Showing posts with label privacy. Show all posts
Showing posts with label privacy. Show all posts

Sunday, July 7, 2013

Graphing metadata


One element of the NSA revelations of the past month is the apparent fact that the NSA's PRISM program enables the agency to collect wholesale the transactions that occur on the Internet, including email header information. This follows the revelation that all metadata for phone calls made on the Verizon network (and presumably others) have been collected for a period of time -- perhaps as long as seven years. A story by John Naughton in the Guardian (7 July 2013) highlights the intrusiveness that wholesale exposure of communications metadata creates for each individual's privacy -- without ever looking into the contents of the communications.

An MIT research group has created a tool called Immersion to allow anyone to graph his or her own email communications network (link). This is a very powerful tool, and it is accessible to all of us. So it is an experiment well worth undertaking.

Here are the results for my own case. In the past three years I've transacted something like 60,000 email exchanges. The Immersion tool is able to analyze the metadata of this set of exchanges in a few minutes, and the resulting graphs are stunning. The vast and apparently disorderly library of messages turns out to have a very simple and revealing order. (I've removed the individual names from the graph, but these names are provided by the Immersion tool. An "analyst" would be able to focus in on any point of interest by name.) Here is the graph for the past three years of my communications:


It is important to understand the inputs to this analysis. Only the data from my own email library has been used. Lots of connections are represented in the graph between other people. But these connections all come from my own email cache -- as cc's on messages to or from me. Second, I am not represented on the graph. Rather, each circle is one of the 250 or so people with whom I have had email contact in the past three years. Each circle's diameter represents the volume of flow between that person and me; and the breadth of the links between individuals represents the volume of flow between them as registered in my email cache. (The program apparently has a smart way of excluding messages from organizations like newsletters.)

It is evident that the graph simplifies my social network substantially (as captured by my email traffic).  In fact, it is possible to dissect the graph into several key core groups: the local administrators at the campus where I work, the administrators at the system level of the university, a group of family members, a small group of researchers with whom I communicate and who communicate with each other, a journal editorial group, and a non-profit organization I'm involved in. (There are also students on the graph, but they are single dots, since my email does not provide evidence of email connections among them. Likewise, there are other single-point researchers with whom I communicate (the blue dots lower right), but my cache doesn't indicate communications among them.) So here is the same graph with labels for the distinct groups:


It is interesting to observe that there are lots of inter-group communications between members of the two administrative groups, but the two groups are clearly distinguished nonetheless on the graph. The intra-group communication for each is dense enough to pull the two systems apart. This algorithmic network analysis, based on a very limited set of data, accurately uncovers the organizational structure and hierarchy of the university and the campus administrative groups. Also of interest is the fact that the graph does a decent job of analyzing other people's relationships -- without any data from them!

In other words -- 60,000 messages boil down to a pretty simple and accurate picture of one person's activities over the past three years. 

Another facility that the Immersion tool provides is the ability to graph the recorded connections from any single person to everyone with whom he/she has communicated (out of my list of persons). The large purple circle in the first graph is the chief academic officer on my campus, who reports to me. This is the person with whom I communicate most frequently within the university. (By looking at the month-to-month frequency of communication, also provided by Immersion, it is also possible to single out the months where crises arose.) It is possible to blow up the circle of this individual's network as well, and see which individuals are most frequent connections (again, as recorded by cc's and forwards of messages from other people included in my own in-box). Here is the graph of the CAO's network:


Now think how much more informative these graphs would be if an agency also had access to the email caches of the two hundred or so people represented here. Communications between X and Y that are minor or non-existent on my graph, may turn out to have broad links when this larger set of data is incorporated. And this is exactly the capability that most observers are now attributing to the PRISM and telephone capture programs of the NSA.

What this implies to me is that the people who are most worried about the NSA's wholesale data collection programs are probably right. The data that is being collected -- phone and email metadata, perhaps auto license readers, credit card transactions, and electronic toll and transportation cards as well -- this fund of data suffices to map out our personal lives in much, much greater detail than we would ever have imagined possible in a democratic society. We no longer have the luxury of "privacy through anonymity" or "haystack privacy". Once there is full information recorded over time of our electronic transactions (including, remember, locational data from our phones), our lives can be played back in detail at any point. And the Immersion tool shows that the software exists to make sense of such vast databases, enabling agencies to produce customized and individually tailored "dossiers" for all of us.

The revelations about the FISA court in today's New York Times certainly support skepticism about the notion that these capabilities are carefully and properly controlled by the FISA process (link). Here is a particularly worrisome finding in the NYT article:
The officials said one central concept connects a number of the court’s opinions. The judges have concluded that the mere collection of enormous volumes of “metadata” — facts like the time of phone calls and the numbers dialed, but not the content of conversations — does not violate the Fourth Amendment, as long as the government establishes a valid reason under national security regulations before taking the next step of actually examining the contents of an American’s communications.
This legal theory specifically permits the kind of wholesale collection of metadata that can be used in the ways described here. And that is surely a profound threat to our privacy.


Friday, April 15, 2011

Academic freedom and faculty email

There have been several efforts recently by partisan groups in Michigan and Wisconsin to gain access to faculty email messages on subjects that fall within the scope of the faculty member's research or personal political opinions. These groups have made use of state Freedom of Information laws, on the basis that faculty members are "state officials" and their communications are therefore "business records" of the university.

This is an alarming intrusion into the zone of academic and personal freedom of the faculty member, and it threatens to create a chilling effect on the faculty member's ability to freely communicate his or her ideas with colleagues without fear of retaliation or punishment, or premature disclosure of ideas not yet fully developed. It is vital that universities think very carefully about these issues before complying.

Once a scholar's ideas are published, they are in the public forum and are readily available to anyone who is interested, including the partisan groups who are now attempting to gain access to private emails. But before the scholar chooses to publish his or her ideas, she needs and deserves to have a zone of private conversation and expression through which she can test and refine her ideas. This is part of being a human being. It is a key reason why academic freedom is so important, to allow the free expression and refinement of ideas through intellectual interaction. And being able to control the publicity or privacy of one's thoughts is essential for this process, and is very close to being a human right.

So accepting the principle that a faculty member at a public university is a state official and his/her communications about research ideas or social and political opinions are "business records" represents a huge erosion of academic and personal freedom. Academic freedom requires a zone of untrammeled private expression and discussion through which the individual can develop and refine her ideas.

If public universities are to be successful in maintaining their commitment to academic freedom for their faculty, they need to draw a bright line between business records and intellectual, critical, and creative documents. Freedom of information laws pertain to the former but should not require disclosure of the latter.

So what is the distinction? Here is one way of drawing the distinction. Business activities have to do with decision making about material issues within the organization. They have to do with concrete decisions involving such issues as purchasing, contracting, personnel decisions, hiring, and other material administrative actions. Intellectual, critical and creative documents are those that express the faculty member's ideas, thoughts, judgments, and hypotheses about subjects of interest. Transparency about business deliberations and decisions is essential in order to prevent conflict of interest, favoritism, and other improper business activities within any institution. But privacy with regard to "intellectual, critical, and creative documents" falls outside the scope of business activity, and should be protected.

The argument is sometimes made that professors are hired to think and do research; therefore their writings, even in email, are part of their employment work; therefore these writings are business records. But this line of thought is incorrect. The faculty member is hired to teach courses. An expectation of their work is that they will be active intellectuals and scholars. They will exercise their talents, it is expected, in an autonomous and self-directed way, to arrive at their own original results. But the content and product of their intellectual works are not themselves paid work products. Evidence of this, in part, is the fact that the university does not claim ownership of the copyright on faculty writings. Originality, autonomy, independence, and creativity are key to intellectual work, including faculty work. And this in turn underlines the importance of the zone of privacy within the context of which their intellectual and personal thinking and expression take place.

There are extreme and untenable results that ensue if you take this paradigm to its limit. Right now the FOIA requests are for a range of emails delimited by a list of keywords. But if the principle is accepted that the faculty member's intellectual products, in whatever form, are a business record, then preliminary drafts of scholarly work, laboratory notebooks, the jottings of a creative writing professor in preparation of a short story or novel -- all these products ultimately lead to a research result, which is a part of the expectations of the faculty member's work. And therefore, by this paradigm, they would be discoverable. Therefore it would be possible to FOIA a poet working for a public university to make available preliminary drafts of a poem. Likewise, paintings, drawings, and sculptures are the work of faculty in the arts. By this same principle, it is hard to see a basis for denying a FOIA request for drawings, sketches, and clay models.

On the subject of the expression of political and social opinions, observations, and judgments: Clearly this set of ideas and expressions by the faculty member does not fall within the scope of faculty employment under any description. The university does not hire faculty members to have political opinions. Rather, as citizens they may or may not have such opinions, and it is entirely within their rights to hold and express them. Further, neither the state nor the university has a right or an interest to surveilling or observing or criticizing or delimiting their expressions of political opinion. So any emails that are primarily expressive of political opinions or judgments are not part of their work, do not have business content, and should not be provided under the scope of a FOIA request, even though they are expressed by a faculty member hired by a public university.

These FOIA requests, it should be noted, do not depend on the issue of whether the email account is owned by the university or is a private account. FOIA requires university officials to provide emails that have business content relevant to a particular subject, without regard to the platform on which these messages were transmitted. If the judgment were to stand that the faculty's intellectual products are in fact business records, then it wouldn't matter whether they are expressed in an email owned by the university or a private account.

Background

Here is a story in TPM about the Mackinac Center FOIA request in Michigan (link). Here is a summary of Michigan's FOIA law. Here is a posting from Inside Higher Education that describes the decisions the University of Wisconsin administration made with respect to requests for some of history professor William Cronon's email (link). And here is a thoughtful piece from the Center for Free Speech on Campus on the issues (link).