Social_Software: June 2008 Archive Page
The Petabyte Age is different because more is different. Kilobytes were stored on floppy disks. Megabytes were stored on hard disks. Terabytes were stored in disk arrays. Petabytes are stored in the cloud. As we moved along that progression, we went from the folder analogy to the file cabinet analogy to the library analogy to -- well, at petabytes we ran out of organizational analogies.
At the petabyte scale, information is not a matter of simple three- and four-dimensional taxonomy and order but of dimensionally agnostic statistics. It calls for an entirely different approach, one that requires us to lose the tether of data as something that can be visualized in its totality. It forces us to view data mathematically first and establish a context for it later. For instance, Google conquered the advertising world with nothing more than applied mathematics. It didn't pretend to know anything about the culture and conventions of advertising -- it just assumed that better data, with better analytical tools, would win the day. And Google was right.
Google's founding philosophy is that we don't know why this page is better than that one: If the statistics of incoming links say it is, that's good enough. No semantic or causal analysis is required.
[...]
This is a world where massive amounts of data and applied mathematics replace every other tool that might be brought to bear. Out with every theory of human behavior, from linguistics to sociology. Forget taxonomy, ontology, and psychology. Who knows why people do what they do? The point is they do it, and we can track and measure it with unprecedented fidelity.
Collaborative Authorship Made Easy
The benefits for collaborative writing should be obvious. Wikis allow multiple authors to edit a text easily. While the video doesn't discuss it, wikis include tracking information so anyone can look at who makes changes to the texts and compare the different versions at different points in its creation. Try to do that with a collaborative paper written in Word.
"What we found was that students using social networking sites are actually practicing the kinds of 21st century skills we want them to develop to be successful today," said Christine Greenhow, a learning technologies researcher in the university's College of Education and Human Development and principal investigator of the study. "Students are developing a positive attitude towards using technology systems, editing and customizing content and thinking about online design and layout. They're also sharing creative original work like poetry and film and practicing safe and responsible use of information and technology. The Web sites offer tremendous educational potential."Greenhow said that the study's results, while proving that social networking sites offer more than just social fulfillment or professional networking, also have implications for educators, who now have a vast opportunity to support what students are learning on the Web sites.
"Now that we know what skills students are learning and what experiences they're being exposed to, we can help foster and extend those skills," said Greenhow. "As educators, we always want to know where our students are coming from and what they're interested in so we can build on that in our teaching. By understanding how students may be positively using these networking technologies in their daily lives and where the as yet unrecognized educational opportunities are, we can help make schools even more relevant, connected and meaningful to kids."
Interestingly, researchers found that very few students in the study were actually aware of the academic and professional networking opportunities that the Web sites provide. Making this opportunity more known to students, Greenhow said, is just one way that educators can work with students and their experiences on social networking sites.
According to the NYT, the person who updated the Wikipedia entry 40 minutes before NBC reported it worked at Internet Broadcasting Services, a company that provides web services to TV stations including NBC affiliates. IBS says a "junior-level employee" changed the Wikipedia entry to reflect Russert's death because he or she thought it was common knowledge. When NBC discovered the entry--and freaked out about it--someone else at IBS deleted the date of Russert's death and changed all of the verb tenses back. And then IBS took care of the employee. NYT:
An I.B.S. spokeswoman...added that the company had "taken the necessary measures with the employee and apologized to NBC." NBC News said it was told the employee was fired."
Fired?
If the employee learned the news because NBC was officially distributing it to affiliates under embargo, that's one thing (the firing would be appropriate). If the employee heard about it unofficially, however, from friends at NBC or I.B.S., then the firing was outrageous.* UPDATE: An NBC exec disputes the NYT report, and says the IBS employee was merely suspended, temporarily. We'll update if we can confirm.
It's one thing for a news organization to decide to delay reporting news of a staffer's death out of deference to his or her family (this makes sense). It's another for the organization to expect other organizations to follow the same policy. And it is yet another thing for someone to deliberately strike accurate facts from a collective record to appease an upset client, which is what someone at IBS apparently did.
Does anybody remember that Facebook thing?
Since the rise of social networking sites, the typical SHU student blog has gotten more academic, since the students who are intrested in developing their online identity and relationships already have several well-populated choices.
I only joined Facebook a few months ago. I've connected with a few old friends and people I know from conferences. I envisioned that a handful of students would "friend" me out of pity, but I found myself welcomed fairly quickly.
Yet I'm surprised at how relatively dead Facebook is this summer. I guess when there aren't that many shared real-world events to plan, reflect and post pictures about, there's not much point in visiting Facebook. Stuart Turton on PC Pro has some simillar reactions:
Facebook was a shorthand for my life - "here's who he is, what's he's doing and how he did it" condensed onto one page for your pleasure. Old conversational gambits were suddenly redundant, nobody ever had to ask "what you've been up to this week", because you knew and more so, you know exactly what I was thinking about it "Stuart is bored, Stuart is confounded, Stuart is wondering just why he is writing this."
In the end, the novelty has worn off. I don't think Facebook is any less useful than it was, but the novelty of being in my friend's pockets 24/7 has worn off for me. And presumably for them too. So, we're back on email and mobile. We make plans in the pub and dissect the resulting carnage over dinner. I'm won't close my Facebook account, that would take effort that I don't quite have the will to put in.
Hypertext '08: Session 7: Applications of Hypertext
Chair: Ken Anderson (University of Colorado at Boulder, USA)
Enhancing Access to
Open Corpus Educational Content: Learning in the Wild (Long Paper)
Seamus Lawless, Lucy
Hederman and Vincent
Wade
Trends --content creation moving from the linear authoring of publication to the aggregation of existing; rise of the prosumers, who produce an consume content in increasing volumes.
WWW already holds content useful for incorporation in eLearning options, but the issues of content discovery, repurposing, mean that even WWW content isn't an easy solution.
Address the reliance of eLearning systems on bespoke proprietary content. Open content availability reduces the need for educators to reinvent the wheel every time they create a course. Addresses the information overload in eLearning experiences. Help students identify what is actually relevant ot them in their specific educational context.
Open Corpus Content Service -- OCCS -- WWW and selected digital content repositories. Discovery and harvesting of content -- open-source web crawler, JTCL and Rainbow classification. Indexing with NutchWAX. Visualization -- didn't catch the acronym.
Train the Rainbow text classifier - this dictates what gets included in the cache of content.
[My humanities-trained mind is crying out for examples! I'm putting a lot of conceptual information in temporary storage caches, but the buffer is running out of room. The speaker is actually very good -- but I'm waiting for the payoff that I'm conditioned to expect a humanities presenter would have started to deliver by now. I'm learning just how important the little chart with inputs and outputs is as a convention in scholarly presentations in this genre. We're spending a lot of time on the left-most edge of a rich flowchart that I gather will start moving across the page... we're still on "Training." there we go, now we've got the "Crawling" section. Steve sitting next to me is looking up terms the speaker is using... I'm net yet sure I need to put that level of new information in my neural net until I've seen what it all adds up to.]
Okay -- now we're being walked through an example crawl --
The educator prepares the crawl by identifying the subject area, with seed file generation and training set generation Ran for almost 2 days, found 370,000 + URLs, passed some 67,000 on for further processing, judged 36,000 at 90%. Had human subject matter experts evaluate the returned content to find out how well the computer's predictions mathed the human expert decisions.
[Drat... at this point the Seamus says he's not going into heavy detail -- yet this is exactly content I was waiting for. This is the material I'd like to have seen so that I understand what the system is designed to do, but it's what he rushed through because he judged it as less important. Steve just shut his laptop. Coincidence?]
U-CREATe interface integrates a link to OCCS.
[Okay.. I think I've finally made the phase shift. I came to this talk expecting to read a book. Instead, I got a very meticulous description of new tools for constructing books. Or, to pick a different metaphor, I came expecting to watch a dance, and I got an detailed analysis of how muscles work on the cellular level. Now we're getting usability results -- the convention of the scienctific research paper is to deliver the conclusion last, but humanities papers start with the thesis (the answer to the research question).]
Social Web Applications
in the City: A Lightweight Infrastructure for Urban Computing (Short Paper)
Frank
Allan Hansen and Kaj
Grønbæk
Allan says his work focuses on linking physical places. How to do digital physical linking using 3D barcodes. Present programs built with this infrastructure.
Background -- trying to use ubiquitous hypermedia to support urban web applications -- want to let users brows and create and share information while they are mobile in an urban environment. Not just browsing, but browsing information related to the urban environment where they are.
Anchor information in the physical world; identify aspects of the physical world that we can use to anchor our links. GPS offers one sensor useful for anchoring links.
Ubiquitous link anchors: ID Mapping. Not a static model; we specify an anchor value and the system finds resources that match that anchor value. The 2D barcodes [a pattern of squares, not bars -- that name 2D barcode seems oxymoronic -- new to me, but an established term.] provide a visual anchor for the link. A URL can be converted in to the 2D barcode, scanned by a cell phone, and used to deliver a resource.
Examples... TagBlogger -- 2007 Arhaus festival, lets users access official location-sensitive information; create and share digital overlays. Had to develop the software and deploy 2D codes in the city. Tags on official festival posters; also tags along a route [pedestrian, I presume].
[I wonder... did the barcodes get vandalized? At any rate, sounds like an interesting project, and far more workable than the old CueCat debacle, which would have required people to carry a specialized device around and tether it to a computer.]
A State of the Art
Survey of Soft Skill Simulation Authoring Tools (Short Paper)
Conor Gaffney, Declan
Dagger and Vincent
Wade
Conor presents. There are physical simulations (you learn about the physical object); procedural simulations (flying an airplane); soft skills (take place in a social context, based on interpersonal relationships). Sales, interviewing, leadership.
Typically you get a short clip of the person being simulated, the learner takes on a role within the simulation, and learns by doing.
The simulations are cost-effective, convenient if online, save, educationally effective.
Demo [Thank you for giving the example this early!]
Teaching psychiatric medical students how to deal with patients (this is PARRY for the 21stC). Looks like the same mechanism for adaptive tree fiction.
Difficulty with soft-skill simulations -- difficult to compose. Not only the complexity of the dialog, it also has to be educationally sound simulation. [I wonder if anyone in our family therapy program would be interested in a tool like this.. .obviously I'm interested in the ability to create a model interview for journalists, but the mechanisms will likely be similar to what a family therapist might face.]
VISIOn Composition Tool; Experience Builder; Captivate 3.
- VISIOn sems to be an outliner
- Experince Builder is accessible through a web browser.
- Captivate 3 most typical type of composition tool out there... approach is more towars hard skills, limited soft skill applications. No back links -- artifact of the procedural hard-skill origins of the tool.
Will present more about the ActSIM composition tool.
Mark B - the sentimental novel is intended to teach us how to act and behave in certain situations. How does an environment devoted to writing hypertext differ from an environemnt designed to teach soft skills?
Continue reading Hypertext '08: Session 7: Applications of Hypertext .
Information Flows and Social Capital in Weblogs: A Case Study in the Brazilian Blogosphere (Long Paper)
Qualitative study. Perception is that bloggers are just wasting time, but people have strong personal reasons for blogging. Went quickly through the obligatory background slide... I wonder that this audience might include so many quantitative researchers that she might have spent a bit of time explaining more about ethnography. Again, I'm used to scholarship with a long discursive introduction, so I always feel out of place when presenters rush through their introduction. I'm generally far more intersted in the related research and the motiations for study than in the mechanics of the model, but that's a feature of my disciplinary training.
Ethnographic study of very personal connections in a small web network of Brazillian bloggers. Motives for blogging include creating personal identity, social interaction. Popularity is a strong draw; getting more comments, being the center of a network; a blog is a "publicity strategy"
Age range 15-50 years. Some 32 of [did she say 40 some?] bloggers in the community responded. Tracked "interaction memes" (everyone does it; publish the meme to belong) and "informational memes" (an opportunity to create authority and popularity by being the first to post a meme).
Interaction memes -- send a questionnaire or the equivalent of a chain letter, bond with your gorup by answering these questions creatively.
This is different from publishing information that there's a new online journal or YouTube link -- these kinds of links aren't repeated.
Interactional memes are connected to creating a personal space. Informational memes are connecting to creating authority and knowledge. What social capital does the blogger want?
Interactional memes -- visibility, interaction, social spupprt. (Relatively more emphais on maintaining new ties.) [This is about modding and mutating the meme, so that it maintains its novelty, not passing it along.]
Informational memes -- visibility, reputation, popularity, authority. Bridging (creating new ties) rather than maintaining and strengthening existing ones. [It's likely that bloggers who regularly come up with new ideas probably have at least some "long" connections with people who aren't tightly connected within their groups.]
Making Revisions Hyper-Visible (Short Paper)
David Kolb
14 years ago, published "Socrates in the Labyrinth." How do you revise a hypertext? Mentioned some philosophers who published retractions and revisions; scholars publish both versions. Notes that Auden and other poets revised their works when collecting them for many reasons, both internal and external.
Revising literary works and revising expository or argumentative works. Consider that Joyce revised "afternoon, a story" -- if you mark them they seem like part of the text. There are very few reasons to emphasize revisions in a literary hypertext. In an argumentative work, you might make those revisions and the reasons for them explicit.
Not just the revised text, but also the meta-comments about the work.
Print -- you have two volumes, with the later one footnoting the earlier one. The new version generally replaces the old version, since print operates on an economy of scarcity. Hypertext has an economy of abundance. Wikipedia and Word hide the revisions. In hypertext, you will link the old and new versions. You could leave the old structure and add notes. But a significant update would include new links; the revision will embrace the original (or large parts thereof) but add complexity.
Revision of an argumentative hypertext will lead to a new hypertext with an more elaborate link pattern. [I'm following this closely because I'm working on the development of the map to Colossal Cave Adventure, and all this talk about nodes and paths is sparking lots of ideas.]
Why revise hyper-visibly? Helps scholars clarify what was meant; helps readers identify the changes; helps readers judge whether the changes are useful; provides more chance for the author and reader to think together about the issues.
Audience comment: This is a subset of a more general problem -- we don't have rich enough object models in which the objects were all accessible in versions, this problem would go away. [I can't help but think again of the variable implementations of Douglas Adams' H2G2 -- TV show, radio play, IF game, movie...]
Audience comment: When we change words we often intend to change the whole work [but the example was poetry, rather than David's example of expository.]
We're All Stars Now:
Reality Television, Web 2.0, and Mediated Identities (Short Paper)
Michael A. Stefanone and Derek Lackaff
Derek began by echoing Raquel's paper. Why would someone post the cursed rabbit confessioal meme? What happens to identity when it gets mediated. Invoked the post-coporality promised by Turkle and others. [I'm reminded of My Tiny Life, where Dibbell notes that the best writers got the most virtual "action" -- while people were no longer limited by their bodies, the were, in a textual environment, defined by their ability to write. I think it was insightful for a writer like DIbbell to percive that a world that doles out rewards according to writing talent is really no more fair than a world that rewards looks or riches.]
Reality TV recently voted 2nd worst invention of all time, but it's very popuar. Rise of Web 2.0 represents ability of people to participate. [I note that "youtubing" has entered the lexicon... ]
Observational learning -- requires a model, a learnable behavior, and a context that conduces people to model behavior. [Reminds me of the Frontline video, "Merchants of Cool," that tracks trends through the various forces that combine to manage what the "mooks" and "midriffs" of the world think are cool.]
Hyphotheses -- Reality TV consumption related to time spent on social networking sites, breadth of networks including online only friends, and photos shared online. Asked participants to self-report.
People who watch TV news, fiction, documentaries has little effect on network size, connectness, or photo sharing; rate of watching Reality TV is significant.
Takeaway - we have empirical links between traditional media consumption (watching TV) and the "really cool things that are going on online." Definite change in the understanding of social space. People talking about the social networks that they're part of in new ways. Having an identity online is increasingly banal.
Look at specific media genres -- not TV as a whole, but what kind of TV being watched. [The reality TV genre really got its start during a writers strike in 1989. COPS, Americas Funniest Vidoes... also a resurgence of sitcoms based on figures who could provide their own content, such as Roseanne Barr, Tim Allen... probably building on the success of The Cosby Show.]
Future directions -- attention as power, validity of articulate network structures.
Audience comment: Note that professionals and academics put up lots of information about themselves; we do a different kind of self-promotion, but is it really different from youth social networking?
Response: The scale of social networking sites is greater... novel in the scope.
Mark noted that it could be social networking that gets people interested in reality TV.
The Revenge of the
Page (Long Paper)
David Kolb
The little paper on revision you heard a little while ago was the paper he had intended to write... the issue of complexity began as a footnote, then became an appendix. The dream was complexly linked hypertexts with long, complicated hyperlinks; patterns of links that demand rereading and demanding contemplation beyond the boundaries of the next link.
Quote from Mark B invoking the concept of complex linking... Moulthrop's Victory Garden. Complex literary effects to be achieved from this idea.
14 years later, "Let's face it, there aren't very many complex hypertexts like that."
Wikipedia's links are all single-step links, going from one self-contained mini-essay to another; links are "you want more information? Here's some more."
Reality: Google Analytics looks at Kolb's own example of a complex hypertext: Kolb's Sprawling Places. [I have got to follow up on this for my work on Colossal Cave.]
Kolb notes that Google Images is sending most of his visitors attracted by words in photo captions. Almost nobody visited a large number of pages. Most people navigate through the site by clicking the menu bars rather than the inline links.
Trivial number of people encountered his text in the way he hoped reading would develop. Does it make sense to continue to support the idea of expository and argumentative texts with complex linking patterns.
There are some assertions that can't be made well in a single page; understanding of some concepts requires complexity. [I would add that complex sites can also meet the needs of multiple users, giving newbies a way to explore unfamiliar terms, and advanced users more depth, generalists more breadth, etc.]
The page metaphor -- we expect a page to contain a little mini self-contained essay. We browse things we expect to be relatively self-sufficient. Web-writing tools are optimized for the creation of pages. The link becomes the link between pages rather than part of a chain of links.
But there's a deeper reason. Node and link hypertext itself is one node at a time. We expect one node to replace the other. Maybe we need to do more than we've done if we want complexity. Maybe hypertext is more than nodes and links.
Collage/montage? Make the individual pages more complex. You could use the collage effect of a page to create complexity within the page. Pages are becoming more than pages -- embedded rich media.
You might also make more than one node visible. You can have a web page spawn another window, but that's seldom done.
Replace complexity of linkage with complexity of spatial juxtaposition. [That's a return to the model of the highly annotated illuminated medieval manuscript.]
More sophistication in the relationship between tet and graphics. Images aren't simple illustration. [That's an interesting connection to the idea about links.. an image that merely illustrates is like a Wikipedia link that simply offers more information. A link can also offer an alternative opinion, provide context, refute opposition, etc.]
layertennis.com -- color commentary on two graphic artists competing with each other to generate images in the same file on different layers. The play of images and text is a way to bring complexity into web habits of reading.
Audience comment: Shocked that the invention of the web browser is a done deal and there's not much else to do with hypertext. The web browser chains hypertexts in the same way that the book when it was first invented was chained to the wall. [But hold on... the illuminated medieval manuscript was chained to the wall because the value of the labor and materials that went into the production of that book was probably higher than the value of the building to which the book was chained. How does the "browser as chained book" metaphor map to the present information economy? The pen that's chained to the desk in the bank isn't there to prevent people from writing, it's there so that people who are in the bank can count on having a pen there for them to use. I don't see the chianed book reference hanigng together beyond a surface analogy that the medum of the browser is like a chain, but the chained public book was chained so that more people could consult the book and not hide it in their private collection.]
The Very Small World
of the Well-Connected (Long Paper)
Xiaolin Shi, Matthew Bonner, Lada
Adamic and Anna Gilbert
Vertex Important Graph Synopsis. (Promises a definition!)
Opening image -- "Network or Hairball?" Huge networks are difficult to study and share. To shrink or summarize a network, you create a subgraph of vertices you decide are important. Study these important vertices, and compare their behavior to the rest of the graph.
Degree, betweenness, closeness, PageRank. He's spending time describing in detail all the importance measures. Assortativity vs disassortativity... [The graphs of "betweenness" and "PageRank" look very similar, which I gather is because Google bases its linking algorithim on betweenness.]
Demonstrated how subgraphs can differ greatly based on which importance factor your subgraph selects for. [To translate into my discipline, I suppose this is a close reading of a small portion of a text, by blocking out all the information you don't want to find.]
Leass than 10% of the nodes are needed before the subgraph results look very close to the overall dataset.
This is the third time the presenter has apologized because a dotted red line is invisible in the slides being projected. His oral explanation of what he lines should signify is clear enough, I gather, though he's referring to special kinds of graphs that network researchers (not I) would be expected to understand.
A blog aggregator in his dataset was connecting to many, many nodes, not just the most important nodes, so there's a notable anomaly in degree. [I'm actually intersted in this, though he's mentioned it only as a footnote and offered to say more on it during the Q & A if desired.]
["NP-complete: reducible to Steiner..." nope, the slide changed even before I could write this down as an example of a statement that's perfectly straightforward but requires background knowledge that I would need to look up. I missed what "Keep One" and "Keep All" mean.]
[What I'm getting out of this is a mathematical illustration of the thesis that it is possible to gain useful, accurate information from a study of a subset within a dataset, and that other factors (which are over my head) seem to control for whatever bias you introduce when you select a set of nodes for a particular characteristic. Unlike the "close reading" metaphor I used earlier, if you know what you're doing, you can select a "synopsis" that emphasizes the features you wish to study, and identify how closely your subset matches the overall dataset's similar features.]
An Epistemic Dynamic
Model for Tagging Systems (Long Paper)
Klaas Dellschaft and Steffen
Staab
Klaas pressented, beginning with a basic introduction of tagging [probably not necessary for this audience, but it was brief and... and I don't mind an introduction that gives me a minute or so to adjust to a new speaker's cadence, accent, and relationship to the visuals.]
How do users influence each other in tagging systems?
User interface, user brain, background knowledge, and "something else" all influence tagging behavior. We can only observe the tagging behavior [well, we can learn something about the user interface, too, but I think that comment is intended to contrast with the inscrutability of the brain.]
Klass introduces foksonomy, [again I would assume most in the audience already know the term, but I'm always intersted in how individual scholars define their terms... this is necessary in the humanities, since we don't always have empirical evidence to help us define our terms.]
Presented sample of coding mechanism "co-ocurrence stream" . Measured how many occurrences, how many users, how many different texts, and how many resources.
After 100 occurrences of a term, you have about 50 texts. The number of distinct texts is growing but the rate of growth decreases... it's a nice logarhithmic scale showing that each time the tag appears again, it is more likely that the tag is used on an existing text, rather than a new text.
The frequency of texts is inverted -- The most often used text has a 4-5% probability, text #100 was 01.% probability. [Did he describe that as "fat tail distribution"? I assume that's the oppossite of the "long tail"?] Not quite as even as the tag graph, but
Resource streams [I missed something... does this mean he's graphing the channel by which the document is delivered, or by which the tag entered into his dataset, or the timeline of each entry in his dataset regardles of source or other characteristics? He briefly went through some related research, so obviously this is a common concept that I would know about if I knew the literature. The whole idea of measuring the time between the creation of links is an exciting new concept for me, since it blends well with composition pedagogy that considers writing as process rather than product.]
[Equation that I can't type fast enough] Probability of selecting from background knowledge -- probability of selecting word w for topic t modeled by -- [drat the slide changed.]
[Another equation] probabilty of repeating a previous tag assignment.
Now he's finished presenting the model, and he's moving on to compare the observed behavior with the behavior he modeled.
[I'm starting to pick up on the biological metaphors -- mutating of tag assignments, limited number of texts in the pool. I hadn't realized just how deeply the "virus" metaphor is rooted in this subject matter... Does the metaphor drive the science? Are there patterns that don't already have known biological instances that aren't pursued because there isn't already a vocabulary to describe them? I'm going to have to re-read The Selfish Gene for all the parts that I skimmed over when I was reading it simply to undersand "meme".]
After more than 1000 tag assignments, the frequency of assignments is more or less stable. There's a drop around tag number 7 in resource streams. Cited http://isweb.uni-koblenz.de/Research/Tagdataset and http://tagora-project.edu
First question, from the conference host, was about whether the user interface affects tagging behavior? [Here I am, thinking that I'm an idiot, and a big-shot followed up on something that dimly appeared to my Humanities brain. I confess I didn't follow the answer but that's my fault -- I was high-fiving myself for having wondered about that earlier.]
Understanding the
Efficiency of Social Tagging Systems using Information Theory (Long Paper)
Ed H. Chi and Todd
Mytkowicz
Todd presented. Organization of knowledge... the printing press was "a start" at organizing human knowledge. [Well... the codex, the classical form of the argument, rhyme, abstract thought... but I digress.]
Great-- Todd notes that the audience doesn't need an introduction to tagging, but notes the value of communicating the way he frames tagging. We all have individualistic motivations for tagging, but the result is a global knowledge map; the research posits that tags create a lower-dimensional representation of the material the tags describe.
This is a distributive process with hundreds and thousands of users, and trying to infer high-level behavior is challenging. If we sample del.icio.us data, we don't own access to the entire dataset, and therefore don't know whether we're sampling the data accurately. Noted the Measure/Model/Innovate cycle, and indicated roadblocks when you're measuring a dataset that you don't own.
Information theory -- how much information do tags tell us about an underlying document.
Entropy measures the uncertainty in a dataset. A heterogenous set has maximun entropy at log n, minimum of 0. You can increase entropy by increasing the number of things in your dataset; or you can keep n constant and make data more uniform [did he mean to say less uniform?]
[I lost track for a while because my blog was having trouble saving... picking back up]
Overview of the data collected from deli.cio.us over about 120 weeks. The number of documents increases rapidly... there have to be many tags for every individual document that comes into the system.
Entropy of the document set is rising... the diversity of the document set is increasing over time. People are creating new content for the system, rather than re-enforcing opinions that have already been entered into the system. This is good for del.icio.us in the long term.
Tagging in delicious is both encoding and retrieving. Encoding -- users have some notion of document space and future use; how likely are you to use a document in the future? We're sending a message to ourself in the future that we're going to have to retrieve. [How intersting -- this concept might be very useful in getting students to think about the value of taking notes in class or annotating their journey through a challenging academic or literary text.]
Around week 80, there's a dramatic stabilization of the entropy curve. The number of tags are increasing over time. People are becoming more and more likely to set up a tag that's already in the document space. [What happened around that time to create that change? Did some blogging tool or some content site like Wired add "delicious" tags to its content?]
Tag "efficiency" is decreasing -- crested around week 40. Tags are becoming less and less signficant. The descriptive power of English is limited.
Do people change their behaviors to re-capture some of that efficiency? People are starting to add more tags to each item in response to the navigation pressure. Ziff's principle of least effort. If you are just archiving a document, you probably won't put much effort into the tagging. Putting the notion of tagging in to the framework of information theory.
[My observation... the delicious interface changes over time, so the nature of tagging changes over time.]
Rather than delicious suggesting tags others have used, try asking the user to come up with new tags [as Google does in its picture-tagging game].
[During the Q and A I asked whether the drop at week 80 was realted to a change in the del.icio.us interface, but Todd says not. They did notice other phenomena that corresponded with changes in the user interface.]
Academics have long talked of the "academic conversation." Now, Web 2.0 has called our bluff. We live in the midst of a non-stop world conversation. But, are conversational skills (in writing) important and, if so, how do we teach them?
Often called a dead genre, interactive fiction continues to flourish long after reaching the end of its commercial lifespan. In the decades since whiz-bang graphics drew away the attention of the masses, hundreds of games have continued to evolve the genre -- to the point where it can be a little intimidating to approach cold. If you've never experienced interactive fiction, or haven't returned to it since its commercial decline, maybe we can offer a little direction. Here are five of our favorite titles from the last decade to ease you into things.His picks: Lost Pig, Ecdysis, Takes of the Traveling Swordsman, Galatea, and Photopia.
This year's conference emphasizes social linking and its relation to information linking.
A striking slide illustrated the tangled interconnections of online friendships, as opposed to the red and blue nodes that characterize political blogs (with some neutral interconnections). (Blackstorm-Huttenlocher-Kleinberg-Lan 2006) and (Adamic and Glance 2005)
Bridging levels of scale; Zacharly 1977 studying a university karate club in the process of splitting in two. 34 nodes in the social network, 2 years researching each of these nodes, big chunk of a PhD work to investigate the conflict.
Compare the 30-year-old study of 34 nodes to the kind of information we can get out of massive network datasets -- both more and less information. It's easier to measure, hard to pose nuanced questions about what the connections mean. We can hand-code small subsets, but not on the large scale.
Notes that the goal is not to accumulate huge amounts of data, but rather to find the point where these two lies of research converge.
Intrinsically missing from large-scale studies -- we need to enrich our notion of link structure, so as to be able to talk about more complex, subtle questions.
Social networks aren't static structures -- they are circulatory systems for ideas and information. Tension between the global reach of diffusion and the localized views that most datasets (even massive ones) provide. You can watch one person adopt something and see who's influenced by that, but you can't really watch the first 1000 people to adopt iPhones. How to find a dataset that will let us watch something spread, not just locally in the neighborhood of a node, but on a larger scale.
Example: chain-letter petitions.
Quoted from Vannevar Bush on the associated trails running through documents, and people as trailblazers. A person blazes a trail through a network of documents... a chain letter is a document that blazes a trail through a network of people. The document follows the social structure.
Shows the traditional picture of information propagating along a neat tree. But the full tree is unobservable. But a chain letter petition includes traces -- a signature of all people that this particular incarnation of the chain letter has passed through.
People change the list, changing spellings, or adding joke names, or truncating sections. The analogy to mutational events is computationally very useful. Every kind of genomic mutation that you can imagine.
Showed 20,000 nodes of the Iraq chain letter -- looks like hair. More linear propagation than lateral propagation. Really few branches. (Jon has clarified that we're reconstructing the chain based on what we find... we don't have the whole tree.) Trees for other chain letters have similar structure.
Timing gets you closer to the answer. Viral spreading is implicitly synchronous -- that each branch propagates at basically the same rate. Biological viral infections impose internal schedules, but people don't forward information at the same rate, so different branches will progress at different rates. Notes that there's been productive recent research in the time people take to respond to emails.
Because people within social networks will likely get multiple copies of a chain letter, the relative dates at which people woke up and found multiple copies of the chain letter in their in box means that the chain latter followed a depth chain -- only one copy was produced from each related group of friends who received it at the same time and got a copy from someone else before they acted on it. LiveJournal friendship networks produce a similar tree shape.
Spatial clusters... even in online social networks, there's a large amount of homophilia (you are friends with people who are similar to you).
Opposing influence -- sometimes people act only when they have multiple stimuli (multiple requests to send on the chain letter).
Altough you may hear about the chain letter from a remote link, you are not likely to participate until you've heard about it from enough of your closer relationships... so the chain letter only spreads locally, it doesn't jump a great distance to bridge the gap between you and a distant connection who shares few of your friends.
Can we use this information to predict which viral events will lead to cascades? Columbia University "Music Lab" has a leader board. They ran eight parallel versions of the site, letting the universes evolve independently. The leader board feedback leads to inequality. Random symmetry breaking in the beginning gets amplified. Genuinely bad songs didn't get propagated, and good songs weren't totally published; nevertheless there may be an element of inherent unpredictability in the efforts to predict viral success.
Protection of anonymity. Handing data over to researchers with anonymization is "weak to the point of being dangerous." When we have detailed data about people, anonymization runs into trouble. Someone who posts under multiple identities can readily be identified as the same person. [I love the word -- "de-anonymize."]
The implications -- we may not object to our use trails being published without names attached, but as Jon notes it's possible to correlate the anonymized private data (such as Netfilx rentals) with public data (such as IMDB ratings).
The attacker of an anonymous network can have more power if they are part of the system. It's not hard to plant yourself in a large network, and leave some kind of privacy-breeching Trojan horse. If you knew the information would be released, it's surprisingly easy to compromise privacy by exploiting the pattern of links. The pattern will be so rich that anything you create will be highly distinguishing and highly findable.
A network of 12 nodes is sufficient to identify within a dataset of 100M nodes. Would it be computationally hard to locate a particular node of 12?
In LiveJournal, you and 6 friends chosen at random can carry out an attack compromising (within the domain of an anonymized data set) 10 users. This can happen even unintentionally -- you and a group of friends likely already have a unique linking structure that compromise a set of people you've already linked to.
A clique is a set of nodes that are all interconnected; an independent set has no links. In a large set, you've got to have either a large clique or a large independent set.
Takeaway -- one way to prove that something exists is to show that there's a high probability that it will occur in a random structure.
The IM graph is not random, but we're randomizing the subgraph we're targeting. It's very unlikely that your subgraph is in the set of data released. We don't need to randomize to get that signature -- friendship structures already lay the groundwork for this kind of attack.
Reflections on privacy -- social networks are skeletal structures, but they are something that needs to be treated with care. Anonymization of data doesn't really protect users -- only our contractual agreement with the service provider is protecting our information.
t ^ -1.5 describes the likelihood that you will answer a given e-mail in t days. (Barabasi 2005)
Toronto Globe and Mail -- MySpace "doesn't just create social network, it anatomizes them. It spreads them out like a digestive tract on the autopsy table." Do we want to know all that we can learn when the guts of our social networks spill out? These relationships have been implicit, but social networking makes those relationships explicit.
Hypertext '08: Social Linking 1: Link Inference
Dynamic Prediction
of Communication Flow Using Social Context (Short Paper)
Munmun De Choudhury, Hari
Sundaram, Ajita
John and Doree
Seligmann
Estimate intent to communicate and the associated delay. Using MySpace, successful prediction of intent to communicate. [This section is a review of related work, so the speaker is going quickly through material that's unfamiliar to me... I'm waiting for the statement of the research question and I'm hoping she'll define her terms... aha! Here are some definitions.]
Intent to communicate -- the probability that a person will engage in some communication with a person in her network.
Delay in propagation -- how long it takes for one person to contact a person in her network on a given topic.
Communication context -- the set of attributes that affect communication between two individuals. Has been established that context is dynamic, relationship between messages, past communication behavior of a person, and response patterns of people in the person's neighborhood.
Neighborhood context, topic context, and recipient context. [That was a detailed slide.. I didn't get to process it before she moved on.]
Chart shows people can be categorized as having a strong tendency to send or a strong tendency to receive messages. People can be generators, mediators, and receptors of information. Also categorizing people according to strength of ties - strong or weak, a value that changes depending on what information is being transmitted.
As near as I can tell, the research looked at the past communications between people, and predicted how likely they would communicate with each other on a given topic, and the time delay. I'm not sure what it means to say that their predictions had only 12-15% error rate as opposed to some other measure by which there were 25%-30% of an error rate... how does the control group differ from the group that presented the smaller error rate? Are we looking at a refinement of an existing method to predict future behavior? That would be such a n00b question that I'm not going to ask it. There are limits to my willingness to parade my ignorance.
Correlating User Profiles
From Multiple Folksonomies (Long Paper)
Martin Szomszor, Ivan Cantador and Harith Alani
Martin Szomszor: The current Web 2.0 direction is towards the user having different profiles for doing different things. We expose a lot of information about ourselves across various sites. Research suggests that lots of us will have lots of different profiles, to help us do lots of different things. Spoke of the popularity of tagging/folksonomy -- people like expressing themselves through the tags they supply. The way people tag is heavily influenced by the way people around them tag.
People tag in multiple sites that focus on different domains that focus on different tasks. Can we use this behavior to find a figerprint that can identify a user across all these different profiles. [What's the application of this research? Targeting ads?]
The research searched for userIDs in Flickr and del.icio.us, filtering out the low-activity sites. People on different sites often used the same tags in different sites.
A slide noted different kinds of tags likely to be repeated -- dates, activities such as "cooking," and events such as "christmas." As people tag more across more sites, the overlap increases.
Due to the free-form nature of tagging, people aren't always consistent ("podcast" vs "podcasting" or "blog" "blogs" or "blogging") even within the same domain. Filter out the overlaps. Discarding such terms as dates, dealing with misspellings and compound nouns; used Google to do this work for them, since Google will suggest a correction when you mistype a word. Then moves to WordNet and Wikipedia, normalizing terms and looking for synonyms.
Little change when uncommon tags were rejected. Biggest change in the results occurs when the dataset is correlated with Wikipedia, checking for acronyms and such.
Compared each individual tag cloud with the group. Most people have a fairly small delicious vocabulary size. Filtering did not really help identify the profiles of users across systems. I'm not sure how to interpret the statistical significance, but it looks like tag clouds are not a terribly reliable way to identify what the researchers feel is the same person using different profiles. Correlating all the tags to Wikipeia did result in an increase in the likelihood that this method would accurately identify the same user on different sites.
Google API was released just after the paper was published; thus there are new tools that help researchers find all their accounts, so there's more data that they can use to re-run the filtering with a better set of data. Other issues -- does "sf" mean "science fiction" or "San Francisco"? A user who uses "second" and "life" as two separate tags is actually referring to the single subject "second life" -- a more accurate study would account for that.
[As you can probably guess from the quality of the notes, this talk was much more accessible to me. I even asked a question about whether the research looks at the content on the other end of the link that's being tagged. No, it doesn't, but it's possible to do so.]
Measuring Social Networks
with Digital Photograph Collections (Short Paper)
Scott Golder
Noted that his talk is more about explicit links. Began with a picture of shoeboxes, the historical photo storage technique. Then added XML tags around the photo, saying that what we'd really like is a system to turn a shoebox of photos into a dataset.
The most important information is the identity of the person in each photo.
Showed a photo from his own wedding, showing a group. People are more likely to take a camera to, and travel to, special social events. We're not capturing people's work networks with the same mechanism that is so useful for capturing someone's social network.
Postcard study -- mailing postcards from Massachusets to Kansas. Diary study -- ask people to record in a diary all the people they talk to. You can have people in a classroom mention names -- who woudl you ask to help you get homework. Large-scale quantitative studies of networks, including e-mail networks. Connections between people in a corporate e-mail network. Trade-offs between a study with large n and a study that provides a lot of info.
You're likely to be in a photo with soemeone you know, so pairs of people in a photo is a good way to build a network.
A photo of 30 people implies a much weaker link between any two in that goup. As the number of people in the photo grows, the amount of weight implied by the link decreases.
This doesn't work for smart album generation. (Promised to tell why later.)
Noted Facebook's photo-tagging feature.
Link strength is useful for predicting who else is likely to be in a photo. Only a few key people were in many photos, most photos had only a few people. Only half of the photos had any people in it at all, and many had only one person. There were few photos with large numbers of people.
The value of being co-depicted with the owner of the archive. Photographer can't also be in the picture with you. Being co-depicted with the archive owner implies some effort -- the photographer hands the camera to someone else. Friends of people in photos with the owner are also more likely to be rated higher.
[I wonder... did the act of looking at a subset of all your photos, and seeing a picture of you with someone else, have an effect on the user's self-rating? What about asking the people in the photos to rank their closeness with the photographer? I asked this question... Scott said he had not considered that.]
A close friend brings a stranger that you don't know to an event, and gets into your archive by virtue of proximity to your close friend. So close friends bring strangers into your network. [This seems tautological... close friends probably also bring people who are close to you into your network....]
A question from the audience -- is the timestamp of a photo important -- if you've been taking photos of the same person for 10 years, wouldn't that show more closeness? [I wonder also, is the archive owner's distance from the photo any indication of emotional closeness?]
Can Blog Communication
Dynamics be correlated with Stock Market Activity? (Short Paper)
Munmun
De Choudhury, Hari
Sundaram, Ajita
John and Doree
Seligmann
This line of inquiry might help us understand the predictive power of online chatter; could be useful to companies intersted in their online reputation.
Looked at stock market motion of specific companies correlated with communication on in endgaget.com. [I wonder... how do the information dymamics of comments posted in response to entries posted on a specific blog differ from the information dynamics of bloggers who choose to create their own blog entry on a topic. Commenters don't drive the discusion the way the original posts do.]
Example -- last year's release of the iPhone. LIkely that lots of people on endgaget will be talking about the iPhone, and when the event actually happens it's likely that the stock wil move.
[Does the dataset note positive or negative comments?]
Characterize people as early responders and late traliers; frequency of communication is loyal readers and outliers.
Presumes that stock market activity is related to blog behavior over the previous week. [But surely at least some of the blog communication will be responding to stock market events... but I supposed that would make more of a difference on a website devoted to business. Well, you have to start somewhere, and Munmun has clearly stated this is an assumption.]
Conclusions -- excellent results coordinating blog chatter with stock motion. "Remarkable predictive power." Future directions -- looking at the predictive power of groups and communities, vocal majorities and silent minorities.
Q -- does the work look at the sentiment of the blog posts? (No, this work does not look at the sentiments )
[Munmun indicated that it was possible to predict whether the stock would go up or down, but I'm not sure I understand how -- if people post more frequently is that a sign that stock will go up? Or if more outliers post, does that mean the stock will go down?]
Continue reading Hypertext '08: Social Linking 1: Link Inference.
Hypertext '08 Poster Presentation: Charlie Hargood, A Thematic Model for Narrative Generation
One poster I had no trouble understanding at a glance was Charlie Hargood's poster on his narrative generation project. Themes, motifs, connotation, denotation -- this is familiar language about storytelling, presenting in the context of a model for generating rich narrative.
At yesterday's workshop, Chris Crawford dismissed the idea that an interactive narrative should be judged on anything other than its interactive depth; if you want literary richness, then read a book. I would have like to hear Chris and Charlie discuss their differing approaches to the same problem. (Charlie says he was attending a different workshop yesterday.)
The core I took away from Charlie's presentation (which is a proposed model, rather than a working demo or a finished product), was his term "natom" for "narrative atom." In the past I have referred to the interactivity of a text-adventure game as a more-finely grained than a Choose-Your-Own-Adventure novel, and "natom" is a wonderfully evocative term for each individual grain. Charlie's model includes tagging each "natom" according to its "features," using the tagged features to denote "motifs," and presenting "themes" as connoted by these "motifs" (as well as by other themes).
Since my approach to interactive narrative is so thoroughly colored by my knowledge of interactive fiction, I couldn't help but point Charlie to the "recipe book" that's part of the Inform 7 design environment. That recipe book includes about 200 examples, most of which were written by Emily Short, that present the code for such concepts as "a person who can be in love with exactly one other person at a time" or "a telephone that lets people talk to and hear characters in distant rooms." The IF community has done a lot of tagging and sorting of the corpus of IF works, and I wonder if IF would be a good testing ground for his world-building model. Can his model accurately represent the kinds of stories IF authors have generated?
At any rate, I gave Charlie some pointers for learning about the theoretical and critical output of the IF community.
Hypertext '08: One-Minute Poster Presentations
About 20 people pre-loaded their slides onto the conference room computer, then lined up in the aisle. Each was given one minute to present their ideas. The host had an ooga-ooga horn that he squeezed when the one minute was up.
It's painful to watch someone cut off in mid-sentence, but it's a fascinating genre. Plus, this one-minute pitch is designed to get the conference attendees to stop by the presenter's table later on. It's an efficient way to for conference attendees to sample all the posters, and it's a good chance for the presenters to encapsulate why their work is worth a closer look.
Okay, now that I've processed what I think about the genre, I'm ready to shift my focus to the content of the talks.
Paper 15 and 16, on on improving/expanding browser functionality were the most relevant to my interests so far. Paper 18 explicitly mentions blogs, so naturally I'm interested. Paper 19 "Social WebEx Usage" is an educational tool that interests me; from the quotes from students it seems to be teaching Java, which is not an application I'd need.
Students whose posters are rated the best will give 10-minute talks tomorrow, and the winners of that will go on to the next phase.
Brughel painting showing the social dynamics of a village festival.
Grounded the talk with a presentation of statistics on user-generated content (Facebook, MySpace, etc.), noting that whether those users are interacting with one another is another question. Noted that his research is observational rather than experimental, and that he won't be able to go into detail, because rather his overservations will focus on what we can learn from the large number of users.
Noted that until recently, most content was created by a few and consumed by many. Noted the "remarkable inversion" in the creation of content.
Wikipedia - great example of something created from the bottom up. Invoked Nature's experiment comparing Wikipedia articles with print encyclopedias.
Interested in the overall quality of Wikipedia as a reliable source of information. Study how Wikipedia is produced and generated. Correlation between the number of edits and its quality. With D. Wilkinson in First Monday, published on clear correlation between the number of edits and the quality of the article.
Transitioned to the importance of the scarcity of attention. Information that people used to pay for is now freely available, but what is rare is our attention.
Not psychological attention, but social attention. Citing a source so that a reader can attend to it.
How does the phenomenon of attention pay in the importance of information? What is the role of novelty? How do you maximize the amount of value you get from a website given the fact that we have a limited amount of attention.
Noted that very few people are publishing "real scholarly work" on this subject.
In areas of low information density, attention is not a problem. (Showed a desert scene with a stopsign... "information poor environment").
A picture of Howard Rheingold in Tokyo, staring up at the insanely detailed advertising displays. In an information rich environment, attention is valuable, ephemeral, and difficult to obtain.
Two ways to gain the attention of a group. One: to broadcast. At least initially it gets the attention of the people. Another way is propagating the information virally.
Included graphic from a study on Amazon.com recommendations. Most things do not propagate very far, but every so often you have a long chain. Propagation of recommendations of a medical book has many shallow nodes, but the network for a Japanese graphic novel has far fewer nodes, each of which gets much more activity.
People are more likely to buy a DVD recommended by many people than they are to buy a book. Two recommendations leads to a spike in recommendations for book purchases, but after that it drops off. The DVD graph rises steadily with more recommendations, with the value approaching a higher figure.
Prediction 1: the attention among all items is distributed in a log-normal way.
Prediction 2: attention decays in time as a stretched exponential (long tail, invoked radioactive half-life of about 69 minutes, roughly the amount of time a news story is on the front page of a website).
Suggested dynamically reconfiguring a website based on the number of hits the stories receive.
Noted that, while we do attend to novelty, we are also social beings, so we will attend to something simply because others are attending to it. How does popularity affect attention?
[I note that Huberman defines "novelty" as "recency" rather than anything to do with the content. It seems then that it would be easy for a computer to notice when a link has unusually intersting or unusually boring content when the measured clickthrough rate differs from the predicted natural decay... I suppose that's what he means by the "popularity" of an item.]
Public opinions are another attention structure. How do opinions form and evolve? Noted psychological study on group deliberation and group polarization, demonstrating that people tend to move towards extreme views. On the web, it is costly (in terms of time) to write a review. Conjecture -- people will only write a review if their opinion differs from the dominant opinion. There's a softening of reviews over time.
Voter's paradox -- why do people bother to vote when your individual vote is so insignificant in a country of millions of voters?
In larger groups, individual contributions have to be larger in order to have a significant effect on the end result.
When it is costless to express an opinion, polarization occurs. When it is costly to express an opinion, a softening takes place. [What implications does this have for multiple-choice or short-answer quizzes, and papers?]
The group opinion does not change, but the selection process governing who expresses an opinion does change.
IMDB -- I think I missed something here; did he compare the effects of assigning a rating (a low-cost operation) with writing a review (a high-cost operation)? He rushed through this point in order to end on time
Q and A.
Q How does free will enter into the picture?
A. There's a huge difference between people and the objects that physicists study. Rocks falling and photons behaving can be described by an equation. People, unlike rocks, molecules and atoms, take into account the future when they make each action. "I'm not here as a result of Brownian motion." Using physics-like metaphors to discuss phase transformation is just that -- a metaphor. Not forcing the world of people to be like something that we know from physics. Calls for "a certain amount of humility" when facing the study of human actions.
Q: Clarification of "novelty" -- is it just chronology?
A: An idea is novel if it is not repeated; if you insert something in a familiar piece of music, people will attend to it.
Q: What implications for diversity of opinion?
A: Paris Hilton keeps finding novel ways of attracting attention, so he gives her credit for a certain amount of intelligence. There's a competitive component -- attention research actually can be liberating because distinct and diverse opinions attract attention. The fact that whole world is collapsing to the point where we all see the same information is a different phenomenon.
Q. In the current issue of Atlantic, there's an article about traffic management of automobiles in UK vs US. The article says our road system is oversaturated with too many signs.
A: That theory is about a collective attention phenomenon, but not about perceptual attention to individual elements.
Q: Invitation to speculate about tension between the familiar and the new, and the "sweet spot" that attracts... movie genres is an example of people liking things they already know.
A: We haven't done much on that. A personal home page or newspaper is familiar, but has to have content that changes.
Q: Note old comparative psychology study... what kind of animal would a rat want to play with most? Rats liked to play with rats best, but they also liked the human hand. Hypothesis -- moderate unpredictability was good. The guinea pig always had the same behavior (hiding), but the hand was unpredictable. (The speaker noted that this was a comment not a question.)
News Flash: Bloggers Stop Quoting AP Stories
The AP has a right to discourage people from posting the full content of articles online, just as you or I retain the copyright to our own writing (unless we explicitly give those rights away). But to charge money even for brief quotations is to reject the Section 107 of the Copyright Act -- known as the "Fair Use Exception."
§ 107. Limitations on exclusive rights: Fair useNote that copying an entire book (or song, or movie) in order to avoid purchasing it is not "fair use." Showing a clip from a movie in class, or posting quotations from a novel to back up a review or literary research paper, are all covered by "fair use."
Notwithstanding the provisions of sections 106 and 106A, the fair use of a copyrighted work, including such use by reproduction in copies or phonorecords or by any other means specified by that section, for purposes such as criticism, comment, news reporting, teaching (including multiple copies for classroom use), scholarship, or research, is not an infringement of copyright. In determining whether the use made of a work in any particular case is a fair use the factors to be considered shall include--(1) the purpose and character of the use, including whether such use is of a commercial nature or is for nonprofit educational purposes;The fact that a work is unpublished shall not itself bar a finding of fair use if such finding is made upon consideration of all the above factors.
(2) the nature of the copyrighted work;
(3) the amount and substantiality of the portion used in relation to the copyrighted work as a whole; and
(4) the effect of the use upon the potential market for or value of the copyrighted work.
Access to the words of public officials, as reported from various news sources, is an important part of the democratic process. A candidate being interviewed on ABC should be able to quote from what an opponent said on NBC, and someone who calls in on a CBS show should be able to quote from what a guest said on CNN. The Fair Use Exception recognizes that anyone engaging in "criticism" or "comment" should have the same the ability to quote brief passages from published materials.