Chair: Ethan Munson (University of Wisconsin)
Estimate intent to communicate and the associated delay. Using MySpace, successful prediction of intent to communicate. [This section is a review of related work, so the speaker is going quickly through material that's unfamiliar to me... I'm waiting for the statement of the research question and I'm hoping she'll define her terms... aha! Here are some definitions.]
Intent to communicate — the probability that a person will engage in some communication with a person in her network.
Delay in propagation — how long it takes for one person to contact a person in her network on a given topic.
Communication context — the set of attributes that affect communication between two individuals. Has been established that context is dynamic, relationship between messages, past communication behavior of a person, and response patterns of people in the person’s neighborhood.
Neighborhood context, topic context, and recipient context. [That was a detailed slide.. I didn't get to process it before she moved on.]
Chart shows people can be categorized as having a strong tendency to send or a strong tendency to receive messages. People can be generators, mediators, and receptors of information. Also categorizing people according to strength of ties – strong or weak, a value that changes depending on what information is being transmitted.
As near as I can tell, the research looked at the past communications between people, and predicted how likely they would communicate with each other on a given topic, and the time delay. I’m not sure what it means to say that their predictions had only 12-15% error rate as opposed to some other measure by which there were 25%-30% of an error rate… how does the control group differ from the group that presented the smaller error rate? Are we looking at a refinement of an existing method to predict future behavior? That would be such a n00b question that I’m not going to ask it. There are limits to my willingness to parade my ignorance.
Correlating User Profiles
From Multiple Folksonomies (Long Paper)
Martin Szomszor, Ivan Cantador and Harith Alani
Martin Szomszor: The current Web 2.0 direction is towards the user having different profiles for doing different things. We expose a lot of information about ourselves across various sites. Research suggests that lots of us will have lots of different profiles, to help us do lots of different things. Spoke of the popularity of tagging/folksonomy — people like expressing themselves through the tags they supply. The way people tag is heavily influenced by the way people around them tag.
People tag in multiple sites that focus on different domains that focus on different tasks. Can we use this behavior to find a figerprint that can identify a user across all these different profiles. [What's the application of this research? Targeting ads?]
The research searched for userIDs in Flickr and del.icio.us, filtering out the low-activity sites. People on different sites often used the same tags in different sites.
A slide noted different kinds of tags likely to be repeated — dates, activities such as “cooking,” and events such as “christmas.” As people tag more across more sites, the overlap increases.
Due to the free-form nature of tagging, people aren’t always consistent (“podcast” vs “podcasting” or “blog” “blogs” or “blogging”) even within the same domain. Filter out the overlaps. Discarding such terms as dates, dealing with misspellings and compound nouns; used Google to do this work for them, since Google will suggest a correction when you mistype a word. Then moves to WordNet and Wikipedia, normalizing terms and looking for synonyms.
Little change when uncommon tags were rejected. Biggest change in the results occurs when the dataset is correlated with Wikipedia, checking for acronyms and such.
Compared each individual tag cloud with the group. Most people have a fairly small delicious vocabulary size. Filtering did not really help identify the profiles of users across systems. I’m not sure how to interpret the statistical significance, but it looks like tag clouds are not a terribly reliable way to identify what the researchers feel is the same person using different profiles. Correlating all the tags to Wikipeia did result in an increase in the likelihood that this method would accurately identify the same user on different sites.
Google API was released just after the paper was published; thus there are new tools that help researchers find all their accounts, so there’s more data that they can use to re-run the filtering with a better set of data. Other issues — does “sf” mean “science fiction” or “San Francisco”? A user who uses “second” and “life” as two separate tags is actually referring to the single subject “second life” — a more accurate study would account for that.
[As you can probably guess from the quality of the notes, this talk was much more accessible to me. I even asked a question about whether the research looks at the content on the other end of the link that's being tagged. No, it doesn't, but it's possible to do so.]
Noted that his talk is more about explicit links. Began with a picture of shoeboxes, the historical photo storage technique. Then added XML tags around the photo, saying that what we’d really like is a system to turn a shoebox of photos into a dataset.
The most important information is the identity of the person in each photo.
Showed a photo from his own wedding, showing a group. People are more likely to take a camera to, and travel to, special social events. We’re not capturing people’s work networks with the same mechanism that is so useful for capturing someone’s social network.
Postcard study — mailing postcards from Massachusets to Kansas. Diary study — ask people to record in a diary all the people they talk to. You can have people in a classroom mention names — who woudl you ask to help you get homework. Large-scale quantitative studies of networks, including e-mail networks. Connections between people in a corporate e-mail network. Trade-offs between a study with large n and a study that provides a lot of info.
You’re likely to be in a photo with soemeone you know, so pairs of people in a photo is a good way to build a network.
A photo of 30 people implies a much weaker link between any two in that goup. As the number of people in the photo grows, the amount of weight implied by the link decreases.
This doesn’t work for smart album generation. (Promised to tell why later.)
Noted Facebook’s photo-tagging feature.
Link strength is useful for predicting who else is likely to be in a photo. Only a few key people were in many photos, most photos had only a few people. Only half of the photos had any people in it at all, and many had only one person. There were few photos with large numbers of people.
The value of being co-depicted with the owner of the archive. Photographer can’t also be in the picture with you. Being co-depicted with the archive owner implies some effort — the photographer hands the camera to someone else. Friends of people in photos with the owner are also more likely to be rated higher.
[I wonder... did the act of looking at a subset of all your photos, and seeing a picture of you with someone else, have an effect on the user's self-rating? What about asking the people in the photos to rank their closeness with the photographer? I asked this question... Scott said he had not considered that.]
A close friend brings a stranger that you don’t know to an event, and gets into your archive by virtue of proximity to your close friend. So close friends bring strangers into your network. [This seems tautological... close friends probably also bring people who are close to you into your network....]
A question from the audience — is the timestamp of a photo important — if you’ve been taking photos of the same person for 10 years, wouldn’t that show more closeness? [I wonder also, is the archive owner's distance from the photo any indication of emotional closeness?]
This line of inquiry might help us understand the predictive power of online chatter; could be useful to companies intersted in their online reputation.
Looked at stock market motion of specific companies correlated with communication on in endgaget.com. [I wonder... how do the information dymamics of comments posted in response to entries posted on a specific blog differ from the information dynamics of bloggers who choose to create their own blog entry on a topic. Commenters don't drive the discusion the way the original posts do.]
Example — last year’s release of the iPhone. LIkely that lots of people on endgaget will be talking about the iPhone, and when the event actually happens it’s likely that the stock wil move.
[Does the dataset note positive or negative comments?]
Characterize people as early responders and late traliers; frequency of communication is loyal readers and outliers.
Presumes that stock market activity is related to blog behavior over the previous week. [But surely at least some of the blog communication will be responding to stock market events... but I supposed that would make more of a difference on a website devoted to business. Well, you have to start somewhere, and Munmun has clearly stated this is an assumption.]
Conclusions — excellent results coordinating blog chatter with stock motion. “Remarkable predictive power.” Future directions — looking at the predictive power of groups and communities, vocal majorities and silent minorities.
Q — does the work look at the sentiment of the blog posts? (No, this work does not look at the sentiments )
[Munmun indicated that it was possible to predict whether the stock would go up or down, but I'm not sure I understand how -- if people post more frequently is that a sign that stock will go up? Or if more outliers post, does that mean the stock will go down?]