If you tweet about your life, a new algorithm can identify your most significant events and assemble them into an accurate life history, say the computer scientists who built it
Twitter
allows anyone to describe their life in unprecedented detail. Many
accounts provide an ongoing commentary of an individual’s interests,
activities and opinions.
So it’s not hard to imagine that it’s possible to reconstruct a person’s life history by analysing their Twitter stream.
But
doing this automatically is trickier than it sounds. That’s because
most Twitter streams contain news of important events mixed up with
entirely trivial details about events of little or no significance. The
difficulty is in telling these apart.
Today,
Jiwei Li at Carnegie Mellon University in Pittsburgh and Claire Cardie
at Cornell University in Ithaca say they’ve developed an algorithm that
does this. Their new technique can create an accurate life history for
any individual by mining their tweets and those of their followers. That
allows them to generate an eerily accurate chronology of a person’s
life-changing events, without knowing anything about them other than
their twitter handle.
The key behind this
work is a technique for separating the wheat from the chaff in any
twitter stream. Li and Cardie do this classifying every tweet in one of
four categories. The most important tweets are those that describe
important, time specific events of a personal nature.
A
tweet about starting a new job would be a good example. By contrast, a
tweet about a 5 kilometre run that is part of a regular exercise regime
would not qualify because it happens regularly. So personal events fall
into two categories–time specific and time general.
Equally,
tweets about other non-personal events fall into a similar two
categories–time specific and time general. A tweet about the US election
would be an example of the former whereas an opinion about the summer
weather would be an example of the latter.
The
problem that Li and Cardie have solved is to find a way of
automaticallydistinguishing tweets in the first category from the
others. The solution is based on the discovery that that the pattern of
tweets, retweets and replies varies for each of the categoroies they’ve
defined.
For example, a tweet about
starting a new job has a different pattern of responses from followers
than a tweet about running or the US election or the weather. So the
trick is to identify this ‘Twitter signature’ of these important
personal events and then mine the twitter stream for other examples. A
chronological list of these events is that person’s life history.
At
least, that’s the theory. Li and Cardie test their idea by mining the
streams of 20 ordinary twitter users and 20 celebrities over a 21 month
period from 2011 to 2013. They then asked the ordinary users to create
their own life history by manually identifying their most important
tweets. For the celebrities, Li and Cardie used Wikipedia biographies
and other sources of information to create ‘gold standard’ life
histories manually.
Finally, they compared
these gold standard life-histories against the ones generated by their
algorithm. The results are not bad. The algorithm accurately picks out
many important life events that are also identified in the gold
standards. “Experiments on real Twitter data quantitatively demonstrate
the effectiveness of our method,” they say.
But
it is by no means perfect. For example, the technique only works with
users who tweet regularly and with enough followers to allow the
algorithm to spot the unique pattern of responses that identifies
important tweets.
Still, that’s a
significant number of people and Lie and Cardie say their technique can
be broadly applied. “It can be extended to any individual, (e.g. friend,
competitor or movie star), if only he or she has a twitter account,” they add.
Lie
and Cardie talk about their future plans in terms of improving the
accuracy of their technique. However, they do not talk about making the
algorithm more widely available. If it works as well as they imply,
there should be no shortage of interested parties wanting to use it.
The
ability to mine the twitter firehose for the life histories of the
masses will be valuable. Just who might want to use this technique and
how, I’ll leave for the comments section below.
The
work raises some interesting questions, not least about privacy. Would
Individuals think more carefully about placing their life history in the
public domain if they knew how easily it could be distilled?
The
new technique means that a detailed life history will be available at
the touch of a button to friends and family but also to prospective
employers, business competitors, the government, the media, law
enforcement agencies, stalkers and so on.
What’s
clear is that social networks are an important aspect of modern life.
What is not yet so clear is just how powerful and revealing they will
turn out to be.
Ref:arxiv.org/abs/1309.7313 : Timeline Generation: Tracking individuals on Twitter
No comments:
Post a Comment