Big Data is revolutionising 21st-century business without anybody knowing what it actually means. Now computer scientists have come up with a definition they hope everyone can agree on.
One
of the biggest new ideas in computing is “big data”. There is unanimous
agreement that big data is revolutionising commerce in the 21st
century. When it comes to business, big data offers unporecedented
insight, improved decision-making and reveals untapped sources of
profit.
And yet ask a chief technology
officer to define big data and he or she will will scuff their feet and
stare at the floor. Chances are, you will get as many definitions as the
number of people you ask. And that’s a problem for anyone attempting to
buy or sell or use big data services–what exactly is on offer?
Today,
Jonathan Stuart Ward and Adam Barker at the University of St Andrews
in Scotland take the issue in hand. These guys survey the various
definitions offered by the world’s biggest and most influential hi-tech
organisations. They then attempt to distill from all this noise a
definition that everyone can agree on.
Stuart Ward and Barker cast their net far and wide but the results are mixed. Formal definitions are hard to come by with many organisations preferring to give anecdotal examples.
In
particular, the notion of ‘big’ is tricky to pin down, not least
because a dataset that seems large today will almost certainly seem
small in the not-too-distant future. Where one organisation gives hard
figures for what constitutes ‘big’, another gives a relative definition,
implying that big data will always be more than conventional techniques
can handle.
Some organisations point out
that large data sets are not always complex and small dataset are always
simple. Their point is that the complexity of a dataset is an important
factor in deciding whether it is ‘big’.
Here is a summary of the kind of descriptions Stuart Ward and Barker discovered from various influential organisations:
1. Gartner.
In 2001, a Meta (now Gartner) report noted the increasing size of data,
the increasing rate at which it is produced and the increasing range of
formats and representations employed. This report predated the term
‘dig data’ but proposed a three-fold definition encompassing the “three
Vs”: Volume, Velocity and Variety. This idea has since become popular and sometimes includes a fourth V: veracity, to cover questions of trust and uncertainty.
2. Oracle.
Big data is the derivation of value from traditional relational
database driven business decision making, augmented with new sources of
unstructured data.
3. Intel.
Big data opportunities emerge in organisations generating a median of
300 terabytes of data a week. The most common forms of data analysed in
this way are business transactions stored in relational databases
followed by documents, email, sensor data, blogs and social media.
4. Microsoft.
“Big data is the term increasingly used to describe the process of
applying serious computing power - the latest in machine learning and
artificial intelligence - to seriously massive and often highly complex
sets of information.”
5. The Method for an Integrated Knowledge Environment
open source project. The MIKE project argues that big data is not a
function of the size of a dataset but its complexity. Consequently, it
is the high degree of permutations and interactions within a dataset
that defines big data.
6. The National Institute of Standards and Technology.
NIST argues that big data is data which: “exceed(s) the capacity or
capability of current or conventional methods and systems”. In other
words, the notion of “big” is relative to the current standard of
computation.
A mixed bag if ever there was one.
In
addiiotn to the search for defintions, Stuart Ward and Barker attempted
to better understand the way people use the phrase big data by
searching Google Trends to see what words are most commonly associated
with it. They say these are: data analytics, Hadoop, NoSQL, Google, IBM,
and Oracle.
These guys bravely finish
their survey with a definition of their own in which they attempt to
bring together these disparate ideas. Here’s their defintion:
“Big
data is a term describing the storage and analysis of large and or
complex data sets using a series of techniques including, but not
limited to: NoSQL, MapReduce and machine learning.”
A game attempt at a worthy goal–a definition that everyone can agree is certainly overdue.
But will this do the trick? Answers please in the comments section below
No comments:
Post a Comment