SlideShare una empresa de Scribd logo
1 de 28
Descargar para leer sin conexión
What Do We Do with
All This Big Data?
Fostering Insight and Trust in the Digital Age
A Market Definition Report
January 21, 2015
By Susan Etlinger
Edited by Rebecca Lieb
Introduction
Every day, we hear new stories about data: how much there is, how fast it
moves, how it’s used for good or ill. Data ubiquity affects our businesses,
our educational and legal systems, our society, and increasingly, our
dinner-table conversation. I had the opportunity to speak at TED@IBM
in San Francisco on September 23, 2014, about the implications of a
data-rich world, and what we can do as businesspeople, citizens, and
consumers, to use it to our best advantage.1
That talk, as well as this document, examines two themes that underlie
many conversations about data and technology that correspond to fears
that George Orwell and Aldous Huxley chronicled in their novels 1984 and
Brave New World. As the culture critic Neil Postman put it in his 1985 book,
Amusing Ourselves to Death:
What Orwell feared were those who would ban books. What
Huxley feared was that there would be no reason to ban a book,
for there would be no one who wanted to read one. Orwell
feared those who would deprive us of information. Huxley
feared those who would give us so much that we would be
reduced to passivity and egotism. Orwell feared that the truth
would be concealed from us. Huxley feared the truth would
be drowned in a sea of irrelevance. Orwell feared we would
become a captive culture. Huxley feared we would become a
trivial culture.2
These two themes—irrelevance and narcissism on one hand (Huxley) and
surveillance and power on the other (Orwell)—anticipate modern fears
about the explosion of data in our personal and professional lives. As
individuals, we crave insight and convenience, yet we simultaneously fear
loss of control over our privacy and our digital identities.
Photo: Daniel K. Davis/TED
Susan Etlinger
speaking at TED@IBM at SFJAZZ, San Francisco, California, September 23, 2014.
What’s So Hard About Big Data? .......................................................................................................................................
With Big Data, Size Isn’t Everything ...............................................................................................................................
Unstructured Data Demands New Analytical Approaches ........................................................................................
Traditional Methodologies Must Adapt ........................................................................................................................
From Data to Insight ..............................................................................................................................................................................
Big Data Requires Linguistic Expertise .........................................................................................................................
Big Data Requires Expertise in Data Science and Critical Thinking .........................................................................
Legal and Ethical Issues of Big Data .................................................................................................................................
Planning for Data Ubiquity .............................................................................................................................................................
Conclusion .........................................................................................................................................................................
Table of Contents
5
6
8
10
13
14
14
17
21
23
Executive Summary
This document proposes an approach to better understand and address:
•	 How we extract insight from data
•	 How we use data in such a way as to earn and protect trust: the trust of customers,
constituents, patients, and partners
To be clear, these twin challenges of insight and trust will occupy data scientists, engineers,
analysts, ethicists, linguists, lawyers, social scientists, journalists, and, of course, the public for
many years to come. To derive insight from data while protecting and sustaining trust with
communities, organizations must think deeply about how they source and analyze it and clarify and
communicate their roles as stewards of increasingly revealing information. This is only a first step,
but it’s a critical one if we are to derive sustainable advantage from data, big and small.
What’s So Hard
About Big Data?
5
WITH BIG DATA, SIZE ISN’T EVERYTHING
The idea of big data isn’t new; it was defined in the late ’90s by analysts at META Group (now
Gartner Group). According to META/Gartner, big data has three main attributes, known as
the Three Vs:
•	 Volume (the amount of data)
•	 Velocity (the speed at which the data moves)
•	 Variety (the many types of data)3
Now nearly two decades old, this construct has become increasingly pertinent. As IBM has famously said,
“90% of all the data in the world was created in the past two years.”4
To understand why this is, we need to
compare the business conditions that existed when big data was originally defined with today’s. In the early
2000s, technologists were grappling with a burgeoning variety of data types, spurred in large part by the rise of
electronic commerce. Today, social media is a major catalyst of data proliferation. Consider that:
•	 100 hours of video are uploaded to YouTube every minute.5
•	 On WordPress alone, users produce about 64.8 million new blog posts and 60.4 million new comments
each month.6
•	 500 million tweets are sent per day.7
Much data is unstructured. It is, as Gartner defines it, “content that does not conform to a specific, pre-
defined data model. It tends to be the human-generated and people-oriented content that does not fit neatly
into database tables.”8
As a result, the primary challenge of what we think of as big data isn’t actually the size;
it’s the variety. For this reason, the term “big data” can sometimes be misleading.
If this seems counterintuitive, consider this example: the New York Stock Exchange (NYSE) recorded
approximately 9.3 billion shares traded on December 16, 2014, more than 18 times the average number of
tweets (approximately 500 million) created per day.9
Even though the number of trades is much larger than
the number of tweets (volume) and the speed of the market may change from hour to hour and day to day
(velocity), the basic attributes of a trade—price, trade time, change from previous trade, previous close, price/
earnings ratio, and so on—are the same every time. A trade is a trade. It is homogeneous and predictable from a
data perspective (variety).
In contrast, social data is far more complex and variable. While a tweet contains some structured data
(metadata about the time it was posted, the user who posted it, whether it includes hashtags or media, such as
photography, and other attributes), it can express anything that fits into 140 characters. It is a mix of structured
metadata and unstructured text and images that can be expressed with variable lengths, languages, meanings,
and formats. It can contain a news headline, a haiku, a sales message, or a random thought. For this reason, a
much smaller number of tweets can be far more complex to analyze from a data standpoint. Size isn’t everything.
6
The nature of human
language demands
rigorous and repeatable
processes to extract
meaning from it in
a transparent and
defensible way.
UNSTRUCTURED
DATA DEMANDS NEW
ANALYTICAL APPROACHES
The human-generated and people-oriented nature of
unstructured data is both an unprecedented asset and a
disruptive force. Data’s value lies in its ability to capture the
desires, hopes, dreams, preferences, buying habits, likes,
and dislikes of everyday people, whether individually or in
aggregate. The disruptive nature of this data stems from
two attributes:
•	 It’s raw material. It requires processing to translate it
into a format that machines, and therefore people, can
understand and act upon at scale.
•	 It offers a window into human behavior and attitudes.
When enriched with demographic and location
information, data can introduce an unprecedented
level of insight and, potentially, privacy concerns..
Unstructured data requires a number of processes and
technologies to:
•	 Identify the appropriate sources
•	 Crawl and extract it
•	 Detect and interpret the language being used
•	 Filter it for spam
•	 Categorize it for relevance (e.g., “Gap store” versus
“trade gap”)
•	 Analyze the content for context (sentiment, tone,
intensity, keywords, location, demographic information)
•	 Classify it so the business can act on it (a customer
service issue, a request for a product enhancement,
a question, etc.)
Each of these steps is rife with nuances that require both
sophisticated technologies and processes to address
(see Figure 1).
The above challenges add up to a host of risks: missed
signals, inaccurate conclusions, bad decisions, high total
cost of data and tool ownership, and an inability to scale,
among others. Even a small misstep, such as a missing
source, a disparity in filtering algorithms, or a lack of
language support, can have a significant detrimental effect
on the trustworthiness of the results.
A recent story in Foreign Policy magazine provides a timely
example. “Why Big Data Missed the Early Warning Signs of
Ebola” highlights the importance of an early media report
published by Xinhua’s French-language newswire covering
a press conference about an outbreak of an unidentified
hemorrhagic fever in the Macenta prefecture in Guinea.10
The Foreign Policy article debunks some of the hyperbole
about the role of big data in identifying Ebola, not because
the technology wasn’t available (it was) or because the
indications weren’t there (they were), but because, as
author Kalev Leetaru writes, “part of the problem is that
the majority of media in Guinea is not published in English,
while most monitoring systems today emphasize English-
language material.”
8
1
2
3
6
7
5
4
ChallengeSteps
Identify Data Sources
Crawl and Extract Data
Detect and Interpret Language
Filter for Spam
Categorize for Relevance
Analyze for Sentiment and
Keywords/Themes
Classify for Action
Not all data sources provide reliable APIs
or consistent access.
Different tools use different crawlers, which
can return different samples.
Different spam filtering algorithms can also
return different samples, accuracy levels.
Sentiment analysis is highly subjective and subject
to interpretation or error. Even with human coding
(which reduces scalability) and machine learning,
no tool is perfect.
Requires both organizational and technology
resource to tag data so that it is appropriately
classified and shared with the right people.
Inconsistent levels of accuracy and
different approaches.
Not all tools support multiple languages,
or support them equally well.
Bonjour!
Hello! Hola!
もしもし!
Hej!
e
eek
e
ve
l
e
e
m
be eoe
w y
k
eaesw
ee
n
q
of
e
a
o
u
ep
eej
geee
o
oty
h
t
af
f
w
FIGURE 1 CHALLENGES OF UNSTRUCTURED DATA
9
TRADITIONAL
METHODOLOGIES
MUST ADAPT
Even in the unlikely event that all relevant data is in English
or another single language, there’s no guarantee that it
will be easy to interpret or that the path to doing so will
be clear. For this reason, researchers in both industry
and academia are grappling with the many challenges
that large, unstructured human data poses as a tool
for conducting scientific or business research. The
following provides an example of how one organization is
addressing these significant methodological issues.
Case Study: Health Media Collaboratory
Applying Methodological Rigor to Big Data
The Health Media Collaboratory (HMC) at the University
of Illinois at Chicago’s Institute for Health Research and
Policy is focused on understanding social data, most of
which is unstructured, to “positively impact the health
behavior of individuals and communities,” according
to its website. In the broadest sense, HMC’s mission is
to develop and propagate a new paradigm for health
media research using innovative strategies to apply
methodological rigor to the analysis of big data.11
The focus of a recent project was to look at how people
talk about quitting smoking on Twitter so that HMC and
the Centers for Disease Control and Prevention (CDC)
could learn how they might promote behavior change.
Recently, HMC turned to Twitter to explore two questions
about the impact, if any, of social data on smoking
cessation. The initial research questions were:
•	 How much electronic-cigarette promotion is there
on Twitter?
•	 How much organic conversation about electronic
cigarettes exists on Twitter?
In another project, HMC also looked at whether Twitter
could be used as a tool to evaluate the efficacy of health-
oriented media campaigns. In particular, the CDC wanted
to assess the impact of several provocative and graphic
television commercials, one of which featured a woman
with a hole in her throat. The questions HMC sought to
answer were:
•	 Did the commercials work?
•	 How can we prove it?
This type of research, as well as the data it presents, is
vastly different from fielding a conventional multiple-
choice survey in which the questions and answers are
predefined and results tabulate the percentage of answers
in each column. HMC instead had to determine, with an
appropriate level of confidence, how people talk about
smoking on Twitter and whether this data could serve as a
useful indicator of public opinion and even of likely behavior.
10
Researchers in
both industry and
academia are
grappling with the
many challenges
that large,
unstructured human
data poses as a
tool for conducting
scientific or
business research.
11
To do this, the team needed to understand how much
of the Twitter conversation about smoking was spam,
how much was off topic (“smoking marijuana,” “smoking
ribs,” “smoking hot women”), and how much was relevant
(“I’ve really got to quit smoking cigarettes”). For the first
project, it also meant understanding how people talk about
electronic cigarettes in particular. Figure 2 is a recreation
of the search string HMC used in its research, illustrating
why this effort isn’t as simple as it might seem.
The methodology that HMC used to collect, clean, and
analyze the Twitter conversation related to smoking
topics closely mirrors the big data challenges outlined
in Figure 1. While it adheres to scientific method, it’s
important to know that this was a methodology that
HMC itself devised to account for the nuances and
challenges of unstructured data.
1. Data collection. Determine the appropriate source
and sample size of the data to be collected.
2. Keyword selection. Generate the most comprehensive
possible list of keywords, encompassing nonstandard
English usages, slang terms, and misspellings.
3. Metadata. Collect metadata related to the
tweets, including:
a.	 A tweet ID (a unique numerical identifier assigned
to each tweet)
b.	 The username and biographical profile of the
account used to post the tweet
c.	 Geolocation (if enabled by the user)
d.	 Number of followers of the posting account
e.	 The number of accounts the posting
account follows
f.	 The posting account’s Klout score
g.	 Hashtags
h.	 URL links
i.	 Media content attached to the tweet.
4. Filtering for engagement. Because engagement with
the campaign was the determining factor for relevance,
the team filtered tweets that described televised
commercials, later de-duplicating them to ensure that
tweets with multiple keywords would not be counted twice.
5. Human coding. Throughout the process, human
coders reviewed the data to assess relevance and code
message content.
Figure 2: How People Talk About E-Cigarettes
Key Words for E-Cigs
E cigarettes blue cigarette e cigarettes njoy cigarette e cigarettes blu cig e cigarettes njoy cig e cigarettes ecig e
cigarettes e cig e cigarettes @blucigs e cigarettes e-cigarette e cigarettes ecigarette ecigarettes from:blucigs e
cigarettes e-cigarette e cigarettes e-cigs e cigarettes ecigarettes e cigarettes e-cigarettes e cigarettes green
smoke e cigarettes south beach smoke e cigarettes cartomizer ecigarette (atomizer OR atomizers)-perfume e
cigarettes ehookah OR e-hookah e cigarettes ejuice OR ejuices OR e-juice OR e-juice ecigarettes eliquid OR
eliquids OR e-liquid OR e-liquids e cigarettes e-smoke OR e-smokes e cigarettes (esmoke OR esmokes)
sample:5 lang: en e cigarettes lavatube OR lavatubes e cigarettes logicecig OR logicecigs e cigarettes
smartsmoker e cigarettes smokestik OR Smokestiks e cigarettes v2 cig OR “v2 cigs” OR v2cig OR v2cigs vaper
or vapers OR vaping e cigarettes zerocig OR Zerocigs e cigarettes cartomizers e cigarettes e-cigarettes
FIGURE 2 HOW PEOPLE TALK ABOUT E-CIGARETTES
Source: University of Illinois at Chicago’s Institute for Health Research and Policy
12
6. Precision and relevance. The team used a
combination of human and machine coding to assess
relevance and eliminate false positives, using three
teams of trained coders and a process to assess
intercoder reliability using a Kappa score, a statistic
“used to assess inter-rater reliability when observing or
otherwise coding qualitative/categorical variables.”12
According to HMC, “the human-coded tweets were then
used to train a naïve Bayes classifier to automatically
classify the larger dataset of Tips engagement tweets
for relevance. Precision was calculated as the percent
of Tips-relevant tweets yielded by the keyword filters.”13
7. Recall. To assess whether the tweet sample was
representative of and could be generalized to all
potentially relevant Twitter content, the team compared
its sample to a larger sample of unretrieved tweets,
again using trained coders and a Kappa score to
assess how well the filtered tweet sample represented
the larger data set.14
8. Content coding. Finally, the team coded the content
to better understand “fear appeals,” that is, whether the
user accepted, rejected, or disregarded the message.
So, did the CDC’s graphic and disturbing anti-smoking ads
and the Twitter conversation surrounding them actually
lead people to quit? HMC didn’t overstate its data; rather, it
concluded that approximately 87% of the tweets about the
TV commercials expressed fear and that the ads had “the
desired result of jolting the audience into a thought process
that might have some impact on future behavior.”15
HMC’s case study illustrates that unstructured data
requires significant adaptations to analytics methodology
to extract meaning. Certainly it would have been a lot
simpler for the CDC to host a focus group or field a survey
to collect impressions about its anti-smoking campaign,
but that data, as comparatively simple as it would have
been to analyze, would lack the spontaneity and rich variety
of expression available on Twitter or other social networks,
had the teams extended the research to other sources.
The nature of human language demands rigorous and
repeatable processes to extract meaning in a transparent
and defensible way. As a result, analytics methodology is
undergoing an explosive period of change.
From Data
to Insight
13
Subject Matter
Expertise
Access to
Tools
Critical Thinking,
Applied Statistics
Inability to
Execute
Incorrect
Conclusions
Insights
Irrelevant
Conclusions
14
BIG DATA REQUIRES
LINGUISTIC EXPERTISE
As counterintuitive as it might seem, an influx of
unstructured data demands not only new and more
sophisticated technologies to process and store it but a
renewed emphasis on the humanistic disciplines as well.
This is because, as Gartner has said, big data “tends to be
the human-generated and people-oriented content” rather
than highly structured data that fits neatly into databases.
Naturally, “human-generated and people-oriented content”
includes language, which is rife with contractions,
sarcasm, slang, and metaphors expressed in multiple
written forms, in hundreds of languages, 24 hours a day,
seven days a week.
Furthermore, language changes constantly, a fact Oxford
Dictionaries marks each November by publishing a word
of the year that encapsulates that year’s zeitgeist. 2014’s
word was “vape,” salient in light of HMC’s research. Five
years ago, “vape” would have been impossible to interpret,
because it—and its cultural context—didn’t exist yet.
A recent article in MIT Technology Review illustrates just
how quickly language and meaning can evolve, both in
obvious and subtle ways.16
Vivek Kulkarni, a PhD student
in the Data Science Lab at Stony Brook University, along
with several of his colleagues, used linguistic mapping
to illustrate the speed at which word meanings change,
gathering inputs from sources such as Google Books,
Amazon, and Twitter.
“Mouse” acquired an entirely new meaning following the
introduction of the computer mouse in the early 1970s, and
“sandy” changed literally overnight with Hurricane Sandy in
2012. Today we see a constant stream of examples both
of redefined words and of new ones (“vaping,” “selfie”) that
require both technological and humanistic expertise to map,
place in context, and understand.
BIG DATA REQUIRES
EXPERTISE IN DATA
SCIENCE AND CRITICAL
THINKING
The speed, size, and variety of data around us—and the
availability of platforms used to visualize and analyze
it—have democratized the function of analytics within
organizations. At the same time, fundamental analytics
education has lagged, creating a situation in which
organizations are at risk of misinterpreting data of all
kinds. Says Philip B. Stark, professor and chair of statistics
at the University of California, Berkeley, “the type of data
(structured, text, etc.) isn’t the point at all. The way of
thinking matters.”17
Stark emphasizes that good data science requires having
subject matter expertise, access to the appropriate
computational tools, and most importantly, critical thinking
and statistics skills. Figure 3 lays out the consequences of
overlooking any of these three foundational elements.
FIGURE 3 FUNDAMENTALS OF DATA SCIENCE
15
1. Irrelevant conclusions. If tools and critical thinking
are present but subject matter expertise is absent, the
organization risks asking the wrong questions, which
can result in irrelevant conclusions and valueless
answers. In addition, the organization will lack the
context necessary to design experiments that will yield
the answers it needs. It will be unable to understand
the intrinsic limitations of the data, says Stark: noise,
sampling issues, response bias, measurement bias,
and so on. This creates a domino effect that can
squander resources and lead to ineffectual—or worse,
harmful—decisions.
2. Inability to execute. If subject matter expertise and
critical thinking are present, but tools are absent, the
organization will be unable to extract insights at scale
and must resort to time-consuming manual methods.
As a result, the organization risks burning out and
eventually losing top analysts, who now must focus on
brute-force methods of processing and analyzing data,
rather than using their skills for more sophisticated and
rewarding applications.
3. Incorrect conclusions. If subject matter expertise
and tools are present, but critical thinking and a
knowledge of applied statistics are absent, the
organization risks drawing the wrong conclusions from
good data, making poor decisions that may ignore
other critical business signals. Like a lack of subject
matter expertise, this can have harmful consequences
to decision making and, therefore, business results.
Given the spread of data throughout organizations and
the impracticality of hiring legions of trained analysts to
keep pace with its growth, the next step is to evolve from
analytics that simply describe a situation to analytics that
predict what may happen next and then to analytics that
prescribe a course of action.18
But even assuming access to the most sophisticated
algorithms that incorporate the most detailed business
knowledge, widespread access to data necessitates
that more people, irrespective of role, grasp the basics of
logic and statistics to understand that data. This doesn’t
mandate universal PhDs in applied statistics, but it does
require an awareness of basic principles of logic.
The good news is that, while the big data industry is still
in its infancy, many of the most valuable tools for analysis
are widely available—and more than two thousand years
old to boot. As early as 350 BCE, Aristotle described 13
logical fallacies, which logicians and philosophers have
built upon during the last 2,400 years.19
Ignoring these
fallacies leaves organizations vulnerable to a host of risks,
which can harm competitive position, financial success,
customer sentiment and trust, and other critical objectives.
One common example is mistaking correlation for
causation, in which organizations erroneously attribute
one outcome (for example, increased revenue) to a
corresponding data point (for example, reach of a
marketing campaign). The increasing use of technologies
that present complex data visually can exacerbate the
problem. Harvard law student Tyler Vigen succinctly (and
sometimes hilariously) presents this phenomenon on his
Spurious Correlations blog.20
The good news is
that, while the big
data industry is
still in its infancy,
many of the most
valuable tools for
analysis are widely
available—and
more than two
thousand years
old to boot.
5.0
DIVORCESPER1000PEOPLE
POUNDS
Divorce rate in Maine
correlates with
Per capita consumption of margarine (US)
Correlation: 0.992558
4.8
4.5
4.4
2000 2001 2002 2003 2004 2005 2006 2007 2008 2009
5.0 4.7 4.6 4.4 4.3 4.1 4.2 4.2 4.2 4.1
8.2 7.0 6.5 5.3 5.2 4.0 4.6 4.5 4.2 3.7
4.2
4.0 3
4
5
6
7
8
9
Divorces rate in Maine Per Capita Consumption of Margarine (US)
FIGURE 4 MISTAKING CORRELATION FOR CAUSATION
Source: Tyler Vigen
In Figure 4, Vigen’s calculations show that there is a 99%
correlation between the divorce rate in Maine and per-
capita margarine consumption. Does the Maine divorce
rate somehow cause US residents to eat margarine? Does
US margarine consumption somehow lead to divorce in
Maine? While these questions are absurd, charts such this
visually suggest a link.
The correlation/causation fallacy is just one of many
logical fallacies that have been documented and described
over the years, including formal fallacies (fallacies of
logic) and informal fallacies (fallacies of evidence or
relevance).21
As more tools become available to visualize
data sets quickly and easily, organizations must invest
as much in critical thinking and data science expertise
as they do in tools to visualize data. Otherwise, they risk
succumbing to logical fallacies.
16
Legal and Ethical
Issues of Big Data
17
18
BIG DATA RAISES
MULTIPLE LEGAL AND
ETHICAL ISSUES
The good news—and the bad news—about big data is that
it can provide unprecedented insight into people, both as
individuals and in aggregate. While surveys can, arguably,
reveal human attitudes, Christian Rudder, CEO of dating
site OKCupid, points out in his 2014 book, Dataclysm:
Who We Are When We Think No One’s Looking, that “we can
pinpoint the speaker, the words, the moment, even the
latitude and longitude of human communication.”22
Many people know the story of how Target discovered
that a young girl was pregnant before her father did; such
stories have become mainstream.23
But much of the
challenge with recent discussions on ethics and privacy
stems from the extremely broad nature of these terms,
the spectrum of personal preferences, and the beliefs of
individuals about the media environment we live in today.
Consider these recent examples:
•	 Seeking to prevent suicides, Samaritans Radar raises
privacy concerns. In October 2014, the BBC reported
that the Samaritans had launched an app that would
monitor words and phrases such as “hate myself” and
“depressed” on Twitter and would notify users if any
of the people they follow appear to be suicidal.24
While
the app was developed to help people reach out to
those in need, privacy advocates expressed concern
that the information could be used to target and profile
individuals without their consent. According to a
petition filed on Change.org, the Samaritans app was
monitoring approximately 900,000 Twitter accounts
as of late October.25
By November 7, the app was
suspended based on public feedback.26
•	 Facebook’s “Emotional Contagion” experiment
provokes outrage about its methodology. In June
2014, Facebook’s Adam Kramer published a study in
The Proceedings of the National Academy of Science,
revealing that “emotional states can be transferred
to others via emotional contagion, leading people
to experience the same emotions without their
awareness.”27
In other words, seeing negative stories
on Facebook can make you sad. The experiment
provoked outrage about the perceived lack of informed
consent, the ethical repercussions of such a study,
the concern over appropriate peer review, the privacy
implications, and the precedent such a study might
set for research using digital data.
•	 Uber knows when and where (and possibly with
whom) you’ve spent the night. In March 2012, Uber
posted, and later deleted, a blog post entitled “Rides of
Glory,” which revealed patterns, by city, of Uber rides
after “brief overnight weekend stays,” also known as
the passenger version of the walk of shame.28
Uber
was later criticized for allegedly revealing its “God
View” at an industry event, showing attendees the
precise location of a particular journalist without his
knowledge, while a December 1, 2014, post on Talking
Points Memo disclosed the story of a job applicant who
was allegedly shown individuals’ live travel information
during an interview.29, 30
Much of the
challenge with
recent discussions
on ethics and
privacy stems
from the extremely
broad nature of
these terms.
1919
•	 A teenager becomes an Internet celebrity—and a
target—in one day. Alex Lee, a 16-year-old Target clerk,
became a meme (#AlexFromTarget) and a celebrity
within hours, based on a photo taken of him unawares
at work. He was invited to appear on The Ellen Show
and was reported to have received death threats on
social media.31
These stories illustrate several attributes of the data
environment we live in now and the attendant ethical
issues they represent:
•	 Data collection. The Samaritans example illustrates
the law of unintended consequences: what may
happen when an app collects data that may, albeit
unintentionally, compromise privacy or put people in
harm’s way.
•	 Methodology and usage. The Facebook example
demonstrates what happens when a company uses
its vast reservoir of data to run technically legal but
ethically ambiguous experiments on its users, raising
questions about the nature of informed consent and
ethical data use in the digital age.
•	 Aggregation, storage, and stewardship. The Uber posts
illustrate, albeit with aggregated data, the intensely
intimate nature of the data users entrust to companies,
raising questions of stewardship, ethics (is aggregating
such data ethical?), and privacy (what happens if data is
intentionally or accidentally disclosed?).
•	 Communication. All of the above examples illustrate
the gray areas between law and ethics, or, from
an organizational point of view, risk management
and customer experience. As data becomes even
more valuable and ubiquitous, the way in which
organizations communicate—about collection,
analysis, intent and usage—will affect not only their
legal risk profile, but their ability to attract and retain
the trust and loyalty of their communities.
Finally, there is, as former secretary of defense Donald
Rumsfeld so famously called it, “the unknown unknown.”
The #AlexFromTarget story demonstrates not only an
example of how an everyday 16 year old (by definition, a
minor) can become an instant Internet celebrity but also
how a company can unwittingly and suddenly find itself
at the center of a crisis not of its own creation, one that
raises issues (compounded because of Lee’s age) of
employee privacy and even safety.
Figure 5 lays out these issues at a high level.
In the past, many of these ethical issues related to data
were cloaked behind proprietary systems and siloed data
stores. As data becomes ubiquitous, more integrated,
and more portable, however, the number and type of
ethical gray areas will multiply, along with a need to
distinguish the organization’s legal responsibilities, such
as what it discloses in a terms of service, from its ethical
ones—the actions it takes that promote or erode the trust
of its community.
Data sources
Data types
Sample size
How the data may have been filtered,
enriched or otherwise modified with:
Democratic
Location
Other metadata
Keyword selection
Human or algorithmic coding
Process for assessing precision, relevance, recall
How the organization may change the
experience based on data
Whether the organization plans to sell the data
in any form to a third party
How data is combined and its impact on
personally identifiable information (PII) or user
experience in general
What data is collected
How and for how long data is stored
Who owns the data
Who has the right to delete data (posts or entire profiles)
Process for deleting data (posts or entire profiles)
Who has the right to view/modify/share data (administration)
Whether and how the data can be extracted
Methodology
Usage
Aggregation
Storage &
Stewardship
Data Collection
The extent to which the
organization proactively
and transparently
informs users/custom-
ers about what and
how it collects, analyz-
es, stores, aggregates,
and uses their data
Action Communication
FIGURE 5 ETHICAL ISSUES RELATED TO DATA
20
Planning for
Data Ubiquity
21
Define data strategy and operating model
If data is to be considered a business-critical asset, it must be treated as such by leaders who drive and
instill strategy across the organization. In 2015, leaders must define what critical data streams are needed
to drive business goals, how they will source them, and what operating model is needed to process,
interpret, and act on them at the right time.
The challenge is that an organization’s departments (and therefore the data) tend to be siloed, which can
result in blind spots, organizational politics, and spiraling costs. Organizations must balance their need
for insight and competitive advantage on the one hand and privacy and rational cost of ownership on the
other. All too frequently, these dual imperatives are in conflict, sometimes unnecessarily so, because the
organization does not have a clear strategy for what data will be used and stored, what data will be used
but not stored, and what data is simply unnecessary.
Update analytics methodology to reflect new data realities
Analyzing unstructured data will never yield the same confidence levels as a simple binary choice; it will
always require interpretation. The key is to make that interpretation transparent, rigorous, and repeatable so
that others can reliably repeat analyses and yield the same or substantially similar results.
This is one area in which there is a tremendous difference between private and public institutions. In
private institutions, work process, product, and data tend to be proprietary. In public institutions, such as
universities, research is subject to the highest levels of scrutiny among academic publications and journals.
It’s also important to engineer the method of measurement into initiatives to reduce ambiguity and provide
a greater ability to trace impact. The broader the topic, the more hashtags can help confirm the provenance
and relevance of social conversation. Tracking codes and multivariate testing are also a useful if not
perfect solution.
Seek out critical thinking and diverse skill sets
Unquestionably, engineering and analytical skills, not to mention skills in applied statistics and data science,
will continue to gain value as organizations become ever more dependent on multiple data types. At the
same time, the demands of analyzing unstructured data also require skill in interpreting context related
to language and behavior, a challenge humans have had since we developed language. After all, even the
cleanest, most reliable data can be misinterpreted, whether intentionally or unintentionally.
To minimize misinterpretation means valuing not only math and engineering but also social sciences and
humanities. These disciplines—sociology, psychology, anthropology, linguistics, ethics, philosophy, and
rhetoric—provide context and help us become better critical thinkers. Without a balance of critical thinking,
business knowledge, and smart analytics tools, we’re in danger of making the wrong decision much more
efficiently, quickly, and with far greater impact than we have in the past.
If we—individually and collectively—are to make the best use of data and extract relevant
insight from it in a trustworthy manner, we must approach data strategy thoughtfully.
Following are some basic tenets of a strategic data plan.
22
3
2
1
CONCLUSION
The hype over “big data” has partially obscured the fact that our ability to collect, analyze,
and act on data—and to some extent predict outcomes based upon it—is a potentially
transformative force for business and humanity alike. While Aldous Huxley couldn’t have
anticipated the impact of a Kim Kardashian magazine cover or the challenges inherent
in understanding how people talk about smoking, he was prescient to call out the ever-
increasing difficulty of identifying relevance in a “sea of irrelevance.”32
It seems likely that the privacy and ethical implications of data ubiquity, not to mention
recent disclosures about government access to and use of personal data, would have
confirmed many of Orwell’s worst fears. At the same time, however, we do not need to
blindly accept the dystopian nightmare he envisioned as our only future. We have an
opportunity--and an obligation--to examine not only the legal, but the ethical implications
of ubiquitous data, and use this understanding to decide how we will use it, sustainably
and responsibly, for years to come.
23
Insist on ethical data use and transparent disclosure
Earl Warren, former chief justice of the United States, once said, “In civilized life, law floats in a sea of
ethics.”33
This is especially true of the digital age, in which few of the implications of digital transformation
have found their way into case law and, as a result, organizational policy. As organizations become
more data centric, for their own benefit as well as their customers’, they must also look closely at
the affirmative and passive decisions they make about where they get their data; their analytics
methodology; how they store, steward, aggregate, and use the data; and how transparently they disclose
these actions.
Reward and reinforce humility and learning
It is nearly impossible to calculate the impact that data will have in our lives in the next decade.
Technologies such as IBM’s Watson, Ayasdi, and others are illustrating the many applications for big
data, whether in healthcare, consumer products, financial services, energy, or elsewhere. Meanwhile, the
Internet of Things introduces data feeds from sensors, which can be combined with other data streams
to deliver specific, relevant, and even predictive insights that will only compound volume, velocity, and
variety challenges.
Yet the world is just starting to come to terms with the impact of data ubiquity from the technology,
business, research, cultural, and ethical perspectives. The most important and perhaps most difficult
impact of data ubiquity is the fact that it radically undermines traditional methods of analysis and laughs
at our desire for certainty. The only strategy to combat the fear of uncertainty is to accept and work with
the limits of the data and approach the science of challenging data sets with an appetite for continuous
learning, whether the goal is to sell a pair of shoes or to help prevent cancer
5
4
Image courtesy of GarySchroeder
24
ENDNOTES
1
You can view the talk at http://www.ted.com/talks/susan_
etlinger_what_do_we_do_with_all_this_big_data.
2
Neil Postman, Amusing Ourselves to Death: Public Discourse in the
Age of Show Business (New York: Penguin Books,1985), vii.
3
For a more detailed view, a good starting point is ““3D Data
Management: Controlling Data Volume, Velocity and Variety,”“
published by META Group on February 6, 2001, http://blogs.
gartner.com/doug-laney/files/2012/01/ad949-3D-Data-
Management-Controlling-Data-Volume-Velocity-and-Variety.pdf.
4
“What Is Big Data?” IBM, accessed January 6, 2015, http://www-
01.ibm.com/software/data/bigdata/what-is-big-data.html.
5
“Statistics,” YouTube, accessed January 6, 2015, https://www.
youtube.com/yt/press/statistics.html.
6
“Stats,” WordPress, cached on November 2, 2014, http://
sq.wordpress.com/stats/.
7
“About,” Twitter, accessed January 6, 2015, https://about.twitter.
com/company.
8
Darin Stewart, ““Big Content: The Unstructured Side of Big Data,”“
Gartner Group, May 1, 2013, http://blogs.gartner.com/darin-
stewart/2013/05/01/big-content-the-unstructured-side-of-big-
data/.
9
Zacks Equity Research, “Stock Market News for
December 17, 2014 - Market News,” Yahoo! Finance,
December 17, 2014, http://finance.yahoo.com/news/
stock-market-news-december-17-151003130.html;_
ylt=AwrBJSCwLpNUWlIAatyTmYlQ.
10
Kalev Leetaru, “Why Big Data Missed the Early Warning Signs of
Ebola,” Foreign Policy, September 26, 2014, http://foreignpolicy.
com/2014/09/26/why-big-data-missed-the-early-warning-signs-
of-ebola/.
11
See also: Sherry L. Emery, Glen Szczypka, Eulàlia P. Abril,
Yoonsang Kim, and Lisa Vera, “Are You Scared Yet? Evaluating
Fear Appeal Messages in Tweets About the Tips Campaign,”
Journal of Communication, 64 (2014): 278–295, doi: 10.1111/
jcom.12083.
12
“Cohen’s Kappa, “University of Nebraska–Lincoln, accessed
January 6, 2015, http://psych.unl.edu/psycrs/handcomp/
hckappa.PDF.
13
Sherry L. Emery, “Are You Scared Yet?.”’’
14
Ibid.
15
Ibid.
16
“Linguistic Mapping Reveals How Word Meanings Sometimes
Change Overnight,” MIT Technology Review, November 23, 2014,
http://www.technologyreview.com/view/532776/linguistic-
mapping-reveals-how-word-meanings-sometimes-change-
overnight/.
17
Philip Stark, Twitter comment, November 24, 2014, https://
twitter.com/philipbstark/status/536955754163363840.
18
For a quick primer on descriptive, predictive, and prescriptive
analytics, see this interview with data scientist Michael Wu
of Lithium by Jeff Bertolucci in InformationWeek: http://www.
informationweek.com/big-data/big-data-analytics/big-
data-analytics-descriptive-vs-predictive-vs-prescriptive/d/d-
id/1113279.
19
To download the text, go to http://classics.mit.edu/Aristotle/
sophist_refut.html.
20
Vigen maintains a running list of spurious correlations at his
blog, Spurious Correlations (http://tylervigen.com/).
21
For an excellent tutorial on logical fallacies, see chapter 2 of
“SticiGui,” an online statistics textbook by Philip B. Stark, professor
and chair of the department of statistics, University of California,
Berkeley: http://www.stat.berkeley.edu/~stark/SticiGui/Text/
reasoning.htm.
22
Rudder, Dataclysm, 146.
23
Kashmir Hill,, “How Target Figured Out a Teen Girl Was Pregnant
Before Her Father Did,” Forbes, February 16, 2012, http://www.
forbes.com/sites/kashmirhill/2012/02/16/how-target-figured-
out-a-teen-girl-was-pregnant-before-her-father-did/.
24
Zoe Kleinman, “Samaritans App Monitors Twitter Feeds for
Suicide Warnings,” BBC News, October 28, 2014, http://www.bbc.
com/news/technology-29801214.
25
Adrian Short, “Shut Down Samaritans Radar,” Change.org,
accessed January 6, 2015, https://www.change.org/p/twitter-
inc-shut-down-samaritans-radar.
26
“Samaritans Radar announcement - Friday 7 November,”
Samaritans, November 7, 2014, http://www.samaritans.org/
news/samaritans-radar-announcement-friday-7-november.
25
27
Adam D.I. Kramer, Jamie E. Guillory, and Jeffrey T. Hancock,
“Experimental Evidence of Massive-Scale Emotional Contagion
Through Social Networks,” Proceedings of the National Academy
of Sciences of the United States of America, vol. 111 (24), DOI:
10.1073/pnas.1320040111.
28
Voytek, “Rides of Glory,” Uber, cached March 26, 2012, https://
web.archive.org/web/20140828024924/http://blog.uber.com/
ridesofglory.
29
Kashmir Hill, “‘God View’: Uber Allegedly Stalked Users for
Party-Goers’ Viewing Pleasure (Updated),” Forbes, October 3,
2014, http://www.forbes.com/sites/kashmirhill/2014/10/03/
god-view-uber-allegedly-stalked-users-for-party-goers-viewing-
pleasure/. Talking Points Mem: Uber Let Job Applicant Access
Controversial December 1, 2014: http://talkingpointsmemo.
com/livewire/uber-job-applicant-ride-logs.
30
Caitlin MacNeal, “Report: Uber Let Job Applicant Access
Controversial “‘God View’ Mode,” Talking Points Memo, December
1, 2014, http://talkingpointsmemo.com/livewire/uber-job-
applicant-ride-logs.
31
Nick Bilton, “Alex from Target: The Other Side of Fame”, The
New York Times, November 12, 2014, http://www.nytimes.
com/2014/11/13/style/alex-from-target-the-other-side-of-fame.
html?_r=0.
32
Aldous Huxley, Brave New World Revisited (New York:
HarperCollins Publishers, 1958), 36.
33
Earl Warren, speech at the Louis Marshall Award Dinner of the
Jewish Theological Seminary (Americana Hotel, New York City,
November 11, 1962).
SOURCES AND ACKNOWLEDGMENTS
This document was developed as a companion piece to a talk
given at TED@IBM in San Francisco, California, on September 23,
2014. As such, it was built on online and in-person conversations
with market influencers, technology vendors, brands, academics,
and others on the effective and ethical use of big data, as well as
secondary research, including relevant and timely books, articles,
and news stories. My deepest gratitude to the following:
•	 The team at the Health Media Collaboratory at the University
of Illinois at Chicago, specifically Sherry Emery, Eman Aly, and
Glen Szcypka for sharing their research and methodology and
educating me about the nuances of interpreting big data for
medical research.
•	 My fellow board members at the Big Boulder Initiative for
their insights and perspective on the effective and ethical
use of social data: Pernille Bruun-Jensen, CMO, NetBase;
Damon Cortesi, Founder and CTO, Simply Measured; Jason
Gowans, Director, Data Lab, Nordstrom; Will McInnes, CMO,
Brandwatch; Chris Moody, Vice President, Data Strategy,
Twitter (Chair); Stuart Shulman, Founder and CEO, Texifier;
Carmen Sutter, Product Manager, Social, Adobe; and Tom
Watson, Head of Sales, Hanweck Associates, LLC.
•	 The team at TED who helped me hone and focus the talk and
provided invaluable feedback throughout: Juliet Blake
and Anna Bechtol.
•	 The team at IBM Social Business for planning, executing and
marketing a superb event: Michela Stribling, Beth McElroy,
Jacqueline Saenz and Michelle Killebrew.
•	 My fellow TED@IBM speakers: Gianluca Ambrosetti, Kare
Anderson, Brad Bird, Monika Blaumueller, Erick Brethenoux,
Lisa Seacat DeLuca, Jon Iwata, Bryan Kramer, Tan Le, Charlene
Li, Florian Pinel, Inhi Cho Suh, Marie Wallace,
and Kareem Yusuf.
•	 Philip Stark, professor and chair of Statistics, University of
California, Berkeley, for an extremely insightful perspective on
the methodological and organizational requirements of big
data, as well as access to his superb course materials.
26
27
•	 The organizers and speakers at the International Symposium
on Digital Ethics at Loyola University in November 2014, with
whom I had some incredibly insightful conversations: Don
Heider, dean, School of Communication, Loyola University
Chicago; Thorsten Busch, senior research fellow, Institute for
Business Ethics, University of St. Gallen; Michael Koliska, PhD
candidate at University of Maryland; and Caitlin Ring, assistant
professor of strategic communication at Seattle University.
•	 Farida Vis, research fellow in the Social Sciences in the
Information School at the University of Sheffield.
•	 The teams at DataSift (Nick Halstead, Tim Barker, Jason Rose,
Seth Catalli); Lithium Technologies (Katy Keim and Nicol
Addison); and Oracle (Tara Roberts and Christine Wan) for
valuable insights along the way.
•	 Tyler Vigen for his Spurious Correlations blog, which makes a
complex topic simple and fun to explain; Gary Schroeder for
his wonderful visual storytelling of my TED talk; Daniel K. Davis
for his superb photography at TED@IBM; Vladimir Mirkovic for
graphic design; and Erin Brenner for copyediting.
•	 My talented teammates at Altimeter Group: Rebecca Lieb, who
edited this report, Cheryl Graves, Jessica Groopman, Jaimy
Szymanski, Christine Tran, and, of course, Charlene Li.
Input into this document does not represent a complete
endorsement of the report by the individuals or organizations
listed above. Finally, any errors are mine alone.
OPEN RESEARCH
This independent research report was 100% funded by Altimeter
Group. This report is published under the principle of Open
Research and is intended to advance the industry at no cost. This
report is intended for you to read, utilize, and share with others; if
you do so, please provide attribution to Altimeter Group.
PERMISSIONS
The Creative Commons License is Attribution-Noncommercial-
ShareAlike 3.0 United States, which can be found at https://
creativecommons.org/licenses/by-nc-sa/3.0/us/.
DISCLAIMER
ALTHOUGH THE INFORMATION AND DATA USED IN THIS REPORT HAVE BEEN PRODUCED
AND PROCESSED FROM SOURCES BELIEVED TO BE RELIABLE, NO WARRANTY EXPRESSED
OR IMPLIED IS MADE REGARDING THE COMPLETENESS, ACCURACY, ADEQUACY, OR USE
OF THE INFORMATION. THE AUTHORS AND CONTRIBUTORS OF THE INFORMATION AND
DATA SHALL HAVE NO LIABILITY FOR ERRORS OR OMISSIONS CONTAINED HEREIN OR FOR
INTERPRETATIONS THEREOF. REFERENCE HEREIN TO ANY SPECIFIC PRODUCT OR VENDOR
BY TRADE NAME, TRADEMARK, OR OTHERWISE DOES NOT CONSTITUTE OR IMPLY ITS
ENDORSEMENT, RECOMMENDATION, OR FAVORING BY THE AUTHORS OR CONTRIBUTORS
AND SHALL NOT BE USED FOR ADVERTISING OR PRODUCT ENDORSEMENT PURPOSES.
THEOPINIONS EXPRESSED HEREIN ARE SUBJECT TO CHANGE WITHOUT NOTICE.
About Us
How to Work with Us
Altimeter Group research is applied and brought to life in our client engagements. We help organizations understand and
take advantage of digital disruption. There are several ways Altimeter can help you with your business initiatives:
•	 Strategy Consulting. Altimeter creates strategies and plans to help companies act on disruptive business and
technology trends. Our team of analysts and consultants works with senior executives, strategists .and marketers on
needs assessment, strategy roadmaps, and pragmatic recommendations across disruptive trends.
•	 Education and Workshops. Engage an Altimeter speaker to help make the business case to executives or arm
practitioners with new knowledge and skills.
•	 Advisory. Retain Altimeter for ongoing research-based advisory: conduct an ad-hoc session to address an immediate
challenge; or gain deeper access to research and strategy counsel.
To learn more about Altimeter’s offerings, contact sales@altimetergroup.com.
28
Altimeter is a research and
consulting firm that helps
companies understand and
act on technology disruption.
We give business leaders the
insight and confidence to help
their companies thrive in the
face of disruption. In addition to
publishing research, Altimeter
Group analysts speak and
provide strategy consulting
on trends in leadership, digital
transformation, social business,
data disruption and content
marketing strategy.
Altimeter Group
1875 S Grant St #680
San Mateo, CA 94402
info@altimetergroup.com
www.altimetergroup.com
@altimetergroup
650.212.2272
Susan Etlinger, Industry Analyst
Susan Etlinger is an industry analyst at Altimeter Group,
where she works with global organizations to develop
data and analytics strategies that support their business
objectives. Susan has a diverse background in marketing
and strategic planning within both corporations and
agencies. She’s a frequent speaker on social data and
analytics and has been extensively quoted in outlets,
including Fast Company, BBC, The New York Times, and The
Wall Street Journal. Find her on Twitter at @setlinger and at
her blog, Thought Experiments, at susanetlinger.com.
Rebecca Lieb, Industry Analyst
Rebecca Lieb (@lieblink) covers digital advertising and
media, encompassing brands, publishers, agencies and
technology vendors. In addition to her background as a
marketing executive, she was VP and editor-in-chief of the
ClickZ Network for over seven years. She’s written two
books on digital marketing: The Truth About Search Engine
Optimization (2009) and Content Marketing (2011). Rebecca
blogs at www.rebeccalieb.com/blog.

Más contenido relacionado

La actualidad más candente

Social Data Intelligence: Integrating Social and Enterprise Data for Competit...
Social Data Intelligence: Integrating Social and Enterprise Data for Competit...Social Data Intelligence: Integrating Social and Enterprise Data for Competit...
Social Data Intelligence: Integrating Social and Enterprise Data for Competit...Susan Etlinger
 
Skillsoft Strategy: Harnessing the Power of Big Data
Skillsoft Strategy: Harnessing the Power of Big DataSkillsoft Strategy: Harnessing the Power of Big Data
Skillsoft Strategy: Harnessing the Power of Big DataSkillsoft
 
[Report] What Do We Do With All This Big Data? Fostering Insight and Trust i...
[Report]  What Do We Do With All This Big Data? Fostering Insight and Trust i...[Report]  What Do We Do With All This Big Data? Fostering Insight and Trust i...
[Report] What Do We Do With All This Big Data? Fostering Insight and Trust i...Altimeter, a Prophet Company
 
1099 Problems: Self-Employment and the Future of Financial Services
1099 Problems: Self-Employment and the Future of Financial Services1099 Problems: Self-Employment and the Future of Financial Services
1099 Problems: Self-Employment and the Future of Financial ServicesCore Innovation Capital
 
Disruptive Innovation in 2016
Disruptive Innovation in 2016Disruptive Innovation in 2016
Disruptive Innovation in 2016Jeremy Waite
 
What Makes Great Infographics
What Makes Great InfographicsWhat Makes Great Infographics
What Makes Great InfographicsSlideShare
 
Digital Leadership Interview : Jon Nordmark, co-founder of Iterate Studio
Digital Leadership Interview : Jon Nordmark, co-founder of Iterate StudioDigital Leadership Interview : Jon Nordmark, co-founder of Iterate Studio
Digital Leadership Interview : Jon Nordmark, co-founder of Iterate StudioCapgemini
 
[REPORT PREVIEW] Employee Adoption of Collaboration Tools in 2018
[REPORT PREVIEW] Employee Adoption of Collaboration Tools in 2018[REPORT PREVIEW] Employee Adoption of Collaboration Tools in 2018
[REPORT PREVIEW] Employee Adoption of Collaboration Tools in 2018Altimeter, a Prophet Company
 
Where In The World Is Your Sensitive Data?
Where In The World Is Your Sensitive Data?Where In The World Is Your Sensitive Data?
Where In The World Is Your Sensitive Data?Druva
 
Strategies for the Age of Digital Disruption #DTR7
Strategies for the Age of Digital Disruption #DTR7Strategies for the Age of Digital Disruption #DTR7
Strategies for the Age of Digital Disruption #DTR7Capgemini
 
Digital Retail Innovations 2015
Digital Retail Innovations 2015Digital Retail Innovations 2015
Digital Retail Innovations 2015Webloyalty UK
 
First Round State of Startups 2018
First Round State of Startups 2018First Round State of Startups 2018
First Round State of Startups 2018First Round Capital
 
Technology Vision 2017 infographic
Technology Vision 2017 infographicTechnology Vision 2017 infographic
Technology Vision 2017 infographicAccenture Technology
 
Keeping it real - How authentic is your Corporate Purpose?
 Keeping it real - How authentic is your Corporate Purpose?  Keeping it real - How authentic is your Corporate Purpose?
Keeping it real - How authentic is your Corporate Purpose? Burson-Marsteller
 

La actualidad más candente (19)

Social Data Intelligence: Integrating Social and Enterprise Data for Competit...
Social Data Intelligence: Integrating Social and Enterprise Data for Competit...Social Data Intelligence: Integrating Social and Enterprise Data for Competit...
Social Data Intelligence: Integrating Social and Enterprise Data for Competit...
 
Skillsoft Strategy: Harnessing the Power of Big Data
Skillsoft Strategy: Harnessing the Power of Big DataSkillsoft Strategy: Harnessing the Power of Big Data
Skillsoft Strategy: Harnessing the Power of Big Data
 
The economics of digital identity
The economics of digital identityThe economics of digital identity
The economics of digital identity
 
[Report] What Do We Do With All This Big Data? Fostering Insight and Trust i...
[Report]  What Do We Do With All This Big Data? Fostering Insight and Trust i...[Report]  What Do We Do With All This Big Data? Fostering Insight and Trust i...
[Report] What Do We Do With All This Big Data? Fostering Insight and Trust i...
 
1099 Problems: Self-Employment and the Future of Financial Services
1099 Problems: Self-Employment and the Future of Financial Services1099 Problems: Self-Employment and the Future of Financial Services
1099 Problems: Self-Employment and the Future of Financial Services
 
Disruptive Innovation in 2016
Disruptive Innovation in 2016Disruptive Innovation in 2016
Disruptive Innovation in 2016
 
What Makes Great Infographics
What Makes Great InfographicsWhat Makes Great Infographics
What Makes Great Infographics
 
Millennials in government
Millennials in governmentMillennials in government
Millennials in government
 
Digital Leadership Interview : Jon Nordmark, co-founder of Iterate Studio
Digital Leadership Interview : Jon Nordmark, co-founder of Iterate StudioDigital Leadership Interview : Jon Nordmark, co-founder of Iterate Studio
Digital Leadership Interview : Jon Nordmark, co-founder of Iterate Studio
 
[REPORT PREVIEW] Employee Adoption of Collaboration Tools in 2018
[REPORT PREVIEW] Employee Adoption of Collaboration Tools in 2018[REPORT PREVIEW] Employee Adoption of Collaboration Tools in 2018
[REPORT PREVIEW] Employee Adoption of Collaboration Tools in 2018
 
Where In The World Is Your Sensitive Data?
Where In The World Is Your Sensitive Data?Where In The World Is Your Sensitive Data?
Where In The World Is Your Sensitive Data?
 
Strategies for the Age of Digital Disruption #DTR7
Strategies for the Age of Digital Disruption #DTR7Strategies for the Age of Digital Disruption #DTR7
Strategies for the Age of Digital Disruption #DTR7
 
Digital Retail Innovations 2015
Digital Retail Innovations 2015Digital Retail Innovations 2015
Digital Retail Innovations 2015
 
[REPORT PREVIEW] AI in the Enterprise
[REPORT PREVIEW] AI in the Enterprise[REPORT PREVIEW] AI in the Enterprise
[REPORT PREVIEW] AI in the Enterprise
 
First Round State of Startups 2018
First Round State of Startups 2018First Round State of Startups 2018
First Round State of Startups 2018
 
Technology Vision 2017 infographic
Technology Vision 2017 infographicTechnology Vision 2017 infographic
Technology Vision 2017 infographic
 
Keeping it real - How authentic is your Corporate Purpose?
 Keeping it real - How authentic is your Corporate Purpose?  Keeping it real - How authentic is your Corporate Purpose?
Keeping it real - How authentic is your Corporate Purpose?
 
FinTech survey 2015
FinTech survey 2015FinTech survey 2015
FinTech survey 2015
 
[REPORT PREVIEW] GDPR Beyond May 25, 2018
[REPORT PREVIEW] GDPR Beyond May 25, 2018[REPORT PREVIEW] GDPR Beyond May 25, 2018
[REPORT PREVIEW] GDPR Beyond May 25, 2018
 

Destacado

Best Practices in Experimenting with Existing Channels - Omni Digital
Best Practices in Experimenting with Existing Channels - Omni DigitalBest Practices in Experimenting with Existing Channels - Omni Digital
Best Practices in Experimenting with Existing Channels - Omni DigitalCharlene Li
 
Collaborative Storytelling: Presentation at Startupfest 2013
Collaborative Storytelling: Presentation at Startupfest 2013Collaborative Storytelling: Presentation at Startupfest 2013
Collaborative Storytelling: Presentation at Startupfest 2013Susan Etlinger
 
[Slides] Four Steps Brands Can Take to Design Internet of Things Experiences
[Slides] Four Steps Brands Can Take to Design Internet of Things Experiences[Slides] Four Steps Brands Can Take to Design Internet of Things Experiences
[Slides] Four Steps Brands Can Take to Design Internet of Things ExperiencesAltimeter, a Prophet Company
 
The Future Of Business by Altimeter Group
The Future Of Business by Altimeter GroupThe Future Of Business by Altimeter Group
The Future Of Business by Altimeter GroupCharlene Li
 
Leading Digital Transformation: Putting People First
Leading Digital Transformation: Putting People FirstLeading Digital Transformation: Putting People First
Leading Digital Transformation: Putting People FirstCharlene Li
 
[Slides] Content Marketing Software RFP, by Altimeter Group
[Slides] Content Marketing Software RFP, by Altimeter Group[Slides] Content Marketing Software RFP, by Altimeter Group
[Slides] Content Marketing Software RFP, by Altimeter GroupAltimeter, a Prophet Company
 
[Slides] Content Marketing Performance by Altimeter Group
[Slides] Content Marketing Performance by Altimeter Group[Slides] Content Marketing Performance by Altimeter Group
[Slides] Content Marketing Performance by Altimeter GroupAltimeter, a Prophet Company
 
State of Research and Consulting
State of Research and ConsultingState of Research and Consulting
State of Research and ConsultingCharlene Li
 
Social Data Intelligence: Webinar with Susan Etlinger
Social Data Intelligence: Webinar with Susan EtlingerSocial Data Intelligence: Webinar with Susan Etlinger
Social Data Intelligence: Webinar with Susan EtlingerSusan Etlinger
 
[Slides] The Inevitability of a Mobile-Only Customer Experience by Altimeter ...
[Slides] The Inevitability of a Mobile-Only Customer Experience by Altimeter ...[Slides] The Inevitability of a Mobile-Only Customer Experience by Altimeter ...
[Slides] The Inevitability of a Mobile-Only Customer Experience by Altimeter ...Altimeter, a Prophet Company
 
"The Engaged Leader" at SXSW Interactive
"The Engaged Leader" at SXSW Interactive"The Engaged Leader" at SXSW Interactive
"The Engaged Leader" at SXSW InteractiveCharlene Li
 
[Slides] Customer Experience in the Internet of Things by Altimeter Group
[Slides] Customer Experience in the Internet of Things by Altimeter Group[Slides] Customer Experience in the Internet of Things by Altimeter Group
[Slides] Customer Experience in the Internet of Things by Altimeter GroupAltimeter, a Prophet Company
 
Emerging technologies and how they will impact the business and marketing lan...
Emerging technologies and how they will impact the business and marketing lan...Emerging technologies and how they will impact the business and marketing lan...
Emerging technologies and how they will impact the business and marketing lan...McKinsey on Marketing & Sales
 
Top Digital Transformation Trends and Priorities for 2016
Top Digital Transformation Trends and Priorities for 2016Top Digital Transformation Trends and Priorities for 2016
Top Digital Transformation Trends and Priorities for 2016Charlene Li
 

Destacado (17)

Best Practices in Experimenting with Existing Channels - Omni Digital
Best Practices in Experimenting with Existing Channels - Omni DigitalBest Practices in Experimenting with Existing Channels - Omni Digital
Best Practices in Experimenting with Existing Channels - Omni Digital
 
Collaborative Storytelling: Presentation at Startupfest 2013
Collaborative Storytelling: Presentation at Startupfest 2013Collaborative Storytelling: Presentation at Startupfest 2013
Collaborative Storytelling: Presentation at Startupfest 2013
 
[Slides] Four Steps Brands Can Take to Design Internet of Things Experiences
[Slides] Four Steps Brands Can Take to Design Internet of Things Experiences[Slides] Four Steps Brands Can Take to Design Internet of Things Experiences
[Slides] Four Steps Brands Can Take to Design Internet of Things Experiences
 
The Future Of Business by Altimeter Group
The Future Of Business by Altimeter GroupThe Future Of Business by Altimeter Group
The Future Of Business by Altimeter Group
 
Leading Digital Transformation: Putting People First
Leading Digital Transformation: Putting People FirstLeading Digital Transformation: Putting People First
Leading Digital Transformation: Putting People First
 
[Slides] Content Marketing Software RFP, by Altimeter Group
[Slides] Content Marketing Software RFP, by Altimeter Group[Slides] Content Marketing Software RFP, by Altimeter Group
[Slides] Content Marketing Software RFP, by Altimeter Group
 
Crafting a Digital Strategy
Crafting a Digital StrategyCrafting a Digital Strategy
Crafting a Digital Strategy
 
[Slides] Content Marketing Performance by Altimeter Group
[Slides] Content Marketing Performance by Altimeter Group[Slides] Content Marketing Performance by Altimeter Group
[Slides] Content Marketing Performance by Altimeter Group
 
State of Research and Consulting
State of Research and ConsultingState of Research and Consulting
State of Research and Consulting
 
Social Data Intelligence: Webinar with Susan Etlinger
Social Data Intelligence: Webinar with Susan EtlingerSocial Data Intelligence: Webinar with Susan Etlinger
Social Data Intelligence: Webinar with Susan Etlinger
 
[Slides] The Inevitability of a Mobile-Only Customer Experience by Altimeter ...
[Slides] The Inevitability of a Mobile-Only Customer Experience by Altimeter ...[Slides] The Inevitability of a Mobile-Only Customer Experience by Altimeter ...
[Slides] The Inevitability of a Mobile-Only Customer Experience by Altimeter ...
 
"The Engaged Leader" at SXSW Interactive
"The Engaged Leader" at SXSW Interactive"The Engaged Leader" at SXSW Interactive
"The Engaged Leader" at SXSW Interactive
 
[Slides] A Culture of Content by Altimeter Group
[Slides] A Culture of Content by Altimeter Group[Slides] A Culture of Content by Altimeter Group
[Slides] A Culture of Content by Altimeter Group
 
[Slides] Customer Experience in the Internet of Things by Altimeter Group
[Slides] Customer Experience in the Internet of Things by Altimeter Group[Slides] Customer Experience in the Internet of Things by Altimeter Group
[Slides] Customer Experience in the Internet of Things by Altimeter Group
 
Emerging technologies and how they will impact the business and marketing lan...
Emerging technologies and how they will impact the business and marketing lan...Emerging technologies and how they will impact the business and marketing lan...
Emerging technologies and how they will impact the business and marketing lan...
 
B2B Digital Sales - Sell the buyer’s way
B2B Digital Sales - Sell the buyer’s wayB2B Digital Sales - Sell the buyer’s way
B2B Digital Sales - Sell the buyer’s way
 
Top Digital Transformation Trends and Priorities for 2016
Top Digital Transformation Trends and Priorities for 2016Top Digital Transformation Trends and Priorities for 2016
Top Digital Transformation Trends and Priorities for 2016
 

Similar a What-Do-We-Do-with-All-This-Big-Data-Altimeter-Group

23 ijcse-01238-1indhunisha
23 ijcse-01238-1indhunisha23 ijcse-01238-1indhunisha
23 ijcse-01238-1indhunishaShivlal Mewada
 
Introduction to big data
Introduction to big dataIntroduction to big data
Introduction to big dataHari Priya
 
Notes from the Observation Deck // A Data Revolution
Notes from the Observation Deck // A Data Revolution Notes from the Observation Deck // A Data Revolution
Notes from the Observation Deck // A Data Revolution gngeorge
 
Smart Data Module 1 introduction to big and smart data
Smart Data Module 1 introduction to big and smart dataSmart Data Module 1 introduction to big and smart data
Smart Data Module 1 introduction to big and smart datacaniceconsulting
 
Policy paper need for focussed big data & analytics skillset building throu...
Policy  paper  need for focussed big data & analytics skillset building throu...Policy  paper  need for focussed big data & analytics skillset building throu...
Policy paper need for focussed big data & analytics skillset building throu...Ritesh Shrivastava
 
Bigger and Better: Employing a Holistic Strategy for Big Data toward a Strong...
Bigger and Better: Employing a Holistic Strategy for Big Data toward a Strong...Bigger and Better: Employing a Holistic Strategy for Big Data toward a Strong...
Bigger and Better: Employing a Holistic Strategy for Big Data toward a Strong...IT Network marcus evans
 
Big data consumer analytics and the transformation of marketing
Big data consumer analytics and the transformation of marketingBig data consumer analytics and the transformation of marketing
Big data consumer analytics and the transformation of marketingNicha Tatsaneeyapan
 
Semantic Web Investigation within Big Data Context
Semantic Web Investigation within Big Data ContextSemantic Web Investigation within Big Data Context
Semantic Web Investigation within Big Data ContextMurad Daryousse
 
Big data march2016 ipsos mori
Big data march2016 ipsos moriBig data march2016 ipsos mori
Big data march2016 ipsos moriChris Guthrie
 
Introduction to big data – convergences.
Introduction to big data – convergences.Introduction to big data – convergences.
Introduction to big data – convergences.saranya270513
 
Communications of the Association for Information SystemsV.docx
Communications of the Association for Information SystemsV.docxCommunications of the Association for Information SystemsV.docx
Communications of the Association for Information SystemsV.docxmonicafrancis71118
 
Big Data for International Development
Big Data for International DevelopmentBig Data for International Development
Big Data for International DevelopmentAlex Rascanu
 
Climate Change 2015: Continuing Education Programming Implications
Climate Change 2015: Continuing Education Programming ImplicationsClimate Change 2015: Continuing Education Programming Implications
Climate Change 2015: Continuing Education Programming Implicationsdorothydurkin
 
Data-driven journalism: What is there to learn? (Stanford, June 2010) #ddj
Data-driven journalism: What is there to learn? (Stanford, June 2010) #ddjData-driven journalism: What is there to learn? (Stanford, June 2010) #ddj
Data-driven journalism: What is there to learn? (Stanford, June 2010) #ddjMirko Lorenz
 
Making sense-of-the-chaos
Making sense-of-the-chaosMaking sense-of-the-chaos
Making sense-of-the-chaosswaipnew
 

Similar a What-Do-We-Do-with-All-This-Big-Data-Altimeter-Group (20)

23 ijcse-01238-1indhunisha
23 ijcse-01238-1indhunisha23 ijcse-01238-1indhunisha
23 ijcse-01238-1indhunisha
 
Big data Paper
Big data PaperBig data Paper
Big data Paper
 
Big Data-Job 2
Big Data-Job 2Big Data-Job 2
Big Data-Job 2
 
Introduction to big data
Introduction to big dataIntroduction to big data
Introduction to big data
 
Notes from the Observation Deck // A Data Revolution
Notes from the Observation Deck // A Data Revolution Notes from the Observation Deck // A Data Revolution
Notes from the Observation Deck // A Data Revolution
 
Smart Data Module 1 introduction to big and smart data
Smart Data Module 1 introduction to big and smart dataSmart Data Module 1 introduction to big and smart data
Smart Data Module 1 introduction to big and smart data
 
Policy paper need for focussed big data & analytics skillset building throu...
Policy  paper  need for focussed big data & analytics skillset building throu...Policy  paper  need for focussed big data & analytics skillset building throu...
Policy paper need for focussed big data & analytics skillset building throu...
 
Bigger and Better: Employing a Holistic Strategy for Big Data toward a Strong...
Bigger and Better: Employing a Holistic Strategy for Big Data toward a Strong...Bigger and Better: Employing a Holistic Strategy for Big Data toward a Strong...
Bigger and Better: Employing a Holistic Strategy for Big Data toward a Strong...
 
Big data consumer analytics and the transformation of marketing
Big data consumer analytics and the transformation of marketingBig data consumer analytics and the transformation of marketing
Big data consumer analytics and the transformation of marketing
 
Semantic Web Investigation within Big Data Context
Semantic Web Investigation within Big Data ContextSemantic Web Investigation within Big Data Context
Semantic Web Investigation within Big Data Context
 
Monetize Big Data
Monetize Big DataMonetize Big Data
Monetize Big Data
 
data, big data, open data
data, big data, open datadata, big data, open data
data, big data, open data
 
Big data march2016 ipsos mori
Big data march2016 ipsos moriBig data march2016 ipsos mori
Big data march2016 ipsos mori
 
Introduction to big data – convergences.
Introduction to big data – convergences.Introduction to big data – convergences.
Introduction to big data – convergences.
 
Communications of the Association for Information SystemsV.docx
Communications of the Association for Information SystemsV.docxCommunications of the Association for Information SystemsV.docx
Communications of the Association for Information SystemsV.docx
 
Big Data for International Development
Big Data for International DevelopmentBig Data for International Development
Big Data for International Development
 
Climate Change 2015: Continuing Education Programming Implications
Climate Change 2015: Continuing Education Programming ImplicationsClimate Change 2015: Continuing Education Programming Implications
Climate Change 2015: Continuing Education Programming Implications
 
Data-driven journalism: What is there to learn? (Stanford, June 2010) #ddj
Data-driven journalism: What is there to learn? (Stanford, June 2010) #ddjData-driven journalism: What is there to learn? (Stanford, June 2010) #ddj
Data-driven journalism: What is there to learn? (Stanford, June 2010) #ddj
 
GITA April 2015 Newsletter
GITA April 2015 NewsletterGITA April 2015 Newsletter
GITA April 2015 Newsletter
 
Making sense-of-the-chaos
Making sense-of-the-chaosMaking sense-of-the-chaos
Making sense-of-the-chaos
 

Más de Susan Etlinger

TrustImperative_etlinger
TrustImperative_etlingerTrustImperative_etlinger
TrustImperative_etlingerSusan Etlinger
 
Shiny Object or Digital Intelligence Hub? Evolution of the Social Media Comma...
Shiny Object or Digital Intelligence Hub? Evolution of the Social Media Comma...Shiny Object or Digital Intelligence Hub? Evolution of the Social Media Comma...
Shiny Object or Digital Intelligence Hub? Evolution of the Social Media Comma...Susan Etlinger
 
Shiny Object or Digital Intelligence Hub? Evolution of the Social Media Comma...
Shiny Object or Digital Intelligence Hub? Evolution of the Social Media Comma...Shiny Object or Digital Intelligence Hub? Evolution of the Social Media Comma...
Shiny Object or Digital Intelligence Hub? Evolution of the Social Media Comma...Susan Etlinger
 
Data everywhere-lessons-from-big-data-in-the-television-industry-altimeter-group
Data everywhere-lessons-from-big-data-in-the-television-industry-altimeter-groupData everywhere-lessons-from-big-data-in-the-television-industry-altimeter-group
Data everywhere-lessons-from-big-data-in-the-television-industry-altimeter-groupSusan Etlinger
 
Canary in the Coalmine: How Social Media Can Prepare Us for Big Data
Canary in the Coalmine: How Social Media Can Prepare Us for Big Data Canary in the Coalmine: How Social Media Can Prepare Us for Big Data
Canary in the Coalmine: How Social Media Can Prepare Us for Big Data Susan Etlinger
 
Working with-industry-analysts
Working with-industry-analystsWorking with-industry-analysts
Working with-industry-analystsSusan Etlinger
 
A Framework for Social Analytics
A Framework for Social AnalyticsA Framework for Social Analytics
A Framework for Social AnalyticsSusan Etlinger
 

Más de Susan Etlinger (7)

TrustImperative_etlinger
TrustImperative_etlingerTrustImperative_etlinger
TrustImperative_etlinger
 
Shiny Object or Digital Intelligence Hub? Evolution of the Social Media Comma...
Shiny Object or Digital Intelligence Hub? Evolution of the Social Media Comma...Shiny Object or Digital Intelligence Hub? Evolution of the Social Media Comma...
Shiny Object or Digital Intelligence Hub? Evolution of the Social Media Comma...
 
Shiny Object or Digital Intelligence Hub? Evolution of the Social Media Comma...
Shiny Object or Digital Intelligence Hub? Evolution of the Social Media Comma...Shiny Object or Digital Intelligence Hub? Evolution of the Social Media Comma...
Shiny Object or Digital Intelligence Hub? Evolution of the Social Media Comma...
 
Data everywhere-lessons-from-big-data-in-the-television-industry-altimeter-group
Data everywhere-lessons-from-big-data-in-the-television-industry-altimeter-groupData everywhere-lessons-from-big-data-in-the-television-industry-altimeter-group
Data everywhere-lessons-from-big-data-in-the-television-industry-altimeter-group
 
Canary in the Coalmine: How Social Media Can Prepare Us for Big Data
Canary in the Coalmine: How Social Media Can Prepare Us for Big Data Canary in the Coalmine: How Social Media Can Prepare Us for Big Data
Canary in the Coalmine: How Social Media Can Prepare Us for Big Data
 
Working with-industry-analysts
Working with-industry-analystsWorking with-industry-analysts
Working with-industry-analysts
 
A Framework for Social Analytics
A Framework for Social AnalyticsA Framework for Social Analytics
A Framework for Social Analytics
 

What-Do-We-Do-with-All-This-Big-Data-Altimeter-Group

  • 1. What Do We Do with All This Big Data? Fostering Insight and Trust in the Digital Age A Market Definition Report January 21, 2015 By Susan Etlinger Edited by Rebecca Lieb
  • 2. Introduction Every day, we hear new stories about data: how much there is, how fast it moves, how it’s used for good or ill. Data ubiquity affects our businesses, our educational and legal systems, our society, and increasingly, our dinner-table conversation. I had the opportunity to speak at TED@IBM in San Francisco on September 23, 2014, about the implications of a data-rich world, and what we can do as businesspeople, citizens, and consumers, to use it to our best advantage.1 That talk, as well as this document, examines two themes that underlie many conversations about data and technology that correspond to fears that George Orwell and Aldous Huxley chronicled in their novels 1984 and Brave New World. As the culture critic Neil Postman put it in his 1985 book, Amusing Ourselves to Death: What Orwell feared were those who would ban books. What Huxley feared was that there would be no reason to ban a book, for there would be no one who wanted to read one. Orwell feared those who would deprive us of information. Huxley feared those who would give us so much that we would be reduced to passivity and egotism. Orwell feared that the truth would be concealed from us. Huxley feared the truth would be drowned in a sea of irrelevance. Orwell feared we would become a captive culture. Huxley feared we would become a trivial culture.2 These two themes—irrelevance and narcissism on one hand (Huxley) and surveillance and power on the other (Orwell)—anticipate modern fears about the explosion of data in our personal and professional lives. As individuals, we crave insight and convenience, yet we simultaneously fear loss of control over our privacy and our digital identities.
  • 3. Photo: Daniel K. Davis/TED Susan Etlinger speaking at TED@IBM at SFJAZZ, San Francisco, California, September 23, 2014.
  • 4. What’s So Hard About Big Data? ....................................................................................................................................... With Big Data, Size Isn’t Everything ............................................................................................................................... Unstructured Data Demands New Analytical Approaches ........................................................................................ Traditional Methodologies Must Adapt ........................................................................................................................ From Data to Insight .............................................................................................................................................................................. Big Data Requires Linguistic Expertise ......................................................................................................................... Big Data Requires Expertise in Data Science and Critical Thinking ......................................................................... Legal and Ethical Issues of Big Data ................................................................................................................................. Planning for Data Ubiquity ............................................................................................................................................................. Conclusion ......................................................................................................................................................................... Table of Contents 5 6 8 10 13 14 14 17 21 23 Executive Summary This document proposes an approach to better understand and address: • How we extract insight from data • How we use data in such a way as to earn and protect trust: the trust of customers, constituents, patients, and partners To be clear, these twin challenges of insight and trust will occupy data scientists, engineers, analysts, ethicists, linguists, lawyers, social scientists, journalists, and, of course, the public for many years to come. To derive insight from data while protecting and sustaining trust with communities, organizations must think deeply about how they source and analyze it and clarify and communicate their roles as stewards of increasingly revealing information. This is only a first step, but it’s a critical one if we are to derive sustainable advantage from data, big and small.
  • 6. WITH BIG DATA, SIZE ISN’T EVERYTHING The idea of big data isn’t new; it was defined in the late ’90s by analysts at META Group (now Gartner Group). According to META/Gartner, big data has three main attributes, known as the Three Vs: • Volume (the amount of data) • Velocity (the speed at which the data moves) • Variety (the many types of data)3 Now nearly two decades old, this construct has become increasingly pertinent. As IBM has famously said, “90% of all the data in the world was created in the past two years.”4 To understand why this is, we need to compare the business conditions that existed when big data was originally defined with today’s. In the early 2000s, technologists were grappling with a burgeoning variety of data types, spurred in large part by the rise of electronic commerce. Today, social media is a major catalyst of data proliferation. Consider that: • 100 hours of video are uploaded to YouTube every minute.5 • On WordPress alone, users produce about 64.8 million new blog posts and 60.4 million new comments each month.6 • 500 million tweets are sent per day.7 Much data is unstructured. It is, as Gartner defines it, “content that does not conform to a specific, pre- defined data model. It tends to be the human-generated and people-oriented content that does not fit neatly into database tables.”8 As a result, the primary challenge of what we think of as big data isn’t actually the size; it’s the variety. For this reason, the term “big data” can sometimes be misleading. If this seems counterintuitive, consider this example: the New York Stock Exchange (NYSE) recorded approximately 9.3 billion shares traded on December 16, 2014, more than 18 times the average number of tweets (approximately 500 million) created per day.9 Even though the number of trades is much larger than the number of tweets (volume) and the speed of the market may change from hour to hour and day to day (velocity), the basic attributes of a trade—price, trade time, change from previous trade, previous close, price/ earnings ratio, and so on—are the same every time. A trade is a trade. It is homogeneous and predictable from a data perspective (variety). In contrast, social data is far more complex and variable. While a tweet contains some structured data (metadata about the time it was posted, the user who posted it, whether it includes hashtags or media, such as photography, and other attributes), it can express anything that fits into 140 characters. It is a mix of structured metadata and unstructured text and images that can be expressed with variable lengths, languages, meanings, and formats. It can contain a news headline, a haiku, a sales message, or a random thought. For this reason, a much smaller number of tweets can be far more complex to analyze from a data standpoint. Size isn’t everything. 6
  • 7. The nature of human language demands rigorous and repeatable processes to extract meaning from it in a transparent and defensible way.
  • 8. UNSTRUCTURED DATA DEMANDS NEW ANALYTICAL APPROACHES The human-generated and people-oriented nature of unstructured data is both an unprecedented asset and a disruptive force. Data’s value lies in its ability to capture the desires, hopes, dreams, preferences, buying habits, likes, and dislikes of everyday people, whether individually or in aggregate. The disruptive nature of this data stems from two attributes: • It’s raw material. It requires processing to translate it into a format that machines, and therefore people, can understand and act upon at scale. • It offers a window into human behavior and attitudes. When enriched with demographic and location information, data can introduce an unprecedented level of insight and, potentially, privacy concerns.. Unstructured data requires a number of processes and technologies to: • Identify the appropriate sources • Crawl and extract it • Detect and interpret the language being used • Filter it for spam • Categorize it for relevance (e.g., “Gap store” versus “trade gap”) • Analyze the content for context (sentiment, tone, intensity, keywords, location, demographic information) • Classify it so the business can act on it (a customer service issue, a request for a product enhancement, a question, etc.) Each of these steps is rife with nuances that require both sophisticated technologies and processes to address (see Figure 1). The above challenges add up to a host of risks: missed signals, inaccurate conclusions, bad decisions, high total cost of data and tool ownership, and an inability to scale, among others. Even a small misstep, such as a missing source, a disparity in filtering algorithms, or a lack of language support, can have a significant detrimental effect on the trustworthiness of the results. A recent story in Foreign Policy magazine provides a timely example. “Why Big Data Missed the Early Warning Signs of Ebola” highlights the importance of an early media report published by Xinhua’s French-language newswire covering a press conference about an outbreak of an unidentified hemorrhagic fever in the Macenta prefecture in Guinea.10 The Foreign Policy article debunks some of the hyperbole about the role of big data in identifying Ebola, not because the technology wasn’t available (it was) or because the indications weren’t there (they were), but because, as author Kalev Leetaru writes, “part of the problem is that the majority of media in Guinea is not published in English, while most monitoring systems today emphasize English- language material.” 8
  • 9. 1 2 3 6 7 5 4 ChallengeSteps Identify Data Sources Crawl and Extract Data Detect and Interpret Language Filter for Spam Categorize for Relevance Analyze for Sentiment and Keywords/Themes Classify for Action Not all data sources provide reliable APIs or consistent access. Different tools use different crawlers, which can return different samples. Different spam filtering algorithms can also return different samples, accuracy levels. Sentiment analysis is highly subjective and subject to interpretation or error. Even with human coding (which reduces scalability) and machine learning, no tool is perfect. Requires both organizational and technology resource to tag data so that it is appropriately classified and shared with the right people. Inconsistent levels of accuracy and different approaches. Not all tools support multiple languages, or support them equally well. Bonjour! Hello! Hola! もしもし! Hej! e eek e ve l e e m be eoe w y k eaesw ee n q of e a o u ep eej geee o oty h t af f w FIGURE 1 CHALLENGES OF UNSTRUCTURED DATA 9
  • 10. TRADITIONAL METHODOLOGIES MUST ADAPT Even in the unlikely event that all relevant data is in English or another single language, there’s no guarantee that it will be easy to interpret or that the path to doing so will be clear. For this reason, researchers in both industry and academia are grappling with the many challenges that large, unstructured human data poses as a tool for conducting scientific or business research. The following provides an example of how one organization is addressing these significant methodological issues. Case Study: Health Media Collaboratory Applying Methodological Rigor to Big Data The Health Media Collaboratory (HMC) at the University of Illinois at Chicago’s Institute for Health Research and Policy is focused on understanding social data, most of which is unstructured, to “positively impact the health behavior of individuals and communities,” according to its website. In the broadest sense, HMC’s mission is to develop and propagate a new paradigm for health media research using innovative strategies to apply methodological rigor to the analysis of big data.11 The focus of a recent project was to look at how people talk about quitting smoking on Twitter so that HMC and the Centers for Disease Control and Prevention (CDC) could learn how they might promote behavior change. Recently, HMC turned to Twitter to explore two questions about the impact, if any, of social data on smoking cessation. The initial research questions were: • How much electronic-cigarette promotion is there on Twitter? • How much organic conversation about electronic cigarettes exists on Twitter? In another project, HMC also looked at whether Twitter could be used as a tool to evaluate the efficacy of health- oriented media campaigns. In particular, the CDC wanted to assess the impact of several provocative and graphic television commercials, one of which featured a woman with a hole in her throat. The questions HMC sought to answer were: • Did the commercials work? • How can we prove it? This type of research, as well as the data it presents, is vastly different from fielding a conventional multiple- choice survey in which the questions and answers are predefined and results tabulate the percentage of answers in each column. HMC instead had to determine, with an appropriate level of confidence, how people talk about smoking on Twitter and whether this data could serve as a useful indicator of public opinion and even of likely behavior. 10 Researchers in both industry and academia are grappling with the many challenges that large, unstructured human data poses as a tool for conducting scientific or business research.
  • 11. 11 To do this, the team needed to understand how much of the Twitter conversation about smoking was spam, how much was off topic (“smoking marijuana,” “smoking ribs,” “smoking hot women”), and how much was relevant (“I’ve really got to quit smoking cigarettes”). For the first project, it also meant understanding how people talk about electronic cigarettes in particular. Figure 2 is a recreation of the search string HMC used in its research, illustrating why this effort isn’t as simple as it might seem. The methodology that HMC used to collect, clean, and analyze the Twitter conversation related to smoking topics closely mirrors the big data challenges outlined in Figure 1. While it adheres to scientific method, it’s important to know that this was a methodology that HMC itself devised to account for the nuances and challenges of unstructured data. 1. Data collection. Determine the appropriate source and sample size of the data to be collected. 2. Keyword selection. Generate the most comprehensive possible list of keywords, encompassing nonstandard English usages, slang terms, and misspellings. 3. Metadata. Collect metadata related to the tweets, including: a. A tweet ID (a unique numerical identifier assigned to each tweet) b. The username and biographical profile of the account used to post the tweet c. Geolocation (if enabled by the user) d. Number of followers of the posting account e. The number of accounts the posting account follows f. The posting account’s Klout score g. Hashtags h. URL links i. Media content attached to the tweet. 4. Filtering for engagement. Because engagement with the campaign was the determining factor for relevance, the team filtered tweets that described televised commercials, later de-duplicating them to ensure that tweets with multiple keywords would not be counted twice. 5. Human coding. Throughout the process, human coders reviewed the data to assess relevance and code message content. Figure 2: How People Talk About E-Cigarettes Key Words for E-Cigs E cigarettes blue cigarette e cigarettes njoy cigarette e cigarettes blu cig e cigarettes njoy cig e cigarettes ecig e cigarettes e cig e cigarettes @blucigs e cigarettes e-cigarette e cigarettes ecigarette ecigarettes from:blucigs e cigarettes e-cigarette e cigarettes e-cigs e cigarettes ecigarettes e cigarettes e-cigarettes e cigarettes green smoke e cigarettes south beach smoke e cigarettes cartomizer ecigarette (atomizer OR atomizers)-perfume e cigarettes ehookah OR e-hookah e cigarettes ejuice OR ejuices OR e-juice OR e-juice ecigarettes eliquid OR eliquids OR e-liquid OR e-liquids e cigarettes e-smoke OR e-smokes e cigarettes (esmoke OR esmokes) sample:5 lang: en e cigarettes lavatube OR lavatubes e cigarettes logicecig OR logicecigs e cigarettes smartsmoker e cigarettes smokestik OR Smokestiks e cigarettes v2 cig OR “v2 cigs” OR v2cig OR v2cigs vaper or vapers OR vaping e cigarettes zerocig OR Zerocigs e cigarettes cartomizers e cigarettes e-cigarettes FIGURE 2 HOW PEOPLE TALK ABOUT E-CIGARETTES Source: University of Illinois at Chicago’s Institute for Health Research and Policy
  • 12. 12 6. Precision and relevance. The team used a combination of human and machine coding to assess relevance and eliminate false positives, using three teams of trained coders and a process to assess intercoder reliability using a Kappa score, a statistic “used to assess inter-rater reliability when observing or otherwise coding qualitative/categorical variables.”12 According to HMC, “the human-coded tweets were then used to train a naïve Bayes classifier to automatically classify the larger dataset of Tips engagement tweets for relevance. Precision was calculated as the percent of Tips-relevant tweets yielded by the keyword filters.”13 7. Recall. To assess whether the tweet sample was representative of and could be generalized to all potentially relevant Twitter content, the team compared its sample to a larger sample of unretrieved tweets, again using trained coders and a Kappa score to assess how well the filtered tweet sample represented the larger data set.14 8. Content coding. Finally, the team coded the content to better understand “fear appeals,” that is, whether the user accepted, rejected, or disregarded the message. So, did the CDC’s graphic and disturbing anti-smoking ads and the Twitter conversation surrounding them actually lead people to quit? HMC didn’t overstate its data; rather, it concluded that approximately 87% of the tweets about the TV commercials expressed fear and that the ads had “the desired result of jolting the audience into a thought process that might have some impact on future behavior.”15 HMC’s case study illustrates that unstructured data requires significant adaptations to analytics methodology to extract meaning. Certainly it would have been a lot simpler for the CDC to host a focus group or field a survey to collect impressions about its anti-smoking campaign, but that data, as comparatively simple as it would have been to analyze, would lack the spontaneity and rich variety of expression available on Twitter or other social networks, had the teams extended the research to other sources. The nature of human language demands rigorous and repeatable processes to extract meaning in a transparent and defensible way. As a result, analytics methodology is undergoing an explosive period of change.
  • 14. Subject Matter Expertise Access to Tools Critical Thinking, Applied Statistics Inability to Execute Incorrect Conclusions Insights Irrelevant Conclusions 14 BIG DATA REQUIRES LINGUISTIC EXPERTISE As counterintuitive as it might seem, an influx of unstructured data demands not only new and more sophisticated technologies to process and store it but a renewed emphasis on the humanistic disciplines as well. This is because, as Gartner has said, big data “tends to be the human-generated and people-oriented content” rather than highly structured data that fits neatly into databases. Naturally, “human-generated and people-oriented content” includes language, which is rife with contractions, sarcasm, slang, and metaphors expressed in multiple written forms, in hundreds of languages, 24 hours a day, seven days a week. Furthermore, language changes constantly, a fact Oxford Dictionaries marks each November by publishing a word of the year that encapsulates that year’s zeitgeist. 2014’s word was “vape,” salient in light of HMC’s research. Five years ago, “vape” would have been impossible to interpret, because it—and its cultural context—didn’t exist yet. A recent article in MIT Technology Review illustrates just how quickly language and meaning can evolve, both in obvious and subtle ways.16 Vivek Kulkarni, a PhD student in the Data Science Lab at Stony Brook University, along with several of his colleagues, used linguistic mapping to illustrate the speed at which word meanings change, gathering inputs from sources such as Google Books, Amazon, and Twitter. “Mouse” acquired an entirely new meaning following the introduction of the computer mouse in the early 1970s, and “sandy” changed literally overnight with Hurricane Sandy in 2012. Today we see a constant stream of examples both of redefined words and of new ones (“vaping,” “selfie”) that require both technological and humanistic expertise to map, place in context, and understand. BIG DATA REQUIRES EXPERTISE IN DATA SCIENCE AND CRITICAL THINKING The speed, size, and variety of data around us—and the availability of platforms used to visualize and analyze it—have democratized the function of analytics within organizations. At the same time, fundamental analytics education has lagged, creating a situation in which organizations are at risk of misinterpreting data of all kinds. Says Philip B. Stark, professor and chair of statistics at the University of California, Berkeley, “the type of data (structured, text, etc.) isn’t the point at all. The way of thinking matters.”17 Stark emphasizes that good data science requires having subject matter expertise, access to the appropriate computational tools, and most importantly, critical thinking and statistics skills. Figure 3 lays out the consequences of overlooking any of these three foundational elements. FIGURE 3 FUNDAMENTALS OF DATA SCIENCE
  • 15. 15 1. Irrelevant conclusions. If tools and critical thinking are present but subject matter expertise is absent, the organization risks asking the wrong questions, which can result in irrelevant conclusions and valueless answers. In addition, the organization will lack the context necessary to design experiments that will yield the answers it needs. It will be unable to understand the intrinsic limitations of the data, says Stark: noise, sampling issues, response bias, measurement bias, and so on. This creates a domino effect that can squander resources and lead to ineffectual—or worse, harmful—decisions. 2. Inability to execute. If subject matter expertise and critical thinking are present, but tools are absent, the organization will be unable to extract insights at scale and must resort to time-consuming manual methods. As a result, the organization risks burning out and eventually losing top analysts, who now must focus on brute-force methods of processing and analyzing data, rather than using their skills for more sophisticated and rewarding applications. 3. Incorrect conclusions. If subject matter expertise and tools are present, but critical thinking and a knowledge of applied statistics are absent, the organization risks drawing the wrong conclusions from good data, making poor decisions that may ignore other critical business signals. Like a lack of subject matter expertise, this can have harmful consequences to decision making and, therefore, business results. Given the spread of data throughout organizations and the impracticality of hiring legions of trained analysts to keep pace with its growth, the next step is to evolve from analytics that simply describe a situation to analytics that predict what may happen next and then to analytics that prescribe a course of action.18 But even assuming access to the most sophisticated algorithms that incorporate the most detailed business knowledge, widespread access to data necessitates that more people, irrespective of role, grasp the basics of logic and statistics to understand that data. This doesn’t mandate universal PhDs in applied statistics, but it does require an awareness of basic principles of logic. The good news is that, while the big data industry is still in its infancy, many of the most valuable tools for analysis are widely available—and more than two thousand years old to boot. As early as 350 BCE, Aristotle described 13 logical fallacies, which logicians and philosophers have built upon during the last 2,400 years.19 Ignoring these fallacies leaves organizations vulnerable to a host of risks, which can harm competitive position, financial success, customer sentiment and trust, and other critical objectives. One common example is mistaking correlation for causation, in which organizations erroneously attribute one outcome (for example, increased revenue) to a corresponding data point (for example, reach of a marketing campaign). The increasing use of technologies that present complex data visually can exacerbate the problem. Harvard law student Tyler Vigen succinctly (and sometimes hilariously) presents this phenomenon on his Spurious Correlations blog.20 The good news is that, while the big data industry is still in its infancy, many of the most valuable tools for analysis are widely available—and more than two thousand years old to boot.
  • 16. 5.0 DIVORCESPER1000PEOPLE POUNDS Divorce rate in Maine correlates with Per capita consumption of margarine (US) Correlation: 0.992558 4.8 4.5 4.4 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 5.0 4.7 4.6 4.4 4.3 4.1 4.2 4.2 4.2 4.1 8.2 7.0 6.5 5.3 5.2 4.0 4.6 4.5 4.2 3.7 4.2 4.0 3 4 5 6 7 8 9 Divorces rate in Maine Per Capita Consumption of Margarine (US) FIGURE 4 MISTAKING CORRELATION FOR CAUSATION Source: Tyler Vigen In Figure 4, Vigen’s calculations show that there is a 99% correlation between the divorce rate in Maine and per- capita margarine consumption. Does the Maine divorce rate somehow cause US residents to eat margarine? Does US margarine consumption somehow lead to divorce in Maine? While these questions are absurd, charts such this visually suggest a link. The correlation/causation fallacy is just one of many logical fallacies that have been documented and described over the years, including formal fallacies (fallacies of logic) and informal fallacies (fallacies of evidence or relevance).21 As more tools become available to visualize data sets quickly and easily, organizations must invest as much in critical thinking and data science expertise as they do in tools to visualize data. Otherwise, they risk succumbing to logical fallacies. 16
  • 17. Legal and Ethical Issues of Big Data 17
  • 18. 18 BIG DATA RAISES MULTIPLE LEGAL AND ETHICAL ISSUES The good news—and the bad news—about big data is that it can provide unprecedented insight into people, both as individuals and in aggregate. While surveys can, arguably, reveal human attitudes, Christian Rudder, CEO of dating site OKCupid, points out in his 2014 book, Dataclysm: Who We Are When We Think No One’s Looking, that “we can pinpoint the speaker, the words, the moment, even the latitude and longitude of human communication.”22 Many people know the story of how Target discovered that a young girl was pregnant before her father did; such stories have become mainstream.23 But much of the challenge with recent discussions on ethics and privacy stems from the extremely broad nature of these terms, the spectrum of personal preferences, and the beliefs of individuals about the media environment we live in today. Consider these recent examples: • Seeking to prevent suicides, Samaritans Radar raises privacy concerns. In October 2014, the BBC reported that the Samaritans had launched an app that would monitor words and phrases such as “hate myself” and “depressed” on Twitter and would notify users if any of the people they follow appear to be suicidal.24 While the app was developed to help people reach out to those in need, privacy advocates expressed concern that the information could be used to target and profile individuals without their consent. According to a petition filed on Change.org, the Samaritans app was monitoring approximately 900,000 Twitter accounts as of late October.25 By November 7, the app was suspended based on public feedback.26 • Facebook’s “Emotional Contagion” experiment provokes outrage about its methodology. In June 2014, Facebook’s Adam Kramer published a study in The Proceedings of the National Academy of Science, revealing that “emotional states can be transferred to others via emotional contagion, leading people to experience the same emotions without their awareness.”27 In other words, seeing negative stories on Facebook can make you sad. The experiment provoked outrage about the perceived lack of informed consent, the ethical repercussions of such a study, the concern over appropriate peer review, the privacy implications, and the precedent such a study might set for research using digital data. • Uber knows when and where (and possibly with whom) you’ve spent the night. In March 2012, Uber posted, and later deleted, a blog post entitled “Rides of Glory,” which revealed patterns, by city, of Uber rides after “brief overnight weekend stays,” also known as the passenger version of the walk of shame.28 Uber was later criticized for allegedly revealing its “God View” at an industry event, showing attendees the precise location of a particular journalist without his knowledge, while a December 1, 2014, post on Talking Points Memo disclosed the story of a job applicant who was allegedly shown individuals’ live travel information during an interview.29, 30 Much of the challenge with recent discussions on ethics and privacy stems from the extremely broad nature of these terms.
  • 19. 1919 • A teenager becomes an Internet celebrity—and a target—in one day. Alex Lee, a 16-year-old Target clerk, became a meme (#AlexFromTarget) and a celebrity within hours, based on a photo taken of him unawares at work. He was invited to appear on The Ellen Show and was reported to have received death threats on social media.31 These stories illustrate several attributes of the data environment we live in now and the attendant ethical issues they represent: • Data collection. The Samaritans example illustrates the law of unintended consequences: what may happen when an app collects data that may, albeit unintentionally, compromise privacy or put people in harm’s way. • Methodology and usage. The Facebook example demonstrates what happens when a company uses its vast reservoir of data to run technically legal but ethically ambiguous experiments on its users, raising questions about the nature of informed consent and ethical data use in the digital age. • Aggregation, storage, and stewardship. The Uber posts illustrate, albeit with aggregated data, the intensely intimate nature of the data users entrust to companies, raising questions of stewardship, ethics (is aggregating such data ethical?), and privacy (what happens if data is intentionally or accidentally disclosed?). • Communication. All of the above examples illustrate the gray areas between law and ethics, or, from an organizational point of view, risk management and customer experience. As data becomes even more valuable and ubiquitous, the way in which organizations communicate—about collection, analysis, intent and usage—will affect not only their legal risk profile, but their ability to attract and retain the trust and loyalty of their communities. Finally, there is, as former secretary of defense Donald Rumsfeld so famously called it, “the unknown unknown.” The #AlexFromTarget story demonstrates not only an example of how an everyday 16 year old (by definition, a minor) can become an instant Internet celebrity but also how a company can unwittingly and suddenly find itself at the center of a crisis not of its own creation, one that raises issues (compounded because of Lee’s age) of employee privacy and even safety. Figure 5 lays out these issues at a high level. In the past, many of these ethical issues related to data were cloaked behind proprietary systems and siloed data stores. As data becomes ubiquitous, more integrated, and more portable, however, the number and type of ethical gray areas will multiply, along with a need to distinguish the organization’s legal responsibilities, such as what it discloses in a terms of service, from its ethical ones—the actions it takes that promote or erode the trust of its community.
  • 20. Data sources Data types Sample size How the data may have been filtered, enriched or otherwise modified with: Democratic Location Other metadata Keyword selection Human or algorithmic coding Process for assessing precision, relevance, recall How the organization may change the experience based on data Whether the organization plans to sell the data in any form to a third party How data is combined and its impact on personally identifiable information (PII) or user experience in general What data is collected How and for how long data is stored Who owns the data Who has the right to delete data (posts or entire profiles) Process for deleting data (posts or entire profiles) Who has the right to view/modify/share data (administration) Whether and how the data can be extracted Methodology Usage Aggregation Storage & Stewardship Data Collection The extent to which the organization proactively and transparently informs users/custom- ers about what and how it collects, analyz- es, stores, aggregates, and uses their data Action Communication FIGURE 5 ETHICAL ISSUES RELATED TO DATA 20
  • 22. Define data strategy and operating model If data is to be considered a business-critical asset, it must be treated as such by leaders who drive and instill strategy across the organization. In 2015, leaders must define what critical data streams are needed to drive business goals, how they will source them, and what operating model is needed to process, interpret, and act on them at the right time. The challenge is that an organization’s departments (and therefore the data) tend to be siloed, which can result in blind spots, organizational politics, and spiraling costs. Organizations must balance their need for insight and competitive advantage on the one hand and privacy and rational cost of ownership on the other. All too frequently, these dual imperatives are in conflict, sometimes unnecessarily so, because the organization does not have a clear strategy for what data will be used and stored, what data will be used but not stored, and what data is simply unnecessary. Update analytics methodology to reflect new data realities Analyzing unstructured data will never yield the same confidence levels as a simple binary choice; it will always require interpretation. The key is to make that interpretation transparent, rigorous, and repeatable so that others can reliably repeat analyses and yield the same or substantially similar results. This is one area in which there is a tremendous difference between private and public institutions. In private institutions, work process, product, and data tend to be proprietary. In public institutions, such as universities, research is subject to the highest levels of scrutiny among academic publications and journals. It’s also important to engineer the method of measurement into initiatives to reduce ambiguity and provide a greater ability to trace impact. The broader the topic, the more hashtags can help confirm the provenance and relevance of social conversation. Tracking codes and multivariate testing are also a useful if not perfect solution. Seek out critical thinking and diverse skill sets Unquestionably, engineering and analytical skills, not to mention skills in applied statistics and data science, will continue to gain value as organizations become ever more dependent on multiple data types. At the same time, the demands of analyzing unstructured data also require skill in interpreting context related to language and behavior, a challenge humans have had since we developed language. After all, even the cleanest, most reliable data can be misinterpreted, whether intentionally or unintentionally. To minimize misinterpretation means valuing not only math and engineering but also social sciences and humanities. These disciplines—sociology, psychology, anthropology, linguistics, ethics, philosophy, and rhetoric—provide context and help us become better critical thinkers. Without a balance of critical thinking, business knowledge, and smart analytics tools, we’re in danger of making the wrong decision much more efficiently, quickly, and with far greater impact than we have in the past. If we—individually and collectively—are to make the best use of data and extract relevant insight from it in a trustworthy manner, we must approach data strategy thoughtfully. Following are some basic tenets of a strategic data plan. 22 3 2 1
  • 23. CONCLUSION The hype over “big data” has partially obscured the fact that our ability to collect, analyze, and act on data—and to some extent predict outcomes based upon it—is a potentially transformative force for business and humanity alike. While Aldous Huxley couldn’t have anticipated the impact of a Kim Kardashian magazine cover or the challenges inherent in understanding how people talk about smoking, he was prescient to call out the ever- increasing difficulty of identifying relevance in a “sea of irrelevance.”32 It seems likely that the privacy and ethical implications of data ubiquity, not to mention recent disclosures about government access to and use of personal data, would have confirmed many of Orwell’s worst fears. At the same time, however, we do not need to blindly accept the dystopian nightmare he envisioned as our only future. We have an opportunity--and an obligation--to examine not only the legal, but the ethical implications of ubiquitous data, and use this understanding to decide how we will use it, sustainably and responsibly, for years to come. 23 Insist on ethical data use and transparent disclosure Earl Warren, former chief justice of the United States, once said, “In civilized life, law floats in a sea of ethics.”33 This is especially true of the digital age, in which few of the implications of digital transformation have found their way into case law and, as a result, organizational policy. As organizations become more data centric, for their own benefit as well as their customers’, they must also look closely at the affirmative and passive decisions they make about where they get their data; their analytics methodology; how they store, steward, aggregate, and use the data; and how transparently they disclose these actions. Reward and reinforce humility and learning It is nearly impossible to calculate the impact that data will have in our lives in the next decade. Technologies such as IBM’s Watson, Ayasdi, and others are illustrating the many applications for big data, whether in healthcare, consumer products, financial services, energy, or elsewhere. Meanwhile, the Internet of Things introduces data feeds from sensors, which can be combined with other data streams to deliver specific, relevant, and even predictive insights that will only compound volume, velocity, and variety challenges. Yet the world is just starting to come to terms with the impact of data ubiquity from the technology, business, research, cultural, and ethical perspectives. The most important and perhaps most difficult impact of data ubiquity is the fact that it radically undermines traditional methods of analysis and laughs at our desire for certainty. The only strategy to combat the fear of uncertainty is to accept and work with the limits of the data and approach the science of challenging data sets with an appetite for continuous learning, whether the goal is to sell a pair of shoes or to help prevent cancer 5 4
  • 24. Image courtesy of GarySchroeder 24
  • 25. ENDNOTES 1 You can view the talk at http://www.ted.com/talks/susan_ etlinger_what_do_we_do_with_all_this_big_data. 2 Neil Postman, Amusing Ourselves to Death: Public Discourse in the Age of Show Business (New York: Penguin Books,1985), vii. 3 For a more detailed view, a good starting point is ““3D Data Management: Controlling Data Volume, Velocity and Variety,”“ published by META Group on February 6, 2001, http://blogs. gartner.com/doug-laney/files/2012/01/ad949-3D-Data- Management-Controlling-Data-Volume-Velocity-and-Variety.pdf. 4 “What Is Big Data?” IBM, accessed January 6, 2015, http://www- 01.ibm.com/software/data/bigdata/what-is-big-data.html. 5 “Statistics,” YouTube, accessed January 6, 2015, https://www. youtube.com/yt/press/statistics.html. 6 “Stats,” WordPress, cached on November 2, 2014, http:// sq.wordpress.com/stats/. 7 “About,” Twitter, accessed January 6, 2015, https://about.twitter. com/company. 8 Darin Stewart, ““Big Content: The Unstructured Side of Big Data,”“ Gartner Group, May 1, 2013, http://blogs.gartner.com/darin- stewart/2013/05/01/big-content-the-unstructured-side-of-big- data/. 9 Zacks Equity Research, “Stock Market News for December 17, 2014 - Market News,” Yahoo! Finance, December 17, 2014, http://finance.yahoo.com/news/ stock-market-news-december-17-151003130.html;_ ylt=AwrBJSCwLpNUWlIAatyTmYlQ. 10 Kalev Leetaru, “Why Big Data Missed the Early Warning Signs of Ebola,” Foreign Policy, September 26, 2014, http://foreignpolicy. com/2014/09/26/why-big-data-missed-the-early-warning-signs- of-ebola/. 11 See also: Sherry L. Emery, Glen Szczypka, Eulàlia P. Abril, Yoonsang Kim, and Lisa Vera, “Are You Scared Yet? Evaluating Fear Appeal Messages in Tweets About the Tips Campaign,” Journal of Communication, 64 (2014): 278–295, doi: 10.1111/ jcom.12083. 12 “Cohen’s Kappa, “University of Nebraska–Lincoln, accessed January 6, 2015, http://psych.unl.edu/psycrs/handcomp/ hckappa.PDF. 13 Sherry L. Emery, “Are You Scared Yet?.”’’ 14 Ibid. 15 Ibid. 16 “Linguistic Mapping Reveals How Word Meanings Sometimes Change Overnight,” MIT Technology Review, November 23, 2014, http://www.technologyreview.com/view/532776/linguistic- mapping-reveals-how-word-meanings-sometimes-change- overnight/. 17 Philip Stark, Twitter comment, November 24, 2014, https:// twitter.com/philipbstark/status/536955754163363840. 18 For a quick primer on descriptive, predictive, and prescriptive analytics, see this interview with data scientist Michael Wu of Lithium by Jeff Bertolucci in InformationWeek: http://www. informationweek.com/big-data/big-data-analytics/big- data-analytics-descriptive-vs-predictive-vs-prescriptive/d/d- id/1113279. 19 To download the text, go to http://classics.mit.edu/Aristotle/ sophist_refut.html. 20 Vigen maintains a running list of spurious correlations at his blog, Spurious Correlations (http://tylervigen.com/). 21 For an excellent tutorial on logical fallacies, see chapter 2 of “SticiGui,” an online statistics textbook by Philip B. Stark, professor and chair of the department of statistics, University of California, Berkeley: http://www.stat.berkeley.edu/~stark/SticiGui/Text/ reasoning.htm. 22 Rudder, Dataclysm, 146. 23 Kashmir Hill,, “How Target Figured Out a Teen Girl Was Pregnant Before Her Father Did,” Forbes, February 16, 2012, http://www. forbes.com/sites/kashmirhill/2012/02/16/how-target-figured- out-a-teen-girl-was-pregnant-before-her-father-did/. 24 Zoe Kleinman, “Samaritans App Monitors Twitter Feeds for Suicide Warnings,” BBC News, October 28, 2014, http://www.bbc. com/news/technology-29801214. 25 Adrian Short, “Shut Down Samaritans Radar,” Change.org, accessed January 6, 2015, https://www.change.org/p/twitter- inc-shut-down-samaritans-radar. 26 “Samaritans Radar announcement - Friday 7 November,” Samaritans, November 7, 2014, http://www.samaritans.org/ news/samaritans-radar-announcement-friday-7-november. 25
  • 26. 27 Adam D.I. Kramer, Jamie E. Guillory, and Jeffrey T. Hancock, “Experimental Evidence of Massive-Scale Emotional Contagion Through Social Networks,” Proceedings of the National Academy of Sciences of the United States of America, vol. 111 (24), DOI: 10.1073/pnas.1320040111. 28 Voytek, “Rides of Glory,” Uber, cached March 26, 2012, https:// web.archive.org/web/20140828024924/http://blog.uber.com/ ridesofglory. 29 Kashmir Hill, “‘God View’: Uber Allegedly Stalked Users for Party-Goers’ Viewing Pleasure (Updated),” Forbes, October 3, 2014, http://www.forbes.com/sites/kashmirhill/2014/10/03/ god-view-uber-allegedly-stalked-users-for-party-goers-viewing- pleasure/. Talking Points Mem: Uber Let Job Applicant Access Controversial December 1, 2014: http://talkingpointsmemo. com/livewire/uber-job-applicant-ride-logs. 30 Caitlin MacNeal, “Report: Uber Let Job Applicant Access Controversial “‘God View’ Mode,” Talking Points Memo, December 1, 2014, http://talkingpointsmemo.com/livewire/uber-job- applicant-ride-logs. 31 Nick Bilton, “Alex from Target: The Other Side of Fame”, The New York Times, November 12, 2014, http://www.nytimes. com/2014/11/13/style/alex-from-target-the-other-side-of-fame. html?_r=0. 32 Aldous Huxley, Brave New World Revisited (New York: HarperCollins Publishers, 1958), 36. 33 Earl Warren, speech at the Louis Marshall Award Dinner of the Jewish Theological Seminary (Americana Hotel, New York City, November 11, 1962). SOURCES AND ACKNOWLEDGMENTS This document was developed as a companion piece to a talk given at TED@IBM in San Francisco, California, on September 23, 2014. As such, it was built on online and in-person conversations with market influencers, technology vendors, brands, academics, and others on the effective and ethical use of big data, as well as secondary research, including relevant and timely books, articles, and news stories. My deepest gratitude to the following: • The team at the Health Media Collaboratory at the University of Illinois at Chicago, specifically Sherry Emery, Eman Aly, and Glen Szcypka for sharing their research and methodology and educating me about the nuances of interpreting big data for medical research. • My fellow board members at the Big Boulder Initiative for their insights and perspective on the effective and ethical use of social data: Pernille Bruun-Jensen, CMO, NetBase; Damon Cortesi, Founder and CTO, Simply Measured; Jason Gowans, Director, Data Lab, Nordstrom; Will McInnes, CMO, Brandwatch; Chris Moody, Vice President, Data Strategy, Twitter (Chair); Stuart Shulman, Founder and CEO, Texifier; Carmen Sutter, Product Manager, Social, Adobe; and Tom Watson, Head of Sales, Hanweck Associates, LLC. • The team at TED who helped me hone and focus the talk and provided invaluable feedback throughout: Juliet Blake and Anna Bechtol. • The team at IBM Social Business for planning, executing and marketing a superb event: Michela Stribling, Beth McElroy, Jacqueline Saenz and Michelle Killebrew. • My fellow TED@IBM speakers: Gianluca Ambrosetti, Kare Anderson, Brad Bird, Monika Blaumueller, Erick Brethenoux, Lisa Seacat DeLuca, Jon Iwata, Bryan Kramer, Tan Le, Charlene Li, Florian Pinel, Inhi Cho Suh, Marie Wallace, and Kareem Yusuf. • Philip Stark, professor and chair of Statistics, University of California, Berkeley, for an extremely insightful perspective on the methodological and organizational requirements of big data, as well as access to his superb course materials. 26
  • 27. 27 • The organizers and speakers at the International Symposium on Digital Ethics at Loyola University in November 2014, with whom I had some incredibly insightful conversations: Don Heider, dean, School of Communication, Loyola University Chicago; Thorsten Busch, senior research fellow, Institute for Business Ethics, University of St. Gallen; Michael Koliska, PhD candidate at University of Maryland; and Caitlin Ring, assistant professor of strategic communication at Seattle University. • Farida Vis, research fellow in the Social Sciences in the Information School at the University of Sheffield. • The teams at DataSift (Nick Halstead, Tim Barker, Jason Rose, Seth Catalli); Lithium Technologies (Katy Keim and Nicol Addison); and Oracle (Tara Roberts and Christine Wan) for valuable insights along the way. • Tyler Vigen for his Spurious Correlations blog, which makes a complex topic simple and fun to explain; Gary Schroeder for his wonderful visual storytelling of my TED talk; Daniel K. Davis for his superb photography at TED@IBM; Vladimir Mirkovic for graphic design; and Erin Brenner for copyediting. • My talented teammates at Altimeter Group: Rebecca Lieb, who edited this report, Cheryl Graves, Jessica Groopman, Jaimy Szymanski, Christine Tran, and, of course, Charlene Li. Input into this document does not represent a complete endorsement of the report by the individuals or organizations listed above. Finally, any errors are mine alone. OPEN RESEARCH This independent research report was 100% funded by Altimeter Group. This report is published under the principle of Open Research and is intended to advance the industry at no cost. This report is intended for you to read, utilize, and share with others; if you do so, please provide attribution to Altimeter Group. PERMISSIONS The Creative Commons License is Attribution-Noncommercial- ShareAlike 3.0 United States, which can be found at https:// creativecommons.org/licenses/by-nc-sa/3.0/us/. DISCLAIMER ALTHOUGH THE INFORMATION AND DATA USED IN THIS REPORT HAVE BEEN PRODUCED AND PROCESSED FROM SOURCES BELIEVED TO BE RELIABLE, NO WARRANTY EXPRESSED OR IMPLIED IS MADE REGARDING THE COMPLETENESS, ACCURACY, ADEQUACY, OR USE OF THE INFORMATION. THE AUTHORS AND CONTRIBUTORS OF THE INFORMATION AND DATA SHALL HAVE NO LIABILITY FOR ERRORS OR OMISSIONS CONTAINED HEREIN OR FOR INTERPRETATIONS THEREOF. REFERENCE HEREIN TO ANY SPECIFIC PRODUCT OR VENDOR BY TRADE NAME, TRADEMARK, OR OTHERWISE DOES NOT CONSTITUTE OR IMPLY ITS ENDORSEMENT, RECOMMENDATION, OR FAVORING BY THE AUTHORS OR CONTRIBUTORS AND SHALL NOT BE USED FOR ADVERTISING OR PRODUCT ENDORSEMENT PURPOSES. THEOPINIONS EXPRESSED HEREIN ARE SUBJECT TO CHANGE WITHOUT NOTICE.
  • 28. About Us How to Work with Us Altimeter Group research is applied and brought to life in our client engagements. We help organizations understand and take advantage of digital disruption. There are several ways Altimeter can help you with your business initiatives: • Strategy Consulting. Altimeter creates strategies and plans to help companies act on disruptive business and technology trends. Our team of analysts and consultants works with senior executives, strategists .and marketers on needs assessment, strategy roadmaps, and pragmatic recommendations across disruptive trends. • Education and Workshops. Engage an Altimeter speaker to help make the business case to executives or arm practitioners with new knowledge and skills. • Advisory. Retain Altimeter for ongoing research-based advisory: conduct an ad-hoc session to address an immediate challenge; or gain deeper access to research and strategy counsel. To learn more about Altimeter’s offerings, contact sales@altimetergroup.com. 28 Altimeter is a research and consulting firm that helps companies understand and act on technology disruption. We give business leaders the insight and confidence to help their companies thrive in the face of disruption. In addition to publishing research, Altimeter Group analysts speak and provide strategy consulting on trends in leadership, digital transformation, social business, data disruption and content marketing strategy. Altimeter Group 1875 S Grant St #680 San Mateo, CA 94402 info@altimetergroup.com www.altimetergroup.com @altimetergroup 650.212.2272 Susan Etlinger, Industry Analyst Susan Etlinger is an industry analyst at Altimeter Group, where she works with global organizations to develop data and analytics strategies that support their business objectives. Susan has a diverse background in marketing and strategic planning within both corporations and agencies. She’s a frequent speaker on social data and analytics and has been extensively quoted in outlets, including Fast Company, BBC, The New York Times, and The Wall Street Journal. Find her on Twitter at @setlinger and at her blog, Thought Experiments, at susanetlinger.com. Rebecca Lieb, Industry Analyst Rebecca Lieb (@lieblink) covers digital advertising and media, encompassing brands, publishers, agencies and technology vendors. In addition to her background as a marketing executive, she was VP and editor-in-chief of the ClickZ Network for over seven years. She’s written two books on digital marketing: The Truth About Search Engine Optimization (2009) and Content Marketing (2011). Rebecca blogs at www.rebeccalieb.com/blog.