SlideShare una empresa de Scribd logo
1 de 24
Descargar para leer sin conexión
7/20/16
1
Data and Algorithmic Bias
in the Web
Ricardo Baeza-Yates
California (NTENT), Catalonia (UPF), Chile (UChile)
WebVisions Barcelona, July 2016
Steve Jobs and Bias
Pervasive Optimistic Bias (Kahneman)
Reality Distortion Field
7/20/16
2
Every Website is an
Information Market
Good Design
Good Interaction
Right Incentives
7/20/16
3
All Data has Bias
§  Gender
§  Racial
§  Sexual
§  Religious
§  Social
§  Linguistic
§  Geographic
§  Political
§  Educational
§  Economic
§  Technological
§  from Noise or Spam
§  Validity (e.g. temporal)
§  Completeness
§  Gathering process
§  ….
However many people extrapolate
results to the whole population
(e.g., social media analysis)
In addition there is bias when
measuring bias as well as bias
towards measuring it!
Yes, We Live in a (Very) Biased World!
7/20/16
4
A Non-Technical Question
Algorithm
Biased
Data
Neutral?
Same
Bias
Not
Always!
Unbias the data
Tune the algorithm
Unbias the output
Bias awareness!
Big Data and Bias
15
§  The quality of any algorithm is bounded by the
quality of the data that uses
§  Data bias awareness
§  Algorithmic fairness
§  Key issues for machine learning
§  Uniformity of data properties
§  In the Web, distributions resemble a power law
§  Uniformity of error
§  Data sample methodology
§  E.g., sample size to see infrequent events or
sampling bias issues
7/20/16
5
Data bias
Activity bias
Selection bias
Sampling bias and size
Algorithmic bias
Interface
(Self) selection bias
Second order bias
Sparsity
Privacy
Algorithm
7/20/16
6
Quantity
Quality
User-
generated
Traditional
publishing
What is in the Web? How Much Data?
How Good is it? 168 million active web servers,
1083 million hostnames and
infinitely many pages!
26
What else is in the Web?
7/20/16
7
Noise and Spam
27
§  Noise may come from many places:
§  Instruments that measure (e.g., IoT)
§  How we interpret the data
§  Spam is everywhere
§  Fight both with the wisdom of the crowds
Data Bias and Redundancy
38
§  There is any dependency in the data?
§  There is any duplication?
§  Lexical duplication in the Web is around 25%
§  Semantic duplication is larger (more later)
§  Any other biases? Many!
§  Web structure (economic, cultural)
§  Web content (linguistic, geography, gender)
7/20/16
8
39
[Baeza-Yates, Castillo & López. Characteristics of the Web of Spain.
The Information Professional (Spanish), 2006, vol. 15, n. 1, pp. 6-17]
Economic Bias in Links
Number of linked domains
Exports(thousandsofUS$)
40
Baeza-Yates & Castillo, WWW2006
Exports/Imports vs. Domain Links
7/20/16
9
41
[Baeza-Yates, Castillo, Efthimiadis, TOIT 2007]
Website Structure
Minimal effortShame
42
Linguistic Bias
7/20/16
10
Geographical Bias
[E. Graells-Garrido and M. Lalmas,
“Balancing diversity to counter-measure
geographical centralization in microblogging
platforms”, ACM Hypertext’14]
Gender Bias
[E. Graells-Garrido et al,. “First Women, Second Sex: Gender Bias in Wikipedia”, ACM Hypertext’15]
Systemic bias?
Equal opportunity?
7/20/16
11
•  The Web already is influenced by small groups
•  "0.05% of the user population, attract almost
50% of all attention within Twitter" (50K users)
[Wu, Hofman, Mason & Watts, WWW 2011]
•  We explored this issue further with four different datasets:
1.  a large one from Twitter (2011),
2.  a small one from Facebook (2009),
3. Amazon reviews (2013), and
4.  Wikipedia editors (2015).
•  Digital desert: the content that is never seen
Activity Bias: Wisdom of a Few?
[Baeza-Yates & Saez-Trumper, ACM Hypertext 2015]
Examples
[Baeza-Yates & Saez-Trumper, ACM Hypertext 2015]
7/20/16
12
October 2015
Quality of Content?
51 Yahoo Confidential & Proprietary
•  Adding content implies adding wisdom?
•  We use Amazon’s reviews helpfulness
•  We computed the text entropy
•  Content-based-wise users
•  How many of those users are being paid?
7/20/16
13
Digital Desert
52 Yahoo Confidential & Proprietary
Weblands
of Wisdom
7/20/16
14
Bias in the Interface
Position bias
Ranking bias
Presentation bias
Social bias
Interaction bias
Presentation Bias
§  Interaction data will be biased to what is shown
§  In recommender systems, items recommended will get
more clicks than items not recommended
§  In search systems top ranked results will get more clicks
than other results
›  Ranking bias
›  Interaction bias
CTR
(log)
1 11 21 Rank
[Dupret & Piwowarski, SIGIR 2008]
[Chapelle & Zhang, WWW 2009]bias
7/20/16
15
[WHY AMAZON’S RATINGS MIGHT MISLEAD YOU; The Story of Herding Effects
Ting Wang and Dashun Wang, Big Data, 2014]
Social Bias
Extreme Algorithmic Bias
7/20/16
16
Second Order Bias in Web Content
[Baeza-Yates, Pereira & Ziviani,
Geneological Trees in the Web, WWW 2008]
Person
Web content is redundant
Clicks in results are biased to
the ranking and the interaction
Query
Ranking bias
Redundancy grows (35%)
Search results
New
Most measures in the Web follow a power law
The Long Tail: Sparsity
[Anatomy of the long tail: Ordinary People with Extraordinary Tastes,
Goel, Broder, Gabrilovich, Pang; WSDM 2010]
§  Why there is a long tail?
§  Sampling in the tail
§  When the crowd dominates
§  Empowering the tail
7/20/16
17
When the Crowd
Dominates
Kills the long tail
80
Personalization “facets”:
•  Language (not always)
•  Location
•  Semantic facets per user
•  Query intent prediction in search
Empowering the Tail
The Filter “Bubble”, Eli Pariser
•  Avoid the Poor get Poorer Syndrome
•  Avoid the Echo Chamber
•  How to expose opposite views?
81
Cold start problem solution:
Explore & Exploit
Solutions:
•  Diversity
•  Novelty
•  Serendipity
7/20/16
18
A Data Portrait is a visual
context where users can
explore how the system
understand their interests.
This context is used to
embed content-based
recommendations, displayed
visually to facilitate
exploration and user
engagement.
To combat homophily,
recommendations are
generated having political
diversity in mind.
Does it work? Yes, by using
intermediary topics that are shared!
But only when users are interested
in politics.
Demo at http://auroratwittera.cl/perfil/YahooLabs
[E. Graells-Garrido, M. Lalmas and R. Baeza-Yates, ACM UAI 2016]
•  Exploit the context (and deep learning!)
91% accuracy to predict the next app you will use
[Baeza-Yates et al, WSDM 2015]
•  Personalization vs. Contextualization
Recall that user interaction is another long tail
People
Interests
Aggregating in theTail
7/20/16
19
[De Choudhury et al, ACM HT 2010]
87[Quercia et al, ACM HT 2014]
Crowdsourcing Data: Good Paths
7/20/16
20
Regions from Pictures
[Thomee et al, Demo at CHI 2014]
AOL Query Logs Release Incident
§  No. 4417749 conducted hundreds of searches over
a three-month period on topics ranging from “numb
fingers” to “60 single men”.
§  Other queries: “landscapers in Lilburn, Ga,” several
people with the last name Arnold and “homes sold in
shadow lake subdivision gwinnett county georgia.”
§  Data trail led to Thelma Arnold, a 62-year-old widow
who lives in Lilburn, Ga., frequently researches her
friends’ medical ailments and loves her three dogs.
A Face Is Exposed for AOL Searcher No. 4417749,
By MICHAEL BARBARO and TOM ZELLER Jr,
The New York Times, Aug 9 2006
90
7/20/16
21
91
Risks of Privacy in Query Logs
§  Profile [Jones, Kumar, Pang, Tompkins, CIKM 2007]
•  Gender: 84%
•  Age (±10): 79%
•  Location (ZIP3): 35%
§  Vanity Queries [Jones et al, CIKM 2008]
•  Partial name: 8.9%
•  Complete: 1.2%
§  More information:
•  A Survey of query log privacy-enhancing techniques
from a policy perspective [Cooper, ACM TWEB 2008]
§  A good anonymization technique is still an open problem
7/20/16
22
Privacy Awareness
§ How our privacy changes when we change our social network?
§ Information gain to predict a private attribute based on public data
§ Each user may have a promiscuity score
§ Example: new friendship request
Promiscuity( me ) > Promiscuity( new)
Promiscuity( me ) ≥ Promiscuity( new ) + max-gain-I-allow
Promiscuity( me ) < Promiscuity( new ) + max-gain-I-allow
Related work by [Estivill-Castro & Nettleton; Singh, ASONAM 2015]
The Web Works Thanks to Bias!
§ Web traffic
›  Local caching
›  Proxy/Akamai caching
§ Search engines
›  Answer caching
›  Essential web pages
•  25% queries can be answered with less than 1% of the URLs!
[Baeza-Yates, Boldi, Chierichetti, WWW 2015]
§ E-Commerce
›  Large fraction of revenue comes from few popular items
Activity bias
(Self) selection bias
7/20/16
23
Web Data
§  A mirror of ourselves, the good, the bad and the ugly
§  The web amplifies everything, good or bad, but always
leaves traces
§  We have to be aware of the biases and contrarrest them
§  We have to be aware of our privacy
Big Data of People is huge…..
….. but is tiny compared to the future
Big Data of the Internet of Things (IoT)
It’s Hard to Get Data to Tell the Truth
§  The blindness of the averages
§  Look at distributions
§  Absolute vs. relative
§  Income per capita vs. Inequality
§  Local vs. global optimization
§  Teams competing without knowing, uncorrelated criteria
§  You can always see/torture data as you wish
›  61 analysts, 29 teams: 20 yes and 9 no (Univ. of Virginia, COS)
7/20/16
24
Contact: rbaeza@acm.org
www.baeza.cl
@polarbearby
ASIST 2012
Book of the
Year Award
Questions?
Biased Questions?

Más contenido relacionado

La actualidad más candente

HR Experts Share How Analytics are Shaping a #SmarterWorkforce
HR Experts Share How Analytics are Shaping a #SmarterWorkforceHR Experts Share How Analytics are Shaping a #SmarterWorkforce
HR Experts Share How Analytics are Shaping a #SmarterWorkforceIBM Smarter Workforce
 
2015 back-to-school and back-to-college survey results
2015 back-to-school and back-to-college survey results2015 back-to-school and back-to-college survey results
2015 back-to-school and back-to-college survey resultsDeloitte United States
 
Big Data & The Role Analytics Can Play In Our Organizations
Big Data & The Role Analytics Can Play In Our OrganizationsBig Data & The Role Analytics Can Play In Our Organizations
Big Data & The Role Analytics Can Play In Our OrganizationsAgile Technologies
 
Rapid fire with Douglas Van Praet
Rapid fire with Douglas Van PraetRapid fire with Douglas Van Praet
Rapid fire with Douglas Van PraetPraz Hari
 
The female millennial: A new era of talent
The female millennial: A new era of talentThe female millennial: A new era of talent
The female millennial: A new era of talentPwC
 
Full Study: Adobe State of Create 2016
Full Study: Adobe State of Create 2016Full Study: Adobe State of Create 2016
Full Study: Adobe State of Create 2016Adobe
 
Onboarding AI by Jana Eggers
Onboarding AI by Jana EggersOnboarding AI by Jana Eggers
Onboarding AI by Jana EggersGlobant
 
Connecting Learning to the Right Systems Webinar
Connecting Learning to the Right Systems WebinarConnecting Learning to the Right Systems Webinar
Connecting Learning to the Right Systems WebinarNetDimensions
 
WUD2008 - The Numbers Revolution and its Effect on the Web
WUD2008 - The Numbers Revolution and its Effect on the WebWUD2008 - The Numbers Revolution and its Effect on the Web
WUD2008 - The Numbers Revolution and its Effect on the WebRich Miller
 
Living in a data economy: Transforming the role of HR
Living in a data economy: Transforming the role of HRLiving in a data economy: Transforming the role of HR
Living in a data economy: Transforming the role of HRMartin Sutherland
 
The Future of Personalised Education
The Future of Personalised EducationThe Future of Personalised Education
The Future of Personalised EducationIBM Government
 
The Future of Work: Winning With an Agile Workforce
The Future of Work: Winning With an Agile WorkforceThe Future of Work: Winning With an Agile Workforce
The Future of Work: Winning With an Agile WorkforceCatalant Technologies
 
Driving Revenue w/ Social, Content, Marketing Automation - Scoop.It Meetup
Driving Revenue w/ Social, Content, Marketing Automation - Scoop.It Meetup Driving Revenue w/ Social, Content, Marketing Automation - Scoop.It Meetup
Driving Revenue w/ Social, Content, Marketing Automation - Scoop.It Meetup Jason Miller
 
Designing Mobile Experiences
Designing Mobile ExperiencesDesigning Mobile Experiences
Designing Mobile ExperiencesBrian Fling
 
The Customer Experience Revolution Coming to Everywhere Near You!
The Customer Experience Revolution Coming to Everywhere Near You!The Customer Experience Revolution Coming to Everywhere Near You!
The Customer Experience Revolution Coming to Everywhere Near You!Jennie Vickers
 
Roger hoerl say award presentation 2013
Roger hoerl say award presentation 2013Roger hoerl say award presentation 2013
Roger hoerl say award presentation 2013Roger Hoerl
 
Social Media is about People not Technology
Social Media is about People not TechnologySocial Media is about People not Technology
Social Media is about People not TechnologyFatmir Hyseni
 
JESS3 x Power to Fly Meet Neil Branding Presentation
JESS3 x Power to Fly Meet Neil Branding PresentationJESS3 x Power to Fly Meet Neil Branding Presentation
JESS3 x Power to Fly Meet Neil Branding PresentationJESS3
 

La actualidad más candente (20)

HR Experts Share How Analytics are Shaping a #SmarterWorkforce
HR Experts Share How Analytics are Shaping a #SmarterWorkforceHR Experts Share How Analytics are Shaping a #SmarterWorkforce
HR Experts Share How Analytics are Shaping a #SmarterWorkforce
 
2015 back-to-school and back-to-college survey results
2015 back-to-school and back-to-college survey results2015 back-to-school and back-to-college survey results
2015 back-to-school and back-to-college survey results
 
Big Data & The Role Analytics Can Play In Our Organizations
Big Data & The Role Analytics Can Play In Our OrganizationsBig Data & The Role Analytics Can Play In Our Organizations
Big Data & The Role Analytics Can Play In Our Organizations
 
Rapid fire with Douglas Van Praet
Rapid fire with Douglas Van PraetRapid fire with Douglas Van Praet
Rapid fire with Douglas Van Praet
 
The female millennial: A new era of talent
The female millennial: A new era of talentThe female millennial: A new era of talent
The female millennial: A new era of talent
 
Full Study: Adobe State of Create 2016
Full Study: Adobe State of Create 2016Full Study: Adobe State of Create 2016
Full Study: Adobe State of Create 2016
 
Onboarding AI by Jana Eggers
Onboarding AI by Jana EggersOnboarding AI by Jana Eggers
Onboarding AI by Jana Eggers
 
The Future of Work
The Future of Work The Future of Work
The Future of Work
 
Connecting Learning to the Right Systems Webinar
Connecting Learning to the Right Systems WebinarConnecting Learning to the Right Systems Webinar
Connecting Learning to the Right Systems Webinar
 
WUD2008 - The Numbers Revolution and its Effect on the Web
WUD2008 - The Numbers Revolution and its Effect on the WebWUD2008 - The Numbers Revolution and its Effect on the Web
WUD2008 - The Numbers Revolution and its Effect on the Web
 
Living in a data economy: Transforming the role of HR
Living in a data economy: Transforming the role of HRLiving in a data economy: Transforming the role of HR
Living in a data economy: Transforming the role of HR
 
The Future of Personalised Education
The Future of Personalised EducationThe Future of Personalised Education
The Future of Personalised Education
 
The Future of Work: Winning With an Agile Workforce
The Future of Work: Winning With an Agile WorkforceThe Future of Work: Winning With an Agile Workforce
The Future of Work: Winning With an Agile Workforce
 
Driving Revenue w/ Social, Content, Marketing Automation - Scoop.It Meetup
Driving Revenue w/ Social, Content, Marketing Automation - Scoop.It Meetup Driving Revenue w/ Social, Content, Marketing Automation - Scoop.It Meetup
Driving Revenue w/ Social, Content, Marketing Automation - Scoop.It Meetup
 
Designing Mobile Experiences
Designing Mobile ExperiencesDesigning Mobile Experiences
Designing Mobile Experiences
 
Digital Ethics
Digital EthicsDigital Ethics
Digital Ethics
 
The Customer Experience Revolution Coming to Everywhere Near You!
The Customer Experience Revolution Coming to Everywhere Near You!The Customer Experience Revolution Coming to Everywhere Near You!
The Customer Experience Revolution Coming to Everywhere Near You!
 
Roger hoerl say award presentation 2013
Roger hoerl say award presentation 2013Roger hoerl say award presentation 2013
Roger hoerl say award presentation 2013
 
Social Media is about People not Technology
Social Media is about People not TechnologySocial Media is about People not Technology
Social Media is about People not Technology
 
JESS3 x Power to Fly Meet Neil Branding Presentation
JESS3 x Power to Fly Meet Neil Branding PresentationJESS3 x Power to Fly Meet Neil Branding Presentation
JESS3 x Power to Fly Meet Neil Branding Presentation
 

Destacado

Organizing for Success with Digital Retail
Organizing for Success with Digital RetailOrganizing for Success with Digital Retail
Organizing for Success with Digital RetailJDA Software
 
ELK - What's new and showcases
ELK - What's new and showcasesELK - What's new and showcases
ELK - What's new and showcasesAndrii Gakhov
 
Education faculty sotl workshopc 25 may 2016
Education faculty sotl workshopc 25 may 2016Education faculty sotl workshopc 25 may 2016
Education faculty sotl workshopc 25 may 2016Brenda Leibowitz
 
Myth busting and the Nigerian Prince
Myth busting and the Nigerian PrinceMyth busting and the Nigerian Prince
Myth busting and the Nigerian PrinceDean Shareski
 
Airing of grievances
Airing of grievancesAiring of grievances
Airing of grievancesDean Shareski
 
Nuevas tecnologías de la información mariana garcia
Nuevas tecnologías de la información mariana garciaNuevas tecnologías de la información mariana garcia
Nuevas tecnologías de la información mariana garciaMariana Garcia Ballesteros
 
นางสาวกรุณา สุขโนนทอง
นางสาวกรุณา   สุขโนนทองนางสาวกรุณา   สุขโนนทอง
นางสาวกรุณา สุขโนนทองsuknontong
 
Advantages of native apps
Advantages of native appsAdvantages of native apps
Advantages of native appsJatin Dabas
 
Networked Fitness 2014 - What Is It And What Does It Mean For Health Clubs An...
Networked Fitness 2014 - What Is It And What Does It Mean For Health Clubs An...Networked Fitness 2014 - What Is It And What Does It Mean For Health Clubs An...
Networked Fitness 2014 - What Is It And What Does It Mean For Health Clubs An...Bryan K. O'Rourke
 
Psychological Improvement program
Psychological Improvement programPsychological Improvement program
Psychological Improvement programFarah Hoque
 
טיפוח והזנת העור מרכיבים טבעיים
טיפוח והזנת העור מרכיבים טבעייםטיפוח והזנת העור מרכיבים טבעיים
טיפוח והזנת העור מרכיבים טבעייםOrit Levav
 
Attracting Manufacturing Talent: How the Dream It. Do It. Recruitment Strateg...
Attracting Manufacturing Talent: How the Dream It. Do It. Recruitment Strateg...Attracting Manufacturing Talent: How the Dream It. Do It. Recruitment Strateg...
Attracting Manufacturing Talent: How the Dream It. Do It. Recruitment Strateg...360mnbsu
 
Vancouver Best Places to Work Roadshow | ATB Financial
Vancouver Best Places to Work Roadshow | ATB FinancialVancouver Best Places to Work Roadshow | ATB Financial
Vancouver Best Places to Work Roadshow | ATB FinancialGlassdoor
 
Are you a Feminist?
Are you a Feminist?Are you a Feminist?
Are you a Feminist?Farah Hoque
 
NEXT11 Sponsoring Opportunites
NEXT11 Sponsoring OpportunitesNEXT11 Sponsoring Opportunites
NEXT11 Sponsoring OpportunitesNEXT Conference
 

Destacado (20)

Reference 2.0
Reference 2.0Reference 2.0
Reference 2.0
 
Thirstier
ThirstierThirstier
Thirstier
 
Organizing for Success with Digital Retail
Organizing for Success with Digital RetailOrganizing for Success with Digital Retail
Organizing for Success with Digital Retail
 
ELK - What's new and showcases
ELK - What's new and showcasesELK - What's new and showcases
ELK - What's new and showcases
 
Education faculty sotl workshopc 25 may 2016
Education faculty sotl workshopc 25 may 2016Education faculty sotl workshopc 25 may 2016
Education faculty sotl workshopc 25 may 2016
 
Imperialismo
ImperialismoImperialismo
Imperialismo
 
Myth busting and the Nigerian Prince
Myth busting and the Nigerian PrinceMyth busting and the Nigerian Prince
Myth busting and the Nigerian Prince
 
Airing of grievances
Airing of grievancesAiring of grievances
Airing of grievances
 
Nuevas tecnologías de la información mariana garcia
Nuevas tecnologías de la información mariana garciaNuevas tecnologías de la información mariana garcia
Nuevas tecnologías de la información mariana garcia
 
นางสาวกรุณา สุขโนนทอง
นางสาวกรุณา   สุขโนนทองนางสาวกรุณา   สุขโนนทอง
นางสาวกรุณา สุขโนนทอง
 
Advantages of native apps
Advantages of native appsAdvantages of native apps
Advantages of native apps
 
Networked Fitness 2014 - What Is It And What Does It Mean For Health Clubs An...
Networked Fitness 2014 - What Is It And What Does It Mean For Health Clubs An...Networked Fitness 2014 - What Is It And What Does It Mean For Health Clubs An...
Networked Fitness 2014 - What Is It And What Does It Mean For Health Clubs An...
 
The Full Gospel
The Full GospelThe Full Gospel
The Full Gospel
 
Psychological Improvement program
Psychological Improvement programPsychological Improvement program
Psychological Improvement program
 
טיפוח והזנת העור מרכיבים טבעיים
טיפוח והזנת העור מרכיבים טבעייםטיפוח והזנת העור מרכיבים טבעיים
טיפוח והזנת העור מרכיבים טבעיים
 
Attracting Manufacturing Talent: How the Dream It. Do It. Recruitment Strateg...
Attracting Manufacturing Talent: How the Dream It. Do It. Recruitment Strateg...Attracting Manufacturing Talent: How the Dream It. Do It. Recruitment Strateg...
Attracting Manufacturing Talent: How the Dream It. Do It. Recruitment Strateg...
 
Vancouver Best Places to Work Roadshow | ATB Financial
Vancouver Best Places to Work Roadshow | ATB FinancialVancouver Best Places to Work Roadshow | ATB Financial
Vancouver Best Places to Work Roadshow | ATB Financial
 
Are you a Feminist?
Are you a Feminist?Are you a Feminist?
Are you a Feminist?
 
NEXT11 Sponsoring Opportunites
NEXT11 Sponsoring OpportunitesNEXT11 Sponsoring Opportunites
NEXT11 Sponsoring Opportunites
 
Online Marketing and SEO Workshop
Online Marketing and SEO WorkshopOnline Marketing and SEO Workshop
Online Marketing and SEO Workshop
 

Similar a Data and Algorithmic Bias in the Web

Ux day2018 ricardo baeza yayes search-biases-semantics
Ux day2018   ricardo baeza yayes search-biases-semanticsUx day2018   ricardo baeza yayes search-biases-semantics
Ux day2018 ricardo baeza yayes search-biases-semanticsMultiplica
 
Creating a Data-Driven Government: Big Data With Purpose
Creating a Data-Driven Government: Big Data With PurposeCreating a Data-Driven Government: Big Data With Purpose
Creating a Data-Driven Government: Big Data With PurposeTyrone Grandison
 
Big Data Analytics and Open Data
Big Data Analytics and Open Data Big Data Analytics and Open Data
Big Data Analytics and Open Data Sharjeel Imtiaz
 
Using Big Data to Tell Your Story
Using Big Data to Tell Your StoryUsing Big Data to Tell Your Story
Using Big Data to Tell Your StoryBen Wright
 
Policy primer net303 study period 3, 2017
Policy primer net303  study period 3, 2017Policy primer net303  study period 3, 2017
Policy primer net303 study period 3, 2017Steve Mckee
 
Keynote baezayates
Keynote baezayatesKeynote baezayates
Keynote baezayatescaise2013vlc
 
Keynote baezayates
Keynote baezayatesKeynote baezayates
Keynote baezayatesPROS-UPV
 
Big data in the web
Big data in the webBig data in the web
Big data in the webcaise2013
 
Unpacking Open Data: power, politics and the importance of infrastructure
Unpacking Open Data: power, politics and the importance of infrastructureUnpacking Open Data: power, politics and the importance of infrastructure
Unpacking Open Data: power, politics and the importance of infrastructureTim Davies
 
Hadoop World 2011: The Hadoop Award for Government Excellence - Bob Gourley -...
Hadoop World 2011: The Hadoop Award for Government Excellence - Bob Gourley -...Hadoop World 2011: The Hadoop Award for Government Excellence - Bob Gourley -...
Hadoop World 2011: The Hadoop Award for Government Excellence - Bob Gourley -...Cloudera, Inc.
 
Informationliteracy
InformationliteracyInformationliteracy
InformationliteracyYvonne M
 
Know4 drr shadrock_roberts_may2015
Know4 drr shadrock_roberts_may2015Know4 drr shadrock_roberts_may2015
Know4 drr shadrock_roberts_may2015know4drr
 

Similar a Data and Algorithmic Bias in the Web (20)

Ux day2018 ricardo baeza yayes search-biases-semantics
Ux day2018   ricardo baeza yayes search-biases-semanticsUx day2018   ricardo baeza yayes search-biases-semantics
Ux day2018 ricardo baeza yayes search-biases-semantics
 
Creating a Data-Driven Government: Big Data With Purpose
Creating a Data-Driven Government: Big Data With PurposeCreating a Data-Driven Government: Big Data With Purpose
Creating a Data-Driven Government: Big Data With Purpose
 
Webinar v.5.23.11
Webinar v.5.23.11Webinar v.5.23.11
Webinar v.5.23.11
 
Big Data Analytics and Open Data
Big Data Analytics and Open Data Big Data Analytics and Open Data
Big Data Analytics and Open Data
 
Using Big Data to Tell Your Story
Using Big Data to Tell Your StoryUsing Big Data to Tell Your Story
Using Big Data to Tell Your Story
 
Purdue IronHacks
Purdue IronHacksPurdue IronHacks
Purdue IronHacks
 
Policy primer net303 study period 3, 2017
Policy primer net303  study period 3, 2017Policy primer net303  study period 3, 2017
Policy primer net303 study period 3, 2017
 
data, big data, open data
data, big data, open datadata, big data, open data
data, big data, open data
 
Your organization and Big Data: Managing access, privacy, and security
Your organization and Big Data: Managing access, privacy, and securityYour organization and Big Data: Managing access, privacy, and security
Your organization and Big Data: Managing access, privacy, and security
 
Innovations in Data for Decision Making
Innovations in Data for Decision MakingInnovations in Data for Decision Making
Innovations in Data for Decision Making
 
Discovering and mapping your community needs
Discovering and mapping your community needsDiscovering and mapping your community needs
Discovering and mapping your community needs
 
Keynote baezayates
Keynote baezayatesKeynote baezayates
Keynote baezayates
 
Keynote baezayates
Keynote baezayatesKeynote baezayates
Keynote baezayates
 
Big data in the web
Big data in the webBig data in the web
Big data in the web
 
Gettind data used
Gettind data usedGettind data used
Gettind data used
 
Unpacking Open Data: power, politics and the importance of infrastructure
Unpacking Open Data: power, politics and the importance of infrastructureUnpacking Open Data: power, politics and the importance of infrastructure
Unpacking Open Data: power, politics and the importance of infrastructure
 
SLA RGC Universe
SLA RGC Universe SLA RGC Universe
SLA RGC Universe
 
Hadoop World 2011: The Hadoop Award for Government Excellence - Bob Gourley -...
Hadoop World 2011: The Hadoop Award for Government Excellence - Bob Gourley -...Hadoop World 2011: The Hadoop Award for Government Excellence - Bob Gourley -...
Hadoop World 2011: The Hadoop Award for Government Excellence - Bob Gourley -...
 
Informationliteracy
InformationliteracyInformationliteracy
Informationliteracy
 
Know4 drr shadrock_roberts_may2015
Know4 drr shadrock_roberts_may2015Know4 drr shadrock_roberts_may2015
Know4 drr shadrock_roberts_may2015
 

Más de WebVisions

Christian Titze, "Hello From the Other Side: Adapting the Agile Agency to Cli...
Christian Titze, "Hello From the Other Side: Adapting the Agile Agency to Cli...Christian Titze, "Hello From the Other Side: Adapting the Agile Agency to Cli...
Christian Titze, "Hello From the Other Side: Adapting the Agile Agency to Cli...WebVisions
 
Amélie Lamont, "Design Anthropology 101"
Amélie Lamont, "Design Anthropology 101"Amélie Lamont, "Design Anthropology 101"
Amélie Lamont, "Design Anthropology 101"WebVisions
 
Nate Clinton, "Conversations with Machines"
Nate Clinton, "Conversations with Machines"Nate Clinton, "Conversations with Machines"
Nate Clinton, "Conversations with Machines"WebVisions
 
Thomas Phinney, “Fonts. Everything is Changing. Again.”
Thomas Phinney, “Fonts. Everything is Changing. Again.”Thomas Phinney, “Fonts. Everything is Changing. Again.”
Thomas Phinney, “Fonts. Everything is Changing. Again.”WebVisions
 
The Importance of Side Projects
The Importance of Side ProjectsThe Importance of Side Projects
The Importance of Side ProjectsWebVisions
 
Commit to the Crazy
Commit to the CrazyCommit to the Crazy
Commit to the CrazyWebVisions
 
Intuition and Reason in Design
Intuition and Reason in DesignIntuition and Reason in Design
Intuition and Reason in DesignWebVisions
 
Activism x Technology
Activism x TechnologyActivism x Technology
Activism x TechnologyWebVisions
 
Mike Monteiro, "This is the Golden Age of Design...and We're Screwed"
Mike Monteiro, "This is the Golden Age of Design...and We're Screwed"Mike Monteiro, "This is the Golden Age of Design...and We're Screwed"
Mike Monteiro, "This is the Golden Age of Design...and We're Screwed"WebVisions
 
Mark Wyner, "A New Dawn of the Human Experience"
Mark Wyner, "A New Dawn of the Human Experience"Mark Wyner, "A New Dawn of the Human Experience"
Mark Wyner, "A New Dawn of the Human Experience"WebVisions
 
Kevin Hoyt, "On the Verge of Genius: Smart Cities Workshop"
Kevin Hoyt, "On the Verge of Genius: Smart Cities Workshop"Kevin Hoyt, "On the Verge of Genius: Smart Cities Workshop"
Kevin Hoyt, "On the Verge of Genius: Smart Cities Workshop"WebVisions
 
Art + Commerce
Art + CommerceArt + Commerce
Art + CommerceWebVisions
 
Users are People Too
Users are People TooUsers are People Too
Users are People TooWebVisions
 
Happily Ever After: Pain-Free Prioritization
Happily Ever After: Pain-Free PrioritizationHappily Ever After: Pain-Free Prioritization
Happily Ever After: Pain-Free PrioritizationWebVisions
 
Taming Context in the Internet of Things
Taming Context in the Internet of ThingsTaming Context in the Internet of Things
Taming Context in the Internet of ThingsWebVisions
 
Mind Melds and BattleBots: Creating the Right Kind of Designer/Developer Dynamic
Mind Melds and BattleBots: Creating the Right Kind of Designer/Developer DynamicMind Melds and BattleBots: Creating the Right Kind of Designer/Developer Dynamic
Mind Melds and BattleBots: Creating the Right Kind of Designer/Developer DynamicWebVisions
 
Poetry for Robots: A Digital Humanities Experiment
Poetry for Robots: A Digital Humanities ExperimentPoetry for Robots: A Digital Humanities Experiment
Poetry for Robots: A Digital Humanities ExperimentWebVisions
 
Kent Nichols, "Downshifting Your Life to Rev Up Your Creativity"
Kent Nichols, "Downshifting Your Life to Rev Up Your Creativity"Kent Nichols, "Downshifting Your Life to Rev Up Your Creativity"
Kent Nichols, "Downshifting Your Life to Rev Up Your Creativity"WebVisions
 
Robert Stulle, "Stories From the Agile Agency"
Robert Stulle, "Stories From the Agile Agency"Robert Stulle, "Stories From the Agile Agency"
Robert Stulle, "Stories From the Agile Agency"WebVisions
 
Mona Patel, "Excuses, Excuses, Excuse Personas"
Mona Patel, "Excuses, Excuses, Excuse Personas"Mona Patel, "Excuses, Excuses, Excuse Personas"
Mona Patel, "Excuses, Excuses, Excuse Personas"WebVisions
 

Más de WebVisions (20)

Christian Titze, "Hello From the Other Side: Adapting the Agile Agency to Cli...
Christian Titze, "Hello From the Other Side: Adapting the Agile Agency to Cli...Christian Titze, "Hello From the Other Side: Adapting the Agile Agency to Cli...
Christian Titze, "Hello From the Other Side: Adapting the Agile Agency to Cli...
 
Amélie Lamont, "Design Anthropology 101"
Amélie Lamont, "Design Anthropology 101"Amélie Lamont, "Design Anthropology 101"
Amélie Lamont, "Design Anthropology 101"
 
Nate Clinton, "Conversations with Machines"
Nate Clinton, "Conversations with Machines"Nate Clinton, "Conversations with Machines"
Nate Clinton, "Conversations with Machines"
 
Thomas Phinney, “Fonts. Everything is Changing. Again.”
Thomas Phinney, “Fonts. Everything is Changing. Again.”Thomas Phinney, “Fonts. Everything is Changing. Again.”
Thomas Phinney, “Fonts. Everything is Changing. Again.”
 
The Importance of Side Projects
The Importance of Side ProjectsThe Importance of Side Projects
The Importance of Side Projects
 
Commit to the Crazy
Commit to the CrazyCommit to the Crazy
Commit to the Crazy
 
Intuition and Reason in Design
Intuition and Reason in DesignIntuition and Reason in Design
Intuition and Reason in Design
 
Activism x Technology
Activism x TechnologyActivism x Technology
Activism x Technology
 
Mike Monteiro, "This is the Golden Age of Design...and We're Screwed"
Mike Monteiro, "This is the Golden Age of Design...and We're Screwed"Mike Monteiro, "This is the Golden Age of Design...and We're Screwed"
Mike Monteiro, "This is the Golden Age of Design...and We're Screwed"
 
Mark Wyner, "A New Dawn of the Human Experience"
Mark Wyner, "A New Dawn of the Human Experience"Mark Wyner, "A New Dawn of the Human Experience"
Mark Wyner, "A New Dawn of the Human Experience"
 
Kevin Hoyt, "On the Verge of Genius: Smart Cities Workshop"
Kevin Hoyt, "On the Verge of Genius: Smart Cities Workshop"Kevin Hoyt, "On the Verge of Genius: Smart Cities Workshop"
Kevin Hoyt, "On the Verge of Genius: Smart Cities Workshop"
 
Art + Commerce
Art + CommerceArt + Commerce
Art + Commerce
 
Users are People Too
Users are People TooUsers are People Too
Users are People Too
 
Happily Ever After: Pain-Free Prioritization
Happily Ever After: Pain-Free PrioritizationHappily Ever After: Pain-Free Prioritization
Happily Ever After: Pain-Free Prioritization
 
Taming Context in the Internet of Things
Taming Context in the Internet of ThingsTaming Context in the Internet of Things
Taming Context in the Internet of Things
 
Mind Melds and BattleBots: Creating the Right Kind of Designer/Developer Dynamic
Mind Melds and BattleBots: Creating the Right Kind of Designer/Developer DynamicMind Melds and BattleBots: Creating the Right Kind of Designer/Developer Dynamic
Mind Melds and BattleBots: Creating the Right Kind of Designer/Developer Dynamic
 
Poetry for Robots: A Digital Humanities Experiment
Poetry for Robots: A Digital Humanities ExperimentPoetry for Robots: A Digital Humanities Experiment
Poetry for Robots: A Digital Humanities Experiment
 
Kent Nichols, "Downshifting Your Life to Rev Up Your Creativity"
Kent Nichols, "Downshifting Your Life to Rev Up Your Creativity"Kent Nichols, "Downshifting Your Life to Rev Up Your Creativity"
Kent Nichols, "Downshifting Your Life to Rev Up Your Creativity"
 
Robert Stulle, "Stories From the Agile Agency"
Robert Stulle, "Stories From the Agile Agency"Robert Stulle, "Stories From the Agile Agency"
Robert Stulle, "Stories From the Agile Agency"
 
Mona Patel, "Excuses, Excuses, Excuse Personas"
Mona Patel, "Excuses, Excuses, Excuse Personas"Mona Patel, "Excuses, Excuses, Excuse Personas"
Mona Patel, "Excuses, Excuses, Excuse Personas"
 

Último

IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfDaniel Santiago Silva Capera
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6DianaGray10
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IES VE
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationIES VE
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioChristian Posta
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024D Cloud Solutions
 
Cloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial DataCloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial DataSafe Software
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdfPedro Manuel
 
PicPay - GenAI Finance Assistant - ChatGPT for Customer Service
PicPay - GenAI Finance Assistant - ChatGPT for Customer ServicePicPay - GenAI Finance Assistant - ChatGPT for Customer Service
PicPay - GenAI Finance Assistant - ChatGPT for Customer ServiceRenan Moreira de Oliveira
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsSeth Reyes
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxMatsuo Lab
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8DianaGray10
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfinfogdgmi
 
Digital magic. A small project for controlling smart light bulbs.
Digital magic. A small project for controlling smart light bulbs.Digital magic. A small project for controlling smart light bulbs.
Digital magic. A small project for controlling smart light bulbs.francesco barbera
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...DianaGray10
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPathCommunity
 
20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf
20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf
20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdfJamie (Taka) Wang
 
GenAI and AI GCC State of AI_Object Automation Inc
GenAI and AI GCC State of AI_Object Automation IncGenAI and AI GCC State of AI_Object Automation Inc
GenAI and AI GCC State of AI_Object Automation IncObject Automation
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfAijun Zhang
 

Último (20)

IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and Istio
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024
 
Cloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial DataCloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial Data
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdf
 
PicPay - GenAI Finance Assistant - ChatGPT for Customer Service
PicPay - GenAI Finance Assistant - ChatGPT for Customer ServicePicPay - GenAI Finance Assistant - ChatGPT for Customer Service
PicPay - GenAI Finance Assistant - ChatGPT for Customer Service
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and Hazards
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptx
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdf
 
Digital magic. A small project for controlling smart light bulbs.
Digital magic. A small project for controlling smart light bulbs.Digital magic. A small project for controlling smart light bulbs.
Digital magic. A small project for controlling smart light bulbs.
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation Developers
 
20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf
20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf
20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf
 
GenAI and AI GCC State of AI_Object Automation Inc
GenAI and AI GCC State of AI_Object Automation IncGenAI and AI GCC State of AI_Object Automation Inc
GenAI and AI GCC State of AI_Object Automation Inc
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdf
 

Data and Algorithmic Bias in the Web

  • 1. 7/20/16 1 Data and Algorithmic Bias in the Web Ricardo Baeza-Yates California (NTENT), Catalonia (UPF), Chile (UChile) WebVisions Barcelona, July 2016 Steve Jobs and Bias Pervasive Optimistic Bias (Kahneman) Reality Distortion Field
  • 2. 7/20/16 2 Every Website is an Information Market Good Design Good Interaction Right Incentives
  • 3. 7/20/16 3 All Data has Bias §  Gender §  Racial §  Sexual §  Religious §  Social §  Linguistic §  Geographic §  Political §  Educational §  Economic §  Technological §  from Noise or Spam §  Validity (e.g. temporal) §  Completeness §  Gathering process §  …. However many people extrapolate results to the whole population (e.g., social media analysis) In addition there is bias when measuring bias as well as bias towards measuring it! Yes, We Live in a (Very) Biased World!
  • 4. 7/20/16 4 A Non-Technical Question Algorithm Biased Data Neutral? Same Bias Not Always! Unbias the data Tune the algorithm Unbias the output Bias awareness! Big Data and Bias 15 §  The quality of any algorithm is bounded by the quality of the data that uses §  Data bias awareness §  Algorithmic fairness §  Key issues for machine learning §  Uniformity of data properties §  In the Web, distributions resemble a power law §  Uniformity of error §  Data sample methodology §  E.g., sample size to see infrequent events or sampling bias issues
  • 5. 7/20/16 5 Data bias Activity bias Selection bias Sampling bias and size Algorithmic bias Interface (Self) selection bias Second order bias Sparsity Privacy Algorithm
  • 6. 7/20/16 6 Quantity Quality User- generated Traditional publishing What is in the Web? How Much Data? How Good is it? 168 million active web servers, 1083 million hostnames and infinitely many pages! 26 What else is in the Web?
  • 7. 7/20/16 7 Noise and Spam 27 §  Noise may come from many places: §  Instruments that measure (e.g., IoT) §  How we interpret the data §  Spam is everywhere §  Fight both with the wisdom of the crowds Data Bias and Redundancy 38 §  There is any dependency in the data? §  There is any duplication? §  Lexical duplication in the Web is around 25% §  Semantic duplication is larger (more later) §  Any other biases? Many! §  Web structure (economic, cultural) §  Web content (linguistic, geography, gender)
  • 8. 7/20/16 8 39 [Baeza-Yates, Castillo & López. Characteristics of the Web of Spain. The Information Professional (Spanish), 2006, vol. 15, n. 1, pp. 6-17] Economic Bias in Links Number of linked domains Exports(thousandsofUS$) 40 Baeza-Yates & Castillo, WWW2006 Exports/Imports vs. Domain Links
  • 9. 7/20/16 9 41 [Baeza-Yates, Castillo, Efthimiadis, TOIT 2007] Website Structure Minimal effortShame 42 Linguistic Bias
  • 10. 7/20/16 10 Geographical Bias [E. Graells-Garrido and M. Lalmas, “Balancing diversity to counter-measure geographical centralization in microblogging platforms”, ACM Hypertext’14] Gender Bias [E. Graells-Garrido et al,. “First Women, Second Sex: Gender Bias in Wikipedia”, ACM Hypertext’15] Systemic bias? Equal opportunity?
  • 11. 7/20/16 11 •  The Web already is influenced by small groups •  "0.05% of the user population, attract almost 50% of all attention within Twitter" (50K users) [Wu, Hofman, Mason & Watts, WWW 2011] •  We explored this issue further with four different datasets: 1.  a large one from Twitter (2011), 2.  a small one from Facebook (2009), 3. Amazon reviews (2013), and 4.  Wikipedia editors (2015). •  Digital desert: the content that is never seen Activity Bias: Wisdom of a Few? [Baeza-Yates & Saez-Trumper, ACM Hypertext 2015] Examples [Baeza-Yates & Saez-Trumper, ACM Hypertext 2015]
  • 12. 7/20/16 12 October 2015 Quality of Content? 51 Yahoo Confidential & Proprietary •  Adding content implies adding wisdom? •  We use Amazon’s reviews helpfulness •  We computed the text entropy •  Content-based-wise users •  How many of those users are being paid?
  • 13. 7/20/16 13 Digital Desert 52 Yahoo Confidential & Proprietary Weblands of Wisdom
  • 14. 7/20/16 14 Bias in the Interface Position bias Ranking bias Presentation bias Social bias Interaction bias Presentation Bias §  Interaction data will be biased to what is shown §  In recommender systems, items recommended will get more clicks than items not recommended §  In search systems top ranked results will get more clicks than other results ›  Ranking bias ›  Interaction bias CTR (log) 1 11 21 Rank [Dupret & Piwowarski, SIGIR 2008] [Chapelle & Zhang, WWW 2009]bias
  • 15. 7/20/16 15 [WHY AMAZON’S RATINGS MIGHT MISLEAD YOU; The Story of Herding Effects Ting Wang and Dashun Wang, Big Data, 2014] Social Bias Extreme Algorithmic Bias
  • 16. 7/20/16 16 Second Order Bias in Web Content [Baeza-Yates, Pereira & Ziviani, Geneological Trees in the Web, WWW 2008] Person Web content is redundant Clicks in results are biased to the ranking and the interaction Query Ranking bias Redundancy grows (35%) Search results New Most measures in the Web follow a power law The Long Tail: Sparsity [Anatomy of the long tail: Ordinary People with Extraordinary Tastes, Goel, Broder, Gabrilovich, Pang; WSDM 2010] §  Why there is a long tail? §  Sampling in the tail §  When the crowd dominates §  Empowering the tail
  • 17. 7/20/16 17 When the Crowd Dominates Kills the long tail 80 Personalization “facets”: •  Language (not always) •  Location •  Semantic facets per user •  Query intent prediction in search Empowering the Tail The Filter “Bubble”, Eli Pariser •  Avoid the Poor get Poorer Syndrome •  Avoid the Echo Chamber •  How to expose opposite views? 81 Cold start problem solution: Explore & Exploit Solutions: •  Diversity •  Novelty •  Serendipity
  • 18. 7/20/16 18 A Data Portrait is a visual context where users can explore how the system understand their interests. This context is used to embed content-based recommendations, displayed visually to facilitate exploration and user engagement. To combat homophily, recommendations are generated having political diversity in mind. Does it work? Yes, by using intermediary topics that are shared! But only when users are interested in politics. Demo at http://auroratwittera.cl/perfil/YahooLabs [E. Graells-Garrido, M. Lalmas and R. Baeza-Yates, ACM UAI 2016] •  Exploit the context (and deep learning!) 91% accuracy to predict the next app you will use [Baeza-Yates et al, WSDM 2015] •  Personalization vs. Contextualization Recall that user interaction is another long tail People Interests Aggregating in theTail
  • 19. 7/20/16 19 [De Choudhury et al, ACM HT 2010] 87[Quercia et al, ACM HT 2014] Crowdsourcing Data: Good Paths
  • 20. 7/20/16 20 Regions from Pictures [Thomee et al, Demo at CHI 2014] AOL Query Logs Release Incident §  No. 4417749 conducted hundreds of searches over a three-month period on topics ranging from “numb fingers” to “60 single men”. §  Other queries: “landscapers in Lilburn, Ga,” several people with the last name Arnold and “homes sold in shadow lake subdivision gwinnett county georgia.” §  Data trail led to Thelma Arnold, a 62-year-old widow who lives in Lilburn, Ga., frequently researches her friends’ medical ailments and loves her three dogs. A Face Is Exposed for AOL Searcher No. 4417749, By MICHAEL BARBARO and TOM ZELLER Jr, The New York Times, Aug 9 2006 90
  • 21. 7/20/16 21 91 Risks of Privacy in Query Logs §  Profile [Jones, Kumar, Pang, Tompkins, CIKM 2007] •  Gender: 84% •  Age (±10): 79% •  Location (ZIP3): 35% §  Vanity Queries [Jones et al, CIKM 2008] •  Partial name: 8.9% •  Complete: 1.2% §  More information: •  A Survey of query log privacy-enhancing techniques from a policy perspective [Cooper, ACM TWEB 2008] §  A good anonymization technique is still an open problem
  • 22. 7/20/16 22 Privacy Awareness § How our privacy changes when we change our social network? § Information gain to predict a private attribute based on public data § Each user may have a promiscuity score § Example: new friendship request Promiscuity( me ) > Promiscuity( new) Promiscuity( me ) ≥ Promiscuity( new ) + max-gain-I-allow Promiscuity( me ) < Promiscuity( new ) + max-gain-I-allow Related work by [Estivill-Castro & Nettleton; Singh, ASONAM 2015] The Web Works Thanks to Bias! § Web traffic ›  Local caching ›  Proxy/Akamai caching § Search engines ›  Answer caching ›  Essential web pages •  25% queries can be answered with less than 1% of the URLs! [Baeza-Yates, Boldi, Chierichetti, WWW 2015] § E-Commerce ›  Large fraction of revenue comes from few popular items Activity bias (Self) selection bias
  • 23. 7/20/16 23 Web Data §  A mirror of ourselves, the good, the bad and the ugly §  The web amplifies everything, good or bad, but always leaves traces §  We have to be aware of the biases and contrarrest them §  We have to be aware of our privacy Big Data of People is huge….. ….. but is tiny compared to the future Big Data of the Internet of Things (IoT) It’s Hard to Get Data to Tell the Truth §  The blindness of the averages §  Look at distributions §  Absolute vs. relative §  Income per capita vs. Inequality §  Local vs. global optimization §  Teams competing without knowing, uncorrelated criteria §  You can always see/torture data as you wish ›  61 analysts, 29 teams: 20 yes and 9 no (Univ. of Virginia, COS)