danah boyd is with Microsoft Research New England and a research fellow at the Berkman Center for Internet and Society at Harvard University; she was one of the first people to research social technical networks of teens
Privacy and Publicity In The Context of Big Data
Privacy
- What are teens doing
- Technical (phising etc)
- Concerns are not new – long before the Net
- What is new has to do with big data
Big Data
- Remix, aggregate – average people are producing data
- Specifically referring to “social data” – people, their activities, their interactions, their behaviors [FB, Twitter]
Sense-Making
- Data is cheap; analysis is not
- What does it mean to engage with this data? Ethics, privacy, publicity
Methodology
- WWW is one of the places where we see big data being researched
- Ethics : ethnography – map cultures, figure out what people do. I started out in computer science, visualizing social networks. I wanted to understand why.
- Georeg Homans : the mthods of social science are dear in time and money and getting dearer every day
- Vint Cerf: We never, ever in the history of mankind have had access to so much info so quickly and easy
* Bigger data are not always better data
* Not all data are created equal
* What and why are different question
* Be careful of your interpretations
SAMPLING
- Quality more impt than quantiy
- big-ness != whole-ness | Twitter has all Twitter but most researchers only have a sample. If you are trying to understand topicalness, your analysis will be wrong no matter how large your sample is
- know your biases – if you have all tweets and can pull a random sample, it is a random sample of tweets not users. some users have multiple accounts, some consume without accounts, some tweet more than others
- not all data are created equal
- “better” networks
- different kinds of social networks : articulated, behavioral, personal … articulated (people you list off on FB etc — lots of people here who are not your “closest” friends — you can’t FB friends list and say you’ve analyzed a person’s social network), behavioral (same room, email, cellphone) neither are the same as personal networks
- Homopholy has been shown to play out in all of these in interesting ways
- “Tie” strength — the person I list as my top friend may be there because of politics; just because I talk to my collaborators more than my mom does not mean that they
- No one loves big data more than marketers and no one misunderstands big data more than marketers [coke/coca-cola and teens example – linked for different reason than what coca-cola thought]
What and Why are not the same thing!
- Cobot + Lambdamoo
- Frequency is not tie strength (drama over misinterpretation)
Interpretation
- Fallacy that qualitative research is interpretation and quantitative research is about producing facts
- Robin Dunbar’s work -> you could only keep up with gossip of 150 people max. Friendster interpreted this as saying people would have no more than 150 friends and capped it there
- Hardest part of research is interpretation – it’s why social scientists are focused on methodology
The #1 threat to privacy today is our focus on big data
We have perverted the notion of public … [find this quote]
Just because data is accessible doesn’t mean that using it is ethical — privacy is contextual — people trust each other to maintain context and they feel violated when this is broken
“These walls have ears” — Chaucer (1387)
People believe that they understand the context in which they are operating and they get upset when this changes – FB – technology DOES have eyes and mouths
Big data isn’t arbitrary data – it is data about people’s lives — the process of sharing and using
* security through obscurity is reasonable
- how we act in public spaces depends on context
- often ephemeral
- surveillance cameras capture cartwheels
- people change behavior when they know that they are being recorded
- mediated situations like FB but amplifying is different — people are developing skills
- but when we keep changing the context people get confused — people’s encounters with social systems
- civil inattention [get this quote] you may be able to stare at everyone who walks by, but you don’t. why is ok to demand the right to stare at everyone online just because you can
* not all publicly accessible data is meant to be publicized
- using a sense of obscurity for context
- paparazzi make lives of celebrities hell
- when we argue for the right to publicize any data that are successful we are arguing that everyone life the life of a celebrity in this sense
- psychological consequences?
- PII – personally identifiable information
- PEI – personally embarrassing information
- aggregating and distributing data out of context is a privacy violation
* Privacy is not access control
- limiting access can be one tool but it’s not the same as privay
* challenging questions
- “should we?”
* publicity
- hard to distinguish between content meant to be aggregated and that which isn’t (context)
* data are people
- Built its reputation on being a closed system
- Interpreted by the public as “anti-myspace” – narrated as closed, intimate – “safer” because it was more private
- First impressions matter
- To this day, many of the average FB users I talk to believe it is about privacy
- Students Against Facebook News Feed -> already accessible but it took implicit content and published as explicit content — changed the context and people wanted to opt-out — it changed peoples’ behaviors — they thought differently about how — those who joined after 2006 have different norms
- Beacon – 2007 advertising messages — made individuals advertisers for their friends — opt in by default — not as visible as news feed – only learned about it when something went wrong — dismantled and settled law suit
- Last year – “invite” to change privacy settings — the default choice was /everyone/ — for most people, they just clicked through. FTC challenged. 65% of users made their content publicly accessible (researchers know most people don’t change defaults)
- I asked people to describe their privacy settings and then we went through their settings. Not a single person I interviewed had a mental map that
- clueless, confused and outright screwed (proverbial boiling frog)
- last week announcement – it relies on the changes from FB (a form of trickery)
- you have to opt-out of each individual partner site on FB and the partner site
- I spent six hours trying to figure out how to turn off like for politicians
CONTEXT
- opt-out is in the better interest of companies not people
- people are engaging with FB for personal reasons — huge ethical issue/challenge
NOT ABOUT HIDING
Regulation – Lessig – Code
- Changes are coming about because of changes in architecture, changes in code
- Technology’s role as regulator has rapidly changed
- Social norms haven’t radically changed
- Law is interesting player in all of this
GET INVOLVED
2 replies on “WWW2010 : danah boyd”
text of speech from danah’s website
Follow up links: