Dangerous Data

Data doesn’t tell stories on its own. It has to be aggregated, collated, formatted…and that’s where the danger lies. The potential for misuse and misinterpretation runs rampant when it comes to algorithmically assigned identities and massive indices. It is this kind of 21st century dilemma that John Cheney-Lippold brings up in We Are Data: Algorithms and The Making of Our Digital Selves. 917OT1PvlZL

When it comes to the web, for example, everything we do is recorded, giving meaning (often semi-arbitrary), and inevitably commodified. This is all without our knowledge and done in ways which don’t actually reflect who we are.

There is the mantra that “wherever data tells us to go, we will find the truth.” Cheney-Lippold complicates this; “But the data that Google uses to categorize people and assign status of identity does not speak; it is evaluated and ordered by a powerful corporation in order to avoid legal culpability” (xiii). If this is the case, which it most certainly is, what does this mean for identity in a general sense? What are we abilities are we giving up and what do we enable in order to be digital citizens? How is our data being used?

More importantly, this data is not being collected to necessarily provide a life-like, accurate picture of you or me. It is meant for representative purposes. Who might we be, what do we represent, and what do our habits indicate. This kind of disconnected identification and inference affect our digital lives in ways that we don’t always recognize. Targeted ads is a simple thing that comes to mind to exemplify this act. Sites can use these constructed identities to manipulate your browsing experience and the lens through which you negotiate digital life, and often in ways to bolster their own business models.

As Cheney-Lippold writes, “The different layers of who we are online, and what who we are means, is decided for us by advertisers, marketers, and governments. And all these categorical identities are functionally unconcerned with what, given your history and sense of self, makes you you” (7). In this system of algorithms and reconfigurations authenticity is superseded by the desire for effective metrics and classifications.

Now this isn’t inherently evil, nor is data itself inherently bad by nature of existing. There is, however, a great danger.

Furthermore, digital technology and the tools of the 21st century may not be as democratizing and transformative as we like to belief.

In a 2016 article for The Atlantic entitled “The Internet May Be As Segregated as A City,” professor of media, culture, and communication at New York University, Charlton McIlwain was quoted saying that the research he was conducting “provides more evidence to dispel the notion that the internet is a democratic space.”

He began with the 56 “Top Black Sites” as chosen by Alexa, a web analytics company that ranks pages based on traffic and popularity, and then ran those hits through a program which selected based on the number of links from those sites. Ending up with over 3,000 pages, McIlwain then looked at how these sites were connected to one another. He found that non-racial and racial sites linked to each other in relatively equal measure, indicating that there was no significant bias towards similar sites for either type.

However, when it came to traffic and how people navigated between sites, he found that  “people who usually go to non-racial sites tend to visit other non-racial sites; similarly, visitors to racial sites preferred to click on other racial sites.”

Expounding on that, McIlwain posed: “Why, when there’s a pathway to a different neighborhood, don’t I go there?” The answer, he believes, has to do with the ways that space of any kind signals to visitors about itself and in what ways it affects user interfacing.

This directly connects to the how data is used to construct identities which then are used to manipulate a user and what content is pushed to the front during, say, a browsing experience. This is a quantitative indication that our experiences are being affected. image-20160214-29188-uu79mh

What this ultimately can do is reinforce existing social and political structures and, for the most part, that is not a good thing.

Cheney-Lippold uses example of the Chicago Police Department’s “predictive policing,” where crime statistics and algorithmic assessments are used to determine the likelihood of crimes in certain areas and by certain persons. This isn’t inherently problematic until you realize that this led to the creation of a “heat list of four hundred ‘at risk’ individuals, an algorithmic category populated by both ‘victims’ and ‘offenders'” (23). Being identified as ‘at risk’ meant that the police would pay a physical visit to your location, which often leads to you being actually at risk, and this doesn’t even begin to address what data is actually being used to determine this.

Google serves up millions of search results every second of every minute, and their finely tuned algorithms are the secret to this speed and success. But these algorithms were, at some point, created by humans. Just like hiring and policing, this is affected by hidden prejudices which are subconsciously (and even sometimes consciously) baked into systems.

This may all seem like a lot of worrying and no answering, and that’s partially true. I don’t have an answer for what we should or could do. What is important is to recognize the dangers and be aware, which hopefully leads to better, more ethical, uses.




Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: