My 12-year-old son said to me yesterday after getting home from school, “Hey, Mommy, did you know that Wal-Mart can tell when you’re pregnant? And so can Target! Even before anyone else knows! They got a girl in trouble when they sent her dad coupons for baby stuff and congratulated her!”
Me, “That’s pretty incredible, isn’t it? Companies are able to discover things like that about people more than ever before through analyzing what is called ‘Big Data’.”
Son, “That’s really creepy. I think you should look into that for your privacy business!”
Glad he’s paying attention to privacy issues. 🙂
So, what is “Big Data”?
If you haven’t heard the term “Big Data” yet, you’re obviously seeing it now, but expect to see it a lot more in the coming months and years. Basically Big Data is a term used to refer to the huge amount of data that is being created, and collectively examined, through the many online sites, as well as offline sites and vast repositories. The amount of data created now is staggering when compared to just a few years ago. Here are a couple of fun Big Data facts:
- 90% of all the data in the world was created in just the past two years; isn’t that amazing!
- There is an average of 2.7 Billion “likes” and “comments” posted just on Facebook each day. Based upon targeted ads, most of this data is then used by marketers; all these “likes” and “comments” are added into the Big Data pot and associated with the people who made them.
Every piece of digital data you leave online, right down to each of your “likes,” Twitter re-tweets, hash tag terms, videos viewed, and so on, all become part of Big Data. It all gets included and considered within all the analysis algorithms being used for research, marketing, investigations, and an unlimited number of other activities.
Who’s using Big Data?
It may be easier to list who isn’t using Big Data than those who are. Here are just a few of the ways in which Big Marketers and others are salivating over the limitless uses they can get from Big Data and the associated analysis.
- Tax preparation organizations, such as Turbotax, like to tout the benefits to make “online tax prep more adaptive and predictive” for their customers.
- Payroll and payment processing businesses, such as Intuit, like to “keep its customers loyal and happy.”
- Museums, zoos and other public attraction businesses, such as the Cincinnati Zoo, are using Big Data analytics to determine what visitors to purchase, the areas where they spend the most time, their favorite attractions, and when potentially high spending visitors are in the area.
- Law enforcement and investigators are using Big Data analytics to track crime incidents, catch crooks and increase public safety.
- New search engines are being created to use semantic technology to improve search results to bring business benefits.
And the list could go on and on.
I certainly agree that being able to better analyze data can bring with it significant benefits to the public in general, for making medical research breakthroughs, and other truly valuable contributions. However, based upon what I’m reading, I’m seeing more emphasis being placed upon the benefits that Big Data analysis can have on marketing, driving more sales, and making companies more profitable, after selling the idea of Big Data to the public for the formerly mentioned more noble purposes.
I did a very quick and unscientific check online of news articles written about Big Data in the past two days and got 12,900 results. If I had done this search of news articles a year ago, I would have gotten four results, two of which were for lower case “big data” (meaning it wasn’t used as a specific term as it is now).
And yes, I understand the irony that my searches are now part of the Big Data repositories.
What’s that got to do with privacy?
“With great power comes great responsibility.” – Uncle Ben to Peter Parker in Spiderman (I love this quote; it applies to so many situations where privacy can be exploited.)
Complex data analysis capabilities not only make it easier for businesses to customize their services for their customers, but all that data, and customization, can reveal a lot of personal information about the customers, and also about their personal lives and activities, along with those of their friends and families. Such powerful algorithms have the capability to take otherwise “de-identified” data that, on its own cannot be attributed to specific individuals, and quickly correlate many pieces of the data puzzle and determine, sometimes with amazing clarity, the actions, likes, history, and as we’ve seen even medical conditions, of specific individuals. The more data that exists, the more likely such correlations can occur.
Businesses need to carefully consider the potential privacy issues involved with using analytics and Big Data.
“There are no laws against using Big Data, so there are no privacy issues!”
I am hearing this argument from many company and consulting lawyers more often; if there isn’t a law against using data then it must be okay and not cause privacy issues. Right? No, not right. Technology and the associated uses always evolve and are actively being used (and sometimes abused) long before any laws or regulations can be hammered out and agreed to, especially by an increasingly divisive group of lawmakers. And laws and regulations are overwhelmingly reactionary. Typically they are not created until a significant number of bad events have happened. Until that time businesses, of all sizes, need to become good data and privacy stewards and make thoughtful decisions about how they are using data, keeping in mind that their actions may not only reveal information that is valuable for business, but that may at the same time reveal explicit information about individuals in unintended ways.
Here are some questions large, medium and small businesses need to ask before they dive into using Big Data and deciding upon a Big Data analytics agreement and/or tool:
- Is the analytics company bringing in additional data from elsewhere to combine with your company’s data? Growing numbers of organizations are using de-identified data internally, but with the requirement that it cannot be shared with others.
- Will the analytics company take all your data, including that they label as “de-identified,” and use it outside of your organization, for analytics activities with other companies? If so, you may be violating your own posted privacy policy. Have you read it lately? Does your company actually do what it promises to your website visitors and customers?
- What does the company really mean when they say they have “de-identified” the data? This is a very fuzzy, subjective term. Is such de-identification really removing the ability to point to specific individuals? This may be the case with just your company’s de-identified data, but if it is combined with other data sets it could actually become “re-identified” data capable of pointing to specific individuals.
- If you are using “publicly available” data, how does the data analytics vendor collect all the data they are using? Are they glomming on to every type of data possible, even data from online sites that may have been left unsecured, but really should have been secured? For example, one marketing vendor told me if they find unsecured personal data on financial or retail sites, they grab it; they justify this by telling me that if it was off limits it would have been secured. Just because data is not appropriately secured does not mean it is available for anyone to take. You need to determine the type of online ethics the company has (or lacks).
- Are you planning to contact customers as a result of your Big Data analysis that could be considered as creep, in the least, or mind-blowingly privacy invasive at worst? (Refer back to the Target baby coupon example.)
- Are you making business decisions using assumptions, based upon the results of Big Data analysis, which are incorrect? For example, do you deny insurance coverage to someone because of such results? Or, make hiring decisions? Send communications regarding physical or medical assumptions? Target individuals as potential terrorists or criminals? Etc.
- Have you discussed your plans with your Information Security, Privacy and Compliance officers? This is very important; you should never make business decisions involving data that reveals information about individuals, their activities, likes and dislikes, medical conditions and so on, without talking to these folks. Even if the data is labeled “de-identified” make sure that term fits your organization’s definition.
- Do you have company policies covering the use of Big Data? If not, now is the time to create some, in collaboration with your Information Security, Privacy and Compliance areas.
Big Data is going to continue being used more widely, for more reasons. All organizations need to keep in mind the privacy impacts of such use before they cross over the line of doing what’s reasonable to violating individuals’ privacy.
Bottom line for all organizations, from the largest to the smallest: Big Data and associated analytics can be used to improve business and customer experiences, and bring innovation and medical breakthroughs. However, organizations must make sure they don’t cross over that line of customization and business improvement into creepiness, and then full-blown privacy invasion.
Other information about Big Data
To learn more about how Big Data can be used, for good purposes as well as in ways that push the privacy boundaries, consider checking out the following articles.
- · “How Companies Learn Your Secrets” A good article about that situation where Target told a dad his daughter was pregnant based upon her purchasing habits; the situation my son heard about and found so creepy.
- · “What is Big Data?” This provides not only a nice description of Big Data, but has some really interesting statistics.
- · “Pastries and Predictions: Finding Hidden Trends” Using Big Data to determine buying trends.
- · “Will Big Data and Big Money Mean Big Trouble?” Interesting podcast (~51 minutes) interview with a Big Data vendor and Mark Rotenberg, Executive Director at EPIC. NOTE: Some other topics are discussed before getting to this main topic.
This post was written as part of the IBM for Midsize Business program, which provides midsize businesses with the tools, expertise and solutions they need to become engines of a smarter planet.
Tags: audit, big data, breach, breach response, change controls, compliance, data analytics, data mining, encryption, IBM, Information Security, information technology, infosec, IT security, midmarket, non-compliance, personal information, personally identifiable information, PII, policies, privacy, privacy breach, privacy professor, privacyprof, protected health information, Rebecca Herold, security, security engineering, sensitive personal information, SPI, systems security, Target, Wal-Mart