Big data analytics are being used more widely every day for an even wider number of reasons. These new methods of applying analytics certainly can bring innovative improvements for business. For example, retail businesses are successfully using big data analytics to predict the hot items each season, and to predict geographic areas where demand will be greatest, just to name a couple of uses.
The power of big data analytics is so great that in addition to all the positive business possibilities, there are just as many new privacy concerns being created. Here are ten of the most significant privacy risks.
1. Privacy breaches and embarrassments.
The actions taken by businesses and other organizations as a result of big data analytics may breach the privacy of those involved, and lead to embarrassment and even lost jobs. Consider that some retailers have used big data analysis to predict such intimate personal details such as the due dates of pregnant shoppers. In such cases subsequent marketing activities resulted in having members of the household discover a family member was pregnant before she had told anyone, resulting in an uncomfortable and damaging family situation. Retailers, and other types of businesses, should not take actions that result in such situations.
2. Anonymization could become impossible.
With so much data, and with powerful analytics, it could become impossible to completely remove the ability to identify an individual if there are no rules established for the use of anonymized data files. For example, if one anonymized data set was combined with another completely separate data base, without first determining if any other data items should be removed prior to combining to protect anonymity, it is possible individuals could be re-identified. The important and necessary key that is usually missing is establishing the rules and policies for how anonymized data files can be combined and used together.
3. Data masking could be defeated to reveal personal information.
If data masking is not used appropriately, big data analysis could easily reveal the actual individuals who data has been masked. Organizations must establish effective policies, procedures and processes for using data masking to ensure privacy is preserved. Since big data analytics is so new, most organizations don’t realize there are risks, so they use data masking in ways that could breach privacy. Many resources are available, such as those from IBM, to provide guidance in data masking for big data analytics.
4. Unethical actions based on interpretations.
Big data analytics can be used to try and influence behaviors. There are my ethical issues with driving behavior. Just because you CAN do something doesn’t mean you should. For example, in the movie The Fight Club, Ed Norton’s character’s job was to determine if an automobile manufacturer should do a recall based strictly on financial consideration, without taking into account the associated health risks. Or, in other words, if it is cheaper for people to be killed or injured instead of fixing the faulty equipment in the vehicles. Big data analytics can be used by organizations to make a much wider variety of business decisions that do not take into account the human lives that are involved. The potential to reveal personal information because it is not illegal, but can damage the lives of individuals, must be considered.
5. Big data analytics are not 100% accurate.
While big data analytics are powerful, the predictions and conclusions that result are not always accurate. The data files used for big data analysis can often contain inaccurate data about individuals, use data models that are incorrect as they relate to particular individuals, or simply be flawed algorithms (the results of big data analytics are only as good, or bad, as the computations used to get those results). These risks increase as more data is added to data sets, and as more complex data analysis models are used without including rigorous validation within the analysis process. As a result, organizations could make bad decisions and take inappropriate and damaging actions. When decisions involving individuals are made based upon inaccurate data or flawed models, as a result individuals can suffer harm by being denied services, being falsely accused or misdiagnosed, or otherwise be treated inappropriately.
Using big data analytics to try and choose job candidates, give promotions, etc. may backfire if the analytics are not truly objective. Discrimination has been a problem for years of course, but the danger is that big data analytics makes it more prevalent, a kind of ‘automated’ discrimination if you will. For example, a bank or other type of financial organization may not be able to tell by a credit application the applicant’s race or sexual orientation (since it is generally illegal to base such a credit decision upon race), but could deduce race or sexual orientation based upon a wide variety of data, collected online and through the Internet of Things (IoT), using big data analytics to then turn down a loan to an individual after obtaining and learning such information.
7. Few (if any) legal protections exist for the involved individuals.
Most organizations still only address privacy risks as explicitly required by existing data protection laws, regulations and contractual requirements. While the U.S. White House, the Federal Trade Commission, and others, have recently expressed concern about the privacy risks that are created within using big data analytics, there are no legal requirements for how to protect privacy while using big data analytics.
8. Big data will probably exist forever.
I’ve talked with many organizations about their big data use. I’ve read many studies and articles. I’ve not found any that indicate they will delete big data repositories. In fact, all have indicated that they instead typically view them as infinitely growing repositories; the bigger the better! As more data is collected and retained, the more easily analytics will be able to determine more insights into individuals’ lives.
9. Concerns for e-discovery.
There was a flurry of articles written about the e-discovery problems created by big data analytics in the past year. The e-discovery process generally requires organizations to identify and produce documents relevant to litigation. When dealing with millions of documents, as most organizations now have in their repositories, this becomes an expensive, time-consuming activity. A big data analytics using an approach called “predictive coding” is now starting to be used on the huge repositories to more quickly narrow down the documents most likely to be necessary for litigation, and then allow individuals the ability to more closely review. There are concerns that by using such analytics to produce documents an organization may be accused of not including all the necessary documents.
10. Making patents and copyrights irrelevant.
There is concern that big data could make patents harder to obtain because patent offices will not be able to verify if a submitted patent is unique since there will be too much data to check through within all the growing numbers of big data repositories. Big data could make copyrights a thing of the past because it will be too hard to control information that can be hidden or propagated infinitely within big data repositories. As an associated effect, the royalties associated with copyrighted information are expected to decrease or possibly disappear altogether.
Bottom line for all businesses of all sizes…
Big data analytics hold great promise for inspiring significant innovations, improving upon all sectors of organizations, and bringing true benefit to individuals in unlimited ways. However, organizations that choose to use big data analytics must determine the associated privacy and information security impacts before they actually put analytics into use. Always:
1) consider at least these ten privacy risks during the planning stages of your big data analytics strategies,
2) establish responsibility, accountability, policies and procedures for big data analytics and use, and
3) incorporate privacy and security controls into the related processes before actually putting them into business use.
This post was written as part of the IBM for Midsize Business (http://Goo.gl/t3fgW ) program, which provides midsize businesses with the tools, expertise and solutions they need to become engines of a smarter planet. I’ve been compensated to contribute to this program, but the opinions expressed in this post are my own and don’t necessarily represent IBM’s positions, strategies or opinions.