Many folks like to argue and pick apart what is meant by “data mining.” Marketers I’ve spoken with claim they are not doing data mining with their customers’ information, but just “repurposing” it.
Whatever you call it, you need to know how your organization is using personally identifiable information (PII) in ways other than the purposes for which it was collected. Many times these other purposes are achieved through data mining.
Last week the U.S. Department of Homeland Security held a workshop, “Implementing Privacy Protections in Government Data Mining” that provided some good information about data mining privacy issues that all organizations should consider. The comments the DHS received prior to the event were very interesting.
Several news agencies provided summaries of the workshop, including Federal Computing Week and the BNA Privacy and Security Law Report (a subscription site).
Information from the DHS February 2008 Report on Data Mining was also reportedly covered within the workshop.
The workshop made clear that data mining is a term that means very different things to different people and groups.
Something that is consistent, however, is that the use of data mining involving PII creates significant privacy concerns.
So what kinds of data mining are there?
A professor at the Vanderbilt University Law School, Christopher Slobogin, provided the following categories of data mining models at the workshop:
1) Subject based data mining: “Focuses on gathering information from the mined data on a particular individual known suspect”
2) Match-driven data mining: “Focuses on a list of known or suspected individuals who pose a threat, such as the terrorist watch lists used to screen airline passengers”
3) Pattern- or event-based data mining: “In which no suspect has been identified and a profile of characteristics is used in an attempt to identify a suspect.”
Organizations typically use the 2nd and 3rd models in their marketing activities.
A big problem is that often times the information discovered during data mining for one purpose expands into other uses. This is where significant errors in PII interpretations, along with erroneous privacy decisions, often occur.
An important consideration for organizations to make is how much PII they *NEED* to collect in the first place. Too often much more PII is collected than is necessary for the purpose at hand, and then that superfluous PII is dumped into the very large PII data mining storage heap, followed by inappropriate use and mistaken interpretations.
Another issue that concerns me is the length of retention for the data involved in data mining. The associated PII is typically never deleted or removed from the data mining databases even after it has outlived its purpose.
Another area of significant concern is how organizations use the data collected during online searches and ecommerce transactions, and how available that information is for government agencies, and perhaps other entities, to access for their own, separate, purposes; potentially leading to costly, and worse, impact to individuals whose PII was misinterpreted during the data mining process.
It’s worth thinking about, and chatting with your marketing and legal areas about. Your organization’s data mining activities may very well be violating your posted privacy policy, or some of the data protection laws and contractual requirements applicable to your organization.
Tags: awareness and training, data mining, Information Security, IT compliance, IT training, policies and procedures, privacy training, risk management, security training