Data-mining, Oversight and Privacy

TechWorld published an interesting and thought-provoking article about data mining today pointing out some of the potential benefits of data mining, but also some of the problems when there is lack of oversight. 

"Data mining is a relatively new field within computer science. In the broadest sense, it combines statistical models, powerful processors, and artificial intelligence to find and retrieve valuable information that might otherwise remain buried inside vast volumes of data. Retailers use it to predict consumer buying patterns, and credit card companies use it to detect fraud. In the aftermath of September 11, the government concluded that data mining could help it prevent future terrorist attacks."

In 2004 a Government Accountability Office (GAO) report found that US federal agencies were actively engaged in or planning 199 data mining projects, with 122 of them involving personal information.  A 2005 GAO report indicated that there were significant concerns about the lack of following oversight procedures and implementing the recommended (possibly meant to be required) privacy and information security procedures for the data minig initiatives. 

A disturbing loophole in the directive covering data mining is nicely summarized in this statement, "While the federal laws and guidance previously outlined provide a wide range of privacy protections, agencies are allowed to claim exemptions from some of these provisions if the records are used for certain purposes."  It sounds as though a large number of agencies claim such exemptions.

The GAO report included the following steps the GAO had recommended to protect privacy.

Table 1: Key Steps Agencies Are Required to Take to Protect Privacy, with Examples of Related Detailed Procedures and Sources
Source: GAO analysis of the Privacy Act, E-Government Act, FISMA, and related guidance.
Key steps to protect privacy of personal information Examples of procedures

1.  Publish notice in the Federal Register when creating or modifying system of records
• Specify the routine uses for the system
• Identify the individual responsible for the system
• Outline procedures individuals can use to gain access to their records

2.  Provide individuals with access to their records
• Permit individuals to review records about themselves
• Permit individuals to request corrections to their records

3.  Notify individuals of the purpose and authority for the requested information when it is collected
• Notify individuals of the authority that authorized the agency to collect the information
• Notify individuals of the principal purposes for which the information is to be used

4.  Implement guidance on system security and data quality
• Perform a risk assessment to determine the information system vulnerabilities, identify threats, and develop countermeasures to those threats
• Have the system certified and accredited by management
• Ensure the accuracy, relevance, timeliness, and completeness of information

5.  Conduct a privacy impact assessment
• Describe and analyze how information is secured
• Describe and analyze intended use of information
• Have assessment reviewed by chief information officer or equivalent
• Make assessment publicly available, if practicable

All good recommendations.  I wonder, which of the government agencies read, let alone implement, GAO reommendations?  What percentage claim exemptions?  As the TechWorld report noted:

"Most data mining projects are not subjected to a rigorous business case analysis. Two current intelligence CIOs who were otherwise unable to comment for this story agreed that this is an issue that they struggle with. The US DoD’s Technology and Privacy Advisory Committee (TAPAC) developed a 10-point system of checks and balances that it recommended every agency head apply to data mining projects, but Cate says that it has never been implemented. Similarly, the US National Academy of Sciences recently appointed a committee to develop a methodology that the government can use to evaluate the efficacy of its antiterror data mining projects, but the target date for its report is still more than a year away."

I believe, based upon what I’ve heard from colleagues, clients and other info sec and privacy professionals at meetings and conferences that the use of data mining is going to increase exponentially in the next few years.  As widely evidenced by the NSA’s data mining of phone records, and also by the growing data mining of public socializing sites, such as described within the January 2006 CRS U.S. government report, "Data Mining and Homeland Security: An Overview."  A couple of snippets to give you a feel for the data mining issues described within report:

"Data mining has become one of the key features of many homeland security initiatives. Often used as a means for detecting fraud, assessing risk, and product retailing, data mining involves the use of data analysis tools to discover previously unknown, valid patterns and relationships in large data sets. In the context of homeland security, data mining can be a potential means to identify terrorist activities, such as money transfers and communications, and to identify and track individual terrorists themselves, such as through travel and immigration records."

"As with other aspects of data mining, while technological capabilities are important, there are other implementation and oversight issues that can influence the success of a project’s outcome. One issue is data quality, which refers to the accuracy and completeness of the data being analyzed. A second issue is the interoperability of the data mining software and databases being used by different agencies. A third issue is mission creep, or the use of data for purposes other than for which the data were originally collected. A fourth issue is privacy. Questions that may be considered include the degree to which government agencies should use and mix commercial data with government data, whether data sources are being used for purposes other than those for which they were originally designed, and possible application of the Privacy Act to these initiatives. It is anticipated that congressional oversight of data mining projects will grow as data mining efforts continue to evolve."

"As additional information sharing and data mining initiatives have been announced, increased attention has focused on the implications for privacy.  Concerns about privacy focus both on actual projects proposed, as well as concerns about the potential for data mining applications to be expanded beyond their original purposes (mission creep). For example, some experts suggest that anti-terrorism data mining applications might also be useful for combating other types of crime as well. So far there has been little consensus about how data mining should be carried out, with several competing points of view being debated. Some observers contend that tradeoffs may need to be made regarding privacy to ensure security. Other observers suggest that existing laws and regulations regarding privacy protections are adequate, and that these initiatives do not pose any threats to privacy. Still other observers argue that not enough is known about how data mining projects will be carried out, and that greater oversight is needed. There is also some disagreement over how privacy concerns should be addressed. Some observers suggest that technical solutions are adequate. In contrast, some privacy advocates argue in favor of creating clearer policies and exercising stronger oversight. As data mining efforts move forward, Congress may consider a variety of questions including, the degree to which government agencies should use and mix commercial data with government data, whether data sources are being used for purposes other than those for which they were originally designed, and the possible application of the Privacy Act to these initiatives."

Data mining is nothing new…it’s been used in one way or another since the advent of the "super computer."  The differentiators from around 25+ years ago to now are the 1) increasing connectivity of multiple repositories of data and multiple computers…computer grids with seemingly unlimitless data storage and containing what is moving to be unlimited amounts of personal information; and 2) the increasing speed and capabilities of the technology to cull through the data in a blink of an eye to find and correlate personal data.

"With great power comes great responsibility."  I use this Spiderman quote often…I think it applies to so many challenges that information security and privacy practitioners face…technology power and related responsibility really do make our professions interesting, important and often infuriating.  Data mining is powerful and that power must be contained.  You don’t want a data mining effort to turn into an out-of-control privacy destroying Doc Oc monstrosity.

Data mining does not have to invade privacy with proper oversight, established accountability, and enforced procedures.  Without these ingredients, however, privacy gets trampled and runs amuck.  There have been any incidents resulting from data mining results that were bad, and misuse of the data.  The discussion of these incidents is a good topic…for another time.

Does your organization have data mining initiatives going, or planned?  Be sure you are addressing information security and privacy issues…from the start of the projects and all the way through until the data mining effort is retired…if it ever is.  Remember:

1.  Your organization risks violating your own privacy policies and agreements when you link the consumer and customer data you collect to carry out different customer-facing processes, and subsequently amass them in different databases.
2.  When your organization analyzes web site data and then links the findings with data acquired from other applications or third-party data providers in order to develop lists targeting specific consumers, you are running a high risk of being in noncompliance with your own policies, contracts and applicable laws.  This is particularly true for your non-U.S. customers/consumers.
3.  Does your organization use the data within your data mining initiatives for other purposes outside the scope of your intended and communicated use?  You run a high risk of regulatory noncompliance and potential lawsuits if you do this.
4.  Incorporate information security and privacy requirements and checks throughout your entire systems and applications development life cycle.  Document them.
5.  Document and communicate information security and privacy policies, procedures and standards for data mining projects, initiatives, applications and systems.  This demonstrates due diligence in addition to complying with several data protection laws.
6.  Learn from the mistakes and recommendations of others.  Read the GAO reports covering data mining and implement the recommendations that you could apply within your organization.  This demonstrates due diligence particularly in the eyes of regulatory auditors.
7.  Conduct privacy impact assessments.  Do them while planning the data mining initiative; following implementation; and regularly thereafter.

Technorati Tags

Leave a Reply