Risky Business: Using Production Data for Test Purposes

Today some stories ran in multiple UK publications, such as the Techworld’s "Firms play Data Protection roulette" discussing the use of production data for test purposes.  It contained some interesting, but unsurprising, statistics.

  • "Nearly half (44 percent) of companies use live data in test environments – something the 1998 Data Protection Act warns against explicitly, according to a recent survey of IT directors by Compuware.
  • Half the directors (48 percent) were only ‘vaguely familiar’ with the Act itself, according to the research, which highlights the importance of understanding the demands and keeping track of how customer data is treated.
  • A further "83 percent used only minimal measures such as using non disclosure agreements (NDA) to control data when outsourcing.""

These statistics come from UK organizations, and actually sound a little low.  Based upon the many business partner and vendor security program reviews I’ve performed I think the number of organizations using live data would probably be at least in the 75% – 90% range…admittedly a very unscientific estimate.

The article provides some discussion of UK’s Data Protection Act and provides a few high level recommendations.  It also reminds the reader of the risks of outsourcing and how such precautions as NDAs will still not stop the insider threat to data, such as the case of the outsourcer employee I blogged about a few days ago who committed fraud using the information he used to perform his job.

There are many, many more issues involved.  There are also many other laws and regulations that prohibit the use of live data for test, pilot and quality assurance testing…basically any type of use that is not for production. 

I wrote about this important topic in the December 2005 issue of the Computer Security Institue Alert newsletter, "Is There Privacy When Testing?"  I’ll plan to update the article and post in the reading room of my Realtime IT Compliance website sometime in the near future.

In the meantime, here are some paraphrased or abbreviated points from my article with a listing of some of the key points organizations need to address when testing, particularly how to deidentify production data to be able to then use for test purposes:

  • Test and development teams need to work with databases that are structurally correct functional copies of the live environments. However, they often do not necessarily need to be able to view real confidential personal information. For test and development purposes, as long as the data looks real, the actual record content is usually irrelevant.
  • De-identifying data is considered a leading practice, and is also legislated in regulations such as HIPAA.  Basically, when data is de-identified it covers, removes or alters real or production data so that the data elements cannot be linked to a specific individual.  Data that has been de-identified is generally considered acceptable to use in the test environment.

De-identifying Data
There are several options for de-identifying data, both operational and automated.  I go into more detail within the article, but here is the barebones listing to start your thinking around this topic:

  1. Data deletion
  2. Data NULLing
  3. Data Mixing
  4. Data replacement
  5. Data Substitution
  6. Encryption
  7. Interjecting Unrelated Text
  8. Modifying Numerical Data
  9. Using an Isolated Testing Environment

Whatever de-identification method you use, you need to make sure the de-identification results are appropriate for the context of the application being tested, and must make sense to the person reviewing the test results.

Because testing activities occur throughout the application lifecycle, organizations must consistently follow documented procedures to thoroughly test applications while at the same time staying in compliance with privacy-related laws, regulations and contracts.  And yes, de-identifying data will be challenging, but still achievable, when the application uses relational databases. 

However, there are many data de-identification solutions and vendors out there, just a few of which include:

I am not endorsing any of these, but provide them to give you an idea of the wide range of automated products available. 

Technorati Tags







Leave a Reply