The PHI PII Egg Hunt

Locate it to protect it

I love speaking with folks about privacy, information security and compliance.  I am sincerely interested in hearing about their challenges, and then also identifying common challenges amongst them all.  We can then get to solutions. 

One of the consistently common challenges I’ve heard from privacy and security folks in the past several months is trying to create an inventory of all the protected health information (PHI), personally identifiable information (PII) and other types of personal information they have within their organization, and then to keep it up-to-date. Not only do organizations need to establish such an inventory for a variety of compliance activities, but how can you adequately safeguard PHI/PII if you don’t even know where it is located? You can’t effectively do so with such data blindness.  You need to go on a hunt for those PHI/PII data eggs and inventory their hiding places. Yeah; I’ve got a bit of Easter egg hunting on my mind right now, so please pardon the somewhat eggstravagant analogy. 

Inventory methods

I did my first PHI/PII inventory project in 2001. At that time I depended upon having key stakeholders fill out a questionnaire I had created for this purpose, and then doing any necessary interviews to get clarification. I still use this method for some types of organizations, especially those where they have a lot of non-digital PHI/PII, for organizations that do not have much change within their PHI/PII, and some non-profits that can’t afford persistent automated tools to be running. However, now it is ideal (more comprehensive, efficient and a huge time-saver) to use an automated, often persistent, tool that will automatically create and keep an inventory updated. This is especially useful for identifying inappropriate areas where PHI/PII is located!

Most of the tools available are data leak protection/prevention (DLP) tools that have the capability to also create and keep up-to-date PHI/PII inventories. However, there are a few data finding (DF) tools that are dedicated solely to creating and keeping an inventory maintained, without providing any other DLP types of activities, like blocking emails from leaving the network. What an organization decides to use should ultimately depend upon their needs, their risk environment, and their budget. 

This week I spoke with a hospital about this topic. While I provided them with some ideas for specific inventorying tools, my purpose here is not to promote or recommend one specific tool over another. Some tools will work better and be the cat’s meow for some organizations, and others will be the bee’s knees at yet other organizations.

Things to look for

Here is a modified checklist I created for one of my hospital clients to use when considering a DLP/DF product. Use this to help you determine what will work best within your own organization.

Data discovery:

1. Can detect PHI (and other PII) in the following locations:

  • Network traffic
  • Data at rest (hard drives, USB drives, CDs, DVDs, tapes, etc.)
  • Endpoint operations (email and FTP with desktops, laptops, servers, smartphones, tablets, etc.)

2. Provides a wide variety of pre-formatted data items for you to choose from

3. Allows you to look for specific formats that you create

4. Can support the detection of data content in structured and unstructured data, using registered or described data definitions

5. Can search for special characters, numerals, and a wide variety of languages

6. Can identify specific types of encrypted PHI and PII, such as social security numbers (SSNs), credit card numbers, etc.

7. Can detect PII, PHI and other types of specified sensitive content using as many of the following as possible:

  • Partial and exact document matching
  • Structured data matching
  • Statistical analysis
  • Extended expression matching
  • Heuristical analysis
  • Conceptual analysis
  • Lexicon analysis

8. Can do a search on mobile devices as soon as they are connected to the network

9. Can scan within hidden folders and directories

10. Can find deleted items that still remain in storage

11. Can scan within networked fax servers, copiers, printers and scanners

Tool management:

1. Has a centralized policy and event management console with many different features

2. Provides a variety of reporting options

3. Ideally can block policy violations that occur via email communication. Also good is the ability to identify and prevent leaks through wireless access points (WAPs), FTP connections, and other data path points.

4. Effective and timely notification methods for when PHI/PII is identified.
5. Compatibility with other products. If it can’t search within certain critical databases or systems that contain PHI, then its usefulness will be limited.

6. Ability to integrate with other security tools

7. Level of accuracy of identifying items

8. Speed

9. Network disruption; e.g., impacts response time, uses too much memory, etc.

10. A nice to have: remote wipe specific data items when identified


  1. Well-vetted product
  2. Organization has a good reputation, with no (or few minor) published incidents
  3. Fair pricing   
  4. Market responsiveness and a good track record           
  5. Exceptional customer service       
  6. Warranty provided
  7. Other clients provide their reviews 

Bottom line for organizations of all sizes…

You cannot protect personal information if you don’t know where it is located. All organizations that possess personal information of any kind (and can you think of any that don’t?) need to decide upon a way to identify and inventory that data so that they can effectively secure it. If you decide upon a technology solution to accomplish this, use the checklists above to help you make your decision.


This post was written as part of the IBM for Midsize Business ( ) program, which provides midsize businesses with the tools, expertise and solutions they need to become engines of a smarter planet. I’ve been compensated to contribute to this program, but the opinions expressed in this post are my own and don’t necessarily represent IBM’s positions, strategies or opinions.


tumblr visitor

Tags: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,

Leave a Reply