Much is written about the many different ways in which sensitive data is leaked…and yes, there certainly are MANY ways!
Something I noticed once more today while I was doing some online research was the incredibly large amount of personally identifiable information (PII) I found within the PDFs I discovered during my searches.
The PII was not visibly printed within the PDF document itself, however, when I did a search for some specific terms, some of which included names, the search results showed associated PII.
For example, when doing a search for “John Doe” (name obviously changed) within a PDF I found, the search results showed for each instance of John Doe within the PDF his home address, phone number, and email address. This was not visible if I was only just viewing the document, but it was visible within the search results; in the metadata.
I’ve seen this happen several times before…I’m sure there is a treasure-trove of PII floating around the Internet within PDFs.
Most people think converting a Word, PPT, or other document, into a PDF removes all the metadata.
Au contraire, mon frere! (deja vu George Carlin? 🙂 )
I certainly am not a PDF expert, but there are a few ways I know that metadata can get into PDFs.
1) If you attach a Word, PPT, Excel, or other document into a PDF file in its native format the metadata will follow. Yes, you can attach files to a PDF documents through Acrobat.
2) If you have the tracked changes visible when you convert a file to PDF. Yes, you would be able to see the changes clearly within the PDF, but I know *MANY* people who convert files to PDFs and never look over the resulting PDF document to make sure everything looks okay; they cheerfully send it on to others or post it online without realizing the PII is in associated metadata.
3) If your print configuration in Word, or other trackable applications, is set to print ‘tracked changes’ along with the document, then the resulting PDF will include the tracked changes.
4) If you imported a file, or data items, such as items from your email address book, into your native file and then deleted the info you did not want within the viewable file, the deleted viewable portions may still remain in the metadata and become part of resulting PDF.
And I know there are likely numerous other ways that metadata hides within PDFs.
I re-emphasize, I’m far from being a PDF guru, and in fact know comparatively little beyond what I need to know to do my work, but I know enough to know that metadata can easily creep into your PDF documents unbeknownst to the folks doing the PDF conversions.
Those of you out there who ARE PDF gurus…please enlighten me and share the other ways in which metadata can sneak into PDF files! I love learning something new every day, and this would be a great day to learn more about PDF security. 🙂
Does your company post PDFs on your Internet sites? Do you know if they include any PII or other sensitive data? It would be a good exercise to go look at some of them.
A site that seems to have pretty good information about PDFs and may be useful for you is Planet PDF.
Tags: awareness and training, Information Security, IT compliance, metadata, PDF, Planet PDF, policies and procedures, privacy, risk management