Implications Of The CMU SSN Study: What Business Leaders Need To Understand

Following the release of the CMU SNN report on Monday, I’ve had some very interesting discussions with privacy and information security folks, and I’ve been pretty amazed at some of the reactions to the study.
I also posted about this to one of the GRC mailing lists I participate in, and I got some questions asking me for my thoughts about some specific issues. I wanted to share those thoughts here as well…


Yes, probably many of us in information security and privacy professions have known for a very long time that SSNs could probably be guessed because of the way in which they are constructed using simple items as city and birthdate. However, over the years when discussing this topic with executives the question always arose, “Well, how easy is it really?” I never could provide an answer supported by actual research. The CMU provides this answer, with successful SSN discovery percentages being comparatively low in densely populated areas, but comparatively high in sparsely populated areas, and also depending upon the year the SSN was issued…later years are most vulnerable to discovery. It also shows that is it relatively easy to run a computer program to take a partial SSN (such as often found in customer and employee IDs) and then determine the full SSN.
I agree with many opinions; the CMU report itself, meeting necessarily rigorous academic standards and documentation, is not an easy read for the general public, or even for business leaders who must consider and make decisions about authentication, identity validation, and other security controls involving various types of personally identifiable information (PII). But, it truly does point out the ease with which valid, whole, SSNs can be determined when knowing just geography and birth date, and even easier when knowing a portion of the SSN. This should be used to critically look at how businesses truly use SSNs.
As a very simplified bit of background, Social Security numbers (SSNs) have historically been, and are currently, created using an algorithm based largely upon geographical information and birthdate. Until the Internet was widely being used, and prior to wholesale posting of a large amount of personal information to various websites, primarily social networking and other “Web 2.0” sites, the data items used to generate SSNs were not easily found. A person would need to go, often physically, to different locations to gather the different items, or remember to collect the information as they happened upon them over time to be able to determine a person’s SSN.
As I explain a bit more in my blog posting, now it takes a matter of seconds to find these information items online for most folks, and by using computers to apply a comparatively simple formula, valid SSNs can be discovered.
Now consider that, ever since customer IDs and employee IDs were necessary to be used within US organizations to quickly and easily uniquely identify individuals, SSNs were chosen since they 1) already existed for virtually everyone, and 2) were readily and already available to businesses. When email systems and highly distributed computer systems started being used within businesses, throughout the 1980’s, companies realized they shouldn’t be using the full SSNs for their employee IDs, so many started using a large portion of the SSNs along with some characters, often the initials of the individuals’ names, to comprise the employee IDs. Customer IDs largely continued to be the full SSNs, as defined within businesses, up until the early years following 2000, when laws (such as CA SB 168, see http://www.privacyguidance.com/downloads/DoesCALawSB168ApplytoYourBusiness.pdf for more info on this) started prohibiting/limiting the use of SSNs for customer IDs and account access.
Let’s now consider that:

  1. Large portions of SSNs (e.g., the last 6 digits or first 5 digits), and in some organizations the full SSNs, are still widely used as identifiers
  2. Many to most employees have access to see these IDs
  3. Many to most employees are using SSNs, birth city, and birth date as identity verification items, not only online but at the businesses where they have accounts
  4. The numbers of documented and identified identity fraud and identity theft cases continues to grow
  5. Most of the known identity fraud and theft cases are unsolved; they have not discovered the crooks who committed the crimes, but are discovered through individuals seeing fraudulent activities with their names, credit cards, bank accounts, etc.
  6. Large numbers of identity fraud are not known until many months, and too often many years, after the fraudulent activities have started
  7. Growing numbers of insiders are doing bad things with the information to which they have access, but very few are actually caught
  8. The items used to create SSNs are widely and easily found online for many/most individuals

To address some of the questions I got…
1. So what is the likelihood that if you know an individual’s date and state of birth (their required starting points) you can determine that person’s SSN? More to the point, how many attempts are needed to get it right (by way of logon attempts, calls to a help desk, etc.)?
The CMU report shows that, with a fairly simple formula, probably available on multiple Internet sites, valid SSNs can be determined. The likelihood varies greatly depending largely upon a) the densitiy of population for the area and b) the year in which the SSN was assigned. By knowing the SSN, and also knowing the geographic area and birthdate used to generate it, and doing a bit of backwards engineering, information can be found about the likely individual about whom the SSN applies. This does not require attempted logins or calls to the help desk. However, it is important to consider help desks, along with call centers, customer services areas, and any other areas within businesses that have direct contact with customers and consumers. Simple social engineering tactics can often be used to get to sensitive information simply by supplying a valid SSN.
I know many of the business companies I’ve called, when just doing unscientific tests, started out the call with, “May I have your account number please?” To which I say, “Oh, darn; I don’t have that with me! Could I give you my SSN instead?” And usually they say, “Sure; give me the SSN.” Bingo. Social engineering is powerful and used by many crooks.
2. In practice, what is the likelihood that the number of attempts would exceed that allowed by the authentication process – with account suspension and other appropriate actions being taken?
While this is a consideration, and part of the automated applications controls, the ability to generate valid SSNs more specifically highlights the concerns of using SSNs as identifiers and perhaps more importantly, using them, sometimes with birthdate and birth city, as identity verification.
If I have your SSN, birth city and birth date/year, I could go to the businesses where I think you may do business, choose the “forgot password” feature online, and for many of them if I provide any combination of these three items, or even just the SSN, I could get to your information. Or, even more likely, I could call up the businesses where I think you do business, tell them I’m you, and give your SSN, and sometimes also be required to give more information such as birth date and/or birth city, and get into your account to withdraw money or other nefarious actions. Coupling online ID/password resets with required email response activity helps to mitigate these risks.
Also, and probably more importantly, by having what I know are valid SSNs, I could open up all sorts of accounts and do all sorts of fraud and other crimes, really messing up your credit history along with causing you all sorts of other headaches, for a long time before you are even aware that I’m doing bad things. A “breach” technically did not occur, under current laws, if I obtained your SSN by using an automated algorithm, so there would not be any notification from any company that someone had your SSN.
3. How does the number of attempts to access information (or other action) using the SSN compare to the number of attempts to guess a 6-digit password or pin?
This is where the study provides a very revealing statistic; overall an entire SSN can be discovered in less than 1,000 attempts, comparable to a 3-digit PIN, which is generally accepted by information security experts and scholars as being highly unsecure.
4. How much additional residual risk is being taken (i.e., given the likelihood of multiple attempts raising red flags) if an enterprise relies on the SSN?
This is a key question. Now that it has been proven (even though it has been assumed for years) that identifying valid SSNs simply takes running an algorithm using geographic locations and general birth dates, using SSNs to securely verify identity becomes demonstrably much less dependable than previously thought by business leaders.
5. How much additional residual risk is being taken if the SSN is only one of two authentication requirements?
It depends upon the other authentication item and how widely known that item is, or could be. However, you have a good point; if you use the SSN *plus* another item to verify identity, then you’ve made it much harder for a crook to get into your account, as long as those other items are not geographic location and birth date (part of the items in the algorithm used to generate an SSN). However, this still does not address the issue of how getting into accounts is not even necessary now when valid SSNs can be used to open accounts and do any number of crimes.
It is a good thing that the Social Security Administration (SSA) is changing the algorithm for generating SSNs in 2010. However, that will still leave all the SSNs generated before 2010 at risk of being comparatively easily discovered and inappropriately used, regardless of the controls that exist to protect access to the SSN itself within millions of businesses. Think about the millions of SSNs that will continue to be at this risk for many decades to come.
What I encourage you all to think about is…

  • How does your organization use SSNs?
  • Are they used, wholly or partially, as identifiers?
  • Are they used to verify identities in your customer service and other types of call centers?
  • Are they used to verify identities in your online applications?
  • Are they used to verify identities in person?
  • Are they required to open accounts?
  • Do all your personnel have access to IDs that are made with whole or partial SSNs?
  • Do your business partners and contracted workers have access to IDs that are made with whole or partial SSNs?
  • How do the organizations where you do business use SSNs?
  • Do they ask you for your SSN to do business with them?
  • Do they ask you for your SSN when you call them?
  • Do they use your SSN as some, or all, of your account number?
  • Do they require your SSN to change access your account, or change your password, online?
  • Who sees the SSN (partial or whole) you provide to them? Electronically or on hard copy?

People have been concerned for years about SSNs and the many risks for how widely they are used, and how much more widely they continue to be used. As publication of this CMU study gains momentum, the public, your customers and your employees are going to become concerned, and will likely ask how your organization is using SSNs. It is always good to be prepared for how to answer.

Tags: , , , , , , , , , , , ,

Leave a Reply