Defend Privacy. Support EPIC.

EPIC's Mission
Focusing public attention on emerging privacy and civil liberties issues

Re-identification

Concerning the Re-Identification of Consumer Information

Latest News

Background

Introduction

Re-identification is the process by which anonymized personal data is matched with its true owner. In order to protect the privacy interests of consumers, personal identifiers, such as name and social security number, are often removed from databases containing sensitive information. This anonymized, or de-identified, data safeguards the privacy of consumers while still making useful information available to marketers or datamining companies. Recently, however, computer scientists have revealed that this anonymized data can easily be re-identified, such that the sensitive information may be linked back to an individual. The re-identification process implicates privacy rights, because consumers are not aware of how their personal information is used and have little to no control over the disclosure and use of this information.

Prescriber-Identifiable Data

Datamining companies often purchase and collect doctors’, dentists’, and nurse practitioners’ prescription information, or prescriber-identifiable data, without their knowledge or consent, in order to sell to research institutions, private companies, or - their best clients by far - pharmaceutical companies. In a process called “detailing,” pharmaceutical sales representatives analyze this data to identify trends or habits of prescribers, thus allowing them to tailor their marketing of prescription drugs to individual prescribers. Pharmaceutical sales representatives only detail brand-name drugs, which are more expensive, although not necessarily more effective, than their generic counterparts. In an effort to curb prescription drug costs and to protect the privacy of their citizens’ prescription information, several states have enacted, or considered enacting, statutes that limit pharmaceutical datamining activities and provide prescribers and consumers with more control over their prescription information. Datamining companies, such as IMS Health and Verispan, LLC, have aggressively opposed these laws, arguing, among other things, that the prescription data is de-identified, and thus the privacy interests of consumers are not implicated in the detailing process.

Credit Card Data

EPIC's Interest

Law Concerning Anonymized Data

Currently, there is no federal law that provides a general right of information privacy. However, there are federal statutes that do offer consumers protection for particular types of information.

Fair Credit Reporting Act

As early as 1899, creditors, collection agencies, and employers started to furnish information to consumer reporting agencies (CRAs), which then collected and disclosed the credit information of consumers. By the 1960’s, significant controversy surrounded the CRAs because their reports were sometimes used to deny services and opportunities, and individuals had no right to see what was in their file. There was abuse in the industry, including requirements that investigators fill quotas of negative information on data subjects. To do this, some investigators fabricated negative information, while others included incomplete information.

The FCRA was passed to address the growing credit reporting industry in the United States that compiled "consumer credit reports" and "investigative consumer reports" on individuals. The FCRA was the first federal law to regulate the use of personal information by private businesses. Because credit reports can include sensitive personal information and because they are used to evaluate the ability to participate in so many different activities in modern life, they are subject to regulations that follow a framework of Fair Information Practices.

The FCRA limits the information included in consumer credit reports and investigative consumer reports, as well as the use and disclosure of such reports. Target marketing is not allowed, and consumer consent is required before medical information is released. Further, consumers have the right to request their credit file by contacting a CRA directly. For more information, see EPIC’s page on the Fair Credit Reporting Act and the Privacy of Your Credit Report.

Gramm-Leach-Biley Act

The Gramm-Leach-Bliley Act (GLBA), which is also known as the Financial Services Modernization Act of 1999, provides limited privacy protections against the sale of your private financial information. Additionally, the GLBA codifies protections against pretexting, the practice of obtaining personal information through false pretenses.

The GLBA primarily sought to "modernize" financial services--that is, end regulations that prevented the merger of banks, stock brokerage companies, and insurance companies. The removal of these regulations, however, raised significant risks that these new financial institutions would have access to an incredible amount of personal information, with no restrictions upon its use. Prior to GLBA, the insurance company that maintained your health records was distinct from the bank that mortgaged your house and the stockbroker that traded your stocks. Once these companies merge, however, they would have the ability to consolidate, analyze and sell the personal details of their customers' lives. Because of these risks, the GLBA included three simple requirements to protect the personal data of individuals: First, banks, brokerage companies, and insurance companies must securely store personal financial information. Second, they must advise you of their policies on sharing of personal financial information. Third, they must give consumers the option to opt-out of some sharing of personal financial information.

Health Insurance Portability and Accountability Act

The HIPAA Privacy Rule (45 CFR Parts 160 and 164) provides the "federal floor" of privacy protection for health information in the United States, while allowing more protective ("stringent") state laws to continue in force. Under the Privacy Rule, protected health information (PHI) is defined very broadly. PHI includes individually identifiable health information related to the past, present or future physical or mental health or condition, the provision of health care to an individual, or the past, present, or future payment for the provision of health care to an individual. Even the fact that an individual received medical care is protected information under the regulation.

The Privacy Rule establishes a federal mandate for individual rights in health information, imposes restrictions on uses and disclosures of individually identifiable health information, and provides for civil and criminal penalties for violations. The complementary Security Rule includes standards for protection of health information in electronic form.

Children’s Online Privacy Protection Act

The Children's Online Privacy Protection Act ("COPPA") specifically protects the privacy of children under the age of 13 by requesting parental consent for the collection or use of any personal information of the users. The Act took effect in April 2000. The Act was passed in response to a growing awareness of Internet marketing techniques that targeted children and collected their personal information from websites without any parental notification. The Act applies to commercial websites and online services that are directed at children. The main requirements of the Act that a website operator must comply with include: • Incorporation of a detailed privacy policy that describes the information collected from its users. • Acquisition of a verifiable parental consent prior to collection of personal information from a child under the age of 13. • Disclosure to parents of any information collected on their children by the website. • A Right to revoke consent and have information deleted. • Limited collection of personal information when a child participates in online games and contests. • A general requirement to protect the confidentiality, security, and integrity of any personal information that is collected online from children.

De-identified Data and Free Speech

The Process of Re-Identification

The Netflix Study

In 2006, Netflix released data pertaining to how 500,000 of its users rated movies over a six-year period. Netflix “anonymized” the data before releasing it by removing usernames. Still, Netflix assigned unique identification numbers to users in order to allow for continuous tracking of user ratings and trends. Researchers used this information to uniquely identify individual Netflix users. According to the study, if a person has information about when and how a user rated six movies, that person can identify 99% of people in the Netflix database.

AOL Data Release

In 2006, as part of its AOL Research initiative, AOL posted 20 million search queries from 650,000 of its users over a three-month span. AOL attempted to de-identify the data before releasing it by removing IP addresses and usernames to protect the privacy of AOL users. However, because AOL wanted to uniquely identify the data for research purposes, it replaced the usernames and IP addresses with identification numbers, so that a user’s searches would still be connected to the user. Because this data was still linked with unique identification numbers, researchers could link search queries with the individuals who conducted the searches. For example, many users made search queries that identified their city, or even neighborhood, their first and/or last name, and their age demographic. With such information, researchers were able to narrow down the population to the one individual responsible for the searches. In the aftermath of this data release, the researcher responsible for releasing the data was dismissed and the Chief Technology Officer resigned. Still, one of the dangers of releasing such re-identified personal information remains, which is that the potential for future breaches is much higher.

Unique Identification Through Zip Code, Sex, Birthdate

Latanya Sweeney, a computer science professor, conducted a study in 1990 using census data, and found that zip code, birth date, and sex could be combined to uniquely identify 87% of the United States population. To illustrate this threat, Sweeney gathered data from a government agency called Group Insurance Commission (GIC) in order to reveal the identity of a Massachusetts governor. GIC, a purchaser of health insurance for employees, released records of state employees to researchers. GIC, with the support of Governor Weld of Massachusetts, removed names, addresses, social security numbers, and other identifying information, in order to protect the privacy of these employees. Governor Weld assured Massachusetts residents that the release information would remain private.

Sweeney purchased voter rolls, which included name, zip code, address, sex, and birth date of voters in Cambridge, where Governor Weld resided, and combined the information with GIC’s data and easily found the governor. From GIC’s databases, only six people in Cambridge were born on the same day as the governor, half of them were men, and the governor was the only one who lived in the zip code provided by the voter rolls. The information in the GIC database on the Massachusetts governor included prescriptions and diagnoses.

How Data is Re-identified

In each of the above cases, data was re-identified by combining two datasets with different types of information about an individual. One of the datasets contained anonymized information; the other contained outside information - generally available to the public - collected on a daily or routine basis (such as voter registration information). By combining information from each of these datasets, researchers can uniquely identify individuals in the population. While companies tend to focus on the removal of personally-identifiable information (PII), the studies above show that re-identification can occur even by combining non-PII, such as movie ratings in the Netflix study or search engine queries in the AOL example.

News Reports

Related Resources

Electronic Privacy Information Center - Contact Info