Search Engine Privacy
Introduction
Internet search engines are the primary means by which individuals access Internet content. Internet users submit more than 15 Billion searches per month. Typically, search engines collect detailed information that is personally identifiable or can be made personally identifiable. This information includes the search terms submitted to the search engine, as well as the time, date, and location of the computer submitting the search. This data is collected for marketing and consumer profiling purposes. Companies also use search engine data to carry out research and compile usage statistics. Search engines also link individuals' names and other personal information with websites and news stories that may be inaccurate, misleading, or harassing.
Search data is one of the most sensitive types of personal information, and its collection and use by Internet firms poses significant consumer privacy risks. As a result of behavioral marketing methods and the potential exposure of sensitive personal information, privacy groups have called for greater protections for search data. Specifically, privacy advocates have called for strict limitations on the collection, retention, and disclosure of information relating to Internet Protocol (IP) addresses. IP addresses are one of the main methods of identifying Internet users. Other methods include browser fingerprinting, tracking cookies, and search query analysis (particularly with regard to vanity searches). Most users are unaware that search engines collect their personally identifiable data. The majority of users polled in 2015 think that online advertisers should not have any information about their online activities.
Top News
- UK Government Releases Statement of Intent Describing New Data Protection Bill: The UK has released a statement of intent describing a forthcoming bill that would make major revisions to the the country's data protection law. The new rules would follow the EU's General Data Protection Regulation by strengthening rules for obtaining consent, making it easier for consumers to withdraw consent, and improving consumers' ability to access, move, and remove data about themselves. The bill would also expand the definition of "personal data" to include DNA and IP addresses and would make it a crime to re-identify individuals from anonymized data. EPIC supported the GDPR and the right to be forgotten, has explained that IP addresses are personal data, and has warned of the risks of improperly "de-identified" data. EPIC recently filed a complaint asking the FTC to investigate Google's use of a proprietary, secret algorithm Google claims can "de-identify" consumers while tracking their purchases. (Aug. 10, 2017)
- Top EU Legal Advisor Says IP Addresses are PII: The Advocate General, top advisor to the European Court of Justice, has issued an opinion today about Internet anonymity. He found that dynamic IP addresses are personal data subject to data protection law. The opinion concerns the case of German pirate party politician and privacy activist Patrick Breyer who is suing the German government over logging visits to government websites. "Generation Internet has a right to access information on-line just as unmonitored and without inhibition as our parents read the paper," says Breyer. The opinion is not legally binding but "is usually a good indication of how the court will eventually rule". EPIC has supported Internet anonymity since the 1990s and brought a similar challenge to the US government tracking of users of government website. (May. 12, 2016) More top news »
Background
IP and MAC Addresses
An Internet Protocol ("IP") address is a numerical identifier that is used by a computer to send and receive data on a network. An IP address for a computer is similar to a telephone number for a telephone, a “housing addresses” of networked devices. Most modern networks use the TCP/IP protocol to communicate, but there are now two different standards used for IP addresses. All computers that connect to IP networks have an assigned IPv4 address, which is a 32-bit address expressed by four numbers separated by dots (e.g. 192.168.1.1). Many modern devices now also use IPv6 addresses, which are 128-bit identifiers expressed by eight groups of hexadecimal numbers separated by colons (though groups of numbers consisting of all zeroes are often omitted to save space).
Due to the limited size of the IPv4 address space (4,294,967,296 total numbers) and to avoid confusion, the Internet Assigned Numbers Authority (IANA) has reserved three "blocks" for use by private networks (the 10/8, 172.16/12, and 192.168/16 prefixes). These private addresses are commonly assigned to computers on local networks for homes, businesses, or educational institutions. As a result, "public" IP addresses can be shared by multiple computers. An single computer can also be assigned multiple IP addresses if it has multiple network interfaces (e.g. wireless, wired, etc). The IPv6 address space, by contrast, is much larger (3.4 × 1038 addresses) and each device can be uniquely identified. In addition to the IP address, each device with a network connection has a unique media access control (MAC) address for each “distinct point of attachment" (network card or interface). Marketing agencies rely on usernames, IP addresses, and other digital identifiers to track users across the web, and to deliver targeted ads.
Behavioral Marketing
The emergence of targeted Internet advertising has led to "behavioral marketing." In the course of recording users' viewing habits and monitoring their search terms, companies collect information about user interests and tastes, including the things they buy, the stories they read, and the websites they visit, in addition to very sensitive personal information. Search terms entered into search engines may reveal a plethora of personal information such as an individual's medical issues, religious beliefs, political preferences, sexual orientation, and investments. The expansion of the behavioral marketing industry, as well as its ability and incentive to monitor online search behavior, has produced significant privacy problems and substantial risks to Internet users. Opaque industry practices result in consumers remaining largely unaware of the monitoring of their online behavior, the security of this information and the extent to which this information is kept confidential. Industry practices, in the absence of strong privacy principles, also prevent users from exercising any meaningful control over their personal data that is obtained.
Right to Be Forgotten
In 2014, the European Court of Justice ruled that European citizens have a limited right to deindex websites from search results of searches of the person’s name. A website is subject to removal if it contains information that is “inadequate, irrelevant or excessive in relation to” the information’s original purpose. In so ruling, the Court concluded that the fundamental right to privacy is greater than the economic interest of the commercial firm and, in some circumstances, the public interest interest in access to information.
Regulation of Search Engines
Public Disclosure of Search Engine Data by US Service Providers
In 2006, America Online (AOL) published three months of search records for 658,000 Americans. AOL attempted to "anonymize" the records, and intended for academics and technologists to use the data for research purposes. The records did not link searches to IP addresses or user names, but did group searches by individual users via randomly-assigned numerical IDs. Subsequent events demonstrated that AOL's storage of numerical IDs as opposed to usernames or IP addresses does not necessarily prevent search data from being linked back to individuals. Though the search logs released by AOL had been "anonymized," identifying the user by only a number, quick research by New York Times reporters matched some user numbers with the correct individuals. Other sources identified sensitive and occasionally disturbing personal information in the AOL search data, including user searches for "how to kill your wife" "anti psychotic drugs," and "aftermath of incest." In response, several privacy groups filed complaints with the Federal Trade Commission.
EU Regulation of Search Engines
The European Union Data Protection Directive requires search engines to "delete or irreversibly anonymise personal data once they no longer serve the specified and legitimate purpose" for which they were collected. Retention of personal data by search engines for more than six months is presumed to be unnecessary. Search engines that retain personal data for longer periods must "demonstrate comprehensively that it is strictly necessary for the service." This requirement applies to IP address data, which virtually all search engines collect each time a user runs a search. The EU also imposes limits on the lifetime of search engines' cookies - small computer files that can track users between multiple sessions and web sites. As a technical matter, every cookie expires eventually, and web sites can easily select the expiration dates for their cookies. EU guidelines prohibit search engines from setting expiration dates farther in the future than necessary to provide search services.
Article 29 Data Protection Working Party
- Article 29 Data Protection Working Party Opinion on data protection issues related to search engines, April 4, 2008.
- Article 29 Data Protection Working Party Statement, February 19, 2008.
- Article 29 Working Group - Main Page.
- The Article 29 Working Group's April 4, 2008 report issued a set of obligations to search engine firms, including:
- Search engines should get informed consent from users if they correlate personal data across different services, such as desktop search;
- Search engine providers must delete or anonymise (in an irreversible and efficient way) personal data once they are no longer necessary for the purpose for which they were collected;
- Personal data should not be held by search engines for longer than six months;
- In case search engine providers retain personal data longer than six months, they must demonstrate comprehensively that it is strictly necessary for the service;
- It is not necessary to collect additional personal data from individual users in order to be able to perform the service of delivering search results and advertisements;
- If search engine providers use cookies, their lifetime should be no longer than demonstrably necessary;
- Search engine providers must give users clear and intelligible information about their identity and location and about the data they intend to collect, store, or transmit, as well as the purpose for which they are collected
EPIC's Work
- EPIC Testimony on Search Engine Privacy in European Parliament.
IP Addresses and Privacy
IP Address Privacy in the United States
In the United States, federal law does not provide uniform privacy protections for personal data submitted to search engines or for IP addresses. Some federal regulations (i.e. 45 C.F.R. § 164.514(b)(O)) treat IP addresses as "individually identifiable" information for specific purposes, but such treatment is not comprehensive.
IP Address Privacy in the European Union
The European Commission classifies IP addresses as personal data. Search engine data falls under the relevant EU data protection directives, and EU regulations generally apply to search engine companies even when they are headquartered outside Europe. Search engines must comply with European privacy provisions if they maintain an establishment in one of the EU Member States, or if they use automated equipment based in one of the Member States for the purposes of processing personal data. European privacy rules limit the collection, use, and disclosure of personal information. The privacy officials who make up the EU Article 29 Working Group have stated that "the protection of the users' privacy and the guaranteeing of their rights, such as the right to access to their data and the right to information as provided for by the applicable data protection regulations, remain the core issues of the ongoing debate."
Corporate Policies Regarding IP Address Privacy
Google, the leading Internet search engine, automatically collects its users' search terms in connection with their IP addresses. Google states that, after collection, it retains the personally identifiable information for 18 months, and then "anonymizes" the data linking search terms to specific IP addresses by erasing the last octect of the IP address.
On December 17, 2008, Yahoo announced that it would erase the last octect of the IP address after 90 days. The search engine company previously retained the data for for 13 months.
Microsoft makes search query data anonymous after 18 months by permanently removing cookie IDs, the entire IP address and other identifiers from search terms.
Ixquick states that it deletes users' search data (including IP addresses) within 48 hours. Ixquick further states that it does not set any uniquely identifying cookies, and that it shares data with 3rd parties only in limited circumstances.
News
- Kelly Fiveash, Google Extends Right-to-be-Forgotten Rules to All Search Sites, ArsTechnica (Mar. 7, 2016).
- Liza Tucker, Op-Ed, The Right to Bury the (Online) Past, Wash. Post, Sept. 13, 2015.
- Zoya Sheftalovich, Google ordered to remove UK search results, Politico EU, August 21, 2015
- Sylvia Tippman and Julia Powles, Google Accidentally Reveals Data on 'Right to Be Forgotten' Requests, The Guardian, July 14, 2014.
- Mark Scott, France Wants Google to Apply ‘Right to Be Forgotten’ Ruling Worldwide or Face Penalties, NY Times, June 12, 2015
- Mario Trujillo, Public Wants "Right to Be Forgotten" Online, The Hill, Mar. 19, 2015
- Daniel Wilson, FTC's Brill Backs Enhanced Consumer 'Right To Obscurity', Law360, Mar. 10, 2015.
- Yahoo to purge user data after 90 days, San Francisco Chronicle, December 18, 2008
- Yahoo to anonymize user data after three months, Computerworld, December 18, 2008
- Yahoo to purge user data after 90 days, Los Angeles Times, December 18, 2008
- Yahoo Limits Retention of Search Data, New York Times, December 17, 2008
- Yahoo to Anonymize User Data After 90 Days, Wired, December 17, 2008
- Yahoo Changes Data-Retention Policy, Washington Post, December 17, 2008
- Yahoo! Sets New Industry Privacy Standard with Data Retention Policy, Yahoo (Press Release), December 17, 2008
- Leading article: In search of online privacy, The Independent, UK, April 9, 2008
- EU To Restrict Time Companies Can Hold Online Search Data, Dow Jones, April 7, 2008
- Search engines warned over data, BBC News, April 7, 2008
- European Groups Says Search Engines Must Delete Search Data Within Six-Months, Search Engine Land, April 7, 2008.
- EU: 18 Months Too Long To Keep Search Data, SecurityProNews, April 7, 2008.
- Google, Yahoo Keep User Data Too Long, EU Group Says, Bloomberg, April 4, 2008.
- Google scrambles to avoid EU privacy regulators, CNET, February 25, 2008.
- I.P. Address: Partially Personal Information, The New York Times, February 24, 2008.
- Google mounts Chewbacca defense in EU privacy debate, The Register, February 23. 2008.
- Google Says I.P. Addresses Aren't Personal, The New York Times, February 22, 2008.
- Google argues against calling IP addresses "personal data," Ars Technica, February 22, 2008.
- Are IP addresses personal?, Google Public Policy Blog, February 22, 2008.
- EU: Search Engines Under EU Rules, Associated Press, February 22, 2008.
- EU data guardians: search engines must obey our rules, The Register, February 22, 2008.
- Search Engines Must Comply With Strict EU Privacy Rules, Mashable, February 22, 2008.
- Google, Yahoo, Microsoft & Other Search Engines Must Comply With EU Privacy Rules, Search Engine Land, February 22, 2008.
- European privacy advocates to issue report in April, International Herald Tribune, February 20, 2008.
- EU Ponders Privacy of Internet Addresses, PC World, January 27, 2008.
- IP addresses could become "personal information" in Europe, Ars Technica, January 22, 2008.
- EU official says IP address is personal, MSNBC, January 21, 2008.
- EU: IP Addresses Are Personal Information, CBS News, January 21, 2008.
- Change in Yahoo Search Retention Leaves Privacy Questions Unresolved. Yahoo announced that, after 90 days, it will obscure some elements in the records that it keeps about all Internet users who use the company's services. The search company will continue to keep modified record locators, time/date stamps, web pages viewed, and a persistent user identifier, known as a "cookie" for an indefinite period. Yahoo is also retaining much of the IP address, which typically identifies a user's device, such as a laptop or a mobile phone. Privacy rules classify IP addresses as "personal data." Experts have criticized the partial deletion of IP address data as insufficient to protect consumers, and called for complete deletion. For more information, see EPIC's Search Engine Privacy page. (Dec. 18, 2008)
- Google "Flu Trends" Raises Privacy Concerns. Google announced this week a new web tool that may make it possible to detect flu outbreaks before they might otherwise be reported. Google Flu Trends relies on individual search terms, such as "flu symptoms," provided by Internet users. Google has said that it will only reveal aggregate data, but there are no clear legal or technological privacy safeguards to prevent the disclosure of individual search histories concerning the flu, or related medical concerns, such as "AIDS symptoms," "ritalin," or "Paxil." Privacy and medical groups have urged Google to be more transparent and publish the algorithm on which Flu Trends data is based so that the public can determine whether the privacy safeguards are adequate. (Nov. 12, 2008)
- European Privacy Officials: Privacy Rules Apply to Search Engines. European privacy officials have established "a clear set of responsibilities" on search engine companies regarding their handling of user data. The opinion, issued by the Article 29 Working Group, states that the European Union Data Protection Directive requires search engines to "delete or irreversibly anonymise personal data once they no longer serve the specified and legitimate purpose" for which they were collected. This requirement has particular significance for search engines, because European privacy rules classify Internet Protocol (IP) addresses as "personal data." The opinion further holds that European privacy laws generally apply to search engines "even when their headquarters are outside [Europe]," and requires that search engines must delete personal data within six months of collection. (Apr. 7, 2008)
- Search Histories Subject to European Privacy Rules. European privacy officials determined this week that companies operating search engines will be subject to European privacy rules that limit the collection, use, and disclosure of personal information. The privacy officials who make up the Article 29 Working Group stated that "The protection of the users' privacy and the guaranteeing of their rights, such as the right to access to their data and the right to information as provided for by the applicable data protection regulations, remain the core issues of the ongoing debate." Earlier this year, EPIC urged the European Parliament to protect the privacy of search histories. A report from the Article 29 Working Group on Search Engines and Privacy is expected in April. (Feb. 22, 2008)
More news items
Share this page:
Subscribe to the EPIC Alert
The EPIC Alert is a biweekly newsletter highlighting emerging privacy issues.