- The Office of the Privacy Commissioner of Canada initiated three
complaints against Google Inc. (Google) on May 31, 2010, pursuant to subsection
11(2) of the Act, after being made aware that Google Street View cars
had been collecting payload data from unencrypted WiFi networks during their
collection of publicly broadcast WiFi signals (service set identifiers [SSID]
information and Media Access Control (“MAC”) addresses.
- The three complaints are as follows:
- Google’s collection, use or disclosure of payload data was done without
the individual's prior knowledge and consent;
- Google’s collection of payload data was done without prior
identification of the purposes for which personal information (PI) was
collected;
- Google’s collection of payload data was not limited to that which was
necessary for the purposes identified.
- Following a request from the German data protection authority in Hamburg
to audit the WiFi data collected by Google’s Street View cars during a
location-based project, Google discovered in May 2010 that it had been
collecting payload data from unsecured wireless networks as part of its
collection of WiFi data. By Google’s own admission, it appears that this
inadvertent collection was due to the integration of the code developed in 2006
with the software used to collect WiFi signals. As a result, Google grounded
its Street View cars, stopped the collection of WiFi network data on May 7,
2010, and segregated and stored all of the data already collected.
- On June 1, 2010, our Office sent a letter to Google stating that she was
launching an investigation with regard to its collection of payload data.
Google responded on June 29, 2010.
- On June 28, 2010, pursuant to subsection 11(2) of the Act, this
Office requested to undertake a site visit to Google’s facility in Mountain
View, California. The purpose of this site visit was twofold: 1) to allow the
review of the payload data gathered by Google, and 2) to ask specific questions
of Google’s representatives, such as the circumstances surrounding this
incident, the segregation and storage of the payload data, and the mitigation
and prevention measures Google intended to implement.
- Google agreed to a site visit. Two technical representatives from this
Office then went to the Mountain View facility on July 19, 2010. Although our
technicians reviewed the payload data, no Google representatives were available
in Mountain View to answer our questions. Instead, by letter dated July 16,
2010, Google answered general questions we posed in a questionnaire we sent on
July 12, 2010.
- On August 18, 2010, a videoconference was held between Google’s counsel
and this Office in order to answer supplementary questions.
- The results of our investigation into the three complaints against
Google are summarized below in the following sections:
- Google’s Product Counsel’s involvement in product review;
- Circumstances surrounding the collection of payload data and technical
testing;
- Personal information collected;
- Segregation and storage of the payload data;
- Google’s future plans for its location-based services; and
- Privacy implications of future plans, and mitigation and prevention
measures that Google intends to implement to prevent a recurrence.
- Google advised that it has a formal review process for each external
product launch. (“External product” denotes a product to be offered to
consumers.) This process requires that a Product Counsel assess, among other
things, the privacy implications of the product.
- Since the code ultimately used to sample all categories of publicly
broadcast WiFi data is not considered by Google to be an external product, the
formal review process did not apply.
- However, our investigation learned that Google’s code design procedure
includes a template and process by which the code must be reviewed by Product
Counsel before being used or integrated with another Google product. The
template—a methodology document—is in fact mandatory and is the first step in
the code design procedure.
- Our investigation also learned that in the code design-procedure
document for the particular code later to be used for the collection of WiFi
signals, the engineer did identify one or more privacy concerns about the
information collection. These relate to the fact that Google could obtain
sufficient data to precisely triangulate a user’s position at a given time.
- The engineer qualified his concerns as being “superficial privacy
implications”. He did not forward his code design documents to Product Counsel
for review—contrary to company procedure. Thus, the code’s privacy implications
were never assessed.
- We were also informed that Google’s Product Counsel Members consist of
practising lawyers with various legal backgrounds. Google claims that they
usually have some private-sector experience in privacy issues.
- According to Google, Product Counsel Members attend the same
introductory training session available to all new Google employees. As well,
Product Counsel Members participate in weekly privacy- and security-issue
meetings. Google also claims that “Privacy is part of the ongoing CLE
[Continuing Legal Education] obligations of Google counsel.”
- Google allows its engineers to use 20% of their time to work on projects
of interest to them. When using this time in 2006, a Google engineer developed
code to sample all categories of publicly broadcast WiFi data.
- The engineer involved included lines to the code that allowed for the
collection of payload data. He thought it might be useful to Google in the
future and that this type of collection would be appropriate.
- This code was later used by Google when it decided to launch a
particular location-based service. The service relies on a variety of signals
(such as GPS, the location of cell towers and the location of WiFi access
points) to provide the user with a location. Google installed antennas and
appropriate software (including Kismet, an open-source application) on its
Google Street View cars in order to collect publicly broadcast WiFi radio
signals within the range of the cars while they travelled through an area.
These signals are then processed to identify the WiFi networks (using their MAC
address) and to map their approximate location (using the GPS co-ordinates of
the car when the signal was received). This information on the identity of WiFi
networks and their approximate location then populates the Google
location-based services database.
- In its representations to this Office, Google provided technical
information on how it uses WiFi network data for location-based services.
Google stated that its software does not store payload transmissions from
encrypted networks, but that payload data sent over unencrypted WiFi
networks is collected and “dumped” on a disk in raw format.
- However, according to Google, the information thus collected would be
fragmented because its cars are on the move when collection occurs and the
equipment it uses to collect WiFi signals automatically changes channels five
times per second.
- To our investigation, Google acknowledged that it erred in including in
the WiFi-network information-collecting software any code allowing the
collection of payload data. Google contends that the code was primarily
designed for data-collection software and that this purpose preceded its
ultimate application in the collection of WiFi network information for
location-based services. Google claims that it did not realize the presence of
this code when it began using the software for its geo-location project.
- It claims that when the decision was made to use the software for
collecting publicly broadcast WiFi information, the code was reviewed for bugs
and validated by a second engineer before being integrated with, and installed
on, Street View cars. The purpose of this review was to ensure the code did not
interfere with normal Street View operations. The code was not further examined
to verify what kind of data was actually being obtained through the collection
of WiFi publicly broadcast signals.
- Google admitted that since it was not its intention to collect payload
data and it never intended to use payload data in any of its products, it was
not in a position to identify any purposes for the collection of these data or
seek consent from affected individuals. Google also admitted that it did not
inform any affected individuals of the fact that it was collecting payload data
since its employees did not realize they were doing so until May 2010.
- Google provided three reasons to explain why the collection of payload
data was not discovered earlier:
- No one other than the engineer who developed the code was interested in
looking at this program. No one thought payload data would be useful and no one
had planned to use this data.
- Payload data comprised a minuscule amount of the total data collected.
Its collection was thus of minimal concern and no one had any reason to examine
it.
- The engineer had not seen the ramifications of including this code and,
consequently, had not spoken of it with his manager.
- Google also asserted that since it had no purpose for the collection of
payload data, there cannot be any justification for its retention.
Consequently, Google is anticipating its secure destruction as soon as possible
and is seeking this Office’s authorization to do so.
- Our investigation revealed that Google collected WiFi data in Canada
from March 30, 2009 to May 7, 2010, and that its Street View cars have driven
most urban areas and major roads.
- Google stated that it cannot accurately distinguish between WiFi
networks and wireless devices. It can, however, identify the unique number of
basic service set identifiers (a.k.a. BSSIDs), which generally identify a
single WiFi access point. Although the BSSID does identify an access point, it
does not indicate how many devices or networks connect through the access
point.
-
Google estimates that it collected over 6 million BSSIDs over the period
its Street View cars drove throughout Canada.
- Our two technical experts visited Google’s offices in Mountain View,
California on July 19 and 20, 2010. The purpose of this site visit was for them
to examine the data that had been collected by Google’s Street View cars for
Google’s location-based services so as to determine its nature and the quantity
involved. Their examination focussed on finding examples of personal
information within the WiFi payload data collected in Canada.
- Our technical experts searched the payload data to find anything that
could constitute personal information (e.g., examples of e-mail, usernames,
passwords and phone numbers). They produced an approximate count of possible personal
information through an automated search. For example, the count included 787 e‑mail
headers and 678 phone numbers. However, a match does not mean a perfect
identification. The searches may have included irrelevant items, or missed some
items.
- To complement the automated search, our experts performed a manual
verification for five instances of each type of personal information. This was
to demonstrate the existence of each data type, while preventing our experts
from intruding too deeply into any individual’s personal information.
- Our technical experts found at least five instances of e-mails where
they noted the presence of e-mail addresses, complete e-mail headers, IP
addresses, machine hostnames, and contents of messages. The messages were
truncated in the five instances of e-mails they found, but when performing a
manual verification for other items (e.g., phone numbers), they observed
complete e-mail messages.
- They also found five instances of usernames. These could be seen in
cookies, MSN messages and chat sessions. They also found one instance where a
password and username were included in an e-mail message that a person was
sharing with others to tell them how to log in to a server.
- Our experts also found at least five instances of real names of
individuals, five instances of residential addresses and five more of business
addresses. They noted that, unlike the residential addresses, the business
addresses were very common.
- They also found five instances of instant messenger headers and five
instances of phone numbers—both business and personal phone numbers. Like
business addresses, business phone numbers were easier to find than personal
ones.
- A search for nine-digit or sixteen-digit numbers, which could have been
Social Insurance Numbers (SIN) or credit card numbers, did not turn up anything
due to there being too many other instances of irrelevant or similar numbers in
the dataset. Therefore, although we found no evidence of SIN or credit cards
numbers being collected, we still cannot entirely rule out the possibility that
they were.
- Our technical experts also noticed sensitive items during their
searches. For example, they found a list of names, phone numbers, addresses and
medical conditions for specified individuals. They also found a reference to
someone stopped for a speeding violation, along with address information.
- Our experts often saw cookies being passed from client machines to Web
servers. These cookies were unencrypted and some contained personal
information, including IP addresses, user names and postal addresses. They were
surprised by the frequency of unencrypted cookies containing personal
information.
- In summary, our experts found many instances of personal information in
the sample they took of the payload data collected in Canada by Google.
- The WiFi data was collected through WiFi antennas attached to the roof
of Street View cars. This WiFi antenna passively received the publicly
broadcast radio signals within range of the car using open-source Kismet
software. The data was then relayed to a Google-developed application called
“gStumbler” and its executable program “gslite”, which processed the data for
storage. The data was then saved to hard drives physically located in each
Street View car and then subsequently transferred to Google’s servers.
- Google alleges it grounded its Street View cars and segregated the
payload data on a restricted area of its network as soon as it became aware
that its gStumbler application was collecting payload data from unencrypted
WiFi networks.
- As a follow up step, a Google system administrator copied onto a total
of four disks the files containing the payload data collected in all affected
countries. This was done from May 9, 2010, to May 13, 2010. These disks
contained two copies of the data: one copy obtained after categorizing and
labelling the data files by country, and one copy of the data before
categorizing.
- On May 15, 2010, the system administrator consolidated the payload data
onto an encrypted hard drive, segregated by country. A second copy of the
encrypted hard drive was made for security and backup preservation. The four
original disks were then destroyed in a disk deformer.
- A Google employee personally delivered one encrypted hard drive to
another Google location for safekeeping, while the system administrator kept
the other one in a secure location. Once the Google employee arrived at the
destination, the system administrator permanently destroyed the backup,
encrypted hard drive. The US data was then segregated onto a separate encrypted
drive, while the data from the rest of the world remained on the initial
encrypted drive.
- Google still intends to offer location-based services, but does not
intend to resume collection of WiFi data through its Street View cars.
Collection is discontinued and Google has no plans to resume it.
- Google does not intend to contract out to a third party the collection
of WiFi data.
- Google intends to rely on its users’ handsets to collect the information
on the location of WiFi networks that it needs for its location-based services
database. The improvements in smart-phone technology in the past few
years have allowed Google to obtain the data it needs for this purpose from the
handsets themselves.
- Although it has no tracking tool to keep records of a customer’s
locations (and does not intend to create one), Google acknowledges that it does
need to examine the potential privacy concerns of this method of collection.
- Google submits that it is striving to design privacy protections into
all its products and services. It states that its employees receive orientation
and code-of-conduct training that includes a privacy and data-security
component. However, the responsibility of aligning Google’s projects with its
Privacy Principles and Privacy Policy lies with each of its product and
engineering teams.
- Google also states that as products are chartered or otherwise provided
with resources and staffing, they are assigned to a Product Counsel in Google’s
legal department. This individual has a first-level responsibility for
identifying privacy issues in a product.
- In order to avoid a recurrence of a product design having a negative
impact on privacy, Google claimed to be reviewing its product launch
procedures, code review procedures and 20% time policy. In so doing, it
would ensure that its internal controls are robust enough to adequately address
future issues. As of the issue date of this report, Google’s review of its
procedures/policies has not yet been completed.
- In making our determinations, we applied Principles 4.1.1 and 4.1.2 of
the Personal Information Protection and Electronic Documents Act.
Principle 4.1.1 stipulates that accountability for the organization’s
compliance with the principles rests with the designated individual(s), even
though other individuals within the organization may be responsible for the
day-to-day collection and processing of personal information. In addition,
other individuals within the organization may be delegated to act on behalf of
the designated individual(s). Principle 4.1.2 continues that the identity of
the individual(s) designated by the organization to oversee the organization’s
compliance with the principles shall be made known upon request.
- We also applied Principle 4.2, which states that the purpose for which
personal information is collected shall be identified by the organization at or
before the time the information is collected.
- Principle 4.3 states that the knowledge and consent of the individual
are required for the collection, use, or disclosure of personal information,
except where inappropriate
- Lastly, Principle 4.4 states that the collection of personal information
shall be limited to that which is necessary for the purposes identified by the
organization.
- On September 15, 2010, I shared an earlier version of this report with
Google and invited their response. Taking into consideration their response, I
have revised my preliminary letter of findings. What follows is a summary of
our findings and recommendations.
- During their site visit, our technical experts uncovered substantial
amounts of personal information in the form of e-mail message content (e.g., e‑mail,
IP and postal addresses), captured in Google’s collection of payload data in
Canada.
- Google acknowledged to this Office that it did collect payload data, but
not with the intent of using it in any of its products. According to Google, it
was “simply mistaken” in collecting the data and did not seek consent from the
affected individuals. Principle 4.3 of the Act requires that the
knowledge and consent of the individual be obtained for the collection, use or
disclosure of their personal information.
- Google also stated that it had not identified any purposes for the
collection of the payload data. Principle 4.2 requires that such a purpose be
identified at or before the time of collection. Further, Principle 4.4
stipulates that the collection of personal information be limited to that which
is necessary for the purposes identified. Since no purpose could be identified,
it follows that the collection in this case clearly could not be limited to any
specific purpose. This is in violation of Principle. 4.4.
- Due to the engineer’s failure to forward his design document to the
Product Counsel, the Counsel was unable to assess the privacy implications of
the code designed to collect WiFi data. This is a careless error that I take
very seriously since a review of design documents by a Product Counsel (and the
use of a template) is clearly a mandatory step in Google’s code design
procedure.
- As a result, the un-scrutinized code was later used to collect data
containing personal information. If the Product Counsel had been involved when
and as it should have been, Google may have discovered the risk of data
over-collection and would have been in a position to remedy the situation
before any collection took place. The ensuing negative effects on citizens’
privacy and Google’s reputation could easily have been avoided.
- Google informed our Office that engineering and product teams are
accountable for complying with Google’s privacy policies and principles. Google
then stated that it is working towards improving its code-and-product review
processes, as well as accountability mechanisms, for engineering and product
management personnel in order to improve their sensitivity to privacy issues at
all stages of product and code development. A legal team is working with
engineering directors to ensure a comprehensive review of codes for any privacy
issues. Google believes that the review of its policies and procedures that it
has undertaken will ensure no recurrences. Google stated that it will keep this
Office informed as Google completes its review.
- Google asserted that the engineer who developed the lines of code did
not see its ramifications of ultimately allowing the collection of a broader
range of data from wireless networks. Our investigation was not able to
determine with certainty if this was a one-time error committed by one
individual or, perhaps, a sign of a more generalized lack of awareness among
employees with regards to privacy implications of new products. At Google, the
effects of new products on privacy should be well understood not only by the
Product Counsel but also by the professionals who develop these products.
-
In this case, the review and testing of the product containing the code
were insufficient to assess privacy impact. It would appear that the review
consisted merely of ensuring that the product did not interfere with a second
application—that used to collect pictures of the streets navigated by Street
View vehicles.
- As our investigation revealed, the review was not able to assess the
extended capabilities of the product—including its ability to collect more
information than necessary for the location-based project.
- Once Google realized its Street View cars were collecting more data from
wireless networks than anticipated, Google expressed regret in inadvertently
collecting the publicly broadcast data. It immediately grounded its vehicles
and took measures to safeguard the collected payload data and segregate it by
country of origin.
- Google’s actions were justified, appropriate and sufficient to safeguard
the payload data collected in Canada. In my view, Google upheld the
related safeguard provisions under the Act.
- Concerning the data that Google collected, it affirmed that it has
no desire to use the Canadian payload data in any manner and will continue to
secure the data with strenuous access restrictions until it is deleted.
- To this, I would like to add that not only privacy laws, but other
applicable laws in the U.S. and in Canada, including laws of evidence, must
also be taken into account in determining when to delete the Canadian payload
data collected.
- The fact that Google does not intend to resume collection of WiFi data
with its Street View cars eliminates the possibility of further inappropriate
collection of personal information through the tool developed by its engineer.
- However, from users’ handsets, Google intends to obtain the information
needed to populate its location-based services database. This alternative
method of collection could also lead to inappropriate collection and retention
of personal information if Google does not put in place appropriate safeguard
measures.
- I share Google’s goal to avoid recurrences of any similar violations of
individuals’ privacy. While I am pleased that Google has taken under review its
processes and procedures that could impact privacy, I would nonetheless like
the organization to ensure that these controls are complemented by an
overarching governance model embodying all privacy issues pertaining to the
design of internal/external products and services. I would also like Google to
respect reasonable timelines to implement both the governance model and the
revised processes and procedures. With this view, and after reviewing the
additional information Google provided this Office, I am making the following
recommendations:
- That Google re-examine and improve the privacy training it provides all
its employees, with the goal of increasing staff awareness and understanding of
Google’s obligations under privacy laws.
- That Google ensure it has a governance model in place that includes:
- effective controls to
ensure that all necessary procedures to protect privacy have been duly followed
prior to the launch of any product;
- clearly designated
and identified individuals actively involved in the process and accountable for
compliance with Google’s obligations under privacy laws.
-
That Google delete the Canadian payload data it collected, to the extent
that Google is allowed to do so under Canadian and U.S. laws. If the Canadian
payload data cannot immediately be deleted, the data needs to be properly
safeguarded and access thereto is to be restricted.
- At this time, I consider the matter to be well-founded and still unresolved. My Office will only consider the matter resolved upon receiving either by
or before February 1, 2011, confirmation of the implementation of the above
recommendations, at which point I will issue my final report and conclusions.
|