1. Introduction
This section is non-normative.
Browsers are working to prevent cross-site tracking , which threatens user privacy. In addition to third-party cookies and storage, other client-side methods exist that enable cross-site tracking. Navigational tracking correlates user identities across sites during navigations between those sites. Navigational tracking uses link decoration to convey information, but not all link decoration is tracking. This project attempts to distinguish tracking from non-tracking navigation and to prevent the tracking without damaging similar but benign navigations.
2. Infrastructure
This specification depends on the Infra standard. [INFRA]
3. Terminology
Link decoration is when the source of a hyperlink "decorates" its URL with extra information beyond what’s necessary to identify the page a user wants to navigate to. This information can be placed almost anywhere inside the URL.
Navigational tracking refers to the general use of one or more navigations to identify that a user on one site is the same person as a user on another site. Navigations transmit information cross-site in a few different ways, including in the target URL, which might be decorated , and in the timing of the request.
Examples and non-examples of link decoration and navigational tracking , with the potential decoration or tracking element emphasized:
-
https://publisher.example/page?userId= 5789rhkdsaf8urfnsd -
Link decoration , and also navigational tracking .
-
https://bookshop.org/a/ 1122 /9780062252074 -
Link decoration but not navigational tracking : This number identifies an affiliate to credit with a book sale. Replacing this with another number gets to the same target page.
-
https://bookshop.org/a/1122/ 9780062252074 -
Not decoration: This number identifies a particular book. Changing it yields a different target page.
-
https://bugzilla.mozilla.org/show_bug.cgi?id= 1460058 -
Not decoration: changing the number changes which bug the user sees.
-
https://www.google.com/maps/@ 37.4220328,-122.0847584,17.12z -
Changing the numbers changes what map the user sees, and embedding a user ID would not successfully transfer that user ID to the target site, but it’s hard for an automated system inside a browser to prove that, and even hard for humans reading the URL to be confident of it. [Issue #4]
-
https://publisher.example/unsubscribe?userId= 5789rhkdsaf8urfnsd -
The URL identifies an action rather than a page, and the user ID might be essential for that action to happen. However, this is also clearly a user ID and sufficient to track a user if the source and target collaborate. [Issue #5]
-
https://example.com/auth/callback?token= 1234567 -
This is probably the same case as the unsubscribe link. [Issue #5]
-
https://example.com/login?returnto= item/12345 -
Assuming a request for this URL shows a login page instead of immediately redirecting to
item/12345, this is a link decoration but not navigational tracking .
Bounce
tracking
refers
to
the
use
of
redirects
in
a
top-level
context
(including
HTTP
3xx
statuses
,
meta
elements
with
http-equiv
=
refresh
attributes,
and
script-directed
navigation
that
doesn’t
wait
for
user
input)
along
with
link
decoration
to
join
user
identities
between
sites.
Bounce
tracking
is
a
subset
of
navigational
tracking
and
can
include
automated
navigation
through
the
same
or
different
sites
from
the
source
or
ultimate
destination
of
a
link.
Tracking via a bounce through an aggregation domain:
-
The content publisher’s page (on
publisher.example) embeds a third-party script fromtracker.example. -
The third-party script tries to read an already-stored identifier, for example one it has set into
publisher.example's storage or one read from a third-partytracker.exampleiframe. -
If it can’t, it redirects the top level page to
tracker.exampleusingwindow.location. -
During this load
tracker.exampleis the first party and can read and write its cookie jar. -
tracker.exampleredirects back to the original page URL, with that URL decorated with its user ID in a query parameter. -
The
tracker.exampleuser ID is now available onpublisher.exampleand can be saved into its first-party storage so that future visits don’t need to bounce.
4. Threat model
This section will precisely define the goals and non-goals of this specification’s mitigations. It will define a few classes of actors with the ability to modify websites in particular ways. Then it will define what cross-site information each of these actors can or cannot learn.
4.1. Threat actors
TODO
5. Considered Alternatives
This section is non-normative.
So far, the alternative designs consist of mitigations that various browsers have already deployed.
5.1. Deployed Mitigations
Some browsers have deployed and announced protections against navigational tracking . This section is a work in progress to detail what protections have been shipped and / or are planned. This section is not comprehensive.
5.1.1. Safari
Safari uses an algorithmic approach to combat navigational tracking . Safari classifies a site as having cross-site tracking capabilities if the following criteria are met within a particular client:
-
The site appears as a third-party resource under enough different registrable domains .
-
The site automatically redirects the user to enough other sites, immediately or after a short delay.
-
The site redirects to sites that are classified as trackers, recursively.
For example, consider the case of a user clicking on a link on
start.example, which redirects tosecond.example, which redirects tothird.example, which redirects toend.example. If Safari has classifiedthird.exampleas having tracking capabilities, the above behavior can result in Safari classifyingsecond.exampleas having cross-site tracking capabilities.
If a user navigates or is redirected from a classified tracker with a URL that includes either query parameters or a URL fragment, the lifetime of client-side set cookies on the destination page is capped at 24 hours .
5.1.2. Firefox
Firefox uses a list-based approach to combat navigational tracking . Sites on the Disconnect list are considered tracking sites. All storage for tracking sites is cleared after 24 hours, unless the user has interacted with the site in the first-party context in the last 45 days.
Firefox is also starting to remove query parameters known to be used for cross-site tracking. ( [FSN-2021-Q4] ) The affected query parameters are chosen using the criteria on the Mozilla Anti Tracking Policy , which includes:
-
High-entropy parameters that might identify a user or encode user data, except:
-
Parameters exclusively identifying specific elements or actions on the navigating page (per-click or per-element identifiers), as long as those parameters assign a different value to each click or element they are identifying.
-
Identifiers necessary to complete a user-initiated task such as logging in or submitting a form.
-
-
High-entropy parameters that are broadly included in nearly all outgoing navigations from a site, even if the parameters don’t uniquely identify a user.
As of May 2022, this query-parameter stripping is applied by default in the Firefox Nightly build, and planned to be enabled in strict ETP mode and in private browsing.
5.1.3. Brave
Brave uses four list-based approaches to combat navigational tracking .First, Brave strips query parameters commonly used for navigational tracking from URLs on navigation. This list is maintained by Brave.
Second,
by
default,
when
i)
the
user
is
about
to
visit
a
list-identified
bounce-tracking
URL,
and
ii)
the
current
profile
does
not
contain
any
cookies
or
localStorage
for
that
site,
Brave
will
create
a
new,
"ephemeral",
empty
storage
area
for
the
site.
This
storage
area
persists
as
long
as
the
user
has
any
top-level
frames
open
for
the
site.
As
soon
as
the
user
has
no
top-level
frames
for
the
labeled
bounce-tracking
site,
the
ephemeral
storage
area
is
deleted.
Third, in the non-default, "aggressive blocking" configuration, Brave uses popular crowd-sourced filter lists (e.g., EasyList, EasyPrivacy, uBlock Origin) to identify URLs that are used for bounce tracking, and will preempt the navigation with an interstitial (similar to Google SafeBrowsing), giving the user the option to continue the navigation or cancel it.
Fourth,
Brave
uses
a
list-based
approach
for
identifying
bounce
tracking
URLs
where
the
destination
URL
is
present
in
the
URL
of
the
intermediate
tracking
URL.
In
such
cases,
Brave
will
skip
the
intermediate
navigation
and
request
the
destination
URL
instead.
For
example,
if
Brave
Browser
observes
the
user
about
to
navigate
to
the
URL
https://tracker.example/bounce?dest=https://destination.example/
,
the
browser
might
replace
the
navigation
to
tracker.example/bounce
,
with
a
navigation
to
https://destination.example/
.
This
list
is
maintained
by
Brave,
and
is
drawn
from
a
mix
of
crowd-sourcing
and
existing
open-source
projects.
6. Bounce Tracking Mitigations
The content of this section will provide a "monkey patch" specification for bounce tracking mitigations. There is a Chromium-oriented explainer for this work, but the text in this section is intended for adoption across all browsers. This section is not complete yet, and as the algorithms are developed, they will be specified here and presented for review.
The following is a work-in-progress and does not yet reflect any consensus in the PrivacyCG.
6.1. Data Model
-
TODO: Define how bounce tracking information is stored; e.g. sites, timestamps, etc.
TODO: Define a recurring global timer to run the analyze and delete algorithm.
The user agent holds a user activation map which is a map of site hosts to moments . The moments represent the most recent wall clock time at which the user activated a top-level document on the associated host .
The user agent holds a candidate bounce tracking map which is a map of site hosts to moments . The moments represent the most recent wall clock time at which a page on the given host performed an action that could indicate bounce tracking took place.
The bounce tracking grace period is an implementation-defined duration that represents the length of time after a possible bounce tracking event during which the user agent will wait for an interaction before deleting a host 's storage.
The bounce tracking activation lifetime is an implementation-defined duration that represents how long user activations will protect a host from storage deletion.
The bounce tracking timer period is an implementation-defined duration that represents how often to run the bounce tracking timer algorithm.
TODO: Provide reasonable example values for these constants.
Schemeless
site
is
used
as
the
data
structure
key
because
by
default
cookies
are
sent
to
both
http://
and
https://
pages
on
the
same
domain.
6.2. Algorithms
-
TODO: Define the steps necessary to detect and store a "bounce".
TODO: Define the steps to analyze information in the data model and delete appropriate sites.
6.2.1. User Activation Monkey Patch
To record a user activation given a Document document , perform the following steps:
-
Let navigable be document ’s node navigable .
-
If navigable is null, then abort these steps.
-
Let topDocument be navigable ’s top-level traversable 's active document .
-
Let origin be topDocument ’s origin .
-
If origin is an opaque origin then abort these steps.
-
Let site be the result of running obtain a site given origin .
-
Set user activation map [ site ’s host ] to topDocument ’s relevant settings object 's current wall time .
Append the following steps to the activation notification steps in the user activation processing model :
-
Run record a user activation given document .
6.2.2. Timer
To run the bounce tracking timer algorithm given a moment on the wall clock now , perform the following steps:
For each host -> bounceTime of candidate bounce tracking map :
If bounceTime + bounce tracking grace period is after now , then continue .
Let activationTime be user activation map [ host ].
If activationTime is not null and activationTime + bounce tracking activation lifetime is after now , then continue .
If there is a top-level traversable whose active document 's origin 's site 's host equals host , then continue .
Clear cookies for host given host .
Clear non-cookie storage for host given host .
Clear cache for host given host .
TODO: Do something to prevent repeated deletions, etc.
TODO: Consider if we should do anything when the clock is moved forward or backward.
Every bounce tracking timer period the user agent should run the bounce tracking timer algorithm given the wall clock 's unsafe current time .
6.2.3. Deletion
The cookie and cache clearing algorithms were largely copied from the Clear Site Data spec. It would be nice to unify these in the future.
To clear cookies for host given a host host , perform the following steps:
Let cookieList be the set of cookies from the cookie store whose domain attribute is a domain-match with host .
For each cookie in cookieList :
Remove cookie from the cookie store .
For each storage shed shed held by the user agent or a traversable navigable :
Let cacheList be the set of entries from the network cache whose target URI host equals host .
For each entry in cacheList :
Remove entry from the network cache .
Acknowledgements
Many thanks to the Privacy Community Group for many good discussions about this proposal.