Navigational-Tracking Mitigations

1. Introduction

This section is non-normative.

Browsers are working to prevent cross-site tracking , which threatens user privacy. In addition to third-party cookies and storage, other client-side methods exist that enable cross-site tracking. Navigational tracking correlates user identities across sites during navigations between those sites. Navigational tracking uses link decoration to convey information, but not all link decoration is tracking. This project attempts to distinguish tracking from non-tracking navigation and to prevent the tracking without damaging similar but benign navigations.

2. Infrastructure

This specification depends on the Infra standard. [INFRA]

3. Terminology

Link decoration is when the source of a hyperlink "decorates" its URL with extra information beyond what’s necessary to identify the page a user wants to navigate to. This information can be placed almost anywhere inside the URL.

Navigational tracking refers to the general use of one or more navigations to identify that a user on one site is the same person as a user on another site. Navigations transmit information cross-site in a few different ways, including in the target URL, which might be decorated , and in the timing of the request.

Examples and non-examples of link decoration and navigational tracking , with the potential decoration or tracking element emphasized:

https://publisher.example/page?userId= 5789rhkdsaf8urfnsd: Link decoration , and also navigational tracking .
https://bookshop.org/a/ 1122 /9780062252074: Link decoration but not navigational tracking : This number identifies an affiliate to credit with a book sale. Replacing this with another number gets to the same target page.
https://bookshop.org/a/1122/ 9780062252074: Not decoration: This number identifies a particular book. Changing it yields a different target page.
https://bugzilla.mozilla.org/show_bug.cgi?id= 1460058: Not decoration: changing the number changes which bug the user sees.
https://www.google.com/maps/@ 37.4220328,-122.0847584,17.12z: Changing the numbers changes what map the user sees, and embedding a user ID would not successfully transfer that user ID to the target site, but it’s hard for an automated system inside a browser to prove that, and even hard for humans reading the URL to be confident of it. [Issue #4]
https://publisher.example/unsubscribe?userId= 5789rhkdsaf8urfnsd: The URL identifies an action rather than a page, and the user ID might be essential for that action to happen. However, this is also clearly a user ID and sufficient to track a user if the source and target collaborate. [Issue #5]
https://example.com/auth/callback?token= 1234567: This is probably the same case as the unsubscribe link. [Issue #5]
https://example.com/login?returnto= item/12345: Assuming a request for this URL shows a login page instead of immediately redirecting to item/12345, this is a link decoration but not navigational tracking .

Bounce tracking refers to the use of redirects in a top-level context (including HTTP 3xx statuses , meta elements with http-equiv = refresh attributes, and script-directed navigation that doesn’t wait for user input) along with link decoration to join user identities between sites. Bounce tracking is a subset of navigational tracking and can include automated navigation through the same or different sites from the source or ultimate destination of a link.

Tracking via a bounce through an aggregation domain:

The content publisher’s page (on publisher.example ) embeds a third-party script from tracker.example.
The third-party script tries to read an already-stored identifier, for example one it has set into publisher.example 's storage or one read from a third-party tracker.example iframe.
If it can’t, it redirects the top level page to tracker.example using window.location.
During this load tracker.example is the first party and can read and write its cookie jar.
tracker.example redirects back to the original page URL, with that URL decorated with its user ID in a query parameter.
The tracker.example user ID is now available on publisher.example and can be saved into its first-party storage so that future visits don’t need to bounce.

4. Threat model

This section will precisely define the goals and non-goals of this specification’s mitigations. It will define a few classes of actors with the ability to modify websites in particular ways. Then it will define what cross-site information each of these actors can or cannot learn.

4.1. Threat actors

TODO

5. Considered Alternatives

This section is non-normative.

So far, the alternative designs consist of mitigations that various browsers have already deployed.

5.1. Deployed Mitigations

Some browsers have deployed and announced protections against navigational tracking . This section is a work in progress to detail what protections have been shipped and / or are planned. This section is not comprehensive.

5.1.1. Safari

Safari uses an algorithmic approach to combat navigational tracking . Safari classifies a site as having cross-site tracking capabilities if the following criteria are met within a particular client:

The site appears as a third-party resource under enough different registrable domains .
The site automatically redirects the user to enough other sites, immediately or after a short delay.
The site redirects to sites that are classified as trackers, recursively.

For example, consider the case of a user clicking on a link on start.example, which redirects to second.example, which redirects to third.example, which redirects to end.example. If Safari has classified third.example as having tracking capabilities, the above behavior can result in Safari classifying second.example as having cross-site tracking capabilities.

If a user navigates or is redirected from a classified tracker with a URL that includes either query parameters or a URL fragment, the lifetime of client-side set cookies on the destination page is capped at 24 hours .

5.1.2. Firefox

Firefox uses a list-based approach to combat navigational tracking . Sites on the Disconnect list are considered tracking sites. All storage for tracking sites is cleared after 24 hours, unless the user has interacted with the site in the first-party context in the last 45 days.

Firefox is also starting to remove query parameters known to be used for cross-site tracking. ( [FSN-2021-Q4] ) The affected query parameters are chosen using the criteria on the Mozilla Anti Tracking Policy , which includes:

High-entropy parameters that might identify a user or encode user data, except:
- Parameters exclusively identifying specific elements or actions on the navigating page (per-click or per-element identifiers), as long as those parameters assign a different value to each click or element they are identifying.
- Identifiers necessary to complete a user-initiated task such as logging in or submitting a form.
High-entropy parameters that are broadly included in nearly all outgoing navigations from a site, even if the parameters don’t uniquely identify a user.

As of May 2022, this query-parameter stripping is applied by default in the Firefox Nightly build, and planned to be enabled in strict ETP mode and in private browsing.

5.1.3. Brave

Brave uses four list-based approaches to combat navigational tracking .

First, Brave strips query parameters commonly used for navigational tracking from URLs on navigation. This list is maintained by Brave.

Second, by default, when i) the user is about to visit a list-identified bounce-tracking URL, and ii) the current profile does not contain any cookies or localStorage for that site, Brave will create a new, "ephemeral", empty storage area for the site. This storage area persists as long as the user has any top-level frames open for the site. As soon as the user has no top-level frames for the labeled bounce-tracking site, the ephemeral storage area is deleted.

Third, in the non-default, "aggressive blocking" configuration, Brave uses popular crowd-sourced filter lists (e.g., EasyList, EasyPrivacy, uBlock Origin) to identify URLs that are used for bounce tracking, and will preempt the navigation with an interstitial (similar to Google SafeBrowsing), giving the user the option to continue the navigation or cancel it.

Fourth, Brave uses a list-based approach for identifying bounce tracking URLs where the destination URL is present in the URL of the intermediate tracking URL. In such cases, Brave will skip the intermediate navigation and request the destination URL instead. For example, if Brave Browser observes the user about to navigate to the URL https://tracker.example/bounce?dest=https://destination.example/, the browser might replace the navigation to tracker.example/bounce, with a navigation to https://destination.example/. This list is maintained by Brave, and is drawn from a mix of crowd-sourcing and existing open-source projects.

6. Bounce Tracking Mitigations

The content of this section will provide a "monkey patch" specification for bounce tracking mitigations. There is a Chromium-oriented explainer for this work, but the text in this section is intended for adoption across all browsers. This section is not complete yet, and as the algorithms are developed, they will be specified here and presented for review.

The following is a work-in-progress and does not yet reflect any consensus in the PrivacyCG.

6.1. Data Model

TODO: Define how bounce tracking information is stored; e.g. sites, timestamps, etc.
~~TODO: Define a recurring global timer to run the analyze and delete algorithm.~~

The user agent holds a user activation map which is a map of site hosts to moments . The moments represent the most recent wall clock time at which the user activated a top-level document on the associated host .

The user agent holds a candidate bounce tracking map which is a map of site hosts to moments . The moments represent the most recent wall clock time at which a page on the given host performed an action that could indicate bounce tracking took place.

The bounce tracking grace period is an implementation-defined duration that represents the length of time after a possible bounce tracking event during which the user agent will wait for an interaction before deleting a host 's storage.

The bounce tracking activation lifetime is an implementation-defined duration that represents how long user activations will protect a host from storage deletion.

The bounce tracking timer period is an implementation-defined duration that represents how often to run the bounce tracking timer algorithm.

TODO: Provide reasonable example values for these constants.

Schemeless site is used as the data structure key because by default cookies are sent to both http:// and https:// pages on the same domain.

6.2. Algorithms

TODO: Define the steps necessary to detect and store a "bounce".
~~TODO: Define the steps to analyze information in the data model and delete appropriate sites.~~

6.2.1. User Activation Monkey Patch

To record a user activation given a Document document, perform the following steps:

Let navigable be document ’s node navigable .
If navigable is null, then abort these steps.
Let topDocument be navigable ’s top-level traversable 's active document .
Let origin be topDocument ’s origin .
If origin is an opaque origin then abort these steps.
Let site be the result of running obtain a site given origin.
Set user activation map [ site ’s host ] to topDocument ’s relevant settings object 's current wall time .

Append the following steps to the activation notification steps in the user activation processing model :

Run record a user activation given document.

6.2.2. Timer

To run the bounce tracking timer algorithm given a moment on the wall clock now, perform the following steps:

For each host -> bounceTime of candidate bounce tracking map :
1. If bounceTime + bounce tracking grace period is after now, then continue .
2. Let activationTime be user activation map [ host ].
3. If activationTime is not null and activationTime + bounce tracking activation lifetime is after now, then continue .
4. If there is a top-level traversable whose active document 's origin 's site 's host equals host, then continue .
5. Clear cookies for host given host.
6. Clear non-cookie storage for host given host.
7. Clear cache for host given host.

TODO: Do something to prevent repeated deletions, etc.

TODO: Consider if we should do anything when the clock is moved forward or backward.

Every bounce tracking timer period the user agent should run the bounce tracking timer algorithm given the wall clock 's unsafe current time .

6.2.3. Deletion

The cookie and cache clearing algorithms were largely copied from the Clear Site Data spec. It would be nice to unify these in the future.

To clear cookies for host given a host host, perform the following steps:

Let cookieList be the set of cookies from the cookie store whose domain attribute is a domain-match with host.
For each cookie in cookieList:
1. Remove cookie from the cookie store .

To given a host host, perform the following steps:

For each storage shed shed held by the user agent or a traversable navigable :
1. For each storageKey -> storageShelf of shed:
  1. If storageKey ’s origin is an opaque origin , then continue .
  2. If storageKey ’s origin ’s host does not equal host, then continue .
  3. Delete all data stored in storageShelf.
  4. Remove storageKey from shed.

To clear cache for host given a host host, perform the following steps:

Let cacheList be the set of entries from the network cache whose target URI host equals host.
For each entry in cacheList:
1. Remove entry from the network cache .

Acknowledgements

Many thanks to the Privacy Community Group for many good discussions about this proposal.

Navigational-Tracking Mitigations

Draft Community Group Report, 4 5 April 2023

Abstract

Status of this document

1. Introduction

2. Infrastructure

3. Terminology

4. Threat model

4.1. Threat actors

5. Considered Alternatives

5.1. Deployed Mitigations

5.1.1. Safari

5.1.2. Firefox

5.1.3. Brave

6. Bounce Tracking Mitigations

6.1. Data Model

6.2. Algorithms

6.2.1. User Activation Monkey Patch

6.2.2. Timer

6.2.3. Deletion

Acknowledgements

Conformance

Document conventions

Conformant Algorithms

Index

Terms defined by this specification

Terms defined by reference

References

Normative References

Informative References

Issues Index