1. Introduction
In today’s web, people’s interests are typically inferred based on observing what sites or pages they visit, which relies on tracking techniques like third-party cookies or less-transparent mechanisms like device fingerprinting. It would be better for privacy if interest-based advertising could be accomplished without needing to collect a particular individual’s browsing history.
This specification provides an API to enable ad-targeting based on the people’s general browsing interest, without exposing the exact browsing history.
2. Interest cohort
The interest cohort is a user’s assigned interest group under a particular cohort assignment algorithm . An interest cohort comprises an interest cohort id and an interest cohort version .
The interest cohort id represents the interest group that the user is assigned to by the cohort assignment algorithm . The total number of groups should not exceed 2^32, and each group can mapped to a 32 bit integer. The interest cohort id can be invalid, which means no group is assigned.
The string representation of the interest cohort id is the string representation of the mapped integer of the interest cohort id in decimal (e.g. “17319”). If the interest cohort id is invalid, the string representation will be an empty string.
The interest cohort version identifies the algorithm used to compute the interest cohort id .
The string representation of the interest cohort version is implementation-defined . It’s recommended that the browser vendor name is part of the version (e.g. “chrome.2.1”, “v21/mozilla”), so that when exposed to the Web, there won’t be naming collisions across browser vendors. As an exception, if two browsers choose to deliberately use the same cohort assignment algorithm, they should pick some other way to give it an unambiguous name and avoid collisions.
The
InterestCohort
dictionary
is
used
to
contain
the
string
representation
of
the
interest
cohort
id
and
the
string
representation
of
the
interest
cohort
id
.
dictionary {InterestCohort DOMString ;id DOMString ; };version
3. The API
The
interest
cohort
API
lives
under
the
Document
interface
since
the
access
permission
is
tied
to
the
document
scope,
and
the
API
is
only
available
if
the
document
is
in
secure
context
.
partial interface Document {Promise <InterestCohort >interestCohort (); };
The
interestCohort()
method
steps
are:
-
Let p be a new promise .
-
Run the following steps in parallel :
-
If any of the following is true:
-
this is not allowed to use the "
interest-cohort" feature. -
The document is not allowed to access the interest cohort per user preference settings.
-
The user agent believes that too many high-entropy bits of information have already been consumed by the given document, and exposing an interest cohort would violate a privacy budget.
-
The cohort assignment algorithm is unavailable.
then:
-
Queue a global task on the interest cohort task source given this 's relevant global object to reject p with a "
NotAllowedError"DOMException. -
Abort these steps.
-
-
Let id be interest cohort id from running the cohort assignment algorithm .
-
Let version be the interest cohort version corresponding to the cohort assignment algorithm .
-
Queue a global task on the interest cohort task source given this 's relevant global object to perform the following steps:
-
Let d be the
InterestCohortdictionary, withidbeing the string representation of id , andversionbeing string representation of version . -
Resolve p with d .
-
-
-
Return p .
4. Interpretation
Organizations that wish to interpret cohorts can observe the habits of each interest cohort and ad targeting can then be partly based on what group the person falls into. The browser vendors could publicly share more information about the interest cohort id (e.g. the range of numbers, whether they have semantics, etc.) or the interest cohort version (e.g. the algorithm detail, the compatibility between versions, etc.) to help with their modeling decisions.5. Cohort assignment algorithm
The browser could use machine learning algorithms to develop the interest cohort id to expose to a given document.5.1. Input and output
The input features to the algorithm should be based on information from the browsing history, which may include the URLs, the page contents, or other factors.The input features should be kept local on the browser and should not be uploaded elsewhere.
The output of the algorithm is the interest cohort id .
5.2. Caching the result
For performance concern and/or to mitigate the risk of recovering the browsing history from cohorts , the algorithm could return a cached interest cohort id that was computed recently, instead of computing from scratch.5.3. Privacy guarantees
The algorithm should have the following privacy properties. Sometimes generating an invalid interest cohort id may be helpful to meet these guarantees.5.3.1. Anonymity
The browser should ensure that the interest cohort ids are well distributed, so that each represents thousands of people, where a person is considered to be associated with an interest cohort id if that interest cohort id was recently computed for them. The browser may further leverage other anonymization methods, such as differential privacy.5.3.2. No browsing history recovering from cohorts
The browser should ensure that the interest cohort ids exposed to any given site does not reveal the browsing history.5.3.3. No sensitive cohorts
The browser should ensure that the interest cohort ids are not correlated with sensitive information .6. Permissions policy integration
This
specification
defines
a
policy-controlled
feature
identified
by
the
string
"
interest-cohort
".
Its
default
allowlist
is
*
.
7. Privacy considerations
7.1. Permission
7.1.1. Eligibility for a page to be included in the interest cohort computation
By default, a page is eligible for the interest cohort computation if the
interestCohort()
API
is
used
in
the
page.
The
page
can
opt
itself
out
of
the
interest
cohort
computation
through
the
"
interest-cohort
"
policy-controlled
feature
.
[PERMISSIONS-POLICY]
The user agent should offer a dedicated permission setting for the user to disallow sites from being included for interest cohort calculations.
7.1.2. Permission to access the interest cohort
The page can restrict itself or subframes from accessing the interest cohort through the "
interest-cohort
"
policy-controlled
feature
.
[PERMISSIONS-POLICY]
The API will return a rejected promise if the user has specifically disallowed the site from accessing the interest cohort .
7.1.3. Private browsing / Incognito mode
The interest cohort computation algorithm and the
interestCohort()
API
methods
are
applicable
to
the
private
browsing
mode
as
well.
That
is,
if
the
private
browsing
mode
doesn’t
save
history
at
all,
the
"information
from
the
browsing
history"
is
expected
to
just
be
an
empty
set.
7.1.4. Adoption phase
To make the adoption easier, the user agent may relax the opt-in requirement while third-party cookies still exist. For example, pages with ads resources are an approximation of the pages that are going to opt-in to interest cohort computation in the long run. Thus, at the adoption phase, the page can be eligible to be included in the interest cohort computation if there are ads resources in the page, OR if the API is used.
Additionally,
during
the
adoption
phase,
the
browser
can
use
the
existing
cookie
settings
to
approximate
the
interest
cohort
permission
setting.
For
example,
a
page
is
not
allowed
to
contribute
to
the
interest
cohort
calculation
if
cookies
are
disallowed
for
that
site;
when
cookies
are
cleared,
previous
page
visits
should
not
be
used
for
interest
cohort
computation;
accessing
to
the
interest
cohort
within
a
Document
should
be
denied
if
cookie
access
is
not
allowed
in
the
document,
or
when
third-party
cookies
are
disallowed
in
general.
7.2. Sensitive information
An interest cohort might reveal sensitive information. As a first mitigation, the browser should remove sensitive categories from its data collection. But this does not mean sensitive information can’t be leaked. Some people are sensitive to categories that others are not, and there is no globally accepted notion of sensitive categories.Cohorts could be evaluated for fairness by measuring and limiting their deviation from population-level demographics with respect to the prevalence of sensitive categories, to prevent their use as proxies for a sensitive category. However, this evaluation would require knowing how many individual people in each cohort were in the sensitive categories, information which could be difficult or intrusive to obtain. As an approximation, the browser could use a mechanism for recognizing which web pages are in sensitive categories.
It should be clear that FLoC will never be able to prevent all misuse. There will be categories that are sensitive in contexts that weren’t predicted. Beyond FLoC’s technical means of preventing abuse, sites that use cohorts will need to ensure that people are treated fairly, just as they must with algorithmic decisions made based on any other data today.
7.3. Tracking people via their interest cohort
An interest cohort could be used as a user identifier. It may not have enough bits of information to individually identify someone, but in combination with other information (such as an IP address), it might. One design mitigation is to ensure cohort sizes are large enough that they are not useful for tracking. In addition, if the user agent believes that too many high-entropy bits of information have already been consumed by a given
Document
,
then
the
interestCohort()
algorithm
will
return
a
rejected
promise,
which
can
help
mitigate
such
tracking.
If for any short time period the interest cohorts exposed to different sites tends to be the same, then the time series of interest cohorts can also be used as a user identifier. Sites could associate users' first-party identity with a series of interest cohorts observed over time, and could report these series to a single tracking service. The tracking service could then associate each series with the sites to know the browsing history of an individual.
7.4. Recovering the browsing history from cohorts
Updating the interest cohort too often may increase the likelihood of identifying portions of a user’s browsing history, for instance by using compressed sensing .One possible mitigation is: when the interest cohort is computed and exposed to an origin, pin that interest cohort to that origin for a period of time. When an interest cohort is pinned to an origin, the execution of the cohort assignment algorithm on that origin will return the cached interest cohort instead of computing a new one.
If the browser decide to cache interest cohorts , it should ensure proper handling of data deletion:
-
When site data are deleted, and some cached interest cohorts are derived from any affected site, those interest cohorts should be cleared.
-
When the browsing history is deleted and some cached interest cohorts are derived from any deleted browsing history, those interest cohorts should be cleared.