1. Introduction
This section is not normative.
Web applications store data locally on a user’s computer in order to provide functionality while the user is offline, and to increase performance when the user is online. These local caches have significant advantages for both users and developers, but present risks as well.
A user’s data is both sensitive and valuable; web developers ought to take reasonable steps to protect it. One such step would be to encrypt data before storing it. Another would be to remove data from the user’s machine when it is no longer necessary (for example, when the user signs out of the application, or deletes their account).
Site
authors
can
remove
data
from
a
number
of
storage
mechanisms
via
JavaScript,
but
others
are
difficult
to
deal
with
reliably.
Consider
cookies,
for
instance,
which
can
be
partially
cleared
via
JavaScript
access
to
document.cookie
.
HttpOnly
cookies,
however,
can
only
be
removed
via
a
number
of
Set-Cookie
headers
in
an
HTTP
response.
This,
of
course,
requires
exhaustive
knowledge
of
all
the
cookies
set
for
a
host,
which
can
be
complicated
to
ascertain.
Cache
is
still
harder;
no
imperative
interface
to
a
browser’s
network
cache
exists,
period.
This document defines a new mechanism to deal with removing data from these and other types of local storage, giving web developers the ability to clear out a user’s local cache of data via the Clear-Site-Data HTTP response header.
1.1. Examples
1.1.1. Signing Out
https://supersecretsocialnetwork.example.com/logout
,
and
the
site
author
wishes
to
ensure
that
locally
stored
data
is
removed
as
a
result.
They can do so by sending the following HTTP header in the response:
Clear-Site-Data: "cache", "cookies", "storage", "executionContexts"
1.1.2. Targeted Clearing
https://megacorp.example.com/logout
.
Megacorp
has
a
large
number
of
services
available
as
subdomains,
so
many
that
it’s
not
entirely
clear
which
of
them
would
be
safe
to
clear
as
a
response
to
a
logout
action.
One
option
would
be
to
simply
clear
everything,
and
deal
with
the
fallout.
Megacorp’s
CEO,
however,
once
lost
hours
and
hours
of
progress
in
"Irate
Ibexes"
due
to
inadvertent
site-data
clearing,
and
so
refuses
to
allow
such
a
sweeping
impact
to
the
site’s
users.
The developers know, however, that the "Minus" application is certainly safe to clear out. They can target this specific subdomain by including a request to that subdomain as part of the logout landing page (ideally as a CORS-enabled, CSRF-protected POST):
fetch("https://minus.megacorp.example.com/clear-site-data", { method: "POST", mode: "cors", headers: new Headers({ "CSRF": "[insert sekrit token here]" }) });
That endpoint would return proper CORS headers in response to that request’s preflight, and would return the following header for the actual request:
Clear-Site-Data: "cache", "cookies", "storage", "executionContexts"
1.1.3. Keep Critical Cookies
https://ads-are-awesome.example.com/optout
.
The
site
author
wishes
to
remove
DOM-accessible
data
which
might
contain
tracking
information,
but
needs
to
ensure
that
the
opt-out
cookie
which
the
user
has
just
received
isn’t
wiped
along
with
it.
They can do so by sending the following HTTP header in the response, which includes all the types except for " cookies ":
Clear-Site-Data: "cache", "storage", "executionContexts"
1.1.4. Kill Switch
They can reduce the risk of a persistent client-side XSS by sending the following HTTP header in a response to wipe out local sources of data:
Clear-Site-Data: "cache", "cookies", "storage", "executionContexts"
Note: Installing a Service Worker guarantees that a request will go out to a server every ~24 hours. That update ping would be a wonderful time to send a header like this one in case of catastrophe. [SERVICE-WORKERS]
1.2. Goals
Generally, the goal is to allow web developers more control over the data stored locally by a user agent for their origins. In particular, developers should be able to reliably ensure the following:
-
Data stored in an origin’s client-side storage mechanisms like [INDEXEDDB] , WebSQL, Filesystem,
localStorage
, andsessionStorage
is cleared. -
Cookies for an origin’s host are removed [RFC6265] .
-
Web Workers (dedicated and shared) running for an origin are terminated.
-
Service Workers registered for an origin are terminated and deregistered.
-
Resources from an origin are removed from the user agent’s local cache.
-
The Accept-CH cache for an origin is purged.
-
None of the above can be bypassed by a maliciously active document that retains interesting data in memory, and rewrites it if it’s cleared.
2. Infrastructure
This
document
uses
ABNF
grammar
to
specify
syntax,
as
defined
in
[RFC5234]
and
updated
in
[RFC7405]
,
along
with
the
#rule
extension
defined
in
Section
7
of
[RFC7230]
,
and
the
quoted-string
rule
defined
in
Section
3.2.6
of
the
same
document.
This document depends on the Infra Standard for a number of foundational concepts used in its algorithms and prose [INFRA] .
3. Clearing Site Data
Developers
may
instruct
a
user
agent
to
clear
various
types
of
relevant
data
by
delivering
a
Clear-Site-Data
HTTP
response
header
in
response
to
a
request.
3.1.
The
Clear-Site-Data
HTTP
Response
Header
Field
The
Clear-Site-Data
HTTP
response
header
field
sends
a
signal
to
the
user
agent
that
it
ought
to
remove
all
data
of
a
certain
set
of
types.
The
header
is
represented
by
the
following
grammar:
Clear-Site-Data = 1#( quoted-string ) ; #rule is defined in Section 7 of RFC 7230.
§ 3.2 Fetch Integration and § 4.1 Parsing describe how the Clear-Site-Data header is processed.
This document defines an initial set of known data types which can be cleared using this mechanism. See their descriptions below. Future versions of the header can support additional datatypes, which MUST comply with the quoted-string grammar. User agents MUST ignore unknown types when parsing the header.
-
"
cache
" -
The "
cache
" type indicates that the server wishes to remove locally cached data associated with the origin of a particular response ’s url . This includes the network cache , of course, but will also remove data from various other caches which a user agent implements (prerendered pages, back/forward caches, script caches, shader caches, Accept-CH cache , etc.).Implementation details are in § 4.2.3 Clear cache for origin .
When delivered with a response fromhttps://example.com/clear
, the following header will cause caches associated with the originhttps://example.com
: to be cleared:Clear-Site-Data: "cache"
Note: Caches are typically not organized by origin, but rather by URL and timestamp. This means that in practice, clearing cache by origin might have to be implemented using a linear scan. For large caches, this makes it a prohibitively expensive operation.
-
"
cookies
" -
The "
cookies
" type indicates that the server wishes to remove cookies associated with the origin of a particular response ’s url . Along with cookies, HTTP authentication credentials [RFC7235] , and origin-bound tokens such as those defined by Channel ID [CHANNELID] and Token Binding [TOKBIND] are also cleared.Implementation details are in § 4.2.4 Clear cookies for origin .
When delivered with a response fromhttps://example.com/clear
, the following header will cause cookies associated with the originhttps://example.com
to be cleared, as well as cookies on any origin in the same registered domain (e.g.https://www.example.com/
andhttps://more.subdomains.example.com/
).Clear-Site-Data: "cookies"
Note: Clearing cookies should also clear the Accept-CH cache for origin . This is because the cache is also cleared if the user manually clears cookies.
-
"
storage
" -
The "
storage
" type indicates that the server wishes to remove locally stored data associated with the origin of a particular response ’s url . This includes storage mechanisms such as (localStorage
,sessionStorage
, [INDEXEDDB] , [WEBDATABASE] , etc), as well as tangentially related mechanism such as service worker registrations .Implementation details are in § 4.2.5 Clear DOM-accessible storage for origin .
When delivered with a response fromhttps://example.com/clear
, the following header will cause DOM-accessible storage for the originhttps://example.com
to be cleared:Clear-Site-Data: "storage"
-
"
executionContexts
" -
The "
executionContexts
" type indicates that the server wishes to neuter and reload execution contexts currently rendering the origin of a particular response ’s url .When delivered with a response fromhttps://example.com/clear
, the following header will cause execution contexts displaying the originhttps://example.com
to be neutered and reloaded:Clear-Site-Data: "executionContexts"
-
"
clientHints
" -
The "
clientHints
" type indicates that the server wishes clear the Accept-CH cache for the origin of a particular response ’s url .When delivered with a response fromhttps://example.com/clear
, the following header will cause the Accept-CH cache for originhttps://example.com
to be cleared:Clear-Site-Data: "clientHints"
Note: The Accept-CH cache is also cleared for the cache and cookies options, so it should be used only when neither of the other options (or * ) are applied.
-
"
*
" -
The "
*
" (wildcard) pseudotype indicates that the server has the same effect as specifying all types.When delivered with a response fromhttps://example.com/clear
, the following header will cause all cookies, caches, and DOM-accessible storage associated with the originhttps://example.com
to be cleared, as well as execution contexts for the same origin to be neutered and reloaded:Clear-Site-Data: "*"
Note: The wildcard is forward-compatible in the sense that if more datatypes are added in future versions of this header, they will also be covered by it.
DO use the wildcard if the intention is to perform a broad cleanup, i.e. clear all data associated with an origin that the header knows about.
DO NOT use the wildcard as a shorthand for the four types listed above, as this meaning might change.
Note:
The
syntax
defined
here
is
compatible
with
future
extensions
to
this
document
which
might
add
more
granular
filtering
mechanisms
to
the
types
we’ve
defined.
For
example,
it’s
likely
that
"
cookies
"
will
need
to
grow
a
mechanism
to
prevent
deletion
of
specific
cookie
values.
Wrapping
all
of
the
type
names
in
double-quotes
means
that
we
can
easily
shift
from
simple
splitting-strings-on-commas
processing
to
something
more
complicated
(like
processing
the
header
value
as
JSON)
without
losing
backwards
compatibility.
3.2. Fetch Integration
Monkey patching! Talk with Anne.
If
the
Clear-Site-Data
header
is
present
in
an
HTTP
response
received
from
the
network,
then
data
MUST
be
cleared
before
rendering
the
response
to
the
user.
That
is,
after
step
#14
in
the
current
HTTP-network
fetch
algorithm,
execute
the
following
step:
-
If credentials flag is set, and response ’s header list contains a header named
Clear-Site-Data
, then execute § 4.2 Clear data for response on response .
Note:
This
happens
after
Set-Cookie
headers
are
processed.
If
we
clear
cookies,
we
clear
all
of
them.
This
is
intentional,
as
removing
only
certain
cookies
might
leave
an
application
in
an
indeterminate
and
vulnerable
state.
Removing
specific
cookies
is
best
done
via
expiration
using
the
Set-Cookie
header.
Note:
If
we
clear
the
Accept-CH
cache
via
the
Clear-Site-Data
header
then
any
Accept-CH
and
Critical-CH
headers
in
the
same
request
must
be
ignored.
Note:
While
the
fetch
credentials
flag
is
intended
to
restrict
the
modification
of
cookies,
Clear-Site-Data
applies
the
same
restriction
to
all
types
for
the
sake
of
consistency.
4. Algorithms
4.1. Parsing
Given
a
response
,
the
user
agent
can
parse
response
’s
Clear-Site-Data
header
,
returning
a
list
of
types,
as
follows:
-
Let types be an empty list.
-
Let header be the result of extracting header list values given
Clear-Site-Data
and response ’s header list . -
If header is
null
or failure, return an empty list. -
For each type in header , execute the first matching statement, if any, switching on type :
-
`
"cache"
` -
Append " cache " and " clientHints " to types .
-
`
"cookies"
` -
Append " cookies " and " clientHints " to types .
-
`
"storage"
` -
Append " storage " to types .
-
`
"executionContexts"
` -
Append " executionContexts " to types .
-
`
"clientHints"
` -
Append " clientHints " to types .
-
`
"*"
` -
Append " cache ", " cookies ", " storage ", " clientHints ", and " executionContexts " to types .
-
`
-
Return types .
Note: All of the existing values can be handled with the simple switch above. If and when more complex type definitions are created, the parser will likely shift over to JSON entirely.
4.2. Clear data for response
Given a response ( response ), the user agent can clear site data for response as follows:
-
If response ’s url is not an a priori authenticated URL , then break .
-
Let types be the result of parsing response ’s
Clear-Site-Data
header . -
Let browsing contexts be the result of preparing to clear data for origin and types .
-
For each type in types :
-
Execute the first matching statement, if any, switching on type :
-
"
cache
" -
"
cookies
" -
"
storage
" -
"
clientHints
" -
Empty Accept-CH cache [ origin ].
-
"
-
-
If types contains " executionContexts ", then Reload browsing contexts .
Note: User agents are are encouraged to give web developers some mechanism by which the clearing operation can be debugged. This might take the form of a console message or timeline entry indicating success.
4.2.1. Prepare to clear origin ’s data
Given an origin ( origin ) and a list of types ( types ), the user agent can prepare to clear origin ’s data by executing the following steps. The algorithm returns a list of browsing contexts which have been sandboxed in order to prevent them from recreating cleared data from in-memory JavaScript variables.
-
Let sandboxed be an empty list.
-
If types does not contain "
executionContexts
", return sandboxed . -
For each context in the user agent’s set of browsing contexts :
-
Let document be context ’s active document .
-
If document ’s relevant settings object 's origin is not origin , continue .
-
Parse a sandboxing directive using the empty string as the input, and document ’s active sandboxing flag set as the output.
-
Append context to sandboxed .
-
-
Return sandboxed .
4.2.2. Reload browsing contexts
Given a list of browsing contexts ( contexts ), the user agent can reload browsing contexts as follows:
-
For each context in contexts :
-
Execute context ’s active document 's relevant settings object 's global object 's
Location
object’sreload()
.This is the simplest thing, but it’s probably reaching a little too far into the documents and mucking with their context. I probably just need to break down and copy/paste the relevant bits from HTML.
-
4.2.3. Clear cache for origin
Given an origin ( origin ), the user agent can clear cache for origin as follows:
-
Let host be origin ’s host .
-
Let cache list be the set of entries from the network cache whose
target URI
host is identical to host .-
For each entry in cache list :
-
Remove entry from the network cache .
-
-
If a user agent implements caches beyond a pure network cache , it MUST remove all entries from those caches which match origin .
-
-
For each traversable in the user agent’s set of top-level traversable :
For each entry in the traversable ’s session history entries :
Let state be entry ’s document state .
Let entry origin be state ’s origin .
If entry origin is the same origin as origin :
Let document be state ’s document .
If document is not fully active :
Destroy the document .
We’re dealing with the network cache here, as defined in [RFC7234] , but that’s not nearly everything a user agent caches. How hand-wavey with the vendor-specific section can we be? For instance, Chrome clears out prerendered pages, script caches, WebGL shader caches, WebRTC bits and pieces, address bar suggestion caches, various networking bits that aren’t representations (HSTS/HPKP, SCDH, etc.). Perhaps [STORAGE] will make this clearer?
4.2.4. Clear cookies for origin
Given an origin ( origin ), the user agent can clear cookies for origin as follows:
Note:
We
remove
all
the
cookies
for
an
entire
registered
domain
,
as
cookies
ignore
the
same-origin
policy,
and
there’s
a
distinct
risk
that
we’d
leave
applications
in
an
ill-defined
state
if
we
only
cleared
cookies
for
a
particular
subdomain.
Consider
accounts.google.com
vs
mail.google.com
,
for
instance,
both
of
which
have
cookies
that
signal
a
user’s
signed-in
status.
Note: This algorithm assumes that the user agent has implemented a cookie store (as discussed in Section 5.3 of [RFC6265] ), which offers the ability to retrieve a list of cookies by host, and to remove individual cookies.
-
Let registered be the registered domain of origin ’s host .
-
Let cookie list be the set of cookies from the cookie store whose
domain
attribute is a domain-match with registered . -
For each cookie in cookie list :
-
Remove cookie from the cookie store .
-
-
If the user agent supports other forms of cookie-like storage, these MUST also be cleared for origins whose host 's registered domain is registered .
Note: For example, if the user agent supports Flash, its local stored objects will be cleared via NPP_ClearSiteData .
-
Clear any Channel IDs [CHANNELID] and bound tokens [TOKBIND] associated with origins whose host 's registered domain is registered .
-
Clear authentication entries and proxy-authentication entries associated with origins whose host 's registered domain is registered .
The process of clearing both bound tokens/IDs and HTTP authentication is super hand-wavey. [Issue #w3c/webappsec-clear-site-data#2]
4.2.5. Clear DOM-accessible storage for origin
Given an origin ( origin ), the user agent can clear DOM-accessible storage for origin as follows:
-
For each area in the user agent’s set of local storage areas [HTML] :
-
For each area in the user agent’s set of session storage areas [HTML] :
-
For each database in the set of databases for origin [INDEXEDDB] :
-
Delete database .
-
-
For each registration in the user agent’s set of service worker registrations :
-
If registration ’s scope URL ’s origin is origin :
-
Execute
unregister()
on registration .
-
-
-
For each appcache in the user agent’s set of application caches :
-
If appcache ’s application cache group is identified by an URL whose origin is origin :
-
Discard appcache .
-
-
-
For any other script-accessible storage mechanism, the user agent MUST delete any data associated with this origin. This includes (but is not limited to) the following:
-
An origin’s WebSQL databases [WEBDATABASE] .
-
An origin’s filesystems [file-system-api]
-
Plugin data (e.g. Flash via NPP_ClearSiteData ),
-
5. Security Considerations
5.1. Incomplete Clearing
It is possible that an application could be put into an indeterminate state by clearing only one type of storage. We mitigate that to some extent by clearing all storage options as a block, and by requiring that the header be delivered over a secure connection.
5.2. Service workers
It is imperative that the Clear-Site-Data header is only respected on responses fetched over network, and not those served by a service worker.
This is because service workers can return arbitrary responses for resource requests in their scope, including third-party requests. Thus, supporting Clear-Site-Data would give them the ability to clear data for any origin.
Note
that
if
a
request
is
sent
to
a
service
worker,
not
handled
by
it,
then
restarted
with
a
service-workers
mode
of
"
none
"
and
sent
to
the
network,
the
corresponding
response
is
a
network
response
and
can
be
handled.
The
previous
attempt
at
obtaining
the
response
from
a
service
worker
is
irrelevant.
Note also that a service worker update is a network response, and is therefore not affected by this restriction. This is important in order to support the use case in § 1.1.4 Kill Switch .
6. Privacy Considerations
6.1. Web developers control the timing.
If
triggered
at
appropriate
times,
Clear-Site-Data
can
increase
a
user’s
privacy
and
security
by
clearing
sensitive
data
from
their
user
agent.
However,
note
that
the
web
developer
(and
not
the
user)
is
in
control
of
when
the
clearing
event
is
triggered.
Even
assuming
a
non-malicious
site
author,
users
can’t
rely
on
data
being
cleared
at
any
particular
point,
nor
are
users
in
control
of
what
data
types
are
cleared.
If a user wishes to ensure that site data is indeed cleared at some specific point, they ought to rely on the data-clearing functionality offered by their user agent.
At a bare minimum, user agents OUGHT TO (in the [RFC6919] sense of the words) offer the same functionality to users that they offer to web developers. Ideally, they will offer significantly more than we can offer at a platform level (clearing browsing history, for example).
6.2. Remnants of data on disk.
While
Clear-Site-Data
triggers
a
clearing
event
in
a
user’s
agent,
it
is
difficult
to
make
promises
about
the
state
of
a
user’s
disk
after
a
clearing
event
takes
place.
In
particular,
note
that
it
is
up
to
the
user
agent
to
ensure
that
all
traces
of
a
site’s
date
is
actually
removed
from
disk,
which
can
be
a
herculean
task
(consider
virtual
memory,
as
a
good
example
of
a
larger
issue).
In short, most user agents implement data clearing as "best effort", but can’t promise an exhaustive wipe.
If a user wishes to ensure that site data does not remain on disk, the best way to do so is to use a browsing mode that promises not to intentionally write data to disk (Chrome’s "Incognito", Internet Explorer’s "InPrivate", etc). These modes will do a better job of keeping data off disk, but are still subject to a number of limitations at the edges.
7. IANA Considerations
The permanent message header field registry should be updated with the following registration: [RFC3864]
7.1. Clear-Site-Data
- Header field name
- Clear-Site-Data
- Applicable protocol
- http
- Status
- standard
- Author/Change controller
- W3C
- Specification document
- This specification (See § 3.1 The Clear-Site-Data HTTP Response Header Field )
8. Acknowledgements
Michal Zalewski proposed a variant of this concept, and Mark Knichel helped refine the details.