HTML Sanitizer API

Draft Community Group Report,

This version:
https://wicg.github.io/sanitizer-api/
Issue Tracking:
GitHub
Editors:
Frederik Braun (Mozilla)
Mario Heiderich (Cure53)
Daniel Vogelheim (Google LLC)
Tom Schuster (Mozilla)
Not Ready For Implementation

This spec is not yet ready for implementation. It exists in this repository to record the ideas and promote discussion.

Before attempting to implement this spec, please contact the editors.


Abstract

This document specifies a set of APIs which allow developers to take untrusted HTML input and sanitize it for safe insertion into a document’s DOM.

Status of this document

This specification was published by the Web Platform Incubator Community Group. It is not a W3C Standard nor is it on the W3C Standards Track. Please note that under the W3C Community Contributor License Agreement (CLA) there is a limited opt-out and other conditions apply. Learn more about W3C Community and Business Groups.

1. Introduction

This section is not normative.

Web applications often need to work with strings of HTML on the client side, perhaps as part of a client-side templating solution, perhaps as part of rendering user generated content, etc. It is difficult to do so in a safe way. The naive approach of joining strings together and stuffing them into an Element’s innerHTML is fraught with risk, as it can cause JavaScript execution in a number of unexpected ways.

Libraries like [DOMPURIFY] attempt to manage this problem by carefully parsing and sanitizing strings before insertion, by constructing a DOM and filtering its members through an allow-list. This has proven to be a fragile approach, as the parsing APIs exposed to the web don’t always map in reasonable ways to the browser’s behavior when actually rendering a string as HTML in the "real" DOM. Moreover, the libraries need to keep on top of browsers' changing behavior over time; things that once were safe may turn into time-bombs based on new platform-level features.

The browser has a fairly good idea of when it is going to execute code. We can improve upon the user-space libraries by teaching the browser how to render HTML from an arbitrary string in a safe manner, and do so in a way that is much more likely to be maintained and updated along with the browser’s own changing parser implementation. This document outlines an API which aims to do just that.

1.1. Goals

1.2. API Summary

The Sanitizer API offers functionality to parse a string containing HTML into a DOM tree, and to filter the resulting tree according to a user-supplied configuration. The methods come in two by two flavours:

2. Framework

2.1. Sanitizer API

The Element interface defines two methods, setHTML() and setHTMLUnsafe(). Both of these take a DOMString with HTML markup, and an optional configuration.

partial interface Element {
  [CEReactions] undefined setHTMLUnsafe((TrustedHTML or DOMString) html, optional SetHTMLUnsafeOptions options = {});
  [CEReactions] undefined setHTML(DOMString html, optional SetHTMLOptions options = {});
};
Element’s setHTMLUnsafe(html, options) method steps are:
  1. Let compliantHTML be the result of invoking the Get Trusted Type compliant string algorithm with TrustedHTML, this’s relevant global object, html, "Element setHTMLUnsafe", and "script".

  2. Let target be this’s template contents if this is a template element; otherwise this.

  3. Set and filter HTML given target, this, compliantHTML, options, and false.

Element’s setHTML(html, options) method steps are:
  1. Let target be this’s template contents if this is a template; otherwise this.

  2. Set and filter HTML given target, this, html, options, and true.

partial interface ShadowRoot {
  [CEReactions] undefined setHTMLUnsafe((TrustedHTML or DOMString) html, optional SetHTMLUnsafeOptions options = {});
  [CEReactions] undefined setHTML(DOMString html, optional SetHTMLOptions options = {});
};

These methods are mirrored on the ShadowRoot:

ShadowRoot’s setHTMLUnsafe(html, options) method steps are:
  1. Let compliantHTML be the result of invoking the Get Trusted Type compliant string algorithm with TrustedHTML, this’s relevant global object, html, "ShadowRoot setHTMLUnsafe", and "script".

  2. Set and filter HTML using this, this’s shadow host (as context element), compliantHTML, options, and false.

ShadowRoot’s setHTML(html, options) method steps are:
  1. Set and filter HTML using this (as target), this (as context element), html, options, and true.

The Document interface gains two new methods which parse an entire Document:

partial interface Document {
  static Document parseHTMLUnsafe((TrustedHTML or DOMString) html, optional SetHTMLUnsafeOptions options = {});
  static Document parseHTML(DOMString html, optional SetHTMLOptions options = {});
};
The parseHTMLUnsafe(html, options) method steps are:
  1. Let compliantHTML be the result of invoking the Get Trusted Type compliant string algorithm with TrustedHTML, this’s relevant global object, html, "Document parseHTMLUnsafe", and "script".

  2. Let document be a new Document, whose content type is "text/html".

    Note: Since document does not have a browsing context, scripting is disabled.

  3. Set document’s allow declarative shadow roots to true.

  4. Parse HTML from a string given document and compliantHTML.

  5. Let sanitizer be the result of calling get a sanitizer instance from options with options and false.

  6. Call sanitize on document with sanitizer and false.

  7. Return document.

The parseHTML(html, options) method steps are:
  1. Let document be a new Document, whose content type is "text/html".

    Note: Since document does not have a browsing context, scripting is disabled.

  2. Set document’s allow declarative shadow roots to true.

  3. Parse HTML from a string given document and html.

  4. Let sanitizer be the result of calling get a sanitizer instance from options with options and true.

  5. Call sanitize on document with sanitizer and true.

  6. Return document.

2.2. SetHTML options and the configuration object.

The family of setHTML()-like methods all accept an options dictionary. Right now, only one member of this dictionary is defined:

enum SanitizerPresets { "default" };
dictionary SetHTMLOptions {
  (Sanitizer or SanitizerConfig or SanitizerPresets) sanitizer = "default";
};
dictionary SetHTMLUnsafeOptions {
  (Sanitizer or SanitizerConfig or SanitizerPresets) sanitizer = {};
};

The Sanitizer configuration object encapsulates a filter configuration. The same configuration can be used with both "safe" or "unsafe" methods, where the "safe" methods perform an implicit removeUnsafe operation on the passed in configuration and have a default configuration when none is passed. The default differs between "safe" and "unsafe" methods: The "safe" methods are aiming to be safe by default and have a restrictive default, while the "unsafe" methods are unrestricted by default. The intent for configuration use is that one (or a few) configurations will be built-up early on in a page’s lifetime, and can then be used whenever needed. This allows implementations to pre-process configurations.

The configuration object can be queried to return a configuration dictionary. It can also be modified directly.

[Exposed=Window]
interface Sanitizer {
  constructor(optional (SanitizerConfig or SanitizerPresets) configuration = "default");

  // Query configuration:
  SanitizerConfig get();

  // Modify a Sanitizer’s lists and fields:
  boolean allowElement(SanitizerElementWithAttributes element);
  boolean removeElement(SanitizerElement element);
  boolean replaceElementWithChildren(SanitizerElement element);
  boolean allowAttribute(SanitizerAttribute attribute);
  boolean removeAttribute(SanitizerAttribute attribute);
  boolean setComments(boolean allow);
  boolean setDataAttributes(boolean allow);

  // Remove markup that executes script.
  boolean removeUnsafe();
};

A Sanitizer has an associated SanitizerConfig configuration.

The constructor(configuration) method steps are:
  1. If configuration is a SanitizerPresets string, then:

    1. Assert: configuration is default.

    2. Set configuration to the built-in safe default configuration.

  2. Let valid be the return value of set a configuration with configuration and true on this.

  3. If valid is false, then throw a TypeError.

The get() method steps are to return the value of this’s configuration.
The allowElement(element) method steps are to allow an element with element and this’s configuration.
The removeElement(element) method steps are to remove an element with element and this’s configuration.
The replaceElementWithChildren(element) method steps are to replace an element with its children with element and this’s configuration.
The allowAttribute(attribute) method steps are to allow an attribute with attribute and this’s configuration.
The removeAttribute(attribute) method steps are to remove an attribute with attribute and this’s configuration.
The setComments(allow) method steps to set comments with allow and this’s configuration.
The setDataAttributes(allow) method steps are to set data attributes with allow and this’s configuration.
The removeUnsafe() method steps are to update this’s configuration with the result of calling remove unsafe on this’s configuration.

2.3. The Configuration Dictionary

dictionary SanitizerElementNamespace {
  required DOMString name;
  DOMString? _namespace = "http://www.w3.org/1999/xhtml";
};

// Used by "elements"
dictionary SanitizerElementNamespaceWithAttributes : SanitizerElementNamespace {
  sequence<SanitizerAttribute> attributes;
  sequence<SanitizerAttribute> removeAttributes;
};

typedef (DOMString or SanitizerElementNamespace) SanitizerElement;
typedef (DOMString or SanitizerElementNamespaceWithAttributes) SanitizerElementWithAttributes;

dictionary SanitizerAttributeNamespace {
  required DOMString name;
  DOMString? _namespace = null;
};
typedef (DOMString or SanitizerAttributeNamespace) SanitizerAttribute;

dictionary SanitizerConfig {
  sequence<SanitizerElementWithAttributes> elements;
  sequence<SanitizerElement> removeElements;
  sequence<SanitizerElement> replaceWithChildrenElements;

  sequence<SanitizerAttribute> attributes;
  sequence<SanitizerAttribute> removeAttributes;

  boolean comments;
  boolean dataAttributes;
};

2.4. Configuration Invariants

Configurations can and ought to be modified by developers to suit their purposes. Options are to write a new configuration dictionary from scratch, to modify an existing Sanitizer’s configuration by using the modifier methods, or to get() an existing Sanitizer’s configuration as a dictionary and modify the dictionary and then create a new Sanitizer with it.

An empty configuration allows everything (when called with the "unsafe" methods like setHTMLUnsafe). A configuration "default" contains a built-in safe default configuration. Note that "safe" and "unsafe" sanitizer methods have different defaults.

Not all configuration dictionaries are valid. A valid configuration avoids redundancy (like specifying the same element to be allowed twice) and contradictions (like specifying an element to be both removed and allowed.)

Several conditions need to hold for a configuration to be valid:

The elements element allow-list can also specify allowing or removing attributes for a given element. This is meant to mirror [HTML]’s structure, which knows both global attributes as well as local attributes that apply to a specific element. Global and local attributes can be mixed, but note that ambiguous configurations where a particular attribute would be allowed by one list and forbidden by another, are generally invalid.

global attributes global removeAttributes
local attributes An attribute is allowed if it matches either list. No duplicates are allowed. An attribute is only allowed if it’s in the local allow list. No duplicate entries between global remove and local allow lists are allowed. Note that the global remove list has no function for this particular element, but may well apply to other elements that do not have a local allow list.
local removeAttributes An attribute is allowed if it’s in the global allow-list, but not in the local remove-list. Local remove must be a subset of the global allow lists. An attribute is allowed if it is in neither list. No duplicate entries between global remove and local remove lists are allowed.

Please note the asymmetry where mostly no duplicates between global and per-element lists are permitted, but in the case of a global allow-list and a per-element remove-list the latter must be a subset of the former. An excerpt of the table above, only focusing on duplicates, is as follows:

global attributes global removeAttributes
local attributes No duplicates are allowed. No duplicates are allowed.
local removeAttributes Local remove must be a subset of the global allow lists. No duplicates are allowed.

The dataAttributes setting allows custom data attributes. The rules above easily extends to custom data attributes if one considers dataAttributes to be an allow-list:

global attributes and dataAttributes set
local attributes All custom data attributes are allowed. No custom data attributes may be listed in any allow-list, as that would mean a duplicate entry.
local removeAttributes A custom data attribute is allowed, unless it’s listed in the local remove-list. No custom data attribute may be listed in the global allow-list, as that would mean a duplicate entry.

Putting these rules in words:

A SanitizerConfig config is valid if all of the following conditions hold:
  1. The config has either an elements or a removeElements key, but not both.

  2. The config has either an attributes or a removeAttributes key, but not both.

  3. Assert: All SanitizerElementNamespaceWithAttributes, SanitizerElementNamespace, and SanitizerAttributeNamespace items in config are canonical, meaning they have been run through canonicalize a sanitizer element or canonicalize a sanitizer attribute, as appropriate.

  4. None of config[elements], config[removeElements], config[replaceWithChildrenElements], config[attributes], or config[removeAttributes], if they exist, has dupes.

  5. If both config[elements] and config[replaceWithChildrenElements] exist, then the intersection of config[elements] and config[replaceWithChildrenElements] is empty.

  6. If both config[removeElements] and config[replaceWithChildrenElements] exist, then the intersection of config[removeElements] and config[replaceWithChildrenElements] is empty.

  7. If config[attributes] exists:

    1. If config[elements] exists:

      1. For any element in config[elements]:

        1. The intersection of config[attributes] and element[attributes] with default « [] » is empty.

        2. element[removeAttributes] with default « [] » is a subset of config[attributes].

        3. If dataAttributes exists and dataAttributes is true:

          1. element[attributes] does not contain a custom data attribute.

    2. If dataAttributes is true:

      1. config[attributes] does not contain a custom data attribute.

  8. If config[removeAttributes] exists:

    1. If config[elements] exists, then for any element in config[elements]:

      1. The intersection of config[removeAttributes] and element[attributes] with default « [] » is empty.

      2. The intersection of config[removeAttributes] and element[removeAttributes] with default « [] » is empty.

    2. config[dataAttributes] does not exist.

Note: Setting a configuration from a dictionary will do a bit normalization. In particular, if both allow- and remove-lists are missing, it will interpret this as an empty remove-list. So {} itself is not a valid configuration, but it will be normalized to {removeElements:[],removeAttributes:[]}, which is. This normalization step was chosen in order to have a missing dictionary be consistent with an empty one, i.e., to have setHTMLUnsafe(txt) be consistent with setHTMLUnsafe(txt, {sanitizer: {}}).

3. Algorithms

To set and filter HTML, given an Element or DocumentFragment target, an Element contextElement, a string html, and a dictionary options, and a boolean safe:
  1. If safe and contextElement’s local name is "script" and contextElement’s namespace is the HTML namespace or the SVG namespace, then return.

  2. Let sanitizer be the result of calling get a sanitizer instance from options with options and safe.

  3. Let newChildren be the result of the HTML fragment parsing algorithm given contextElement, html, and true.

  4. Let fragment be a new DocumentFragment whose node document is contextElement’s node document.

  5. For each node in newChildren, append node to fragment.

  6. Run sanitize on fragment using sanitizer and safe.

  7. Replace all with fragment within target.

To get a sanitizer instance from options from a dictionary options with a boolean safe:

Note: This algorithm works for both SetHTMLOptions and SetHTMLUnsafeOptions. They only differ in the defaults.

  1. Let sanitizerSpec be "default".

  2. If options["sanitizer"] exists, then:

    1. Set sanitizerSpec to options["sanitizer"]

  3. Assert: sanitizerSpec is either a Sanitizer instance, a string which is a SanitizerPresets member, or a dictionary.

  4. If sanitizerSpec is a string:

    1. Assert: sanitizerSpec is "default"

    2. Set sanitizerSpec to the built-in safe default configuration.

  5. Assert: sanitizerSpec is either a Sanitizer instance, or a dictionary.

  6. If sanitizerSpec is a dictionary:

    1. Let sanitizer be a new Sanitizer instance.

    2. Let setConfigurationResult be the result of set a configuration with sanitizerSpec and not safe on sanitizer.

    3. If setConfigurationResult is false, throw a TypeError.

    4. Set sanitizerSpec to sanitizer.

  7. Assert: sanitizerSpec is a Sanitizer instance.

  8. Return sanitizerSpec.

3.1. Sanitize

For the main sanitize operation, using a ParentNode node, a Sanitizer sanitizer, and a boolean safe, run these steps:
  1. Let configuration be the value of sanitizer’s configuration.

  2. If safe is true, then set configuration to the result of calling remove unsafe on configuration.

  3. Call sanitize core on node, configuration, and with handleJavascriptNavigationUrls set to safe.

The sanitize core operation, using a ParentNode node, a SanitizerConfig configuration, and a boolean handleJavascriptNavigationUrls, recurses over the DOM tree beginning with node. It consistes of these steps:
  1. For each child of node’s children:

    1. Assert: child implements Text, Comment, Element, or DocumentType.

      Note: Currently, this algorithm is only called on output of the HTML parser for which this assertion should hold. DocumentType should only occur for parseHTML and parseHTMLUnsafe. If in the future this algorithm will be used in different contexts, this assumption needs to be re-examined.

    2. If child implements DocumentType, then continue.

    3. If child implements Text, then continue.

    4. If child implements Comment:

      1. If configuration["comments"] is not true, then remove child.

    5. Otherwise:

      1. Let elementName be a SanitizerElementNamespace with child’s local name and namespace.

      2. If configuration["replaceWithChildrenElements"] exists and if configuration["replaceWithChildrenElements"] contains elementName:

        1. Call sanitize core on child with configuration and handleJavascriptNavigationUrls.

        2. Call replace all with child’s children within child.

        3. Continue.

      3. If configuration["removeElements"] exists and configuration["removeElements"] contains elementName:

        1. Remove child.

        2. Continue.

      4. If configuration["elements"] exists and configuration["elements"] does not contain elementName:

        1. Remove child.

        2. Continue.

      5. If elementName equals «[ "name" → "template", "namespace" → HTML namespace ]», then call sanitize core on child’s template contents with configuration and handleJavascriptNavigationUrls.

      6. If child is a shadow host, then call sanitize core on child’s shadow root with configuration and handleJavascriptNavigationUrls.

      7. Let elementWithLocalAttributes be « [] ».

      8. If configuration["elements"] exists and configuration["elements"] contains elementName:

        1. Set elementWithLocalAttributes to configuration["elements"][elementName].

      9. For each attribute in child’s attribute list:

        1. Let attrName be a SanitizerAttributeNamespace with attribute’s local name and namespace.

        2. If elementWithLocalAttributes["removeAttributes"] with default « [] » contains attrName:

          1. Remove attribute.

        3. Otherwise, if configuration["attributes"] exists:

          1. If configuration["attributes"] does not contain attrName and elementWithLocalAttributes["attributes"] with default « [] » does not contain attrName, and if "data-" is not a code unit prefix of attribute’s local name and namespace is not null or configuration["dataAttributes"] is not true:

            1. Remove attribute.

        4. Otherwise:

          1. If elementWithLocalAttributes["attributes"] exists and elementWithLocalAttributes["attributes"] does not contain attrName:

            1. Remove attribute.

          2. Otherwise, if configuration["removeAttributes"] contains attrName:

            1. Remove attribute.

        5. If handleJavascriptNavigationUrls:

          1. If «[elementName, attrName]» matches an entry in the built-in navigating URL attributes list, and if attribute contains a javascript: URL, then remove attribute.

          2. If child’s namespace is the MathML Namespace and attr’s local name is "href" and attr’s namespace is null or the XLink namespace and attr contains a javascript: URL, then remove attribute.

          3. If the built-in animating URL attributes list contains «[elementName, attrName]» and attr’s value is "href" or "xlink:href", then remove attribute.

      10. Call sanitize core on child with configuration and handleJavascriptNavigationUrls.

Note: Current browsers support javascript: URLs only when navigating. Since navigation itself is not an XSS threat we handle navigation to javascript: URLs, but not navigations in general.

Declarative navigation falls into a handful of categories:

  1. Anchor elements. (<a> in HTML and SVG namespaces)

  2. Form elements that trigger navigation as part of the form action.

  3. [MathML] allows any element to act as an anchor.

  4. [SVG11] animation.

The first two are covered by the built-in navigating URL attributes list.

The MathML case is covered by a seperate rule, because there is no formalism in this spec to cover a "per-namespace global" rule.

The SVG animation case is covered by the built-in animating URL attributes list. But since the interpretation of SVG animation elements depends on the animation target, and since during sanitization we cannot know what the final target will be, the sanitize algorithm blocks any animation of href attributes.

To determine whether an attribute contains a javascript: URL:
  1. Let url be the result of running the basic URL parser on attribute’s value.

  2. If url is failure, then return false.

  3. Return whether url’s scheme is "javascript".

3.2. Modify the Configuration

The configuration modifier methods are methods on Sanitizer that modify its configuration. They will maintain the validity criteria. They return a boolean which informs the caller whether the configuration was modified or not.

let s = new Sanitizer({elements: ["div"]});
s.allowElement("p"); // Returns true.
div.setHTML("<div><p>", {sanitizer: s});  // Allows `<div>` and `<p>`.
let s = new Sanitizer({elements: ["div"]});
s.removeElement("p");  // Return false, as <p> was not previously allowed.
div.setHTML("<div><p>", {sanitizer: s});  // Allows `<div>`. `<p>` is removed.
To allow an element SanitizerElementWithAttributes element with a SanitizerConfig configuration:
Note: This algorithm is relatively involved, because the element allow list may specifiy per-element allow- or remove-lists for attributes. This requires that we distinguish 4 cases:
  • Whether we have a global allow- or remove-list, and

  • whether these lists already contain element or not.

  1. Set element to the result of canonicalize a sanitizer element with attributes with element.

  2. Set modified to the result of remove element from configuration["replaceWithChildrenElements"].

  3. If configuration["elements"] exists:

    1. Comment: We need to make sure the per-element attributes do not overlap with global attributes.

    2. If element["attributes"] exists:

      1. If configuration["attributes"] exists:

        1. Set element["attributes"] to the difference of element["attributes"] and configuration["attributes"].

        2. If configuration["dataAttributes"] exists and configuration["dataAttributes"] is true:

          1. Remove all items item from element["attributes"] where item is a custom data attribute.

      2. If configuration["removeAttributes"] exists:

        1. Set element["attributes"] to the difference of element["attributes"] and configuration["removeAttributes"].

    3. Otherwise if element["removeAttributes"] exists:

      1. If configuration["attributes"] exists:

        1. Set element["removeAttributes"] to the intersection of element["removeAttributes"] and configuration["attributes"].

      2. If configuration["removeAttributes"] exists:

        1. Set element["removeAttributes"] to the difference of element["removeAttributes"] and configuration["removeAttributes"].

    4. If configuration["elements"] does not contain element:

      1. Comment: This is the case with a global allow-list that does not yet contain element.

      2. Append element to configuration["elements"].

      3. Return true.

    5. Comment: This is the case with a global allow-list that already contains element.

    6. Let current element be the item in configuration["elements"] where item[name] equals element[name] and item[namespace] equals element[namespace].

    7. If element equals current element then return modified.

    8. Remove element from configuration["elements"].

    9. Append element to configuration["elements"]

    10. Return true.

  4. Otherwise:

    1. Comment: If we have a global remove-list, the per-element attributes of element get ignored.

    2. If configuration["removeElements"] does not contain element:

      1. Comment: This is the case with a global remove-list that does not contain element.

      2. Return modified.

    3. Comment: This is the case with a global remove-list that contains element.

    4. Remove element from configuration["removeElements"].

    5. Return true.

To remove an element SanitizerElement element from a SanitizerConfig configuration:
Note: This method requires that we distinguish 4 cases:
  • Whether we have a global allow- or remove-list,

  • whether they already contain element or not.

  1. Set element to the result of canonicalize a sanitizer element with element.

  2. Set modified to the result of remove element from configuration["replaceWithChildrenElements"].

  3. If configuration["elements"] exists:

    1. If configuration["elements"] contains element:

      1. Comment: We have a global allow list and it contains element.

      2. Remove element from configuration["elements"].

      3. Return true.

    2. Comment: We have a global allow list and it does not contain element.

    3. Return modified.

  4. Otherwise:

    1. If configuration["removeElements"] contains element:

      1. Comment: We have a global remove list and it already contains element.

      2. Return modified.

    2. Comment: We have a global remove list and it does not contain element.

    3. Add element to configuration["removeElements"].

    4. Return true.

To replace an element with its children SanitizerElement element from a SanitizerConfig configuration:
  1. Set element to the result of canonicalize a sanitizer element with element.

  2. If configuration["replaceWithChildrenElements"] contains element:

    1. Return false.

  3. Add element to configuration["replaceWithChildrenElements"].

  4. Remove element from configuration["removeElements"].

  5. Remove element from configuration["elements"] list.

  6. Return true.

To allow an attribute SanitizerAttribute attribute on a SanitizerConfig configuration:

Note: This method distinguishes two cases, namely whether we have a global allow- or a global remove-list. If add attribute to a global allow-list, we may need to do additional work to fix up per-element allow- or remove-lists to maintain our validity criteria.

  1. Set attribute to the result of canonicalize a sanitizer attribute with attribute.

  2. If configuration["attributes"] exists:

    1. Comment: If we have a global allow-list, we need to add attribute.

    2. If configuration["dataAttributes"] exists and configuration["dataAttributes"] is true and attribute is a custom data attribute, then return false.

    3. If configuration["attributes"] contains attribute return false.

    4. Append attribute to configuration["attributes"]

    5. Comment: Fix-up per-element allow and remove lists.

    6. If configuration["elements"] exists:

      1. For each element in configuration["elements"]:

        1. If element["attributes"] with default « [] » contains attribute:

          1. Remove attribute from element["attributes"].

        2. Assert: element["removeAttributes"] with default « [] » does not contain attribute.

    7. Return true.

  3. Otherwise:

    1. Comment: If we have a global remove-list, we need to remove attribute.

    2. If configuration["removeAttributes"] does not contain attribute:

      1. Return false.

    3. Remove attribute from configuration["removeAttributes"].

    4. Return true.

To remove an attribute SanitizerAttributeattribute from a SanitizerConfig configuration:

Note: This method distinguishes two cases, namely whether we have a global allow- or a global remove-list. If we add attribute to the global remove-list, we may need to do additional work to fix up per-element allow- or remove-lists to maintain our validity criteria. If we remove attribute from a global allow-list, we may also have to remove it from local remove-lists.

  1. Set attribute to the result of canonicalize a sanitizer attribute with attribute.

  2. If configuration["attributes"] exists:

    1. Comment: If we have a global allow-list, we need to add attribute.

    2. If configuration["attributes"] does not contain attribute:

      1. Return false.

    3. Remove attribute from configuration["attributes"].

    4. Comment: Fix-up per-element allow and remove lists.

    5. If configuration["elements"] exists:

      1. For each element in configuration["elements"]:

        1. If element["removeAttributes"] with default « [] » contains attribute:

          1. Remove attribute from element["removeAttributes"].

    6. Return true.

  3. Otherwise:

    1. Comment: If we have a global remove-list, we need to add attribute.

    2. If configuration["removeAttributes"] contains attribute return false.

    3. Append attribute to configuration["removeAttributes"]

    4. Comment: Fix-up per-element allow and remove lists.

    5. If configuration["elements"] exists:

      1. For each element in configuration["elements"]:

        1. If element["attributes"] with default « [] » contains attribute:

          1. Remove attribute from element["attributes"].

        2. If element["removeAttributes"] with default « [] » contains attribute:

          1. Remove attribute from element["removeAttributes"].

    6. Return true.

To set comments with a boolean allow on a SanitizerConfig configuration:
  1. If configuration["comments"] exists and configuration["comments"] equals allow, then return false;

  2. Set configuration["comments"] to allow.

  3. Return true.

To set data attributes with a boolean allow on a SanitizerConfig configuration:
  1. If configuration["attributes"] does not exist, then return false.

  2. If configuration["dataAttributes"] exists and configuration["dataAttributes"] equals allow, then return false.

  3. Set configuration["dataAttributes"] to allow.

  4. If allow is true:

    1. Remove any items attr from configuration["attributes"] where attr is a custom data attribute.

    2. If configuration["elements"] exists:

      1. For each element in configuration["elements"]:

        1. If element[attributes] exists:

          1. Remove any items attr from element[attributes] where attr is a custom data attribute.

  5. Return true.

To remove unsafe from a SanitizerConfig configuration, do this:

Note: While this algorithm is called remove unsafe, we use the term "unsafe" strictly in the sense of this spec, to denote content that will execute JavaScript when inserted into the document. In other words, this method will remove oportunities for XSS.

  1. Assert: The key set of built-in safe baseline configuration equals «[ "removeElements", "removeAttributes" ] ».

  2. Let result be false.

  3. For each element in built-in safe baseline configuration[removeElements]:

    1. Call remove an element element from configuration.

    2. If the call returned true, set result to true.

  4. For each attribute in built-in safe baseline configuration[removeAttributes]:

    1. Call remove an attribute attribute from configuration.

    2. If the call returned true, set result to true.

  5. For each attribute listed in event handler content attributes:

    1. Call remove an attribute attribute from configuration.

    2. If the call returned true, set result to true.

  6. Return result.

3.3. Set the Configuration

To set a configuration, given a dictionary configuration, a boolean allowCommentsAndDataAttributes, and a Sanitizer sanitizer:
  1. Canonicalize configuration with allowCommentsAndDataAttributes.

  2. If configuration is not valid, then return false.

  3. Set sanitizer’s configuration to configuration.

  4. Return true.

3.4. Canonicalize the Configuration

The Sanitizer stores the configuration in a canonical form, as this makes a number of processing steps easier.

An elements list {elements: ["div"]} gets stored as {elements: [{name: "div", namespace: "http://www.w3.org/1999/xhtml"}]).
To canonicalize the configuration SanitizerConfig configuration with a boolean allowCommentsAndDataAttributes:

Note: We assume that configuration is the result of [WebIDL] converting a JavaScript value to a SanitizerConfig.

  1. If neither configuration["elements"] nor configuration["removeElements"] exist, then set configuration["removeElements"] to « [] ».

  2. If neither configuration["attributes"] nor configuration["removeAttributes"] exist, then set configuration["removeAttributes"] to « [] ».

  3. If configuration["elements"] exists:

    1. Let elements be « [] »

    2. For each element of configuration["elements"] do:

      1. Append the result of canonicalize a sanitizer element with attributes element to elements.

    3. Set configuration["elements"] to elements.

  4. If configuration["removeElements"] exists:

    1. Let elements be « [] »

    2. For each element of configuration["removeElements"] do:

      1. Append the result of canonicalize a sanitizer element element to elements.

    3. Set configuration["removeElements"] to elements.

  5. If configuration["replaceWithChildrenElements"] exists:

    1. Let elements be « [] »

    2. For each element of configuration["replaceWithChildrenElements"] do:

      1. Append the result of canonicalize a sanitizer element element to elements.

    3. Set configuration["replaceWithChildrenElements"] to elements.

  6. If configuration["attributes"] exists:

    1. Let attributes be « [] »

    2. For each attribute of configuration["attributes"] do:

      1. Append the result of canonicalize a sanitizer attribute attribute to attributes.

    3. Set configuration["attributes"] to attributes.

  7. If configuration["removeAttributes"] exists:

    1. Let attributes be « [] »

    2. For each attribute of configuration["removeAttributes"] do:

      1. Append the result of canonicalize a sanitizer attribute attribute to attributes.

    3. Set configuration["removeAttributes"] to attributes.

  8. If configuration["comments"] does not exist, then set configuration["comments"] to allowCommentsAndDataAttributes.

  9. If configuration["attributes"] exists and configuration["dataAttributes"] does not exist, then set configuration["dataAttributes"] to allowCommentsAndDataAttributes.

To canonicalize a sanitizer element with attributes a SanitizerElementWithAttributes element:
  1. Let result be the result of canonicalize a sanitizer element with element.

  2. If element is a dictionary:

    1. For each attribute in element["attributes"]:

      1. Add the result of canonicalize a sanitizer attribute with attribute to result["attributes"].

    2. For each attribute in element["removeAttributes"]:

      1. Add the result of canonicalize a sanitizer attribute with attribute to result["removeAttributes"].

  3. Return result.

In order to canonicalize a sanitizer element a SanitizerElement element, return the result of canonicalize a sanitizer name with element and the HTML namespace as the default namespace.
In order to canonicalize a sanitizer attribute a SanitizerAttribute attribute, return the result of canonicalize a sanitizer name with attribute and null as the default namespace.
In order to canonicalize a sanitizer name name, with a default namespace defaultNamespace, run the following steps:
  1. Assert: name is either a DOMString or a dictionary.

  2. If name is a DOMString, then return «[ "name" → name, "namespace" → defaultNamespace]».

  3. Assert: name is a dictionary and name["name"] exists.

  4. Let namespace be name["namespace"] if it exists, otherwise defaultNamespace.

  5. If namespace is the empty string, then set it to null.

  6. Return «[
    "name" → name["name"],
    "namespace" → namespace
    ]».

3.5. Supporting Algorithms

For the canonicalized element and attribute name lists used in this spec, list membership is based on matching both "name" and "namespace" entries:

A Sanitizer name list contains an item if there exists an entry of list that is an ordered map, and where item["name"] equals entry["name"] and item["namespace"] equals entry["namespace"].
To remove an item from a list that is an ordered map, remove all entry from list where item["name"] equals entry["name"] and item["namespace"] equals entry["namespace"].
To add a name to a list, where name is canonicalized and list is an ordered map:
  1. If list contains name, then return.

  2. Append name to list.

Equality for ordered sets is equality of its members, but without regard to order: Ordered sets A and B are equal if both A is a superset of B and B is a superset of A.
An ordered map is a sequence of key and value tuples. Equality of ordered maps is equality of this sequence of tuples, when treated as an ordered set. Ordered maps A and B are equal if the ordered set consisting of A’s entries and the ordered set of B’s entries are equal.
A list list has dupes, if for any item of list, there is more than one entry in list where item["name"] equals entry["name"] and item["namespace"] equals entry["namespace"].
The intersection of two lists A and B containing SanitizerElement is the same as set intersection, but with the set entries previously canonicalized:
  1. Let set A be « [] »

  2. Let set B be « [] »

  3. For each entry of A, append the result of canonicalize a sanitizer name entry to set A.

  4. For each entry of B, append the result of canonicalize a sanitizer name entry to set B.

  5. Retrun the intersection of set A and set B.

To determine not of a boolean bool, return false if bool is true, and return true otherwise.
A Comment contains an explanatory text that applies to a particular point within an algorithm.

3.6. Builtins

There are four builtins:

The built-in safe default configuration is as follows:

{
  "elements": [
    {
      "name": "html",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": []
    },
    {
      "name": "head",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": []
    },
    {
      "name": "title",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": []
    },
    {
      "name": "body",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": []
    },
    {
      "name": "article",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": []
    },
    {
      "name": "section",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": []
    },
    {
      "name": "nav",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": []
    },
    {
      "name": "aside",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": []
    },
    {
      "name": "h1",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": []
    },
    {
      "name": "h2",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": []
    },
    {
      "name": "h3",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": []
    },
    {
      "name": "h4",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": []
    },
    {
      "name": "h5",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": []
    },
    {
      "name": "h6",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": []
    },
    {
      "name": "hgroup",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": []
    },
    {
      "name": "header",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": []
    },
    {
      "name": "footer",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": []
    },
    {
      "name": "address",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": []
    },
    {
      "name": "p",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": []
    },
    {
      "name": "hr",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": []
    },
    {
      "name": "pre",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": []
    },
    {
      "name": "blockquote",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": [
        {
          "name": "cite",
          "namespace": null
        }
      ]
    },
    {
      "name": "ol",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": [
        {
          "name": "reversed",
          "namespace": null
        },
        {
          "name": "start",
          "namespace": null
        },
        {
          "name": "type",
          "namespace": null
        }
      ]
    },
    {
      "name": "ul",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": []
    },
    {
      "name": "menu",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": []
    },
    {
      "name": "li",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": [
        {
          "name": "value",
          "namespace": null
        }
      ]
    },
    {
      "name": "dl",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": []
    },
    {
      "name": "dt",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": []
    },
    {
      "name": "dd",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": []
    },
    {
      "name": "figure",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": []
    },
    {
      "name": "figcaption",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": []
    },
    {
      "name": "main",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": []
    },
    {
      "name": "search",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": []
    },
    {
      "name": "div",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": []
    },
    {
      "name": "a",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": [
        {
          "name": "href",
          "namespace": null
        },
        {
          "name": "rel",
          "namespace": null
        },
        {
          "name": "hreflang",
          "namespace": null
        },
        {
          "name": "type",
          "namespace": null
        }
      ]
    },
    {
      "name": "em",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": []
    },
    {
      "name": "strong",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": []
    },
    {
      "name": "small",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": []
    },
    {
      "name": "s",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": []
    },
    {
      "name": "cite",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": []
    },
    {
      "name": "q",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": []
    },
    {
      "name": "dfn",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": []
    },
    {
      "name": "abbr",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": []
    },
    {
      "name": "ruby",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": []
    },
    {
      "name": "rt",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": []
    },
    {
      "name": "rp",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": []
    },
    {
      "name": "data",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": [
        {
          "name": "value",
          "namespace": null
        }
      ]
    },
    {
      "name": "time",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": [
        {
          "name": "datetime",
          "namespace": null
        }
      ]
    },
    {
      "name": "code",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": []
    },
    {
      "name": "var",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": []
    },
    {
      "name": "samp",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": []
    },
    {
      "name": "kbd",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": []
    },
    {
      "name": "sub",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": []
    },
    {
      "name": "sup",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": []
    },
    {
      "name": "i",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": []
    },
    {
      "name": "b",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": []
    },
    {
      "name": "u",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": []
    },
    {
      "name": "mark",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": []
    },
    {
      "name": "bdi",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": []
    },
    {
      "name": "bdo",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": []
    },
    {
      "name": "span",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": []
    },
    {
      "name": "br",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": []
    },
    {
      "name": "wbr",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": []
    },
    {
      "name": "ins",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": [
        {
          "name": "cite",
          "namespace": null
        },
        {
          "name": "datetime",
          "namespace": null
        }
      ]
    },
    {
      "name": "del",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": [
        {
          "name": "cite",
          "namespace": null
        },
        {
          "name": "datetime",
          "namespace": null
        }
      ]
    },
    {
      "name": "table",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": []
    },
    {
      "name": "caption",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": []
    },
    {
      "name": "colgroup",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": [
        {
          "name": "span",
          "namespace": null
        }
      ]
    },
    {
      "name": "col",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": [
        {
          "name": "span",
          "namespace": null
        }
      ]
    },
    {
      "name": "tbody",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": []
    },
    {
      "name": "thead",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": []
    },
    {
      "name": "tfoot",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": []
    },
    {
      "name": "tr",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": []
    },
    {
      "name": "td",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": [
        {
          "name": "colspan",
          "namespace": null
        },
        {
          "name": "rowspan",
          "namespace": null
        },
        {
          "name": "headers",
          "namespace": null
        }
      ]
    },
    {
      "name": "th",
      "namespace": "http://www.w3.org/1999/xhtml",
      "attributes": [
        {
          "name": "colspan",
          "namespace": null
        },
        {
          "name": "rowspan",
          "namespace": null
        },
        {
          "name": "headers",
          "namespace": null
        },
        {
          "name": "scope",
          "namespace": null
        },
        {
          "name": "abbr",
          "namespace": null
        }
      ]
    },
    {
      "name": "math",
      "namespace": "http://www.w3.org/1998/Math/MathML",
      "attributes": []
    },
    {
      "name": "merror",
      "namespace": "http://www.w3.org/1998/Math/MathML",
      "attributes": []
    },
    {
      "name": "mfrac",
      "namespace": "http://www.w3.org/1998/Math/MathML",
      "attributes": []
    },
    {
      "name": "mi",
      "namespace": "http://www.w3.org/1998/Math/MathML",
      "attributes": []
    },
    {
      "name": "mmultiscripts",
      "namespace": "http://www.w3.org/1998/Math/MathML",
      "attributes": []
    },
    {
      "name": "mn",
      "namespace": "http://www.w3.org/1998/Math/MathML",
      "attributes": []
    },
    {
      "name": "mo",
      "namespace": "http://www.w3.org/1998/Math/MathML",
      "attributes": [
        {
          "name": "form",
          "namespace": null
        },
        {
          "name": "fence",
          "namespace": null
        },
        {
          "name": "separator",
          "namespace": null
        },
        {
          "name": "lspace",
          "namespace": null
        },
        {
          "name": "rspace",
          "namespace": null
        },
        {
          "name": "stretchy",
          "namespace": null
        },
        {
          "name": "symmetric",
          "namespace": null
        },
        {
          "name": "maxsize",
          "namespace": null
        },
        {
          "name": "minsize",
          "namespace": null
        },
        {
          "name": "largeop",
          "namespace": null
        },
        {
          "name": "movablelimits",
          "namespace": null
        }
      ]
    },
    {
      "name": "mover",
      "namespace": "http://www.w3.org/1998/Math/MathML",
      "attributes": [
        {
          "name": "accent",
          "namespace": null
        }
      ]
    },
    {
      "name": "mpadded",
      "namespace": "http://www.w3.org/1998/Math/MathML",
      "attributes": [
        {
          "name": "width",
          "namespace": null
        },
        {
          "name": "height",
          "namespace": null
        },
        {
          "name": "depth",
          "namespace": null
        },
        {
          "name": "lspace",
          "namespace": null
        },
        {
          "name": "voffset",
          "namespace": null
        }
      ]
    },
    {
      "name": "mphantom",
      "namespace": "http://www.w3.org/1998/Math/MathML",
      "attributes": []
    },
    {
      "name": "mprescripts",
      "namespace": "http://www.w3.org/1998/Math/MathML",
      "attributes": []
    },
    {
      "name": "mroot",
      "namespace": "http://www.w3.org/1998/Math/MathML",
      "attributes": []
    },
    {
      "name": "mrow",
      "namespace": "http://www.w3.org/1998/Math/MathML",
      "attributes": []
    },
    {
      "name": "ms",
      "namespace": "http://www.w3.org/1998/Math/MathML",
      "attributes": []
    },
    {
      "name": "mspace",
      "namespace": "http://www.w3.org/1998/Math/MathML",
      "attributes": [
        {
          "name": "width",
          "namespace": null
        },
        {
          "name": "height",
          "namespace": null
        },
        {
          "name": "depth",
          "namespace": null
        }
      ]
    },
    {
      "name": "msqrt",
      "namespace": "http://www.w3.org/1998/Math/MathML",
      "attributes": []
    },
    {
      "name": "mstyle",
      "namespace": "http://www.w3.org/1998/Math/MathML",
      "attributes": []
    },
    {
      "name": "msub",
      "namespace": "http://www.w3.org/1998/Math/MathML",
      "attributes": []
    },
    {
      "name": "msubsup",
      "namespace": "http://www.w3.org/1998/Math/MathML",
      "attributes": []
    },
    {
      "name": "msup",
      "namespace": "http://www.w3.org/1998/Math/MathML",
      "attributes": []
    },
    {
      "name": "mtable",
      "namespace": "http://www.w3.org/1998/Math/MathML",
      "attributes": []
    },
    {
      "name": "mtd",
      "namespace": "http://www.w3.org/1998/Math/MathML",
      "attributes": [
        {
          "name": "columnspan",
          "namespace": null
        },
        {
          "name": "rowspan",
          "namespace": null
        }
      ]
    },
    {
      "name": "mtext",
      "namespace": "http://www.w3.org/1998/Math/MathML",
      "attributes": []
    },
    {
      "name": "mtr",
      "namespace": "http://www.w3.org/1998/Math/MathML",
      "attributes": []
    },
    {
      "name": "munder",
      "namespace": "http://www.w3.org/1998/Math/MathML",
      "attributes": [
        {
          "name": "accentunder",
          "namespace": null
        }
      ]
    },
    {
      "name": "munderover",
      "namespace": "http://www.w3.org/1998/Math/MathML",
      "attributes": [
        {
          "name": "accent",
          "namespace": null
        },
        {
          "name": "accentunder",
          "namespace": null
        }
      ]
    },
    {
      "name": "semantics",
      "namespace": "http://www.w3.org/1998/Math/MathML",
      "attributes": []
    }
  ],
  "attributes": [
    {
      "name": "dir",
      "namespace": null
    },
    {
      "name": "lang",
      "namespace": null
    },
    {
      "name": "title",
      "namespace": null
    },
    {
      "name": "displaystyle",
      "namespace": null
    },
    {
      "name": "mathbackground",
      "namespace": null
    },
    {
      "name": "mathcolor",
      "namespace": null
    },
    {
      "name": "mathsize",
      "namespace": null
    },
    {
      "name": "scriptlevel",
      "namespace": null
    }
  ],
  "comments": false,
  "dataAttributes": false
}

Note: Included [MathML] markup is based on [SafeMathML].

The built-in safe baseline configuration is meant to block only script-content. It is as follows:

{
  "removeElements": [
    {
      "namespace": "http://www.w3.org/1999/xhtml",
      "name": "script"
    },
    {
      "namespace": "http://www.w3.org/1999/xhtml",
      "name": "frame"
    },
    {
      "namespace": "http://www.w3.org/1999/xhtml",
      "name": "iframe"
    },
    {
      "namespace": "http://www.w3.org/1999/xhtml",
      "name": "object"
    },
    {
      "namespace": "http://www.w3.org/1999/xhtml",
      "name": "embed"
    },
    {
      "namespace": "http://www.w3.org/2000/svg",
      "name": "script"
    },
    {
      "namespace": "http://www.w3.org/2000/svg",
      "name": "use"
    }
  ],
  "removeAttributes": []
}

Warning: The remove unsafe algorithm specifies to additionally remove any event handler content attributes, as defined in [HTML]. If a user agent defines extensions to the [HTML] spec with additional event handler content attributes, it is its responsibility to decide how to handle them. Using the current event handler content attributes list, the safe baseline configuration looks effectively like so:

{
  "removeElements": [
    {
      "namespace": "http://www.w3.org/1999/xhtml",
      "name": "script"
    },
    {
      "namespace": "http://www.w3.org/1999/xhtml",
      "name": "frame"
    },
    {
      "namespace": "http://www.w3.org/1999/xhtml",
      "name": "iframe"
    },
    {
      "namespace": "http://www.w3.org/1999/xhtml",
      "name": "object"
    },
    {
      "namespace": "http://www.w3.org/1999/xhtml",
      "name": "embed"
    },
    {
      "namespace": "http://www.w3.org/2000/svg",
      "name": "script"
    },
    {
      "namespace": "http://www.w3.org/2000/svg",
      "name": "use"
    }
  ],
  "removeAttributes": [
    "onafterprint",
    "onauxclick",
    "onbeforeinput",
    "onbeforematch",
    "onbeforeprint",
    "onbeforeunload",
    "onbeforetoggle",
    "onblur",
    "oncancel",
    "oncanplay",
    "oncanplaythrough",
    "onchange",
    "onclick",
    "onclose",
    "oncontextlost",
    "oncontextmenu",
    "oncontextrestored",
    "oncopy",
    "oncuechange",
    "oncut",
    "ondblclick",
    "ondrag",
    "ondragend",
    "ondragenter",
    "ondragleave",
    "ondragover",
    "ondragstart",
    "ondrop",
    "ondurationchange",
    "onemptied",
    "onended",
    "onerror",
    "onfocus",
    "onformdata",
    "onhashchange",
    "oninput",
    "oninvalid",
    "onkeydown",
    "onkeypress",
    "onkeyup",
    "onlanguagechange",
    "onload",
    "onloadeddata",
    "onloadedmetadata",
    "onloadstart",
    "onmessage",
    "onmessageerror",
    "onmousedown",
    "onmouseenter",
    "onmouseleave",
    "onmousemove",
    "onmouseout",
    "onmouseover",
    "onmouseup",
    "onoffline",
    "ononline",
    "onpagehide",
    "onpagereveal",
    "onpageshow",
    "onpageswap",
    "onpaste",
    "onpause",
    "onplay",
    "onplaying",
    "onpopstate",
    "onprogress",
    "onratechange",
    "onreset",
    "onresize",
    "onrejectionhandled",
    "onscroll",
    "onscrollend",
    "onsecuritypolicyviolation",
    "onseeked",
    "onseeking",
    "onselect",
    "onslotchange",
    "onstalled",
    "onstorage",
    "onsubmit",
    "onsuspend",
    "ontimeupdate",
    "ontoggle",
    "onunhandledrejection",
    "onunload",
    "onvolumechange",
    "onwaiting",
    "onwheel"
  ]
}
The built-in navigating URL attributes list, for which "javascript:" navigations are "unsafe", are as follows:

«[
[ { "name" → "a", "namespace" → HTML namespace }, { "name" → "href", "namespace" → null } ],
[ { "name" → "area", "namespace" → HTML namespace }, { "name" → "href", "namespace" → null } ],
[ { "name" → "base", "namespace" → HTML namespace }, { "name" → "href", "namespace" → null } ],
[ { "name" → "button", "namespace" → HTML namespace }, { "name" → "formaction", "namespace" → null } ],
[ { "name" → "form", "namespace" → HTML namespace }, { "name" → "action", "namespace" → null } ],
[ { "name" → "iframe", "namespace" → HTML namespace }, { "name" → "src", "namespace" → null } ],
[ { "name" → "input", "namespace" → HTML namespace }, { "name" → "formaction", "namespace" → null } ],
[ { "name" → "a", "namespace" → SVG namespace }, { "name" → "href", "namespace" → null } ],
[ { "name" → "a", "namespace" → SVG namespace }, { "name" → "href", "namespace" → XLink namespace } ],

The built-in animating URL attributes list, which can be used in [SVG11] to declaratively modify navigation elements to use "javascript:" URLs, is as follows:

«[
[ { "name" → "animate", "namespace" → SVG namespace }, { "name" → "attributeName", "namespace" → null] } ],
[ { "name" → "animateMotion", "namespace" → SVG namespace }, { "name" → "attributeName", "namespace" → null } ],
[ { "name" → "animateTransform", "namespace" → SVG namespace }, { "name" → "attributeName", "namespace" → null } ],
[ { "name" → "set", "namespace" → SVG namespace }, { "name" → "attributeName", "namespace" → null } ],

4. Security Considerations

The Sanitizer API is intended to prevent DOM-based Cross-Site Scripting by traversing a supplied HTML content and removing elements and attributes according to a configuration. The specified API must not support the construction of a Sanitizer object that leaves script-capable markup in and doing so would be a bug in the threat model.

That being said, there are security issues which the correct usage of the Sanitizer API will not be able to protect against and the scenarios will be laid out in the following sections.

4.1. Server-Side Reflected and Stored XSS

This section is not normative.

The Sanitizer API operates solely in the DOM and adds a capability to traverse and filter an existing DocumentFragment. The Sanitizer does not address server-side reflected or stored XSS.

4.2. DOM clobbering

This section is not normative.

DOM clobbering describes an attack in which malicious HTML confuses an application by naming elements through id or name attributes such that properties like children of an HTML element in the DOM are overshadowed by the malicious content.

The Sanitizer API does not protect DOM clobbering attacks in its default state, but can be configured to remove id and name attributes.

4.3. XSS with Script gadgets

This section is not normative.

Script gadgets are a technique in which an attacker uses existing application code from popular JavaScript libraries to cause their own code to execute. This is often done by injecting innocent-looking code or seemingly inert DOM nodes that is only parsed and interpreted by a framework which then performs the execution of JavaScript based on that input.

The Sanitizer API can not prevent these attacks, but requires page authors to explicitly allow unknown elements in general, and authors must additionally explicitly configure unknown attributes and elements and markup that is known to be widely used for templating and framework-specific code, like data- and slot attributes and elements like <slot> and <template>. We believe that these restrictions are not exhaustive and encourage page authors to examine their third party libraries for this behavior.

4.4. Mutated XSS

This section is not normative.

Mutated XSS or mXSS describes an attack based on parser context mismatches when parsing an HTML snippet without the correct context. In particular, when a parsed HTML fragment has been serialized to a string, the string is not guaranteed to be parsed and interpreted exactly the same when inserted into a different parent element. An example for carrying out such an attack is by relying on the change of parsing behavior for foreign content or mis-nested tags.

The Sanitizer API offers only functions that turn a string into a node tree. The context is supplied implicitly by all sanitizer functions: Element.setHTML() uses the current element; Document.parseHTML() creates a new document. Therefore Sanitizer API is not directly affected by mutated XSS.

If a developer were to retrieve a sanitized node tree as a string, e.g. via .innerHTML, and to then parse it again then mutated XSS may occur. We discourage this practice. If processing or passing of HTML as a string should be necessary after all, then any string should be considered untrusted and should be sanitized (again) when inserting it into the DOM. In other words, a sanitized and then serialized HTML tree can no longer be considered as sanitized.

A more complete treatment of mXSS can be found in [MXSS].

5. Acknowledgements

This work is informed and inspired by [DOMPURIFY] from cure53, Internet Explorer’s window.toStaticHTML() as well as the original [HTMLSanitizer] from Ben Bucksch. Anne van Kesteren, Krzysztof Kotowicz, Tom Schuster, Luke Warlow, Guillaume Weghsteen, and Mike West for their valuable feedback.

Index

Terms defined by this specification

Terms defined by reference

References

Normative References

[DOM]
Anne van Kesteren. DOM Standard. Living Standard. URL: https://dom.spec.whatwg.org/
[ECMASCRIPT]
ECMAScript Language Specification. URL: https://tc39.es/ecma262/multipage/
[HTML]
Anne van Kesteren; et al. HTML Standard. Living Standard. URL: https://html.spec.whatwg.org/multipage/
[INFRA]
Anne van Kesteren; Domenic Denicola. Infra Standard. Living Standard. URL: https://infra.spec.whatwg.org/
[TRUSTED-TYPES]
Krzysztof Kotowicz. Trusted Types. URL: https://w3c.github.io/trusted-types/dist/spec/
[URL]
Anne van Kesteren. URL Standard. Living Standard. URL: https://url.spec.whatwg.org/
[WebIDL]
Edgar Chen; Timothy Gu. Web IDL Standard. Living Standard. URL: https://webidl.spec.whatwg.org/

Informative References

[DOMPURIFY]
DOMPurify. URL: https://github.com/cure53/DOMPurify
[HTMLSanitizer]
HTML Sanitizer. URL: https://www.bucksch.org/1/projects/mozilla/108153/
[MathML]
Patrick D F Ion; Robert R Miner. Mathematical Markup Language (MathML™) 1.01 Specification. 7 March 2023. REC. URL: https://www.w3.org/TR/REC-MathML/
[MXSS]
mXSS Attacks: Attacking well-secured Web-Applications by using innerHTML Mutations. URL: https://cure53.de/fp170.pdf
[SafeMathML]
MathML Safe List. URL: https://w3c.github.io/mathml-docs/mathml-safe-list
[SVG11]
Erik Dahlström; et al. Scalable Vector Graphics (SVG) 1.1 (Second Edition). 16 August 2011. REC. URL: https://www.w3.org/TR/SVG11/

IDL Index

enum SanitizerPresets { "default" };
dictionary SetHTMLOptions {
  (Sanitizer or SanitizerConfig or SanitizerPresets) sanitizer = "default";
};
dictionary SetHTMLUnsafeOptions {
  (Sanitizer or SanitizerConfig or SanitizerPresets) sanitizer = {};
};

[Exposed=Window]
interface Sanitizer {
  constructor(optional (SanitizerConfig or SanitizerPresets) configuration = "default");

  // Query configuration:
  SanitizerConfig get();

  // Modify a Sanitizer’s lists and fields:
  boolean allowElement(SanitizerElementWithAttributes element);
  boolean removeElement(SanitizerElement element);
  boolean replaceElementWithChildren(SanitizerElement element);
  boolean allowAttribute(SanitizerAttribute attribute);
  boolean removeAttribute(SanitizerAttribute attribute);
  boolean setComments(boolean allow);
  boolean setDataAttributes(boolean allow);

  // Remove markup that executes script.
  boolean removeUnsafe();
};

dictionary SanitizerElementNamespace {
  required DOMString name;
  DOMString? _namespace = "http://www.w3.org/1999/xhtml";
};

// Used by "elements"
dictionary SanitizerElementNamespaceWithAttributes : SanitizerElementNamespace {
  sequence<SanitizerAttribute> attributes;
  sequence<SanitizerAttribute> removeAttributes;
};

typedef (DOMString or SanitizerElementNamespace) SanitizerElement;
typedef (DOMString or SanitizerElementNamespaceWithAttributes) SanitizerElementWithAttributes;

dictionary SanitizerAttributeNamespace {
  required DOMString name;
  DOMString? _namespace = null;
};
typedef (DOMString or SanitizerAttributeNamespace) SanitizerAttribute;

dictionary SanitizerConfig {
  sequence<SanitizerElementWithAttributes> elements;
  sequence<SanitizerElement> removeElements;
  sequence<SanitizerElement> replaceWithChildrenElements;

  sequence<SanitizerAttribute> attributes;
  sequence<SanitizerAttribute> removeAttributes;

  boolean comments;
  boolean dataAttributes;
};