Infra (PR #171)

Commit Snapshot — Last Updated

Participate:
GitHub whatwg/infra (new issue, open issues)
IRC: #whatwg on Freenode
Commits:
GitHub whatwg/infra/commits
Go to the living standard
@infrastandard
Translation (non-normative):
日本語
This Is a Commit Snapshot of the Standard

This document contains the contents of the standard as of the a1f4c51 commit, and should only be used as a historical reference. This commit may not even have been merged into master.

Do not attempt to implement this version of the specification. Do not reference this version as authoritative in any way. Instead, see https://infra.spec.whatwg.org/ for the living standard.

Abstract

The Infra Standard aims to define the fundamental concepts upon which standards are built.

Goals

Suggestions for more goals welcome.

1. Usage

To make use of the Infra Standard in a document titled X, use X depends on the Infra Standard. Additionally, cross-referencing terminology is encouraged to avoid ambiguity.

Specification authors are also encouraged to add their specification to the list of dependent specifications in order to help the editors ensure that any future breaking changes to the Infra Standard are correctly reflected by any such dependencies.

2. Conventions

2.1. Conformance

All assertions, diagrams, examples, and notes are non-normative, as are all sections explicitly marked non-normative. Everything else is normative.

The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" are to be interpreted as described in RFC 2119. [RFC2119]

These keywords have equivalent meaning when written in lowercase and cannot appear in non-normative content.

This is a willful violation of RFC 8174, motivated by legibility and a desire to preserve long-standing practice in many non-IETF-published pre-RFC 8174 documents. [RFC8174]

All of the above is applicable to both this standard and any document that uses this standard. Documents using this standard are encouraged to limit themselves to "must", "must not", "should", and "may", and to use these in their lowercase form as that is generally considered to be more readable.

For non-normative content "strongly encouraged", "strongly discouraged", "encouraged", "discouraged", "can", "cannot", "could", "could not", "might", and "might not" can be used instead.

2.2. Compliance with other specifications

In general, specifications interact with and rely on a wide variety of other specifications. In certain circumstances, unfortunately, conflicting needs require a specification to violate the requirements of other specifications. When this occurs, a document using the Infra Standard should denote such transgressions as a willful violation, and note the reason for that violation.

The previous section, §2.1 Conformance, documents a willful violation of RFC 8174 committed by the Infra Standard.

2.3. Terminology

The word "or", in cases where both inclusive "or" and exclusive "or" are possible (e.g., "if either width or height is zero"), means an inclusive "or" (implying "or both"), unless it is called out as being exclusive (with "but not both").

3. Algorithms

Algorithms, and requirements phrased in the imperative as part of algorithms (such as "strip any leading spaces" or "return false") are to be interpreted with the meaning of the keyword (e.g., "must") used in introducing the algorithm or step. If no such keyword is used, must is implied.

For example, were the spec to say:

To eat an orange, the user must:

  1. Peel the orange.
  2. Separate each slice of the orange.
  3. Eat the orange slices.

it would be equivalent to the following:

To eat an orange:

  1. The user must peel the orange.
  2. The user must separate each slice of the orange.
  3. The user must eat the orange slices.

Here the key word is "must".

Modifying the above example, if the algorithm was introduced only with "To eat an orange:", it would still have the same meaning, as "must" is implied.

Conformance requirements phrased as algorithms or specific steps may be implemented in any manner, so long as the end result is equivalent. (In particular, the algorithms are intended to be easy to follow, and not intended to be performant.)

Performance is tricky to get correct as it is influenced by user perception, computer architectures, and different types of input that can change over time in how common they are. For instance, a JavaScript engine likely has many different code paths for what is standardized as a single algorithm, in order to optimize for speed or memory consumption. Standardizing all those code paths would be an insurmountable task and not productive as they would not stand the test of time as well as the single algorithm would. Therefore performance is best left as a field to compete over.

3.1. Variables

A variable is declared with "let" and changed with "set".

Let list be a new list.

  1. Let value be null.

  2. If input is a string, then set value to input.

  3. Otherwise, set value to input, UTF-8 decoded.

  4. Assert: value is a string.

Variables must not be used before they are declared. Variables are block scoped.

3.2. Control flow

The control flow of algorithms is such that a requirement to "return" or "throw" terminates the algorithm the statement was in. "Return" will hand the given value, if any, to its caller. "Throw" will make the caller automatically rethrow the given value, if any, and thereby terminate the caller’s algorithm. Using prose the caller has the ability to "catch" the exception and perform another action.

3.3. Iteration

There’s a variety of ways to repeat a set of steps until a condition is reached.

The Infra Standard is not (yet) exhaustive on this; please file an issue if you need something.

For each

As defined for lists (and derivatives) and maps.

While

An instruction to repeat a set of steps as long as a condition is met.

While condition is "met":

An iteration’s flow can be controlled via requirements to continue or break. Continue will skip over any remaining steps in an iteration, proceeding to the next item. If no further items remain, the iteration will stop. Break will skip over any remaining steps in an iteration, and skip over any remaining items as well, stopping the iteration.

Let example be the list « 1, 2, 3, 4 ». The following prose would perform operation upon 1, then 2, then 3, then 4:

  1. For each item in example:

    1. Perform operation on item.

The following prose would perform operation upon 1, then 2, then 4. 3 would be skipped.

  1. For each item in example:

    1. If item is 3, then continue.
    2. Perform operation on item.

The following prose would perform operation upon 1, then 2. 3 and 4 would be skipped.

  1. For each item in example:

    1. If item is 3, then break.
    2. Perform operation on item.

3.4. Assertions

To improve readability, it can sometimes help to add assertions to algorithms, stating invariants. To do this, write "Assert:", followed by a statement that must be true. If the statement ends up being false that indicates an issue with the document using the Infra Standard that should be reported and addressed.

Since the statement can only ever be true, it has no implications for implementations.

  1. Let x be "Aperture Science".

  2. Assert: x is "Aperture Science".

4. Primitive data types

4.1. Nulls

The value null is used to indicate the lack of a value. It can be used interchangeably with the JavaScript null value. [ECMA-262]

Let element be null.

If input is the empty string, then return null.

4.2. Booleans

A boolean is either true or false.

Let elementSeen be false.

4.3. Bytes

A byte is a sequence of eight bits, represented as a double-digit hexadecimal number in the range 0x00 to 0xFF, inclusive.

An ASCII byte is a byte in the range 0x00 (NUL) to 0x7F (DEL), inclusive. As illustrated, an ASCII byte, excluding 0x28 and 0x29, may be followed by the representation outlined in the Standard Code section of ASCII format for Network Interchange, between parentheses. [RFC20]

0x28 may be followed by "(left parenthesis)" and 0x29 by "(right parenthesis)".

0x49 (I) when UTF-8 decoded becomes the code point U+0049 (I).

4.4. Byte sequences

A byte sequence is a sequence of bytes, represented as a space-separated sequence of bytes. Byte sequences with bytes in the range 0x20 (SP) to 0x7E (~), inclusive, can alternately be written as a string, but using backticks instead of quotation marks, to avoid confusion with an actual string.

0x48 0x49 can also be represented as `HI`.

Headers, such as `Content-Type`, are byte sequences.

To get a byte sequence out of a string, using UTF-8 encode from the Encoding Standard is encouraged. In rare circumstances isomorphic encode might be needed. [ENCODING]

A byte sequence’s length is the number of bytes it contains.

To byte-lowercase a byte sequence, increase each byte it contains, in the range 0x41 (A) to 0x5A (Z), inclusive, by 0x20.

To byte-uppercase a byte sequence, subtract each byte it contains, in the range 0x61 (a) to 0x7A (z), inclusive, by 0x20.

A byte sequence A is a byte-case-insensitive match for a byte sequence B, if the byte-lowercase of A is the byte-lowercase of B.

To isomorphic decode a byte sequence input, return a string whose length is equal to input’s length and whose code points have the same values as input’s bytes, in the same order.

4.5. Code points

A code point is a Unicode code point and is represented as a four-to-six digit hexadecimal number, typically prefixed with "U+".

A code point may be followed by its name, by its rendered form between parentheses when it is not U+0028 or U+0029, or by both. Documents using the Infra Standard are encouraged to follow code points by their name when they cannot be rendered or are U+0028 or U+0029, and their rendered form between parentheses otherwise, for legibility.

A code point’s name is defined in the Unicode Standard and represented in ASCII uppercase. [UNICODE]

The code point rendered as 🤔 is represented as U+1F914.

When referring to that code point, we might say "U+1F914 (🤔)", to provide extra context. Documents are allowed to use "U+1F914 THINKING FACE (🤔)" as well, though this is somewhat verbose.

Code points that are difficult to render unambigiously, such as U+000A, can be referred to as "U+000A LF". U+0029 can be referred to as "U+0029 RIGHT PARENTHESIS", because even though it renders, this avoids unmatched parentheses.

Code points are sometimes referred to as characters and in certain contexts are prefixed with "0x" rather than "U+".

A surrogate is a code point that is in the range U+D800 to U+DFFF, inclusive.

A scalar value is a code point that is not a surrogate.

A noncharacter is a code point that is in the range U+FDD0 to U+FDEF, inclusive, or U+FFFE, U+FFFF, U+1FFFE, U+1FFFF, U+2FFFE, U+2FFFF, U+3FFFE, U+3FFFF, U+4FFFE, U+4FFFF, U+5FFFE, U+5FFFF, U+6FFFE, U+6FFFF, U+7FFFE, U+7FFFF, U+8FFFE, U+8FFFF, U+9FFFE, U+9FFFF, U+AFFFE, U+AFFFF, U+BFFFE, U+BFFFF, U+CFFFE, U+CFFFF, U+DFFFE, U+DFFFF, U+EFFFE, U+EFFFF, U+FFFFE, U+FFFFF, U+10FFFE, or U+10FFFF.

An ASCII code point is a code point in the range U+0000 NULL to U+007F DELETE, inclusive.

An ASCII tab or newline is U+0009 TAB, U+000A LF, or U+000D CR.

ASCII whitespace is U+0009 TAB, U+000A LF, U+000C FF, U+000D CR, or U+0020 SPACE.

"Whitespace" is a mass noun.

A C0 control is a code point in the range U+0000 NULL to U+001F INFORMATION SEPARATOR ONE, inclusive.

A C0 control or space is a C0 control or U+0020 SPACE.

A control is a C0 control or a code point in the range U+007F DELETE to U+009F APPLICATION PROGRAM COMMAND, inclusive.

An ASCII digit is a code point in the range U+0030 (0) to U+0039 (9), inclusive.

An ASCII upper hex digit is an ASCII digit or a code point in the range U+0041 (A) to U+0046 (F), inclusive.

An ASCII lower hex digit is an ASCII digit or a code point in the range U+0061 (a) to U+0066 (f), inclusive.

An ASCII hex digit is an ASCII upper hex digit or ASCII lower hex digit.

An ASCII upper alpha is a code point in the range U+0041 (A) to U+005A (Z), inclusive.

An ASCII lower alpha is a code point in the range U+0061 (a) to U+007A (z), inclusive.

An ASCII alpha is an ASCII upper alpha or ASCII lower alpha.

An ASCII alphanumeric is an ASCII digit or ASCII alpha.

4.6. Strings

A JavaScript string is a sequence of unsigned 16-bit integers, also known as code units.

This is different from how the Unicode Standard defines "code unit". In particular it refers exclusively to how the Unicode Standard defines it for Unicode 16-bit strings. [UNICODE]

A JavaScript string’s length is the number of code units it contains.

A JavaScript string can also be interpreted as containing code points, per the conversion defined in The String Type section of the JavaScript specification. [ECMA-262]

This conversion process converts surrogate pairs into their corresponding scalar value and maps isolated surrogates to their corresponding code point, leaving them effectively as-is.

A JavaScript string consisting of the code units 0xD83D, 0xDCA9, and 0xD800, when interpreted as containing code points, would consist of the code points U+1F4A9 and U+D800.

A scalar value string is a sequence of scalar values.

A scalar value string is useful for any kind of I/O or other kind of operation where UTF-8 encode comes into play.

String can be used to refer to either a JavaScript string or scalar value string, when it is clear from the context which is meant or when the distinction is immaterial. Strings are denoted by double quotes and monospace font.

"Hello, world!" is a string.

A string’s length is the number of code points it contains.

To convert a JavaScript string into a scalar value string, replace any surrogates with U+FFFD.

The replaced surrogates are always isolated surrogates, since the process of interpreting the JavaScript string as containing code points will have converted surrogate pairs into scalar values.

A scalar value string can always be used as JavaScript string implicitly since it is a subset. The reverse is only possible if the JavaScript string is known to not contain surrogates; otherwise a conversion must be performed.

An implementation likely has to perform explicit conversion, depending on how it actually ends up representing JavaScript and scalar value strings. It is even fairly typical for implementations to have multiple implementations of just JavaScript strings for performance and memory reasons.


To isomorphic encode a string input, run these steps:

  1. Assert: input contains no code points greater than U+00FF.

  2. Return a byte sequence whose length is equal to input’s length and whose bytes have the same values as input’s code points, in the same order.


An ASCII string is a string whose code points are all ASCII code points.

To ASCII lowercase a string, replace all ASCII upper alphas in the string with their corresponding code point in ASCII lower alpha.

To ASCII uppercase a string, replace all ASCII lower alphas in the string with their corresponding code point in ASCII upper alpha.

A string A is an ASCII case-insensitive match for a string B, if the ASCII lowercase of A is the ASCII lowercase of B.


To strip newlines from a string, remove any U+000A LF and U+000D CR code points from the string.

To strip leading and trailing ASCII whitespace from a string, remove all ASCII whitespace that are at the start or the end of the string.

To strip and collapse ASCII whitespace in a string, replace any sequence of one or more consecutive code points that are ASCII whitespace in the string with a single U+0020 SPACE code point, and then remove any leading and trailing ASCII whitespace from that string.


To collect a sequence of code points meeting a condition condition from a string input, given a position variable position tracking the position of the calling algorithm within input:

  1. Let result be the empty string.

  2. While position doesn’t point past the end of input and the code point at position within input meets the condition condition:

    1. Append that code point to the end of result.

    2. Advance position by 1.

  3. Return result.

In addition to returning the collected code points, this algorithm updates the position variable in the calling algorithm.

To skip ASCII whitespace within a string input given a position variable position, collect a sequence of code points that are ASCII whitespace from input given position. The collected code points are not used, but position is still updated.


To strictly split a string input on a particular delimiter code point delimiter:

  1. Let position be a position variable for input, initially pointing at the start of input.

  2. Let tokens be a list of strings, initially empty.

  3. Let token be the result of collecting a sequence of code points that are not equal to delimiter from input, given position.

  4. Append token to tokens.

  5. While position is not past the end of input:

    1. Advance position to the next code point in input. (This skips past the delimiter.)

    2. Let token be the result of collecting a sequence of code points that are not equal to delimiter from input, given position.

    3. Append token to tokens.

  6. Return tokens.

This algorithm is a "strict" split, as opposed to the commonly-used variants for ASCII whitespace and for commas below, which are both more lenient in various ways involving interspersed ASCII whitespace.

To split a string input on ASCII whitespace:

  1. Let position be a position variable for input, initially pointing at the start of input.

  2. Let tokens be a list of strings, initially empty.

  3. Skip ASCII whitespace within input given position.

  4. While position is not past the end of input:

    1. Let token be the result of collecting a sequence of code points that are not ASCII whitespace from input, given position.

    2. Append token to tokens.

    3. Skip ASCII whitespace within input given position.

  5. Return tokens.

To split a string input on commas:

  1. Let position be a position variable for input, initially pointing at the start of input.

  2. Let tokens be a list of strings, initially empty.

  3. While position is not past the end of input:

    1. Let token be the result of collecting a sequence of code points that are not U+002C (,) from input, given position.

      token might be the empty string.

    2. Strip leading and trailing ASCII whitespace from token.
    3. Append token to tokens.

    4. If position is not past the end of input, then:

      1. Assert: the code point at position within input is U+002C (,).

      2. Advance position by 1.

  4. Return tokens.

5. Data structures

Conventionally, specifications have operated on a variety of vague specification-level data structures, based on shared understanding of their semantics. This generally works well, but can lead to ambiguities around edge cases, such as iteration order or what happens when you append an item to an ordered set that the set already contains. It has also led to a variety of divergent notation and phrasing, especially around more complex data structures such as maps.

This standard provides a small set of common data structures, along with notation and phrasing for working with them, in order to create common ground.

5.1. Lists

A list is a specification type consisting of a finite ordered sequence of items.

For notational convenience, a literal syntax can be used to express lists, by surrounding the list by « » characters and separating its items with a comma. An indexing syntax can be used by providing a zero-based index into a list inside square brackets. The index cannot be out-of-bounds, except when used with exists.

Let example be the list « "a", "b", "c", "a" ». Then example[1] is the string "b".


To append to a list that is not an ordered set is to add the given item to the end of the list.

To prepend to a list that is not an ordered set is to add the given item to the beginning of the list.

To replace within a list that is not an ordered set is to replace all items from the list that match a given condition with the given item, or do nothing if none do.

The above definitions are modified when the list is an ordered set; see below for ordered set append, prepend, and replace.

To insert an item into a list before an index is to add the given item to the list between the given index − 1 and the given index. If the given index is 0, then prepend the given item to the list.

To remove zero or more items from a list is to remove all items from the list that match a given condition, or do nothing if none do.

Removing x from the list « x, y, z, x » is to remove all items from the list that are equal to x. The list now is equivalent to « y, z ».

Removing all items that start with the string "a" from the list « "a", "b", "ab", "ba" » is to remove the items "a" and "ab". The list is now equivalent to « "b", "ba" ».

To empty a list is to remove all of its items.

A list contains an item if it appears in the list. We can also denote this by saying that, for a list list and an index index, "list[index] exists".

A list’s size is the number of items the list contains.

A list is empty if its size is zero.

To iterate over a list, performing a set of steps on each item in order, use phrasing of the form "For each item of list", and then operate on item in the subsequent prose.

To clone a list list is to create a new list clone, of the same designation, and, for each item of list, append item to clone, so that clone contains the same items, in the same order as list.

Note: This is a "shallow clone", as the items themselves are not cloned in any way.

Let original be the ordered set « "a", "b", "c" ». Cloning original creates a new ordered set clone, so that replacing "a" with "foo" in clone gives « "foo", "b", "c" », while original[0] is still the string "a".


The list type originates from the JavaScript specification (where it is capitalized, as List); we repeat some elements of its definition here for ease of reference, and provide an expanded vocabulary for manipulating lists. Whenever JavaScript expects a List, a list as defined here can be used; they are the same type. [ECMA-262]

5.1.1. Stacks

Some lists are designated as stacks. A stack is a list, but conventionally, the following operations are used to operate on it, instead of using append, prepend, or remove.

To push onto a stack is to append to it.

To pop from a stack is to remove its last item and return it, if the stack is not empty, or to return nothing otherwise.

5.1.2. Queues

Some lists are designated as queues. A queue is a list, but conventionally, the following operations are used to operate on it, instead of using append, prepend, or remove.

To enqueue in a queue is to append to it.

To dequeue from a queue is to remove its first item and return it, if the queue is not empty, or to return nothing if it is.

5.1.3. Sets

Some lists are designated as ordered sets. An ordered set is a list with the additional semantic that it must not contain the same item twice.

Almost all cases on the web platform require an ordered set, instead of an unordered one, since interoperability requires that any developer-exposed enumeration of the set’s contents be consistent between browsers. In those cases where order is not important, we still use ordered sets; implementations can optimize based on the fact that the order is not observable.

To append to an ordered set is to do nothing if the set already contains the given item, or to perform the normal list append operation otherwise.

To prepend to an ordered set is to do nothing if the set already contains the given item, or to perform the normal list prepend operation otherwise.

To replace within an ordered set set, given item and replacement: if set contains item or replacement, then replace the first instance of either with replacement and remove all other instances.

Replacing "a" with "c" within the ordered set « "a", "b", "c" » gives « "c", "b" ». Within « "c", "b", "a" » it gives « "c", "b" » as well.

An ordered set set is a subset of another ordered set superset (and conversely, superset is a superset of set) if, for each item of set, superset contains item.

This implies that an ordered set is both a subset and a superset of itself.

The intersection of ordered sets A and B, is the result of creating a new ordered set set and, for each item of A, if B contains item, appending item to set.

The union of ordered sets A and B, is the result of cloning A as set and, for each item of B, appending item to set.


The range n to m, inclusive, creates a new ordered set containing all of the integers from n up to and including m in consecutively increasing order, as long as m is greather than or equal to n.

For each n in the range 1 to 4, inclusive, …

5.2. Maps

An ordered map, or sometimes just "map", is a specification type consisting of a finite ordered sequence of key/value pairs, with no key appearing twice. Each key/value pair is called an entry.

As with ordered sets, by default we assume that maps need to be ordered for interoperability among implementations.

A literal syntax can be used to express ordered maps, by surrounding the ordered map with «[ ]» characters, denoting each of its entries as keyvalue, and separating its entries with a comma. An indexing syntax can be used to look up and set values by providing a key inside square brackets.

Let example be the ordered map «[ "a" → `x`, "b" → `y` ]». Then example["a"] is the byte sequence `x`.


To get the value of an entry in an ordered map given a key is to retrieve the value of any existing entry if the map contains an entry with the given key, or to return nothing otherwise. We can also use the indexing syntax explained above.

To set the value of an entry in an ordered map to a given value is to update the value of any existing entry if the map contains an entry with the given key, or if none such exists, to add a new entry with the given key/value to the end of the map. We can also denote this by saying, for an ordered map map, key key, and value value, "set map[key] to value".

To remove an entry from an ordered map is to remove all entries from the map that match a given condition, or do nothing if none do. If the condition is having a certain key, then we can also denote this by saying, for an ordered map map and key key, "remove map[key]".

An ordered map contains an entry with a given key if there exists an entry with that key. We can also denote this by saying that, for an ordered map map and key key, "map[key] exists".

To get the keys of an ordered map, return a new ordered set whose items are each of the keys in the map’s entries.

To get the values of an ordered map, return a new list whose items are each of the values in the map’s entries.

An ordered map’s size is the size of the result of running get the keys on the map.

An ordered map is empty if its size is zero.

To iterate over an ordered map, performing a set of steps on each entry in order, use phrasing of the form "For each keyvalue of map", and then operate on key and value in the subsequent prose.

5.3. Structs

A struct is a specification type consisting of a finite set of items, each of which has a unique and immutable name.


Structs with a defined order are also known as tuples. For notational convenience, a literal syntax can be used to express tuples, by surrounding the tuple with parenthesis and separating its items with a comma. To use this notation, the names need to be clear from context. This can be done by preceding the first instance with the name given to the tuple.

A status is an example tuple consisting of a code (a three-digit number) and text (a byte sequence).

A nonsense algorithm that manipulates status tuples for the purpose of demonstrating their usage is then:

  1. Let statusInstance be the status (200, `OK`).
  2. Set statusInstance to (301, `FOO BAR`).
  3. If statusInstance’s code is 404, then …

It is intentional that not all structs are tuples. Documents using the Infra Standard might need the flexibility to add new names to their struct without breaking literal syntax used by their dependencies. In that case a tuple is not appropriate.


Tuples with two items are also known as pairs. For pairs, a slightly shorter literal syntax can be used, separating the two items with a / character.

Another way of expressing our statusInstance tuple above would be as 200/`OK`.

6. JSON

To parse JSON from bytes given bytes, run these steps:

  1. Let jsonText be the result of running UTF-8 decode on bytes. [ENCODING]

  2. Return ? Call(%JSONParse%, undefined, « jsonText »). [ECMA-262]

    The conventions used in this step are those of the JavaScript specification.

7. Forgiving base64

To forgiving-base64 encode given a byte sequence data, apply the base64 algorithm defined in section 4 of RFC 4648 to data and return the result. [RFC4648]

This is named forgiving-base64 encode for symmetry with forgiving-base64 decode, which is different from the RFC as it defines error handling for certain inputs.

To forgiving-base64 decode given a string data, run these steps:

  1. Remove all ASCII whitespace from data.

  2. If data’s length divides by 4 leaving no remainder, then:

    1. If data ends with one or two U+003D (=) code points, then remove them from data.

  3. If data’s length divides by 4 leaving a remainder of 1, then return failure.

  4. If data contains a code point that is not one of

    then return failure.

  5. Let output be an empty byte sequence.

  6. Let buffer be an empty buffer that can have bits appended to it.

  7. Let position be a position variable for data, initially pointing at the start of data.

  8. While position does not point past the end of data:

    1. Find the code point pointed to by position in the second column of Table 1: The Base 64 Alphabet of RFC 4648. Let n be the number given in the first cell of the same row. [RFC4648]

    2. Append the six bits corresponding to number, most significant bit first, to buffer.

    3. If buffer has accumulated 24 bits, interpret them as three 8-bit big-endian numbers. Append three bytes with values equal to those numbers to output, in the same order, and then empty buffer.

    4. Advance position by 1.

  9. If buffer is not empty, it contains either 12 or 18 bits. If it contains 12 bits, then discard the last four and interpret the remaining eight as an 8-bit big-endian number. If it contains 18 bits, then discard the last two and interpret the remaining 16 as two 8-bit big-endian numbers. Append the one or two bytes with values equal to those one or two numbers to output, in the same order.

    The discarded bits mean that, for instance, "YQ" and "YR" both return `a`.

  10. Return output.

8. Namespaces

The HTML namespace is "http://www.w3.org/1999/xhtml".

The MathML namespace is "http://www.w3.org/1998/Math/MathML".

The SVG namespace is "http://www.w3.org/2000/svg".

The XLink namespace is "http://www.w3.org/1999/xlink".

The XML namespace is "http://www.w3.org/XML/1998/namespace".

The XMLNS namespace is "http://www.w3.org/2000/xmlns/".

Acknowledgments

Many thanks to Addison Phillips, Aryeh Gregor, Dominic Farolino, Jake Archibald, Jungkee Song, Leonid Vasilyev, Malika Aubakirova, Michael™ Smith, Mike West, Philip Jägenstedt, Rashaun "Snuggs" Stovall, Sergey Shekyan, Simon Pieters, Tab Atkins, Tobie Langel, triple-underscore, and Xue Fuqiao for being awesome!

This standard is written by Anne van Kesteren (Mozilla, annevk@annevk.nl) and Domenic Denicola (Google, d@domenic.me).

Per CC0, to the extent possible under law, the editors have waived all copyright and related or neighboring rights to this work.

Index

Terms defined by this specification

Terms defined by reference

References

Normative References

[ECMA-262]
ECMAScript Language Specification. URL: https://tc39.github.io/ecma262/
[ENCODING]
Anne van Kesteren. Encoding Standard. Living Standard. URL: https://encoding.spec.whatwg.org/
[RFC20]
V.G. Cerf. ASCII format for network interchange. October 1969. Internet Standard. URL: https://tools.ietf.org/html/rfc20
[RFC2119]
S. Bradner. Key words for use in RFCs to Indicate Requirement Levels. March 1997. Best Current Practice. URL: https://tools.ietf.org/html/rfc2119
[RFC4648]
S. Josefsson. The Base16, Base32, and Base64 Data Encodings. October 2006. Proposed Standard. URL: https://tools.ietf.org/html/rfc4648
[UNICODE]
The Unicode Standard. URL: https://www.unicode.org/versions/latest/

Informative References

[RFC8174]
B. Leiba. Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words. May 2017. Best Current Practice. URL: https://tools.ietf.org/html/rfc8174