URLPattern API

Draft Community Group Report,

This version:
https://wicg.github.io/urlpattern/
Editor:
(Google)
Participate:
GitHub WICG/urlpattern (new issue, open issues)
Commits:
GitHub spec.bs commits
Not Ready For Implementation

This spec is not yet ready for implementation. It exists in this repository to record the ideas and promote discussion.

Before attempting to implement this spec, please contact the editors.


Abstract

The URLPattern API provides a web platform primitive for matching URLs based on a convenient pattern syntax.

Status of this document

This specification was published by the Web Platform Incubator Community Group. It is not a W3C Standard nor is it on the W3C Standards Track. Please note that under the W3C Community Contributor License Agreement (CLA) there is a limited opt-out and other conditions apply. Learn more about W3C Community and Business Groups.

1. The URLPattern class

typedef (USVString or URLPatternInit) URLPatternInput;

[Exposed=(Window,Worker)]
interface URLPattern {
  constructor(URLPatternInput input, optional USVString baseURL);

  boolean test(URLPatternInput input, optional USVString baseURL);

  URLPatternResult? exec(URLPatternInput input, optional USVString baseURL);

  readonly attribute USVString protocol;
  readonly attribute USVString username;
  readonly attribute USVString password;
  readonly attribute USVString hostname;
  readonly attribute USVString port;
  readonly attribute USVString pathname;
  readonly attribute USVString search;
  readonly attribute USVString hash;
};

dictionary URLPatternInit {
  USVString protocol;
  USVString username;
  USVString password;
  USVString hostname;
  USVString port;
  USVString pathname;
  USVString search;
  USVString hash;
  USVString baseURL;
};

dictionary URLPatternResult {
  sequence<URLPatternInput> inputs;

  URLPatternComponentResult protocol;
  URLPatternComponentResult username;
  URLPatternComponentResult password;
  URLPatternComponentResult hostname;
  URLPatternComponentResult port;
  URLPatternComponentResult pathname;
  URLPatternComponentResult search;
  URLPatternComponentResult hash;
};

dictionary URLPatternComponentResult {
  USVString input;
  record<USVString, USVString> groups;
};

Each URLPattern object has an associated protocol component, a component, which must be set upon creation.

Each URLPattern object has an associated username component, a component, which must be set upon creation.

Each URLPattern object has an associated password component, a component, which must be set upon creation.

Each URLPattern object has an associated hostname component, a component, which must be set upon creation.

Each URLPattern object has an associated port component, a component, which must be set upon creation.

Each URLPattern object has an associated pathname component, a component, which must be set upon creation.

Each URLPattern object has an associated search component, a component, which must be set upon creation.

Each URLPattern object has an associated hash component, a component, which must be set upon creation.

URLPattern . protocol

The normalized protocol pattern string.

URLPattern . username

The normalized username pattern string.

URLPattern . password

The normalized password pattern string.

URLPattern . hostname

The normalized hostname pattern string.

URLPattern . port

The normalized port pattern string.

URLPattern . pathname

The normalized pathname pattern string.

URLPattern . search

The normalized search pattern string.

URLPattern . hash

The normalized hash pattern string.

The new URLPattern(input, baseURL) constructor steps are:
  1. Let init be null.

  2. If input is a scalar value string then:

    1. Set init to the result of running parse a constructor string given input.

    2. Set init["baseURL"] to baseURL.

  3. Else:

    1. Assert: input is a URLPatternInit.

    2. If baseURL is given, then throw a TypeError.

    3. Set init to input.

  4. Let processedInit be the result of process a URLPatternInit given init, "pattern", null, null, null, null, null, null, null, and null.

  5. If processedInit["protocol"] is a special scheme and processedInit["port"] is its corresponding default port, then set processedInit["port"] to the empty string.

  6. Set this's protocol component to the result of compiling a component given processedInit["protocol"], canonicalize a protocol, and default options.

  7. Set this's username component to the result of compiling a component given processedInit["username"], canonicalize a username, and default options.

  8. Set this's password component to the result of compiling a component given processedInit["password"], canonicalize a password, and default options.

  9. Set this's hostname component to the result of compiling a component given processedInit["hostname"], canonicalize a hostname, and hostname options.

  10. Set this's port component to the result of compiling a component given processedInit["port"], canonicalize a port, and default options.

  11. If the result of running protocol component matches a special scheme given this's protocol component is true, then set this's pathname component to the result of compiling a component given processedInit["pathname"], canonicalize a standard pathname, and standard pathname options.

  12. Else set this's pathname component to the result of compiling a component given processedInit["pathname"], canonicalize a cannot-be-a-base-URL pathname, and default options

  13. Set this's search component to the result of compiling a component given processedInit["search"], canonicalize a search, and default options.

  14. Set this's hash component to the result of compiling a component given processedInit["hash"], canonicalize a hash, and default options.

The protocol getter steps are:
  1. Return this's protocol component's pattern string.

The username getter steps are:
  1. Return this's username component's pattern string.

The password getter steps are:
  1. Return this's password component's pattern string.

The hostname getter steps are:
  1. Return this's hostname component's pattern string.

The port getter steps are:
  1. Return this's port component's pattern string.

The pathname getter steps are:
  1. Return this's pathname component's pattern string.

The search getter steps are:
  1. Return this's search component's pattern string.

The hash getter steps are:
  1. Return this's hash component's pattern string.

The test(input, baseURL) method steps are:
  1. Let result be the result of match given this, input, and baseURL if given.

  2. If result is null, return false.

  3. Return true.

The exec(input, baseURL) method steps are:
  1. Return the result of match given this, input, and baseURL if given.

1.1. Internals

A URLPattern is associated with multiple component structs.

A component has an associated pattern string, a well formed pattern string, which must be set upon creation.

A component has an associated regular expression, a RegExp, which must be set upon creation.

A component has an associated group name list, a list of strings, which must be set upon creation.

To compile a component given a string input, options options, and encoding callback encoding callback:
  1. Let part list be the result of running parse a pattern string given input, options, and encoding callback.

  2. Let (regular expression string, name list) be the result of running generate a regular expression and name list given part list and options.

  3. Let regular expression be RegExpCreate(regular expression string, "u"). If this throws an exception, catch it, and throw a TypeError.

  4. Let pattern string be the result of running generate a pattern string given part list and options.

  5. Return a new component whose pattern string is pattern string, regular expression is regular expression, and group name list is name list.

To perform a match given a URLPattern urlpattern, a URLPatternInput input, and an optional string baseURLString:
  1. Let protocol be the empty string.

  2. Let username be the empty string.

  3. Let password be the empty string.

  4. Let hostname be the empty string.

  5. Let port be the empty string.

  6. Let pathname be the empty string.

  7. Let search be the empty string.

  8. Let hash be the empty string.

  9. Let inputs be an empty list.

  10. Append input to inputs.

  11. If input is a URLPatternInit then:

    1. If baseURLString was given, throw a TypeError.

    2. Let applyResult be the result of process a URLPatternInit given input, "url", protocol, username, password, hostname, port, pathname, search, and hash. If this throws an exception, catch it, and return null.

    3. Set protocol to applyResult["protocol"].

    4. Set username to applyResult["username"].

    5. Set password to applyResult["password"].

    6. Set hostname to applyResult["hostname"].

    7. Set port to applyResult["port"].

    8. Set pathname to applyResult["pathname"].

    9. Set search to applyResult["search"].

    10. Set hash to applyResult["hash"].

  12. Else:

    1. Let baseURL be null.

    2. If baseURLString was given, then:

      1. Set baseURL to the result of parsing baseURLString.

      2. If baseURL is failure, return null.

      3. Append baseURL to inputs.

    3. Let url be the result of parsing input given baseURL.

    4. If url is failure, return null.

    5. Set protocol to url’s scheme.

    6. Set username to url’s username.

    7. Set password to url’s password.

    8. Set hostname to url’s host or the empty string if the value is null.

    9. Set port to url’s port or the empty string if the value is null.

    10. Set pathname to url’s API pathname string.

    11. Set search to url’s query or the empty string if the value is null.

    12. Set hash to url’s fragment or the empty string if the value is null.

  13. Let protocolExecResult be RegExpBuiltinExec(urlpattern’s protocol component's regular expression, protocol).

  14. Let usernameExecResult be RegExpBuiltinExec(urlpattern’s username component's regular expression, username).

  15. Let passwordExecResult be RegExpBuiltinExec(urlpattern’s password component's regular expression, password).

  16. Let hostnameExecResult be RegExpBuiltinExec(urlpattern’s hostname component's regular expression, hostname).

  17. Let portExecResult be RegExpBuiltinExec(urlpattern’s port component's regular expression, port).

  18. Let pathnameExecResult be RegExpBuiltinExec(urlpattern’s pathname component's regular expression, pathname).

  19. Let searchExecResult be RegExpBuiltinExec(urlpattern’s search component's regular expression, search).

  20. Let hashExecResult be RegExpBuiltinExec(urlpattern’s hash component's regular expression, hash).

  21. If protocolExecResult, usernameExecResult, passwordExecResult, hostnameExecResult, portExecResult, pathnameExecResult, searchExecResult, or hashExecResult are null then return null.

  22. Let result be a new URLPatternResult.

  23. Set result["inputs"] to inputs.

  24. Set result["protocol"] to the result of creating a component match result given urlpattern’s protocol component, protocol, and protocolExecResult.

  25. Set result["username"] to the result of creating a component match result given urlpattern’s username component, username, and usernameExecResult.

  26. Set result["password"] to the result of creating a component match result given urlpattern’s password component, password, and passwordExecResult.

  27. Set result["hostname"] to the result of creating a component match result given urlpattern’s hostname component, hostname, and hostnameExecResult.

  28. Set result["port"] to the result of creating a component match result given urlpattern’s port component, port, and portExecResult.

  29. Set result["pathname"] to the result of creating a component match result given urlpattern’s pathname component, pathname, and pathnameExecResult.

  30. Set result["search"] to the result of creating a component match result given urlpattern’s search component, search, and searchExecResult.

  31. Set result["hash"] to the result of creating a component match result given urlpattern’s hash component, hash, and hashExecResult.

  32. Return result.

To create a component match result given a component component, a string input, and an array representing the output of RegExpBuiltinExec execResult:
  1. Let result be a new URLPatternComponentResult.

  2. Set result["input"] to input.

  3. Let groups be a record<USVString, USVString>.

  4. Let index be 1.

  5. While index is less than Get(execResult, "length"):

    1. Let name be component’s group name list[index − 1].

    2. Let value be Get(execResult, ToString(index)).

    3. Set groups[name] to value.

  6. Set result["groups"] to groups.

  7. Return result.

The default options is an options struct with delimiter code point set to the empty string and prefix code point set to the empty string.

The hostname options is an options struct with delimiter code point set "." and prefix code point set to the empty string.

The standard pathname options is an options struct with delimiter code point set "/" and prefix code point set to "/".

To determine if a protocol component matches a special scheme given a component protocol component:
  1. Let special scheme list be a list populated with all of the special schemes.

  2. For each scheme of special scheme list:

    1. Let test result be RegExpBuiltinExec(protocol component’s regular expression, scheme).

    2. If test result is not null, then return true.

  3. Return false.

1.2. Constructor String Parsing

A constructor string parser is a struct.

A constructor string parser has an associated input, a string, which must be set upon creation.

A constructor string parser has an associated token list, a token list, which must be set upon creation.

A constructor string parser has an associated result, a URLPatternInit, initially set to a new URLPatternInit.

A constructor string parser has an associated component start, a number, initially set to 0.

A constructor string parser has an associated token index, a number, initially set to 0.

A constructor string parser has an associated token increment, a number, initially set to 1.

A constructor string parser has an associated group depth, a number, initially set to 0.

A constructor string parser has an associated should treat as a standard URL, a boolean, initially set to false.

A constructor string parser has an associated state, a string, initially set to "init". It must be one of the following:

The URLPattern constructor string algorithm is very similar to the basic URL parser algorithm, but some differences prevent us from using that algorithm directly.

First, the URLPattern constructor string parser operates on tokens generated using the "lenient" tokenize policy. In constrast, basic URL parser operates on code points. Operating on tokens allows the URLPattern constructor string parser to more easily distinguish between code points that are significant pattern syntax and code points that might be a URL component separator. For example, it makes it trivial to handle named groups like ":hmm" in "https://a.c:hmm.example.com:8080" without getting confused with the port number.

Second, the URLPattern constructor string parser needs to avoid applying URL canonicalization to all code points like basic URL parser does. Instead we perform canonicalization on only parts of the pattern string we know are safe later when compiling each component pattern string.

Finally, the URLPattern constructor string parser does not handle some parts of the basic URL parser state machine. For example, it does not treat backslashes specially as they would all be treated as pattern characters and would require excessive escaping. In addition, this parser may not handle some more esoteric parts of the URL parsing algorithm like file URLs with a hostname. The goal with this parser was to handle the most common URLs while allowing any niche case to be handled instead via the URLPatternInit constructor.

To parse a constructor string given a string input:
  1. Let parser be a new constructor string parser whose input is input and token list is the result of running tokenize given input and "lenient".

    When constructing a pattern using a URLPatternInit like new URLPattern({ pathname: 'foo' }) any missing components will be defaulted to wildcards. In the constructor string case, however, all components are precisely defined as either empty string or a longer value. This is due to there being no way to simply "leave out" a component when writing a URL.

    To implement this we initialize components in parser’s result with empty string in advance.

    We can’t, however, do this immediately. We want to allow the baseURL to provide information for relative URLs, so we only want to set the default empty string values for components following the first component in the relative URL. We therefore wait to set the default component values until after we exit the "init" state.

  2. While parser’s token index is less than parser’s token list size:

    1. Set parser’s token increment to 1.

      On every iteration of the parse loop the parser’s token index will be incremented by its token increment value. Typically this means incrementing by 1, but at certain times it is set to zero. The token increment is then always reset back to 1 at the top of the loop.

    2. If parser’s token list[parser’s token index]'s type is "end" then:

      1. If parser’s state is "init":

        If we reached the end of the string in the "init" state, then we failed to find a protocol terminator and this must be a relative URLPattern constructor string.

        1. Run rewind given parser.

          We next determine at which component the relative pattern begins. Relative pathnames are most common, but URLs and URLPattern constructor strings can begin with the search or hash components as well.

        2. If the result of running is a hash prefix given parser is true, then run change state given parser, "hash" and 1.

        3. Else if the result of running is a search prefix given parser is true:

          1. Run change state given parser, "search" and 1.

          2. Set parser’s result["hash"] to the empty string.

        4. Else:

          1. Run change state given parser, "pathname" and 0.

          2. Set parser’s result["search"] to the empty string.

          3. Set parser’s result["hash"] to the empty string.

        5. Increment parser’s token index by parser’s token increment.

        6. Continue.

      2. If parser’s state is "authority":

        If we reached the end of the string in the "authority" state, then we failed to find an "@". Therefore there is no username or password.

        1. Run rewind and set state given parser, and "hostname".

        2. Increment parser’s token index by parser’s token increment.

        3. Continue.

      3. Run change state given parser, "done" and 0.

      4. Break.

    3. If the result of running is a group open given parser is true:

      We ignore all code points within "{ ... }" pattern groupings. It would not make sense to allow a URL component boundary to lie within a grouping; e.g. "https://example.c{om/fo}o". While not supported within well formed pattern strings, we handle nested groupings here to avoid parser confusion.

      It is not necessary to perform this logic for regexp or named groups since those values are collapsed into individual tokens by the tokenize algorithm.

      1. Increment parser’s group depth by 1.

      2. Increment parser’s token index by parser’s token increment.

      3. Continue.

    4. If parser’s group depth is greater than 0:

      1. If the result of running is a group close given parser is true, then decrement parser’s group depth by 1.

      2. Else:

        1. Increment parser’s token index by parser’s token increment.

        2. Continue.

    5. Switch on parser’s state and run the associated steps:

      "init"
      1. If the result of running is a protocol suffix given parser is true:

        We found a protocol suffix, so this must be an absolute URLPattern constructor string. Therefore initialize all component to the empty string.

        1. Set parser’s result["username"] to the empty string.

        2. Set parser’s result["password"] to the empty string.

        3. Set parser’s result["hostname"] to the empty string.

        4. Set parser’s result["port"] to the empty string.

        5. Set parser’s result["pathname"] to the empty string.

        6. Set parser’s result["search"] to the empty string.

        7. Set parser’s result["hash"] to the empty string.

        8. Run rewind and set state given parser and "protocol".

      "protocol"
      1. If the result of running is a protocol suffix given parser is true:

        1. Run compute should treat as a standard URL given parser.

          We must eagerly compile the protocol component to determine if it matches any special schemes. If it does then we treat the URLPattern constructor string as a "standard URL". The determines if the pathname defaults to a "/" and also whether we should look for the username, password, hostname, and port components. Authority slashes may also cause us to look for these components as well. Otherwise we treat this as a "cannot be a base URL" and go straight to the pathname component.

        2. If parser’s should treat as a standard URL is true, then set parser’s result["pathname"] to "/".

        3. Let next state be "pathname".

        4. Let skip be 1.

        5. If the result of running next is authority slashes given parser:

          1. Set next state to "authority".

          2. Set skip to 3.

        6. Else if parser’s should treat as a standard URL is true, then set next state to "authority".

        7. Run change state given parser, next state, and skip.

      "authority"
      1. If the result of running is an identity terminator given parser is true, then run rewind and set state given parser and "username".

      2. Else if any of the following are true:

        then run rewind and set state given parser and "hostname".

      "username"
      1. If the result of running is a password prefix given parser is true, then run change state given parser, "password", and 1.

      2. Else if the result of running is an identity terminator given parser is true, then run change state given parser, "hostname", and 1.

      "password"
      1. If the result of running is an identity terminator given parser is true, then run change state given parser, "hostname", and 1.

      "hostname"
      1. If the result of running is a port prefix given parser is true, then run change state given parser, "port", and 1.

      2. Else if the result of running is a pathname start given parser is true, then run change state given parser, "pathname", and 0.

      3. Else if the result of running is a search prefix given parser is true, then run change state given parser, "search", and 1.

      4. Else if the result of running is a hash prefix given parser is true, then run change state given parser, "hash", and 1.

      "port"
      1. If the result of running is a pathname start given parser is true, then run change state given parser, "pathname", and 0.

      2. Else if the result of running is a search prefix given parser is true, then run change state given parser, "search", and 1.

      3. Else if the result of running is a hash prefix given parser is true, then run change state given parser, "hash", and 1.

      "pathname"
      1. If the result of running is a search prefix given parser is true, then run change state given parser, "search", and 1.

      2. Else if the result of running is a hash prefix given parser is true, then run change state given parser, "hash", and 1.

      "search"
      1. If the result of running is a hash prefix given parser is true, then run change state given parser, "hash", and 1.

      "hash"
      1. Do nothing.

      "done"
      1. Assert: This step is never reached.

    6. Increment parser’s token index by parser’s token increment.

  3. Return parser’s result.

To change state given a constructor string parser parser, a state state, and a number skip:
  1. If state is not "init", not "authority", and not "done", then set parser’s result[state] to the result of running make a component string given parser.

  2. Set parser’s state to state.

  3. Set parser’s component start to parser’s token index + skip.

  4. Increment parser’s token index by skip.

  5. Set parser’s token increment to 0.

To rewind given a constructor string parser parser:
  1. Set parser’s token index to parser’s component start.

  2. Set parser’s token increment to 0.

To rewind and set state given a constructor string parser parser and a state state:
  1. Run rewind given parser.

  2. Set parser’s state to state.

To get a safe token given a constructor string parser parser and a number index:
  1. If index is less than parser’s token list's size, then return parser’s token list[index].

  2. Assert: parser’s token list's size is greater than or equal to 1.

  3. Let last index be parser’s token list's size − 1.

  4. Let token be parser’s token list[last index].

  5. Assert: token’s type is "end".

  6. Return token.

To run is a non-special pattern char given a constructor string parser parser, a number index, and a string value:
  1. Let token be the result of running get a safe token given parser and index.

  2. If token’s value is not value, then return false.

  3. If any of the following are true:

    then return true.

  4. Return false.

To run is a protocol suffix given a constructor string parser parser:
  1. Return the result of running is a non-special pattern char given parser, parser’s token index, and ":".

To run next is authority slashes given a constructor string parser parser:
  1. If the result of running is a non-special pattern char given parser, parser’s token index + 1, and "/" is false, then return false.

  2. If the result of running is a non-special pattern char given parser, parser’s token index + 2, and "/" is false, then return false.

  3. Return true.

To run is an identity terminator given a constructor string parser parser:
  1. Return the result of running is a non-special pattern char given parser, parser’s token index, and "@".

To run is a password prefix given a constructor string parser parser:
  1. Return the result of running is a non-special pattern char given parser, parser’s token index, and ":".

To run is a port prefix given a constructor string parser parser:
  1. Return the result of running is a non-special pattern char given parser, parser’s token index, and ":".

To run is a pathname start given a constructor string parser parser:
  1. Return the result of running is a non-special pattern char given parser, parser’s token index, and "/".

To run is a search prefix given a constructor string parser parser:
  1. If result of running is a non-special pattern char given parser’s token index and "?" is true, then return true.

  2. If parser’s token listtoken index's value is not "?", then return false.

  3. Let previous index be parser’s token index − 1.

  4. If previous index is less than 0, then return true.

  5. Let previous token be the result of running get a safe token given parser and previous index.

  6. If any of the following are true, then return false:

  7. Return true.

To run is a hash prefix given a constructor string parser parser:
  1. Return the result of running is a non-special pattern char given parser’s token index and "#".

To run is a group open given a constructor string parser parser:
  1. If parser’s token list[parser’s token index]'s type is "open", then return true.

  2. Else return false.

To run is a group close given a constructor string parser parser:
  1. If parser’s token list[parser’s token index]'s type is "close", then return true.

  2. Else return false.

To run make a component string given a constructor string parser parser:
  1. Assert: parser’s token index is less than parser’s token list's size.

  2. Let token be parser’s token list[parser’s token index].

  3. Let component start token be the result of running get a safe token given parser and parser’s component start.

  4. Let component start input index be component start token’s index.

  5. Let new length be token’s indexcomponent start input index.

  6. Return the substring within parser’s input starting at component start input index with new length.

To compute should treat as a standard URL given a constructor string parser parser:
  1. Let protocol string be the result of running make a component string given parser.

  2. Let protocol component be the result of compiling a component given protocol string, canonicalize a protocol, and default options.

  3. If the result of running protocol component matches a special scheme given protocol component, then set parser’s should treat as a standard URL to true.

2. Patterns

A pattern string is a string that is written to match a set of target strings. A well formed pattern string conforms to a particular pattern syntax. This pattern syntax is directly based on the syntax used by the popular path-to-regexp JavaScript library.

2.1. Parsing Patterns

2.1.1. Tokens

A token list is a list containing zero or more token structs.

A token is a struct representing a single lexical token within a pattern string.

A token has an associated type, a string, initially "invalid-char". It must be one of the following:

"open"
The token represents a U+007B ({) code point.
"close"
The token represents a U+007D (}) code point.
"regexp"
The token represents a string of the form "(<regular expression>)". The regular expression is required to consist of only ASCII code points.
"name"
The token represents a string of the form ":<name>". The name value is restricted to code points that are consistent with JavaScript identifiers.
"char"
The token represents a valid pattern code point without any special syntactical meaning.
"escaped-char"
The token represents a code point escaped using a backslash like "\<char>".
"other-modifier"
The token represents a matching group modifier that is either the U+003F (?) or U+002B (+) code points.
"asterisk"
The token represents a U+002A (*) code point that can be either a wildcard matching group or a matching group modifier.
"end"
The token represents the end of the pattern string.
"invalid-char"
The token represents a code point that is invalid in the pattern. This could be because of the code point value itself or due to its location within the pattern relative to other syntactic elements.

A token has an associated index, a number, initially 0. It is the position of the first code point in the pattern string represented by the token.

A token has an associated value, a string, initially the empty string. It contains the code points from the pattern string represented by the token.

2.1.2. Tokenizing

A tokenize policy is a string that must be either "strict" or "lenient".

A tokenizer is a struct.

A tokenizer has an associated input, a pattern string, initially the empty string.

A tokenizer has an associated policy, a tokenize policy, initially "strict".

A tokenizer has an associated token list, a token list, initially an empty list.

A tokenizer has an associated index, a number, initially 0.

A tokenizer has an associated next index, a number, initially 0.

A tokenizer has an associated code point, a Unicode code point, initially null.

To tokenize a given string input and tokenize policy policy:
  1. Let tokenizer be a new tokenizer.

  2. Set tokenizer’s input to input.

  3. Set tokenizer’s policy to policy.

  4. While tokenizer’s index is less than tokenizer’s input's code point length:

    1. Run get the next code point given tokenizer.

    2. If tokenizer’s code point is U+002A (*):

      1. Run add a token with default position and length given tokenizer and "asterisk".

      2. Continue.

    3. If tokenizer’s code point is U+002B (+) or U+003F (?):

      1. Run add a token with default position and length given tokenizer and "other-modifier".

      2. Continue.

    4. If tokenizer’s code point is U+005C (\):

      1. If tokenizer’s index is equal to tokenizer’s input's code point length − 1:

        1. Run process a tokenizing error given tokenizer, tokenizer’s next index, and tokenizer’s index.

        2. Continue.

      2. Let escaped index be tokenizer’s next index.

      3. Run get the next code point given tokenizer.

      4. Run add a token with default length given tokenizer, "escaped-char", tokenizer’s next index, and escaped index.

      5. Continue.

    5. If tokenizer’s code point is U+007B ({):

      1. Run add a token with default position and length given tokenizer and "open".

      2. Continue.

    6. If tokenizer’s code point is U+007D (}):

      1. Run add a token with default position and length given tokenizer and "close".

      2. Continue.

    7. If tokenizer’s code point is U+003A (:):

      1. Let name position be tokenizer’s next index.

      2. Let name start be name position.

      3. While name position is less than tokenizer’s input's code point length:

        1. Run seek and get the next code point given tokenizer and name position.

        2. Let first code point be true if name position equals name start and false otherwise.

        3. Let valid code point be the result of running is a valid name code point given tokenizer’s code point and first code point.

        4. If valid code point is false break.

        5. Set name position to tokenizer’s next index.

      4. If name position is less than or equal to name start:

        1. Run process a tokenizing error given tokenizer, name start, and tokenizer’s index.

        2. Continue.

      5. Run add a token with default length given tokenizer, "name", name position, and name start.

    8. If tokenizer’s code point is U+0028 (():

      1. Let depth be 1.

      2. Let regexp position be tokenizer’s next index.

      3. Let regexp start be regexp position.

      4. Let error be false.

      5. While regexp position is less than tokenizer’s input's code point length:

        1. Run seek and get the next code point given tokenizer and regexp position.

        2. If the result of running is ASCII given tokenizer’s code point is false:

          1. Run process a tokenizing error given tokenizer, regexp start, and tokenizer’s index.

          2. Set error to true.

          3. Break.

        3. If regexp position equals regexp start and tokenizer’s code point is U+003F (?):

          1. Run process a tokenizing error given tokenizer, regexp start, and tokenizer’s index.

          2. Set error to true.

          3. Break.

        4. If tokenizer’s code point is U+005C (\):

          1. If regexp position equals tokenizer’s input's code point length − 1:

            1. Run process a tokenizing error given tokenizer, regexp start, and tokenizer’s index.

            2. Set error to true.

            3. Break

          2. Run get the next code point given tokenizer.

          3. If the result of running is ASCII given tokenizer’s code point is false:

            1. Run process a tokenizing error given tokenizer, regexp start, and tokenizer’s index.

            2. Set error to true.

            3. Break.

          4. Set regexp position to tokenizer’s next index.

          5. Continue.

        5. If tokenizer’s code point is U+0029 ()):

          1. Decrement depth by 1.

          2. If depth is 0:

            1. Set regexp position to tokenizer’s next index.

            2. Break.

        6. Else if tokenizer’s code point is U+0028 (():

          1. Increment depth by 1.

          2. If regexp position equals tokenizer’s input's code point length − 1:

            1. Run process a tokenizing error given tokenizer, regexp start, and tokenizer’s index.

            2. Set error to true.

            3. Break

          3. Let temporary position be tokenizer’s next index.

          4. Run get the next code point given tokenizer.

          5. If tokenizer’s code point is not U+003F (?):

            1. Run process a tokenizing error given tokenizer, regexp start, and tokenizer’s index.

            2. Set error to true.

            3. Break.

          6. Set tokenizer’s next index to temporary position.

        7. Set regexp position to tokenizer’s next index.

      6. If error is true continue.

      7. If depth is not zero:

        1. Run process a tokenizing error given tokenizer, regexp start, and tokenizer’s index.

        2. Continue.

      8. Let regexp length be regexp positionregexp start − 1.

      9. If regexp length is zero:

        1. Run process a tokenizing error given tokenizer, regexp start, and tokenizer’s index.

        2. Continue.

      10. Run add a token given tokenizer, "regexp", regexp position, regexp start, and regexp length.

    9. Run add a token given tokenizer and "char".

  5. Run add a token with default length given tokenizer, "end", tokenizer’s index, and tokenizer’s index.

  6. Return tokenizer’s token list.

To get the next code point for a given tokenizer tokenizer:
  1. Set tokenizer’s code point to the Unicode code point in tokenizer’s input at the position indicated by tokenizer’s next index.

  2. Increment tokenizer’s next index by 1.

To seek and get the next code point for a given tokenizer tokenizer and number index:
  1. Set tokenizer’s next index to index.

  2. Run get the next code point given tokenizer.

To add a token for a given tokenizer tokenizer, type type, number next position, number value position, and number value length:
  1. Let token be a new token.

  2. Set token’s type to type.

  3. Set token’s index to tokenizer’s index.

  4. Set token’s value to the substring within tokenizer’s input starting at value position with value length.

  5. Append token to the back of tokenizer’s token list.

  6. Set tokenizer’s index to next position.

To add a token with default length for a given tokenizer tokenizer, type type, number next position, and number value position:
  1. Let computed length be next positionvalue position.

  2. Run add a token given tokenizer, type, next position, value position, and computed length.

To add a token with default position and length for a given tokenizer tokenizer and type type:
  1. Run add a token with default length given tokenizer, type, tokenizer’s next index, and tokenizer’s index.

To process a tokenizing error for a given tokenizer tokenizer, a number next position, and a number value position:
  1. If tokenizer’s policy is "strict", then throw a TypeError.

  2. Assert: tokenizer’s policy is "lenient".

  3. Run add a token with default length given tokenizer, "invalid-char", next position, and value position.

To perform is a valid name code point given a Unicode code point and a boolean first:
  1. If first is true return the result of checking if code point is contained in the IdentifierStart set of code points.

  2. Else return the result of checking if code point is contained in the IdentifierPart set of code points.

To determine if a Unicode code point is ASCII:
  1. If code point is between U+0000 and U+007F inclusive, then return true.

  2. Otherwise return false.

2.1.3. Parts

A part list is a list of zero or more parts.

A part is a struct representing one piece of a parser pattern string. It can contain at most one matching group, a fixed text prefix, a fixed text suffix, and a modifier. It can contain as little as a single fixed text string or a single matching group.

A part has an associated type, a string, which must be set upon creation. It must be one of the following:

"fixed-text"
The part represents a simple fixed text string.
"regexp"
The part represents a matching group with a custom regular expression.
"segment-wildcard"
The part represents a matching group that matches code points up to the next separator code point. This is typically used for a named group like ":foo" that does not have a custom regular expression.
"full-wildcard"
The part represents a matching group that greedily matches all code points. This is typically used for the "*" wildcard matching group.

A part has an associated value, a string, which must be set upon creation.

A part has an associated modifier a string, which must be set upon creation. It must be one of the following:

"none"
The part does not have a modifier.
"optional"
The part has an optional modifier indicated by the U+003F (?) code point.
"zero-or-more"
The part has a "zero or more" modifier indicated by the U+002A (*) code point.
"one-or-more"
The part has a "one or more" modifier indicated by the U+002B (+) code point.

A part has an associated name, a string, initially the empty string.

A part has an associated prefix, a string, initially the empty string.

A part has an associated suffix, a string, initially the empty string.

2.1.4. Options

An options struct contains different settings that control how pattern string behaves. These options originally come from path-to-regexp. We only include the options that are modified within the URLPattern specification and exclude the other options. For the purposes of comparison, this specification acts like path-to-regexp where sensitive, strict, start, and end are always set to true.

An options has an associated delimiter code point, a string, which must be set upon creation. It must contain one ASCII code point or the empty string. This code point is treated as a segment separator and is used for determining how far a :foo named group should match by default. For example, if the delimiter code point is "/" then "/:foo" will match "/bar", but not "/bar/baz". If the delimiter code point is the empty string then the example pattern would match both strings.

An options has an associated prefix code point, a string, which must be set upon creation. It must contain one ASCII code point or the empty string. The code point is treated as an automatic prefix if found immediately preceding a match group. This matters when a match group is modified to be optional or repeating. For example, if prefix code point is "/" then "/foo/:bar?/baz" will treat the "/" before ":bar" as a prefix that becomes optional along with the named group. So in this example the pattern would match "/foo/baz".

2.1.5. Parsing

An encoding callback is an abstract algorithm that takes a given string input. The input will be a simple text piece of a pattern string. An implementing algorithm will validate and encode the input. It must return the encoded string or throw an exception.

A pattern parser is a struct.

A pattern parser has an associated token list, a token list, initially an empty list.

A pattern parser has an associated encoding callback, a encoding callback, that must be set upon creation.

A pattern parser has an associated segment wildcard regexp, a string, that must be set upon creation.

A pattern parser has an associated part list, a part list, initially an empty list.

A pattern parser has an associated pending fixed value, a string, initially the empty string.

A pattern parser has an associated index, a number, initially 0.

A pattern parser has an associated next numeric name, a number, initially 0.

To parse a pattern string given a pattern string input, options options, and encoding callback encoding callback:
  1. Let parser be a new pattern parser whose encoding callback is encoding callback and segment wildcard regexp is the result of running generate a segment wildcard regexp given options.

  2. Set parser’s token list to the result of running tokenize given input and "strict".

  3. While parser’s index is less than parser’s token list's size:

    This first section is looking for the sequence: <prefix char><name><regexp><modifier>. There could be zero to all of these tokens.

    "/:foo(bar)?"
    All four tokens.
    "/"
    One "char" token.
    ":foo"
    One "name" token.
    "(bar)"
    One "regexp" token.
    "/:foo"
    "char" and "name" tokens.
    "/(bar)"
    "char" and "regexp" tokens.
    "/:foo?"
    "char", "name", and "other-modifier" tokens.
    "/(bar)?"
    "char", "regexp", and "other-modifier" tokens.
    1. Let char token be the result of running try to consume a token given parser and "char".

    2. Let name token be the result of running try to consume a token given parser and "name".

    3. Let regexp or wildcard token be the result of running try to consume a regexp or wildcard token given parser and name token.

    4. If name token is not null or regexp or wildcard token is not null:

      If there is a matching group, we need to add the part immediately.

      1. Let prefix be the empty string.

      2. If char token is not null then set prefix to char token’s value.

      3. If prefix is not the empty string and not options’s prefix code point:

        1. Append prefix to the end of parser’s pending fixed value.

        2. Set prefix to the empty string.

      4. Run maybe add a part from the pending fixed value given parser.

      5. Let modifier token be the result of running try to consume a modifier token given parser.

      6. Run add a part given parser, prefix, name token, regexp or wildcard token, the empty string, and modifier token.

      7. Continue.

    5. Let fixed token be char token.

      If there was no matching group, then we need to buffer any fixed text. We want to collect as much text as possible before adding it as a "fixed-text" part.

    6. If fixed token is null, then set fixed token to the result of running try to consume a token given parser and "escaped-char".

    7. If fixed token is not null:

      1. Append fixed token’s value to parser’s pending fixed value.

      2. Continue.

    8. Let open token be the result of running try to consume a token given parser and "open".

      Next we look for the sequence <open><char prefix><name><regexp><char suffix><close><modifier>. The open and close are required, but the other tokens are optional.

      "{a:foo(bar)b}?"
      All tokens are present.
      "{:foo}?"
      "open", "name", "close", and "other-modifier" tokens.
      "{(bar)}?"
      "open", "regexp", "close", and "other-modifier" tokens.
      "{ab}?"
      "open", "char", "close", and "other-modifier" tokens.
    9. If open token is not null:

      1. Set prefix be the result of running consume text given parser.

      2. Set name token to the result of running try to consume a token given parser and "name".

      3. Set regexp or wildcard token to the result of running try to consume a regexp or wildcard token given parser and name token.

      4. Let suffix be the result of running consume text given parser.

      5. Run consume a required token given parser and "close".

      6. Set modifier token to the result of running try to consume a modifier token given parser.

      7. Run add a part given parser, prefix, name token, regexp or wildcard token, suffix, and modifier token.

    10. Run maybe add a part from the pending fixed value given parser.

    11. Run consume a required token given parser and "end".

  4. Return parser’s part list.

The full wildcard regexp value is the string ".*".

To generate a segment wildcard regexp given an options options:
  1. Let result be "[^".

  2. Append the result of running escape a regexp string given options’s delimiter code point to the end of result.

  3. Append "]+?" to the end of result.

  4. Return result.

To try to consume a token given a pattern parser parser and type type:
  1. Assert: parser’s index is less than parser’s token list size.

  2. Let next token be parser’s token list[parser’s index].

  3. If next token’s type is not type return null.

  4. Increment parser’s index by 1.

  5. Return next token.

To try to consume a modifier token given a pattern parser parser:
  1. Let token be the result of running try to consume a token given parser and "other-modifier".

  2. If token is not null, then return token.

  3. Set token to the result of running try to consume a token given parser and "asterisk".

  4. Return token.

To try to consume a regexp or wildcard token given a pattern parser parser and token name token:
  1. Let token be the result of running try to consume a token given parser and "regexp".

  2. If name token is null and token is null, then set token to the result of running try to consume a token given parser and "asterisk".

  3. Return token.

To consume a required token given a pattern parser parser and type type:
  1. Let result be the result of running try to consume a token given parser and type.

  2. If result is null, then throw a TypeError.

  3. Return result.

To consume text given a pattern parser parser:
  1. Let result be the empty string.

  2. While true:

    1. Let token be the result of running try to consume a token given parser and "char".

    2. If token is null, then set token to the result of running try to consume a token given parser and "escaped-char".

    3. If token is null, then break.

    4. Append token’s value to the end of result.

  3. Return result.

To maybe add a part from the pending fixed value given a pattern parser parser:
  1. If parser’s pending fixed value is the empty string, then return.

  2. Let encoded value be the result of running parser’s encoding callback given parser’s pending fixed value.

  3. Set parser’s pending fixed value to the empty string.

  4. Let part be a new part whose type is "fixed-text", value is encoded value, and modifier is "none".

  5. Append part to parser’s part list.

To add a part given a pattern parser parser, a string prefix, a token name token, a token regexp or wildcard token, a string suffix, and a token modifier token:
  1. Let modifier be "none".

  2. If modifier token is not null:

    1. If modifier token’s value is "?" then set modifier to "optional".

    2. Else if modifier token’s value is "*" then set modifier to "zero-or-more".

    3. Else if modifier token’s value is "+" then set modifier to "one-or-more".

  3. If name token is null and regexp or wildcard token is null and modifier is "none":

    This was a "{foo}" grouping. We add this to the pending fixed value so that it will be combined with any previous or subsequent text.

    1. Append prefix to the end of parser’s pending fixed value.

    2. Return.

  4. Run maybe add a part from the pending fixed value given parser.

  5. If name token is null and regexp or wildcard token is null:

    This was a "{foo}?" grouping. The modifier means we cannot combine it with other text. Therefore we add it as a part immediately.

    1. Assert: suffix is the empty string.

    2. If prefix is the empty string, then return.

    3. Let encoded value be the result of running parser’s encoding callback given prefix.

    4. Let part be a new part whose type is "fixed-text", value is encoded value, and modifier is modifier.

    5. Append part to parser’s part list.

    6. Return.

  6. Let regexp value be the empty string.

    Next, we convert the regexp or wildcard token into a regular expression.

  7. If regexp or wildcard token is null, then set regexp value to parser’s segment wildcard regexp.

  8. Else if regexp or wildcard token’s type is "asterisk", then set regexp value to the full wildcard regexp value.

  9. Else set regexp value to regexp or wildcard token’s value.

  10. Let type be "regexp".

    Next, we convert regexp value into a part type. We make sure to go to a regular expression first so that an equivalent "regexp" token will be treated the same as a "name" or "asterisk" token.

  11. If regexp value is parser’s segment wildcard regexp:

    1. Set type to "segment-wildcard".

    2. Set regexp value to the empty string.

  12. Else if regexp value is the full wildcard regexp value:

    1. Set type to "full-wildcard".

    2. Set regexp value to the empty string.

  13. Let name be the empty string.

    Next, we determine the part name. This can be explicitly provided by a "name" token or be automatically assigned.

  14. If name token is not null, then set name to name token’s value.

  15. Else if regexp or wildcard token is not null:

    1. Set name to parser’s next numeric name.

    2. Increment parser’s next numeric name by 1.

  16. Let encoded prefix be the result of running parser’s encoding callback given prefix.

    Finally, we encode the fixed text values and create the part.

  17. Let encoded suffix be the result of running parser’s encoding callback given suffix.

  18. Let part be a new part whose type is type, value is regexp value, modifier is modifier, name is name, prefix is encoded prefix, and suffix is encoded suffix.

  19. Append part to parser’s part list.

2.2. Converting Part Lists to Regular Expressions

To generate a regular expression and name list from a given part list part list and options options:
  1. Let result be "^".

  2. Let name list be a new list.

  3. For each part of part list:

    1. If part’s type is "fixed-text":

      1. If part’s modifier is "none", then append the result of running escape a regexp string given part’s value to the end of result.

      2. Else:

        A "fixed-text" part with a modifier uses a non capturing group. It uses the following form.

        (?:<fixed text>)<modifier>

        1. Append "(?:" to the end of result.

        2. Append the result of running escape a regexp string given part’s value to the end of result.

        3. Append ")" to the end of result.

        4. Append the result of running convert a modifier to a string give part’s modifier to the end of result.

      3. Continue.

    2. Assert: part’s name is not the empty string and is not null.

    3. Append part’s name to name list.

      We collect the list of matching group names in a parallel list. This is largely done for legacy reasons to match path-to-regexp. We could attempt to convert this to use regular expression named captured groups, but given the complexity of this algorithm there is a real risk of introducing unintended bugs. In addition, if we ever end up exposing the generated regular expressions to the web we would like to maintain compability with path-to-regexp which has indicated its unlikely to switch to using named capture groups.

    4. Let regexp value be part’s value.

    5. If part’s type is "segment-wildcard", then set regexp value to the result of running generate a segment wildcard regexp give options.

    6. Else if part’s type is "full-wildcard", then set regexp value to full wildcard regexp value.

    7. If part’s prefix is the empty string and part’s suffix is the empty string:

      If there is no prefix or suffix then we generate a simple capturing group. It uses the following form.

      (<regexp value>)<modifier>

      1. Append "(" to the end of result.

      2. Append regexp value to the end of result.

      3. Append ")" to the end of result.

      4. Append the result of running convert a modifier to a string give part’s modifier to the end of result.

      5. Continue.

    8. If part’s modifier is "none" or "optional":

      This section handles non-repeating parts with a prefix and/or suffix. There is an inner capturing group that contains the primary regexp value. The inner group is then combined with the prefix and/or suffix in an outer non-capturing group. Finally the modifier is applied. The resulting form is as follows.

      (?:<prefix>(<regexp value>)<suffix>)<modifier>

      1. Append "(?:" to the end of result.

      2. Append the result of running escape a regexp string given part’s prefix to the end of result.

      3. Append "(" to the end of result.

      4. Append regexp value to the end of result.

      5. Append ")" to the end of result.

      6. Append the result of running escape a regexp string given part’s suffix to the end of result.

      7. Append ")" to the end of result.

      8. Append the result of running convert a modifier to a string give part’s modifier to the end of result.

      9. Continue.

    9. Assert: part’s modifier is "zero-or-more" or "one-or-more".

    10. Assert: part’s prefix is not the empty string or part’s suffix is not the empty string.

      Repeating parts with a prefix and/or suffix are dramatically more complicated. We want to exclude the initial prefix and the final suffix, but include them between any repeated elements. To achieve this we provide a separate initial expression that excludes the prefix. Then the expression is duplicated with the prefix/suffix values included in an optional repeating element. If zero values are permitted then a final optional modifier may be appended. The resulting form is as follows.

      (?:<prefix>((?:<regexp value>)(?:<suffix><prefix>(?:<regexp value>))*)<suffix>)?

    11. Append "(?:" to the end of result.

    12. Append the result of running escape a regexp string given part’s prefix to the end of result.

    13. Append "((?:" to the end of result.

    14. Append regexp value to the end of result.

    15. Append ")(?:" to the end of result.

    16. Append the result of running escape a regexp string given part’s suffix to the end of result.

    17. Append the result of running escape a regexp string given part’s prefix to the end of result.

    18. Append "(?:" to the end of result.

    19. Append regexp value to the end of result.

    20. Append "))*)" to the end of result.

    21. Append the result of running escape a regexp string given part’s suffix to the end of result.

    22. Append ")" to the end of result.

    23. If part’s modifier is "zero-or-more" then append "?" to the end of result.

  4. Append "$" to the end of result.

  5. Return (result, name list).

To escape a regexp string given a string input:
  1. Assert: input is an ASCII string.

  2. Let result be the empty string.

  3. Let index be 0.

  4. While index is less than input’s length:

    1. Let c be input[index].

    2. Increment index by 1.

    3. If c is one of ".", "+", "*", "?", "^", "$", "{", "}", "(", ")", "[", "]", "|", "/", or "\", then append "\" to the end of result.

    4. Append c to the end of result.

  5. Return result.

2.3. Converting Part Lists to Pattern Strings

To generate a pattern string from a given part list part list and options options:
  1. Let result be the empty string.

  2. For each part of part list:

    1. If part’s type is "fixed-text" then:

      1. If part’s modifier is "none" then:

        1. Append the result of running escape a pattern string given part’s value to the end of result.

        2. Continue.

      2. Append "{" to the end of result.

      3. Append the result of running escape a pattern string given part’s value to the end of result.

      4. Append "}" to the end of result.

      5. Append the result of running convert a modifier to a string given part’s modifier to the end of result.

      6. Continue.

    2. Let needs grouping be true if at least one of the following are true, otherwise let it be false:

    3. Assert: part’s name is not the empty string or null.

    4. Let custom name be true if part’s name[0] is not an ASCII digit.

    5. If needs grouping is true, then append "{" to the end of result.

    6. Append the result of running escape a pattern string given part’s prefix to the end of result.

    7. If custom name is true:

      1. Append ":" to the end of result.

      2. Append part’s name to the end of result.

    8. If part’s type is "regexp" then:

      1. Append "(" to the end of result.

      2. Append part’s value to the end of result.

      3. Append ")" to the end of result.

    9. Else if part’s type is "segment-wildcard" and custom name is false:

      1. Append "(" to the end of result.

      2. Append the result of running generate a segment wildcard regexp given options to the end of result.

      3. Append ")" to the end of result.

    10. Else if part’s type is "full-wildcard":

      1. If custom name is true:

        1. Append "(" to the end of result.

        2. Append full wildcard regexp value to the end of result.

        3. Append ")" to the end of result.

      2. Else append "*" to the end of result.

    11. Append the result of running escape a pattern string given part’s suffix to the end of result.

    12. If needs grouping is true, then append "}" to the end of result.

    13. Append the result of running convert a modifier to a string give part’s modifier to the end of result.

  3. Return result.

To escape a pattern string given a string input:
  1. Assert: input is an ASCII string.

  2. Let result be the empty string.

  3. Let index be 0.

  4. While index is less than input’s length:

    1. Let c be input[index].

    2. If c is one of "+", "*", "?", ":", "{", "}", "(", ")", or "\", then append "\" to the end of result.

    3. Append c to the end of result.

  5. Return result.

To convert a modifier to a string given a modifier modifier:
  1. If modifier is "zero-or-more", then return "*".

  2. If modifier is "optional", then return "?".

  3. If modifier is "one-or-more", then return "+".

  4. Return the empty string.

3. Canonicalization

3.1. Encoding Callbacks

To canonicalize a protocol given a string value:
  1. Let dummyURL be a new URL record.

  2. Let parseResult be the result of running the basic URL parser given value followed by "://dummy.test", with dummyURL as url.

    Note, state override is not used here because it enforces restrictions that are only appropriate for the protocol setter. Instead we use the protocol to parse a dummy URL using the normal parsing entry point.

  3. If parseResult is failure, then throw a TypeError.

  4. Return dummyURL’s scheme.

To canonicalize a username given a string value:
  1. Let dummyURL be a new URL record.

  2. Set the username given dummyURL and value.

  3. Return dummyURL’s username.

To canonicalize a password given a string value:
  1. Let dummyURL be a new URL record.

  2. Set the password given dummyURL and value.

  3. Return dummyURL’s password.

To canonicalize a hostname given a string value:
  1. Let dummyURL be a new URL record.

  2. Let parseResult be the result of running the basic URL parser given value with dummyURL as url and hostname state as state override.

  3. If parseResult is failure, then throw a TypeError.

  4. Return dummyURL’s host.

To canonicalize a port given a string portValue and string protocolValue:
  1. Let dummyURL be a new URL record.

  2. Set dummyURL’s scheme to protocolValue.

    Note, we set the URL record's scheme in order for the basic URL parser to properly recognize and normalize default port values.

  3. Let parseResult be the result of running basic URL parser given portValue with dummyURL as url and port state as state override.

  4. If parseResult is failure, then throw a TypeError.

  5. Return dummyURL’s port.

To canonicalize a standard pathname given a string value:
  1. Let dummyURL be a new URL record.

  2. Let parseResult be the result of running basic URL parser given value with dummyURL as url and path start state as state override.

  3. If parseResult is failure, then throw a TypeError.

  4. Return dummyURL’s API pathname string.

To canonicalize a cannot-be-a-base-URL pathname given a string value:
  1. Let dummyURL be a new URL record.

  2. Set dummyURL’s path[0] to empty string.

  3. Let parseResult be the result of running URL parsing given value with dummyURL as url and cannot be a base URL path state as state override.

  4. If parseResult is failure, then throw a TypeError.

  5. Return dummyURL’s API pathname string.

To canonicalize a search given a string value:
  1. Let dummyURL be a new URL record.

  2. Set dummyURL’s query to the empty string.

  3. Let parseResult be the result of running basic URL parser given value with dummyURL as url and query state as state override.

  4. If parseResult is failure, then throw a TypeError.

  5. Return dummyURL’s query.

To canonicalize a hash given a string value:
  1. Let dummyURL be a new URL record.

  2. Set dummyURL’s fragment to the empty string.

  3. Let parseResult be the result of running basic URL parser given value with dummyURL as url and fragment state as state override.

  4. If parseResult is failure, then throw a TypeError.

  5. Return dummyURL’s fragment.

3.2. URLPatternInit Processing

To process a URLPatternInit given a URLPatternInit init, a string or null type, a string or null protocol, a string or null username, a string or null password, a string or null hostname, a string or null port, a string or null pathname, a string or null search, and a string or null hash:
  1. Let result be the result of creating a new URLPatternInit.

  2. Set result["protocol"] to protocol.

  3. Set result["username"] to username.

  4. Set result["password"] to password.

  5. Set result["hostname"] to hostname.

  6. Set result["port"] to port.

  7. Set result["pathname"] to pathname.

  8. Set result["search"] to search.

  9. Set result["hash"] to hash.

  10. If init["baseURL"] is not null:

    1. Let baseURL be the result of parsing init["baseURL"].

    2. If baseURL is failure, then throw a TypeError.

    3. Set result["protocol"] to baseURL’s scheme.

    4. Set result["username"] to baseURL’s username.

    5. Set result["password"] to baseURL’s password.

    6. Set result["hostname"] to baseURL’s host or the empty string if the value is null.

    7. Set result["port"] to baseURL’s port or the empty string if the value is null.

    8. Set result["pathname"] to baseURL’s API pathname string.

    9. Set result["search"] to baseURL’s query or the empty string if the value is null.

    10. Set result["hash"] to baseURL’s fragment or the empty string if the value is null.

  11. If init["protocol"] is not null then set result["protocol"] to the result of process protocol for init given init["protocol"] and type.

  12. If init["username"] is not null then set result["username"] to the result of process username for init given init["username"] and type.

  13. If init["password"] is not null then set result["password"] to the result of process password for init given init["password"] and type.

  14. If init["hostname"] is not null then set result["hostname"] to the result of process hostname for init given init["hostname"] and type.

  15. If init["port"] is not null then set result["port"] to the result of process port for init given init["port"], result["protocol"], and type.

  16. If init["pathname"] is not null then set result["pathname"] to the result of process pathname for init given init["pathname"], result["protocol"], and type.

  17. If init["search"] is not null then set result["search"] to the result of process search for init given init["search"] and type.

  18. If init["hash"] is not null then set result["hash"] to the result of process hash for init given init["hash"] and type.

  19. Return result.

To process protocol for init given a string value and a string type:
  1. Let strippedValue be the given value with a single trailing U+003A (:) removed, if any.

  2. If type is "pattern" then return strippedValue.

  3. Return the result of running canonicalize a protocol given strippedValue.

To process username for init given a string value and a string type:
  1. If type is "pattern" then return value.

  2. Return the result of running canonicalize a username given value.

To process password for init given a string value and a string type:
  1. If type is "pattern" then return value.

  2. Return the result of running canonicalize a password given value.

To process hostname for init given a string value and a string type:
  1. If type is "pattern" then return value.

  2. Return the result of running canonicalize a hostname given value.

To process port for init given a string portValue, a string protocolValue, and a string type:
  1. If type is "pattern" then return portValue.

  2. Return the result of running canonicalize a port given portValue and protocolValue.

To process pathname for init given a string pathnameValue, a string protocolValue, and a string type:
  1. If type is "pattern" then return pathnameValue.

  2. If protocolValue is a special scheme, then return the result of running canonicalize a standard pathname given pathnameValue.

  3. Else return the result of running canonicalize a cannot-be-a-base-URL pathname given pathnameValue.

To process search for init given a string value and a string type:
  1. Let strippedValue be the given value with a single leading U+003F (?) removed, if any.

  2. If type is "pattern" then return strippedValue.

  3. Return the result of running canonicalize a search given strippedValue.

To process hash for init given a string value and a string type:
  1. Let strippedValue be the given value with a single leading U+0023 (#) removed, if any.

  2. If type is "pattern" then return strippedValue.

  3. Return the result of running canonicalize a hash given strippedValue.

4. Patching

This spec depends on factoring out the pathname getter steps into a new exported algorithm, API pathname string, that operates on a URL record.

Index

Terms defined by this specification

Terms defined by reference

References

Normative References

[ECMASCRIPT]
ECMAScript Language Specification. URL: https://tc39.es/ecma262/multipage/
[INFRA]
Anne van Kesteren; Domenic Denicola. Infra Standard. Living Standard. URL: https://infra.spec.whatwg.org/
[URL]
Anne van Kesteren. URL Standard. Living Standard. URL: https://url.spec.whatwg.org/
[WebIDL]
Boris Zbarsky. Web IDL. URL: https://heycam.github.io/webidl/

IDL Index

typedef (USVString or URLPatternInit) URLPatternInput;

[Exposed=(Window,Worker)]
interface URLPattern {
  constructor(URLPatternInput input, optional USVString baseURL);

  boolean test(URLPatternInput input, optional USVString baseURL);

  URLPatternResult? exec(URLPatternInput input, optional USVString baseURL);

  readonly attribute USVString protocol;
  readonly attribute USVString username;
  readonly attribute USVString password;
  readonly attribute USVString hostname;
  readonly attribute USVString port;
  readonly attribute USVString pathname;
  readonly attribute USVString search;
  readonly attribute USVString hash;
};

dictionary URLPatternInit {
  USVString protocol;
  USVString username;
  USVString password;
  USVString hostname;
  USVString port;
  USVString pathname;
  USVString search;
  USVString hash;
  USVString baseURL;
};

dictionary URLPatternResult {
  sequence<URLPatternInput> inputs;

  URLPatternComponentResult protocol;
  URLPatternComponentResult username;
  URLPatternComponentResult password;
  URLPatternComponentResult hostname;
  URLPatternComponentResult port;
  URLPatternComponentResult pathname;
  URLPatternComponentResult search;
  URLPatternComponentResult hash;
};

dictionary URLPatternComponentResult {
  USVString input;
  record<USVString, USVString> groups;
};