WebDriver BiDi

1. Introduction

This section is non-normative.

WebDriver defines a protocol for introspection and remote control of user agents. This specification extends WebDriver by introducing bidirectional communication. In place of the strict command/response format of WebDriver, this permits events to stream from the user agent to the controlling software, better matching the evented nature of the browser DOM.

2. Infrastructure

This specification depends on the Infra Standard. [INFRA]

Network protocol messages are defined using CDDL. [RFC8610]

Tasks are scheduled using the WebDriver task queue , which is a the result of starting a new parallel queue .

To queue a WebDriver task , consisting of a list of steps, steps, enqueue steps steps to the WebDriver task queue .

This specification defines a wait queue which is a map.

Surely there’s a better mechanism for doing this "wait for an event" thing.

When an algorithm algorithm ~~running in parallel~~ awaits a set of events events, and resume id:

Pause the execution of algorithm.
Assert: wait queue does not contain resume id.
Set wait queue [ resume id ] to ( events, algorithm ).

To resume given name, id and parameters:

If wait queue does not contain id, return.
Let ( events, algorithm ) be wait queue [ id ]
For each event in events:
1. If event equals name:
  1. Remove id from wait queue .
  2. Resume running the steps in algorithm from the point at which they were paused, passing name and parameters as the result of the await .
    
    Should we have something like microtasks to ensure this runs before any other tasks on the event loop?

3. Protocol

This section defines the basic concepts of the WebDriver BiDi protocol. These terms are distinct from their representation at the transport layer.

The protocol is defined using a CDDL definition. For the convenience of implementors two seperate CDDL definitions are defined; the remote end definition which defines the format of messages produced on the local end and consumed on the remote end , and the local end definition which defines the format of messages produced on the remote end and consumed on the local end

3.1. Definition

Should this be an appendix?

This section gives the initial contents of the remote end definition and local end definition . These are augmented by the definition fragments defined in the remainder of the specification.

Remote end definition

Command = {
  id: uint,
  CommandData,
  *text => any,
}
CommandData = (
  SessionCommand //
  BrowsingContextCommand
)
EmptyParams = { *text }

Local end definition

Message = (
  CommandResponse //
  ErrorResponse //
  Event
)
CommandResponse = {
  id: uint,
  result: ResultData,
  *text => any
}
ErrorResponse = {
  id: uint / null,
  error: "unknown error" / "unknown command" / "invalid argument",
  message: text,
  ?stacktrace: text,
  *text => any
}
ResultData = (
  EmptyResult //
  SessionResult //
  BrowsingContextResult //
  ScriptResult
)
EmptyResult = {}
Event = {
  EventData,
  *text => any
}
EventData = (
  BrowsingContextEvent //
  ScriptEvent
)

3.2. Session

WebDriver BiDi extends the session concept from WebDriver .

A session has a BiDi flag , which is false unless otherwise stated.

A BiDi session is a session which has the BiDi flag set to true.

The set of active BiDi sessions is given by:

Let BiDi sessions be a new set.
For each session in active sessions :
1. If session is a BiDi session append session to BiDi sessions.
Return BiDi sessions

3.3. Modules

The WebDriver BiDi protocol is organized into modules.

Each module represents a collection of related commands and events pertaining to a certain aspect of the user agent. For example, a module might contain functionality for inspecting and manipulating the DOM, or for script execution.

Each module has a module name which is a string. The command name and event name for commands and events defined in the module start with the module name followed by a period " . ".

Modules which contain commands define remote end definition fragments. These provide choices in the CommandData group for the module’s commands , and can also define additional definition properties. They can also define local end definition fragments that provide additional choices in the ResultData group for the results of commands in the module.

Modules which contain events define local end definition fragments that are choices in the Event group for the module’s events .

An implementation may define extension modules . These must have a module name that contains a single colon " : " character. The part before the colon is the prefix; this is typically the same for all extension modules specific to a given implementation and should be unique for a given implementation. Such modules extend the local end definition and remote end definition providing additional groups as choices for the defined commands and events .

3.4. Commands

A command is an asynchronous operation, requested by the local end and run on the remote end , resulting in either a result or an error being returned to the local end . Multiple commands can run at the same time, and commands can potentially be long-running. As a consequence, commands can finish out-of-order.

Each command is defined by:

A command type which is defined by a remote end definition fragment containing a group. Each such group has two fields:
- method which is a string literal of the form [module name].[method name]. This is the command name .
- params which defines a mapping containing data that to be passed into the command. The populated value of this map is the command parameters .
A result type , which is defined by a local end definition fragment.
A set of remote end steps which define the actions to take for a command given a [= BiDi session=] and command parameters and return an instance of the command return type .

A command that can run without an active session is a static command . Commands are not static commands unless stated in their definition.

When commands are send from the local end they have a command id. This is an identifier used by the local end to identify the response from a particular command. From the point of view of the remote end this identifier is opaque and cannot be used internally to identify the command.

Note: This is because the command id is entirely controlled by the local end and isn’t necessarily unique over the course of a session. For example a local end which ignores all responses could use the same command id for each command.

The set of all command names is a set containing all the defined command names , including any belonging to extension modules .

3.5. Events

An event is a notification, sent by the remote end to the local end , signaling that something of interest has occurred on the remote end .

An event type is defined by a local end definition fragment containing a group. Each such group has two fields:
- method which is a string literal of the form [module name].[event name]. This is the event name .
- params which defines a mapping containing event data. The populated value of this map is the event parameters.
A remote end event trigger which defines when the event is triggered and steps to construct the event type data.
Optionally, a set of remote end subscribe steps , which define steps to take when a local end subscribes to an event. Where defined these steps have an associated subscribe priority which is an integer controlling the order in which the steps are run when multiple events are enabled at once, with lower integers indicating steps that run earlier.

A BiDi session has a global event set which is a set containing the event names for events that are enabled for all browsing contexts. This initially contains the event name for events that are in the default event set .

A BiDi session has a browsing context event map , which is a map with top-level browsing context keys and values that are a set of event name s for events that are enabled in the given browsing context.

To obtain a list of event enabled browsing contexts given session and event name:

Let contexts be an empty set.
For each context → events of session ’s browsing context event map :
1. If events contains event name, append context to contexts
Return contexts.

The set of sessions for which an event is enabled given event name and browsing contexts is:

Let sessions be a new set.
For each session in active BiDI sessions :
1. If event is enabled with session, event name and browsing contexts, append session to sessions.
Return sessions

To determine if an event is enabled given session, event name and browsing contexts:

Note: browsing contexts is a set because a shared worker can be associated with multiple contexts.

Let top-level browsing contexts be an empty set.
For each browsing context of browsing contexts, append browsing context ’s top-level browsing context to top-level browsing contexts.
Let event map be the browsing context event map for session.
For each browsing context of top-level browsing contexts:
1. If event map contains browsing context, let browsing context events be event map [ browsing context ]. Otherwise let browsing context events be null.
2. If browsing context events is not null, and browsing context events contains event name, return true.
If the global event set for session contains event name return true.
Return false.

To obtain a set of event names given an name:

Let events be an empty set.
If name contains a U+002E (period):
1. If name is the event name for an event, append name to events and return success with data events.
2. Return an error with error code Invalid Argument
Otherwise name is interpreted as representing all the events in a module. If name is not a module name return an error with error code Invalid Argument .
Append the event name for each event in the module with name name to events.
Return success with data events.

4. Transport

Message transport is provided using the WebSocket protocol. [RFC6455]

Note: In the terms of the WebSocket protocol, the local end is the client and the remote end is the server / remote host.

Note: The encoding of commands and events as messages is similar to JSON-RPC, but this specification does not normatively reference it. [JSON-RPC] The normative requirements on remote ends are instead given as a precise processing model, while no normative requirements are given for local ends .

A WebSocket listener is a network endpoint that is able to accept incoming WebSocket connections.

A WebSocket listener has a host , a port , a secure flag , and a list of WebSocket resources .

When a WebSocket listener listener is created, a remote end must start to listen for WebSocket connections on the host and port given by listener ’s host and port . If listener ’s secure flag is set, then connections established from listener must be TLS encrypted.

A remote end has a set of WebSocket listeners active listeners , which is initially empty.

A remote end has a set of WebSocket connections not associated with a session , which is initially empty.

A WebSocket connection is a network connection that follows the requirements of the WebSocket protocol

A BiDi session has a set of session WebSocket connections whose elements are WebSocket connections . This is initially empty.

A BiDi session session is associated with connection connection if session ’s session WebSocket connections contains connection.

Note: Each WebSocket connection is associated with at most one BiDi session .

When a client establishes a WebSocket connection connection by connecting to one of the set of active listeners listener, the implementation must proceed according to the WebSocket server-side requirements , with the following steps run when deciding whether to accept the incoming connection:

Let resource name be the resource name from reading the client’s opening handshake . If resource name is not in listener ’s list of WebSocket resources , then stop running these steps and act as if the requested service is not available.
If resource name is the byte string " /session ", and the implementation supports BiDi-only sessions :
1. Run any other implementation-defined steps to decide if the connection should be accepted, and if it is not stop running these steps and act as if the requested service is not available.
2. Add the connection to the set of WebSocket connections not associated with a session .
3. Return.
Get a session ID for a WebSocket resource with resource name and let session id be that value. If session id is null then stop running these steps and act as if the requested service is not available.
If there is a session in the list of active sessions with session id as its session ID then let session be that session. Otherwise stop running these steps and act as if the requested service is not available.
Run any other implementation-defined steps to decide if the connection should be accepted, and if it is not stop running these steps and act as if the requested service is not available.
Otherwise append connection to session ’s session WebSocket connections , and proceed with the WebSocket server-side requirements when a server chooses to accept an incoming connection.

~~Do we support > 1 connection for a single session?~~

When a WebSocket message has been received for a WebSocket connection connection with type type and data data, a remote end must handle an incoming message given connection, type and data.

When the WebSocket closing handshake is started or when the WebSocket connection is closed for a WebSocket connection connection, a remote end must handle a connection closing given connection.

Note: Both conditions are needed because it is possible for a WebSocket connection to be closed without a closing handshake.

To construct a WebSocket resource name given a session session:

If session is null, return " /session "
Return the result of concatenating the string " /session/ " with session ’s session ID .

To construct a WebSocket URL given a WebSocket listener listener and session session:

Let resource name be the result of constructing a WebSocket resource name given session.
Return a WebSocket URI constructed with host set to listener ’s host , port set to listener ’s port , path set to resource name, following the wss-URI construct if listener ’s secure flag is set and the ws-URL construct otherwise.

To get a session ID for a WebSocket resource given resource name:

If resource name doesn’t begin with the byte string " /session/ ", return null.
Let session id be the bytes in resource name following the " /session/ " prefix.
If session id is not the string representation of a UUID , return null.
Return session id.

To start listening for a WebSocket connection given a session session:

If there is an existing WebSocket listener in the set of active listeners which the remote end would like to reuse, let listener be that listener. Otherwise let listener be a new WebSocket listener with implementation-defined host , port , secure flag , and an empty list of WebSocket resources .
Let resource name be the result of constructing a WebSocket resource name given session.
Append resource name to the list of WebSocket resources for listener.
Append listener to the remote end 's active listeners .
Return listener.

Note: An intermediary node handling multiple sessions can use one or many WebSocket listeners. WebDriver defines that an endpoint node supports at most one session at a time, so it’s expected to only have a single listener.

Note: For an endpoint node the host in the above steps will typically be " localhost ".

To handle an incoming message given a WebSocket connection connection, type type and data data:

If type is not text , respond with an error given connection, null, and invalid argument , and finally return.
Assert : data is a scalar value string , because the WebSocket handling errors in UTF-8-encoded data would already have failed the WebSocket connection otherwise.

Nothing seems to define what status code is used for UTF-8 errors.
If there is a BiDi Session associated with connection connection, let session be that session. Otherwise if connection is in the set of WebSocket connections not associated with a session , let session be null. Otherwise, return.
Let parsed be the result of parsing JSON into Infra values given data. If this throws an exception, then respond with an error given connection, null, and invalid argument , and finally return.
Match parsed against the remote end definition . If this results in a match:
1. Let matched be the map representing the matched data.
2. Assert: matched contains " id ", " method ", and " params ".
3. Let command id be matched [" id "].
4. Let method be matched [" method "]
  1. Let command be the command with command name method.
  2. If session is null and command is not a static command , then respond with an error given connection, command id, and invalid session id , and return.
  3. ~~Run~~ Queue a WebDriver task to run the ~~following steps in parallel:~~ following:
    1. Let result be the result of running the remote end steps for command given session and command parameters matched [" params "]
  4. If result is an error , then respond with an error given connection, command id, and result ’s error code , and finally return.
  5. Let value be result ’s data.
  6. Assert: value matches the definition for the result type corresponding to the command with command name method.
  7. If method is " session.new ", let session be the entry in the list of active sessions whose session ID is equal to the " sessionId " property of value, let session ’s WebSocket connection be connection, and remove connection from the set of WebSocket connections not associated with a session .
  8. Let response be a new map matching the CommandResponse production in the local end definition with the id field set to command id and the value field set to value.
  9. Let serialized be the result of serialize an infra value to JSON bytes given response.
  10. Send a WebSocket message comprised of serialized over connection.
Otherwise:
1. Let command id be null.
2. If parsed is a map and parsed [" id "] exists and is an integer greater than or equal to zero, set command id to that integer.
3. Let error code be invalid argument .
4. If parsed is a map and parsed [" method "] exists and is a string, but parsed [" method "] is not in the set of all command names , set error code to unknown command .
5. Respond with an error given connection, command id, and error code.

To get related browsing contexts given an settings object settings:

Let related browsing contexts be an empty set
If the responsible document of settings is a Document , append the responsible document 's browsing context to related browsing contexts.

Otherwise if the global object specified by settings is a WorkerGlobalScope, for each owner in the global object 's owner set , if owner is a Document , append owner ’s browsing context to related browsing contexts.
Return related browsing contexts.

To emit an event given session, and body:

Assert : body has size 2 and contains " method " and " params ".
Let connection be session ’s WebSocket connection .
If connection is null, return.
Let serialized be the result of serialize an infra value to JSON bytes given body.
Send a WebSocket message comprised of serialized over connection.

To respond with an error given a WebSocket connection connection, command id, and error code:

Let error data be a new map matching the ErrorResponse production in the local end definition , with the id field set to command id, the error field set to error code, the message field set to an implementation-defined string containing a human-readable definition of the error that occurred and the stacktrace field optionally set to an implementation-defined string containing a stack trace report of the active stack frames at the time when the error occurred.
Let response be the result of serialize an infra value to JSON bytes given error data.

Note: command id can be null, in which case the id field will also be set to null, not omitted from response.
Send a WebSocket message comprised of response over connection.

To handle a connection closing given a WebSocket connection connection:

If there is a BiDi session associated with connection connection:
1. Let session be the BiDi session associated with connection connection.
2. Remove connection from session ’s session WebSocket connections .
Otherwise, if the set of WebSocket connections not associated with a session contains connection, remove connection from that set.

Note: This does not end any session .

Need to hook in to the session ending to allow the UA to close the listener if it wants.

4.1. Establishing a Connection

WebDriver clients opt in to a bidirectional connection by requesting a capability with the name " webSocketUrl " and value true.

This specification defines an additional webdriver capability with the capability name " webSocketUrl ".

The additional capability deserialization algorithm for the "


webSocketUrl

" capability, with parameter value is:

If value is not a boolean, return error with code invalid argument .
Return success with data value.

The matched capability serialization algorithm for the "


webSocketUrl

" capability, with parameter value is:

If value is false, return success with data null.
Return success with data true.

The WebDriver new session algorithm defined by this specification, with parameters session, capabilities, and flags is:

If flags contains " bidi ", return.
Let webSocketUrl be the result of getting a property named " webSocketUrl " from capabilities.
If webSocketUrl is undefined or false, return.
Assert : webSocketUrl is true.
Let listener be the result of start listening for a WebSocket connection given session.
Set webSocketUrl to the result of constructing a WebSocket URL given listener and session.
Set a property on capabilities named " webSocketUrl " to webSocketUrl.
Set session ’s BiDi flag to true.
Append " bidi " to flags.

Implementations should also allow clients to establish a BiDi Session which is not a HTTP Session . In this case the URL to the WebSocket server is communicated out-of-band. An implementation that allows this supports BiDi-only sessions . At the time such an implementation is ready to accept requests to start a WebDriver session, it must:

Start listening for a WebSocket connection given null.

5. Common Data Types

5.1. Remote Value

Values accessible from the ECMAScript runtime are represented by a mirror object, specified as RemoteValue. The value’s type is specified in the type property. In the case of JSON-representable primitive values, this contains the value in the value property; in the case of non-JSON-representable primitives, the value property contains a string representation of the value. For non-primitive objects, the objectId property contains a string id that provides a unique handle to the object, valid for its lifetime inside the engine. For some non-primitive types, the value property contains a representation of the data in the ECMAScript object; for container types this can contain further RemoteValue instances. The value property can be null if there is a duplicate object i.e. the object has already been serialized in the current RemoteValue, perhaps as part of a cycle, or otherwise when the maximum serialization depth is reached.

Nodes are also represented by RemoteValue instances. These have a partial serialization of the node in the value property.

Note: mirror objects do not keep the original object alive in the runtime. If an object is discarded in the runtime, subsequent attempts to access it via the protocol will result in an error.

A BiDi session has an object id map . This is a weak map from objects to their corresponding id.

Should this be explicitly per realm?

To get the object id for an object given a session and object:

If session ’s object id map does not contain object, run the following steps:
1. Let object id be a new, unique, string identifier for object. If object is an element this must be the web element reference for object ; if it’s a WindowProxy object, this must be the window handle for object.
2. Set the value of object in session ’s object id map to object id.
Return the result of getting the value for object in session ’s object id map .

remote end definition and local end definition

RemoteValue = {
  UndefinedValue //
  NullValue //
  StringValue //
  NumberValue //
  BooleanValue //
  BigIntValue //
  SymbolValue //
  ArrayValue //
  ObjectValue //
  FunctionValue //
  RegExpValue //
  DateValue //
  MapValue //
  SetValue //
  WeakMapValue //
  WeakSetValue //
  IteratorValue //
  GeneratorValue //
  ErrorValue //
  ProxyValue //
  PromiseValue //
  TypedArrayValue //
  ArrayBufferValue //
  NodeValue //
  WindowProxyValue //
}
ObjectId = text;
ListValue = [*RemoteValue];
MappingValue = [*[(RemoteValue / text), RemoteValue]];
UndefinedValue = {
  type: "undefined",
}
NullValue = {
  type: "null",
}
StringValue = {
  type: "string",
  value: text,
}
SpecialNumber = "NaN" / "-0" / "+Infinity" / "-Infinity";
NumberValue = {
  type: "number",
  value: number / SpecialNumber,
}
BooleanValue = {
  type: "boolean",
  value: bool,
}
BigIntValue = {
  type: "bigint",
  value: text,
}
SymbolValue = {
  type: "symbol",
  objectId: ObjectId,
}
ArrayValue = {
  type: "array",
  objectId: ObjectId,
  value?: ListValue,
}
ObjectValue = {
  type: "object",
  objectId: ObjectId,
  value?: MappingValue,
}
FunctionValue = {
  type: "function",
  objectId: ObjectId,
}
RegExpValue = {
  type: "regexp",
  objectId: ObjectId,
  value: text
}
DateValue = {
  type: "date",
  objectId: ObjectId,
  value: text
}
MapValue = {
  type: "map",
  objectId: ObjectId,
  value?: MappingValue,
}
SetValue = {
  type: "set",
  objectId: ObjectId,
  value?: ListValue
}
WeakMapValue = {
  type: "weakmap",
  objectId: ObjectId,
}
WeakSetValue = {
  type: "weakset",
  objectId: ObjectId,
}
ErrorValue = {
  type: "error",
  objectId: ObjectId,
}
PromiseValue = {
  type: "promise",
  objectId: ObjectId,
}
TypedArrayValue = {
  type: "typedarray",
  objectId: ObjectId,
}
ArrayBufferValue = {
  type: "arraybuffer",
  objectId: ObjectId,
}
NodeValue = {
  type: "node",
  objectId: ObjectId,
  value?: NodeProperties,
}
NodeProperties = {
  nodeType: uint,
  nodeValue: text,
  localName?: text,
  namespaceURI?: text,
  childNodeCount: uint,
  children?: [*NodeValue],
  attributes?: {*text => text},
  shadowRoot?: NodeValue / null,
}
WindowProxyValue = {
  type: "window",
  objectId: ObjectId,
}

Add WASM types?

Should WindowProxy get attributes in a similar style to Node?

handle String / Number / etc. wrapper objects specially?

To serialize as a remote value given an value, a max depth, node details, and a set of known objects:

In the following list of conditions and associated steps, run the first set of steps for which the associated condition is true:
Type ( value ) is Undefined
Let remote value be a map matching the UndefinedValue production in the local end definition .
Type ( value ) is Null
Let remote value be a map matching the NullValue production in the local end definition .
Type ( value ) is String
Let remote value be a map matching the StringValue production in the local end definition , with the value property set to value.
This doesn’t handle lone surrogates

Type ( value ) is Number
1. Switch on the value of value:
  
  NaN
  Let serialized be "NaN"
  -0
  Let serialized be "-0"
  +Infinity
  Let serialized be "+Infinity"
  -Infinity
  Let serialized be "-Infinity"
  Otherwise:
  Let serialized be value
2. Let remote value be a map matching the NumberValue production in the local end definition , with the value property set to serialized.
Type ( value ) is Boolean
Let remote value be a map matching the BooleanValue production in the local end definition , with the value property set to value.
Type ( value ) is BigInt
Let remote value be a map matching the BigIntValue production in the local end definition , with the value property set to the result of running the ToString operation on value.
Type ( value ) is Symbol
Let remote value be a map matching the SymbolValue production in the local end definition , with the objectId property set to the object id for an object value.
IsArray ( value )
1. Let serialized be null.
2. If value is not in the set of known objects, and max depth is not null and greater than 0, run the following steps:
  1. Append value to the set of known objects
  2. Let serialized be the result of serialize as a list given CreateArrayIterator ( value, value), max depth, node details and set of known objects.
3. Let remote value be a map matching the ArrayValue production in the local end definition , with the objectId property set to the object id for an object value, and the value field set to serialized if it’s not null, or ommitted otherwise.
IsRegExp ( value )
1. Let pattern be ToString ( Get ( value, "source")).
2. Let flags be ToString ( Get ( value, "flags")).
3. Let serialized be the string-concatenation of "/", pattern, "/", and flags.
4. Let remote value be a map matching the RegExpValue production in the local end definition , with the objectId property set to the object id for an object object and the value set to serialized
value has a [[DateValue]] internal slot .
1. Let serialized be ToDateString ( thisTimeValue ( value )).
2. Let remote value be a map matching the DateValue production in the local end definition , with the objectId property set to the object id for an object object and the value set to serialized.
value has a [[MapData]] internal slot
1. Let serialized be null.
2. If value is not in the set of known objects, and max depth is not null and greater than 0, run the following steps:
  1. Append value to the set of known objects
  2. Let serialized be the result of serialize as a mapping given CreateMapIterator ( value, key+value), max depth, node details and set of known objects.
1. Let remote value be a map matching the MapValue production in the local end definition , with the objectId property set to the object id for an object value, and the value field set to serialized if it’s not null, or ommitted otherwise.
value has a [[SetData]] internal slot
1. Let serialized be null.
2. If value is not in the set of known objects, and max depth is not null and greater than 0, run the following steps:
  1. Append value to the set of known objects
  2. Let serialized be the result of serialize as a list given CreateSetIterator ( value, value), max depth, node details and set of known objects.
1. Let remote value be a map matching the SetValue production in the local end definition , with the objectId property set to the object id for an object value, and the value field set to serialized if it’s not null, or ommitted otherwise.
value has a [[WeakMapData]] internal slot
Let remote value be a map matching the WeakMapValue production in the local end definition , with the objectId property set to the object id for an object value.
value has a [[WeakSetData]] internal slot
Let remote value be a map matching the WeakSetValue production in the local end definition , with the objectId property set to the object id for an object value.
value has an [[ErrorData]] internal slot
Let remote value be a map matching the ErrorValue production in the local end definition , with the objectId property set to the object id for an object value.
IsPromise ( value )
Let remote value be a map matching the PromiseValue production in the local end definition , with the objectId property set to the object id for an object value.
value has a [[TypedArrayName]] internal slot
Let remote value be a map matching the TypedArrayValue production in the local end definition , with the objectId property set to the object id for an object value.
value has an [[ArrayBufferData]] internal slot
Let remote value be a map matching the ArrayBufferValue production in the local end definition , with the objectId property set to the object id for an object value.
value is a platform object that implements Node
1. Let serialized be null.
2. If node details is true, run the following steps:
  1. Let serialized be a map.
  2. Set serialized [" nodeType "] to Get ( value, "nodeType").
  3. Set serialized [" nodeValue "] to Get ( value, "nodeValue")
  4. If value is an Element or an Attribute :
    
    Set serialized [" localName "] to Get ( value, "localName").
    
    Set serialized [" namespaceURI "] to Get ( value, "namespaceURI")
  5. Let child node count be the size of serialized ’s children .
  6. Set serialized [" childNodeCount "] to child node count.
  7. If max depth is equal to 0 let children be null. Otherwise, let children be an empty list and, for each node child in the children of value:
    
    Let child depth be max depth - 1 if max depth is not null, or null otherwise.
    
    Let serialized be the result of serialize as a remote value with child, child depth, node details and set of known objects.
    
    Append serialized to children.
  8. Set serialized [" children "] to children.
  9. If value is an Element :
    
    Let attributes be a new map.
    
    For each attribute in value ’s attribute list :
    
    Let name be attribute ’s qualified name
    
    Let value be attribute ’s value .
    
    Set attributes [ name ] to value
    
    Set serialized [" attributes "] to attributes.
    
    Let shadow root be value ’s shadow root .
    
    If shadow root is null, let serialized shadow be null. Otherwise run the following substeps:
    
    Let child depth be max depth - 1 if max depth is not null, or null otherwise.
    
    Let serialized shadow be the result of serialize as a remote value with shadow root, child depth, false and set of known objects.
    
    Note: this means the objectId for the shadow root will be serialized irrespective of whether the shadow is open or closed, but no properties of the node will be returned.
    
    Set serialized [" shadowRoot "] to serialized shadow.
3. Let remote value be a map matching the NodeValue production in the local end definition , with the objectId property set to the object id for an object value, and value set to serialized, if serialized is not null.
value is a platform object that implements WindowProxy
1. Let remote value be a map matching the WindowProxyValue production in the local end definition , with the objectId property set to the object id for an object value.
value is a platform object
1. Let remote value be a map matching the ObjectValue production in the local end definition , with the objectId property set to the object id for an object value.
IsCallable ( value )
Let remote value be a map matching the FunctionValue production in the local end definition , with the objectId property set to the object id for an object value.
Otherwise:
1. Assert : type ( value ) is Object
2. Let serialized be null.
3. If value is not in the set of known objects, and max depth is greater than 0, run the following steps:
  1. Append value to the set of known objects
  2. Let serialized be the result of serialize as a mapping given EnumerableOwnPropertyNames ( value, key+value), max depth, node details and set of known objects
4. Let remote value be a map matching the ObjectValue production in the local end definition , with the objectId property set to the object id for an object value, and the value field set to serialized.
Return remote value

Does it make sense to use the same depth parameter for nodes and objects in general?

To serialize as a list given iterable, max depth, node details and set of known objects:

Let serialized be a new list.
For each child value in iterable:
1. Let child depth be max depth - 1 if max depth is not null, or null otherwise.
2. Let serialized child be the result of serialize as a remote value with arguments child value, child depth, node details and set of known objects.
3. Append serialized child to serialized.
Return serialized

this assumes for-in works on iterators

To serialize as a mapping given iterable, max depth, node details and set of known objects:

Let serialized be a new list.
For item in iterable:
1. Assert: IsArray ( item )
2. Let property be CreateListFromArrayLike ( item )
3. Assert: property is a list of size 2
4. Let key be property [0] and let value be property [1]
5. Let child depth be max depth - 1 if max depth is not null, or null otherwise.
6. If Type ( key ) is String, let serialized key be child key, otherwise let serialized key be the result of serialize as a remote value with arguments child key, child depth, node details and set of known objects.
7. Let serialized value be the result of serialize as a remote value with arguments value, child depth, node details and set of known objects.
8. Let serialized child be (« serialized key, serialized value »).
9. Append serialized child to serialized.
Return serialized

6. Modules

6.1. The session Module

The session module contains commands and events for monitoring the status of the remote end.

6.1.1. Definition

remote end definition

SessionCommand = (SessionStatusCommand //
                  SessionSubscribeCommand)

local end definition

SessionResult = (StatusResult)

To update the event map , given session, requested event names, browsing contexts, and enabled:

Note: The return value of this algorithm is a map between event names and contexts. When the events are being enabled, the contexts in the return value are those for which the event are now enabled but were not previously. When events are disabled, the return value is always empty.

Let global event set be a clone of the global event set for session.
Let event map be a new map.
For each key → value of the browsing context event map for session:
1. Set event map [ key ] to a clone of value.
Let event names be an empty set.
For each entry name in requested event names, let event names be the union of event names and the result of trying to obtain a set of event names with name.
Let enabled events be a new map.
If browsing contexts is null:
1. If enabled is true:
  1. For each event name of event names:
    1. If global event set doesn’t contain event name:
      1. Let already enabled contexts be the event enabled browsing contexts given session and event name
      2. Add event name to global event set.
      3. For each context of already enabled contexts, remove event name from event map [ context ].
      4. Let newly enabled contexts be a list of all top-level browsing contexts that are not contained in already enabled contexts,
      5. Set enabled events [ event name ] to newly enabled contexts.
2. If enabled is false:
  1. For each event name in event names:
    1. If global event set contains event name, remove event name from global event set. Otherwise return error with error code invalid argument .
Otherwise, if browsing contexts is not null:
1. Let targets be an empty map.
2. For each context id in browsing contexts:
  1. Let context be the result of trying to get a browsing context with context id.
  2. Let top-level context be the top-level browsing context for context.
  3. If event map does not contain top-level context, set event map [ top-level context ] to a new set.
  4. Set targets [ top-level context ] to event map [ top-level context ].
3. For each event name in event names:
  1. If enabled is true and global event set contains event name, continue.
  2. For each context → target in targets:
    1. If enabled is true and target does not contain event name:
      1. Add event name to target.
      2. If enabled events does not contain event name, set enabled events [ event name ] to a new set.
      3. Append context to enabled events [ event name ].
    2. If enabled is false:
      1. If target contains event name, remove event name from target. Otherwise return error with error code invalid argument .
Set the global event set for session to global event set.
Set the browsing context event map for session to event map.
Return success with data enabled events.

Note: Implementations that do additional work when an event is enabled, e.g. subscribing to the relevant engine-internal events, will likely perform those additional steps when updating the event map. This specification uses a model where hooks are always called and then the event map is used to filter only those that ought to be returned to the local end.

6.1.2. Types

6.1.2.1. The session.CapabilitiesRequest Type

remote end definition and local end definition

Capabilities = {
    ?acceptInsecureCertificates: bool,
    ?browserName: text,
    ?browserVersion: text,
    ?platformName: text,
    ?proxy: {
        ?proxyType: "pac" / "direct" / "autodetect" / "system" / "manual",
        ?proxyAutoconfigUrl: text,
        ?ftpProxy: text,
        ?httpProxy: text,
        ?noProxy: [*text],
        ?sslProxy: text,
        ?socksProxy: text,
        ?socksVersion: int,
    },
    *text => any
};

The CapabilitiesRequest type represents the capabilities requested for a session.

6.1.3. Commands

6.1.3.1. The session.status Command

The session.status command returns information about whether a remote end is in a state in which it can create new sessions, but may additionally include arbitrary meta information that is specific to the implementation.

This is a static command .

Command Type

SessionStatusCommand = {
  method: "session.status",
  params: EmptyParams,
}

Return Type

SessionStatusResult = {
  ready: bool,
  message: text,
}

The remote end steps given session, and command parameters are:

Let body be a new map with the following properties:

"ready"
The remote end ’s readiness state.
"message"
An implementation-defined string explaining the remote end ’s readiness state.
Return success with data body

6.1.3.2. The session.new Command

The session.new command allows creating a new BiDi session .

Note: A session created this way will not be accessible via HTTP.

This is a static command .

Command Type

SessionNewCommand = {
  method: "session.new",
  params: {capabilities: CapabilitiesRequestParameters},
}
CapabilitiesRequestParameters = {
  ?alwaysMatch: CapabilitiesRequest,
  ?firstMatch: [*CapabilitiesRequest]
}

Return Type

SessionNewResult = {
  sessionId: text,
  capabilities: {
    acceptInsecureCertificates: bool,
    browserName: text,
    browserVersion: text,
    platformName: text,
    proxy: {
      ?proxyType: "pac" / "direct" / "autodetect" / "system" / "manual",
      ?proxyAutoconfigUrl: text,
      ?ftpProxy: text,
      ?httpProxy: text,
      ?noProxy: [*text],
      ?sslProxy: text,
      ?socksProxy: text,
      ?socksVersion: int,
    },
    setWindowRect: bool,
    *text => any
  }
}

The remote end steps given session and command parameters are:

If session is not null, return an error with error code session not created .
If the implementation is unable to start a new session for any reason, return an error with error code session not created .
Let flags be a set containing " bidi ".
Let capabilities be the result of trying to process capabilities with command parameters and flags.
Let session be the result of trying to create a session with capabilities and flags.
Set session ’s BiDi flag to true.

Note: the connection for this session will be set to the current connection by the caller.
Let body be a new map matching the SessionNewResult production, with the sessionId field set to session ’s session ID , and the capabilities field set to capabilities.
Return success with data body.

The session.subscribe command enables certain events either globally or for a set of browsing contexts

This needs to be generalized to work with realms too

Command Type

SessionSubscribeCommand = {
  method: "session.subscribe",
  params: SubscribeParameters
}
SessionSubscribeParameters = {
  events: [*text],
  ?contexts: [*BrowsingContext],
}

Return Type

EmptyResult

The remote end steps with session and command parameters are:

Let the list of event names be the value of the events field of command parameters
Let the list of contexts be the value of the contexts field of command parameters if it is present or null if it isn’t.
Let enabled events be the result of trying to update the event map with session, list of event names , list of contexts and enabled true.
Let subscribe step events be a new map.
For each event name → contexts in enabled events:
1. If the event with event name event name defines remote end subscribe steps , set subscribe step events [ event name ] to contexts.
Sort in ascending order subscribe step events using the following less than algorithm given two entries with keys event name one and event name two:
1. Let event one be the event with name event name one
2. Let event two be the event with name event name two
3. Return true if event one ’s subscribe priority is less than event two ’s susbscribe priority, or false otherwise.
If list of contexts is null, let include global be true, otherwise let include global be false.
For each event name → contexts in subscribe step events:
1. Run the remote end subscribe steps for the event with event name event name given session, contexts and include global.
Return success with data null.

6.1.3.4. The session.unsubscribe Command

The session.unsubscribe command disables events either globally or for a set of browsing contexts

This needs to be generalised to work with realms too

Command Type

SessionUnsubscribeCommand = {
  method: "session.unsubscribe",
  params: SubscribeParameters
}

Return Type

EmptyResult

The remote end steps with session and command parameters are:

Let the list of event names be the value of the events field of command parameters.
Let the list of contexts be the value of the contexts field of command parameters if it is present or null if it isn’t.
Try to update the event map with session, list of event names, list of contexts and enabled false.
Return success with data null.

6.2. The browsingContext Module

The browsingContext module contains commands and events relating to browsing contexts.

The progress of navigation is communicated using an immutable WebDriver navigation status struct, which has the following items:

id: The navigation id for the navigation, or null when the navigation is canceled before making progress.
status: A status code that is either " canceled ", " pending ", or " complete ".
url: The URL which is being loaded in the navigation

6.2.1. Definition

remote end definition

BrowsingContextCommand = (
    BrowsingContextGetTreeCommand //
    BrowsingContextNavigateCommand //
    BrowsingContextReloadCommand
)

local end definition

BrowsingContextResult = (
    BrowsingContextGetTreeResult //
    BrowsingContextNavigateResult
)
BrowsingContextEvent = (
    BrowsingContextCreatedEvent //
    BrowsingContextDestroyedEvent //
    BrowsingContextNavigationStartedEvent //
    BrowsingContextFragmentNavigatedEvent //
    BrowsingContextDomContentLoadedEvent //
    BrowsingContextLoadEvent //
    BrowsingContextDownloadWillBegin //
    BrowsingContextNavigationAbortedEvent //
    BrowsingContextNavigationFailedEvent
)

6.2.2. Types

6.2.2.1. The browsingContext.BrowsingContext Type

remote end definition and local end definition

BrowsingContext = text;

Each browsing context has an associated browsing context id , which is a string uniquely identifying that browsing context. This is implicitly set when the context is created. For browsing contexts with an associated WebDriver window handle the browsing context id must be the same as the window handle .

To get a browsing context given context id:

If context id is null, return success with data null.
If there is no browsing context with browsing context id context id return error with error code no such frame
Let context be the browsing context with id context id.
Return success with data context

6.2.2.2. The browsingContext.BrowsingContextInfo Type

local end definition

BrowsingContextInfoList = [* BrowsingContextInfo]
BrowsingContextInfo = {
  context: BrowsingContext,
  ?parent: BrowsingContext / null,
  url: text,
  children: BrowsingContextInfoList / null
}

The BrowsingContextInfo type represents the properties of a browsing context.

To get the browsing context info given context, depth and max depth:

Let context id be the browsing context id for context.
If context has a parent browsing context let parent id be the browsing context id of that parent. Otherwise let parent id be null.
Let document be context ’s active document .
Let url be the result of running the URL serializer , given document ’s URL .

Note: This includes the fragment component of the URL.
Let child info be the result of get the descendent browsing contexts given context id, depth + 1, and max depth.
Let context info be a map matching the BrowsingContextInfo production with the context field set to context id, the parent field set to parent id if depth is 0, or unset otherwise, the url field set to url, and the children field set to child info.
Return context info.

To get the descendent browsing contexts given parent id, depth and max depth:

If max depth is greater than zero, and depth is equal to max depth, return null.
Let parent be the result of trying to get a browsing context given parent id.
If parent is null, let child contexts be a list containing all top-level browsing contexts . Otherwise let child contexts be a list containing all browsing contexts which are child browsing contexts of parent.
Let contexts info be a list.
For each context of child contexts:
1. Let info be the result of get the browsing context info given context, depth, and max depth.
2. Append info to contexts info
Return contexts info

To given context, request, wait condition, and optionally history handling (default: "


default

") and ignore cache (default: false):

Let navigation id be the string representation of a UUID based on truly random, or pseudo-random numbers.
Navigate context with resource request, and using context as the source browsing context , with navigation id navigation id, and history handling behavior history handling. If ignore cache is true, the navigation must not load resources from the HTTP cache.

property specify how the ignore cache flag works. This needs to consider whether only the first load of a resource bypasses the cache (i.e. whether this is like initially clearing the cache and proceeding like normal), or whether resources not directly loaded by the HTML parser (e.g. loads initiated by scripts or stylesheets) also bypass the cache.
Let ( event received, navigate status ) be await given «" navigation started ", " navigation failed ", " fragment navigated "», and navigation id.
Assert: navigate status ’s id is navigation id.
If navigate status ’s status is " complete ":
1. Let body be a map matching the BrowsingContextNavigateResult production, with the navigation field set to navigation id, and the url field set to the result of the URL serializer given navigate status ’s url.
2. Return success with data body.
Note: this is the case if the navigation only caused the fragment to change.
If navigate status ’s status is " canceled " return error with error code unknown error .

TODO: is this the right way to handle errors here?
Assert: navigate status ’s status is " pending " and navigation id is not null.
If wait condition is " none ":
1. Let body be a map matching the BrowsingContextNavigateResult production, with the navigation field set to navigation id, and the url field set to the result of the URL serializer given navigate status ’s url.
2. Return success with data body.
If wait condition is " interactive ", let event name be " domContentLoaded ", otherwise let event name be " load ".
Let ( event received, status ) be await given « event name, " download started ", " navigation aborted ", " navigation failed "» and navigation id.
If event received is " navigation failed " return error with error code unknown error .

Are we surfacing enough information about what failed and why with an error here? What error code do we want? Is there going to be a problem where local ends parse the implementation-defined strings to figure out what actually went wrong?
Let body be a map matching the BrowsingContextNavigateResult production, with the navigation field set to status ’s id, and the url field set to the result of the URL serializer given status ’s url.
Return success with data body.

remote end definition and local end definition

Navigation = text;

The Navigation type is a unique string identifying an ongoing navigation.

TODO: Link to the definition in the HTML spec.

6.2.2.4. The browsingContext.NavigationInfo Type

local end definition :

NavigationInfo = {
  context: BrowsingContext,
  navigation: Navigation / null,
  url: text,
}

The NavigationInfo type provides details of an ongoing navigation.

To , given context and navigation status:

Let context id be the browsing context id for context.
Let navigation id be navigation status ’s id.
Let url be navigation status ’s url.
Return a map matching the NavigationInfo production, with the context field set to context id, the navigation field set to navigation id, and the url field set to the result of the URL serializer given url.

6.2.3. Commands

6.2.3.1. The browsingContext.getTree Command

The browsingContext.getTree command returns a tree of all browsing contexts that are descendents of the given context, or all top-level contexts when no parent is provided.

Command Type

BrowsingContextGetTreeCommand = {
  method: "browsingContext.getTree",
  params: BrowsingContextGetTreeParameters
}
BrowsingContextGetTreeParameters = {
  ?maxDepth: uint,
  ?parent: BrowsingContext,
}

Return Type

BrowsingContextGetTreeResult = {
  contexts: BrowsingContextInfoList
}

The remote end steps with session and command parameters are:

Let the parent id be the value of the parent field of command parameters if present, or null otherwise.
Let max depth be the value of the maxDepth field of command parameters if present, or 0 otherwise.
Let depth be 0.
Let contexts be the result of get the descendent browsing contexts , given parent id, depth, and max depth.
Let body be a map matching the BrowsingContextGetTreeResult production, with the contexts field set to contexts.
Return success with data body.

6.2.3.2. The browsingContext.navigate Command

The browsingContext.navigate command navigates a browsing context to the given URL.

Command Type

BrowsingContextNavigateCommand = {
  method: "browsingContext.navigate",
  params: BrowsingContextNavigateParameters
}
BrowsingContextNavigateParameters = {
  context: BrowsingContext,
  url: text,
  ?wait: ReadinessState,
}
 ReadinessState = "none" / "interactive" / "complete"

Return Type

BrowsingContextNavigateResult = {
    navigation: Navigation / null,
    url: text,
}

The remote end steps with session and command parameters are:

Let context id be the value of the context field of command parameters.
Let context be the result of trying to get a browsing context with context id.
Assert: context is not null.
Let wait condition be the value of the wait field of command parameters if present, or " none " otherwise.
Let url be the value of the url field of command parameters.
Let document be context ’s active document .
Let base be document ’s base URL .
Let url record be the result of applying the URL parser to url, with base URL base.
If url record is failure, return error with error code invalid argument .
Let request be a new request whose URL is url record.
Return the result of await a navigation with context, request and wait condition.

6.2.3.3. The browsingContext.reload Command

The browsingContext.reload command reloads a browsing context.

Command Type

BrowsingContextReloadCommand = {
  method: "browsingContext.reload",
  params: BrowsingContextReloadParameters
}
BrowsingContextReloadParameters = {
  context: BrowsingContext,
  ?ignoreCache: boolean,
  ?wait: ReadinessState,
}

The remote end steps with command parameters are:

Let context id be the value of the context field of command parameters.
Let context be the result of trying to get a browsing context with context id.
Assert: context is not null.
Let ignore cache be the the value of the ignoreCache field of command parameters if present, or false otherwise.
Let wait condition be the value of the wait field of command parameters if present, or " none " otherwise.
Let document be context ’s active document .
Let url be document ’s URL .
Let request be a new request whose URL is url.
Return the result of await a navigation with context, request, wait condition, history handling " reload ", and ignore cache ignore cache.

6.2.4. Events

6.2.4.1. The browsingContext.contextCreated Event

Event Type

 BrowsingContextCreatedEvent = {
  method: "browsingContext.contextCreated",
  params: BrowsingContextInfo
}

To Recursively emit context created events given session and context:

Emit a context created event with session and context.
For each child browsing context, child, of context:
1. Recursively emit context created events given session and child.

To Emit a context created event given session and context:

Let params be the result of get the browsing context info given context, 0, and 1.
Let body be a map matching the BrowsingContextCreatedEvent production, with the params field set to params.
Emit an event with session and body.

The remote end event trigger is:

When the create a new browsing context algorithm is invoked, after the active document of the browsing context is set, queue a WebDriver task to run the following steps:

Let context be the newly created browsing context.
Let related browsing contexts be a set containing context.
For each session in the set of sessions for which an event is enabled given " browsingContext.contextCreated " and related browsing contexts:
1. Emit a context created event given session and context.

The remote end subscribe steps , with subscribe priority 1, given session, contexts and include global are:

For each context in contexts:
1. Recursively emit context created events given session and context.

6.2.4.2. The browsingContext.contextDestroyed Event

Event Type

 BrowsingContextDestroyedEvent = {
  method: "browsingContext.contextDestroyed",
  params: BrowsingContextInfo
}

The remote end event trigger is:

Define the following browsing context tree discarded steps:

Queue a WebDriver task to run the following steps:
1. Let context be the browsing context being discarded.
2. Let params be the result of get the browsing context info , given context, 0, and 0.
3. Let body be a map matching the BrowsingContextDestroyedEvent production, with the params field set to params.
4. Let related browsing contexts be a set containing the parent browsing context of context, if that is not null, or an empty set otherwise.
5. For each session in the set of sessions for which an event is enabled given " browsingContext.contextDestroyed " and related browsing contexts:
  1. Emit an event with session and body.

the way this hooks into HTML feels very fragile. See https://github.com/whatwg/html/issues/6194

It’s unclear if we ought to only fire this event for browsing contexts that have active documents; navigation can also cause contexts to become inaccessible but not yet get discarded because bfcache.

6.2.4.3. The browsingContext.navigationStarted Event

Event Type

 BrowsingContextNavigationStartedEvent = {
  method: "browsingContext.navigationStarted",
  params: NavigationInfo
}

The remote end event trigger is the steps given context and navigation status:

Queue a WebDriver task to run the following steps:
1. Let params be the result of get the navigation info given context and navigation status.
2. Let body be a map matching the BrowsingContextNavigationStarted production, with the params field set to params.
3. Let navigation id be navigation status ’s id.
4. Let related browsing contexts be a set containing context.
5. Resume with " navigation started ", navigation id, and navigation status.
6. For each session in the set of sessions for which an event is enabled given " browsingContext.navigationStarted " and related browsing contexts:
  1. Emit an event with session and body.

6.2.4.4. The browsingContext.fragmentNavigated Event

Event Type

 BrowsingContextFragmentNavigatedEvent = {
  method: "browsingContext.fragmentNavigated",
  params: NavigationInfo
}

The remote end event trigger is the WebDriver BiDi fragment navigated steps given context and navigation status:

Queue a WebDriver task to run the following steps:
1. Let params be the result of get the navigation info given context and navigation status.
2. Let body be a map matching the BrowsingContextFragmentNavigatedEvent production, with the params field set to params.
3. Let navigation id be navigation status ’s id.
4. Let related browsing contexts be a set containing context.
5. Resume with " fragment navigated ", navigation id, and navigation status.
6. For each session in the set of sessions for which an event is enabled given " browsingContext.fragmentNavigated " and related browsing contexts:
  1. Emit an event with session body and body related browsing contexts.

6.2.4.5. The browsingContext.domContentLoaded Event

Event Type

 BrowsingContextDomContentLoadedEvent = {
  method: "browsingContext.domContentLoaded",
  params: NavigationInfo
}

The remote end event trigger is the WebDriver BiDi DOM content loaded steps given context and navigation status:

Queue a WebDriver task to run the following steps:
1. Let params be the result of get the navigation info given context and navigation status.
2. Let body be a map matching the BrowsingContextDomContentLoadedEvent production, with the params field set to params.
3. Let related browsing contexts be a set containing context.
4. Let navigation id be navigation status ’s id.
5. Let params be the result of get the navigation info given context and navigation status.
6. Let body be a map matching the BrowsingContextDomContentLoadedEvent production, with the params field set to params.
7. Resume with " domContentLoaded ", navigation id, and navigation status.
8. For each session in the set of sessions for which an event is enabled given " browsingContext.domContentLoaded " and related browsing contexts:
9. Emit an event with session and body.

6.2.4.6. The browsingContext.load Event

Event Type

 BrowsingContextLoadEvent = {
  method: "browsingContext.load",
  params: NavigationInfo
}

The remote end event trigger is the WebDriver BiDi load complete steps given context and navigation status:

Queue a WebDriver task to run the following steps:
1. Let params be the result of get the navigation info given context and navigation status.
2. Let body be a map matching the BrowsingContextLoadEvent production, with the params field set to params.
3. Let related browsing contexts be a set containing context.
4. Let navigation id be navigation status ’s id.
5. Resume with " load ", navigation id and navigation status.
6. For each session in the set of sessions for which an event is enabled given " browsingContext.load " and related browsing contexts:
  1. Emit an event with session and body.

6.2.4.7. The browsingContext.downloadWillBegin Event

Event Type

 BrowsingContextDownloadWillBegin = {
  method: "browsingContext.downloadWillBegin",
  params: NavigationInfo
}

The remote end event trigger is the WebDriver BiDi download started steps given context and navigation status:

Queue a WebDriver task to run the following steps:
1. Let params be the result of get the navigation info given context and navigation status.
2. Let params be the result of get the navigation info given context and navigation status.
3. Let body be a map matching the BrowsingContextDownloadWillBegin production, with the params field set to params.
4. Let navigation id be navigation status ’s id.
5. Let related browsing contexts be a set containing context.
6. Resume with " download started ", navigation id, and navigation status.
7. For each session in the set of sessions for which an event is enabled given " browsingContext.downloadWillBegin " and related browsing contexts:
  1. Emit an event with session and body.

6.2.4.8. The browsingContext.navigationAborted Event

Event Type

 BrowsingContextNavigationAborted = {
  method: "browsingContext.navigationAborted",
  params: NavigationInfo
}

The remote end event trigger is the steps given context and navigation status:

Queue a WebDriver task to run the following steps:
1. Let params be the result of get the navigation info given context and navigation status.
2. Let body be a map matching the BrowsingContextNavigationAborted production, with the params field set to params.
3. Let navigation id be navigation status ’s id.
4. Let related browsing contexts be a set containing context.
5. Resume with " navigation aborted ", navigation id, and navigation status.
6. For each session in the set of sessions for which an event is enabled given " browsingContext.navigationAborted " and related browsing contexts:
  1. Emit an event with session and body.

6.2.4.9. The browsingContext.navigationFailed Event

Event Type

 BrowsingContextNavigationFailed = {
  method: "browsingContext.navigationFailed",
  params: NavigationInfo
}

The remote end event trigger is the steps given context and navigation status:

Queue a WebDriver task to run the following steps:
1. Let params be the result of get the navigation info given context and navigation status.
2. Let body be a map matching the BrowsingContextNavigationFailed production, with the params field set to params.
3. Let navigation id be navigation status ’s id.
4. Let related browsing contexts be a set containing context.
5. Resume with " navigation failed ", navigation id, and navigation status.
6. For each session in the set of sessions for which an event is enabled given " browsingContext.navigationFailed " and related browsing contexts:
  1. Emit an event with session and body.

6.3. The script Module

The script module contains commands and events relating to script realms and execution.

6.3.1. Definition

Remote end definition

ScriptCommand = (ScriptGetRealmsCommand)

local end definition

ScriptResult = (ScriptGetRealmsResult)
ScriptEvent = (
    ScriptRealmCreatedEvent //
    ScriptRealmDestroyedEvent
)

6.3.2. Types

6.3.2.1. The script.Realm type

Remote end definition and local end definition

Realm = text;

Each realm has an associated realm id , which is a string uniquely identifying that realm. This is implicitly set when the realm is created.

6.3.2.2. The script.RealmInfo type

Local end definition

RealmInfo = {
  realm: Realm,
  type: RealmType,
  origin: text
}
RealmType = "window" / "dedicated-worker" / "shared-worker" / "service-worker" / "worker" / "paint-worklet" / "audio-worklet" / "worklet" / text

The RealmInfo type represents the properties of a realm.

To get the realm info given environment settings:

Let realm be environment settings ’ realm execution context 's Realm component.
Let realm id be the realm id for realm.
Run the steps under the first matching condition:
The global object specified by environment settings is a Window object
1. Let type be " window ".
The global object specified by environment settings is a DedicatedWorkerGlobalScope object
1. Let type be " dedicated-worker ".
The global object specified by environment settings is a SharedWorkerGlobalScope object
1. Let type be " shared-worker ".
The global object specified by environment settings is a ServiceWorkerGlobalScope object
1. Let type be " service-worker ".
The global object specified by environment settings is a WorkerGlobalScope object
1. Let type be " worker ".
The global object specified by environment settings is a PaintWorkletGlobalScope object
1. Let type be " paint-worklet ".
The global object specified by environment settings is a AudioWorkletGlobalScope object
1. Let type be " audio-worklet ".
The global object specified by environment settings is a WorkletGlobalScope object
1. Let type be " worklet ".
Otherwise:
1. Return null.
Let origin be the serialization of an origin given environment settings ’s origin.
Let realm info be a map matching the RealmInfo production, with the realm field set to realm id, the type field set to type and the origin field set to origin.
Return realm info

We currently don’t provide information about realms of unknown types. That might be a problem for e.g. extension-related realms.

Note: Future variations of this specification will retain the invariant that the last component of the type name after splitting on " - " will always be " worker " for globals implementing WorkerGlobalScope, and " worklet " for globals implementing WorkletGlobalScope.

6.3.3. Commands

6.3.3.1. The script.getRealms Command

The script.getRealms command returns a list of all realms, optionally filtered to realms of a specific type, or to the realm associated with the document currently loaded in a specified browsing context .

Command Type

ScriptGetRealmsCommand = {
  method: "script.getRealms",
  params: GetRealmsParameters
}
GetRealmsParameters = {
  ?context: BrowsingContext,
  ?type: RealmType,
}

Return Type

RealmInfoList = [* RealmInfo]
ScriptGetRealmsResult = {
  realms: RealmInfoList
}

The remote end steps with session and command parameters are:

Let environment settings be a list of all the environment settings objects that have their execution ready flag set.
If command parameters contains context:
1. Let context be the result of trying to get a browsing context with command parameters [" context "].
2. Let document be context ’s active document .
3. Let context environment settings be a list.
4. For each settings of environment settings:
  1. If any of the following conditions hold:
    - The responsible document of settings is document
    - The global object specified by settings is a WorkerGlobalScope with document in its owner set
    Append settings to context environment settings.
5. Set environment settings to context environment settings.
Let realms be a list.
For each settings of environment settings:
1. Let realm info be the result of get the realm info given settings
2. If command parameters contains type and realm info [" type "] is not equal to command parameters [" type "] then continue .
3. If realm info is not null, append realm info to realms.
Let body be a map matching the GetRealmsResult production, with the realms field set to realms.
Return success with data body.

Extend this to also allow realm parents e.g. for nested workers? Or get all ancestor workers.

We might want to have a more sophisticated filter system than just a literal match.

6.3.4. Events

6.3.4.1. The script.realmCreated Event

Event Type

 ScriptRealmCreatedEvent = {
  method: "script.realmCreated",
  params: RealmInfo
}

The remote end event trigger is:

When any of the set up a window environment settings object , set up a worker environment settings object or set up a worklet environment settings object algorithms are invoked, immediately prior to returning the settings ~~object:~~ object, queue a WebDriver task to run the following steps:

Let environment settings be the newly created environment settings object .
Let realm info be be the result of get the realm info given environment settings.
If realm info is null, return.
Let related browsing contexts be the result of get related browsing contexts given environment settings.
Let body be a map matching the RealmCreatedEvent production, with the params field set to realm info.
For each session in the set of sessions for which an event is enabled given " script.realmCreated " and related browsing contexts:
1. Emit an event with session and body.

The remote end subscribe steps with subscribe priority 2, given session, contexts and include global are:

Let environment settings be a list of all the environment settings objects that have their execution ready flag set.
For each settings of environment settings:
1. Let related browsing contexts be a new set.
2. If the responsible document of settings is a Document :
  1. Let context be settings ’s responsible document 's browsing context 's top-level browsing context .
  2. If context is not in contexts, continue.
  3. Append context to related browsing contexts.
  Otherwise, if include global is false, continue.
3. Let realm info be the result of get the realm info given settings
4. Let body be a map matching the RealmCreatedEvent production, with the params field set to realm info.
5. If event is enabled givenn session, " script.realmCreated " and related browsing contexts:
  1. Emit an event with session and body.

Should the order here be better defined?

6.3.4.2. The script.realmDestroyed Event

Event Type

RealmDestroyedParameters = {
  realm: Realm
}
ScriptRealmDestroyedEvent = {
  method: "script.realmDestoyed",
  params: RealmDestroyedParameters
}

The remote end event trigger is:

Define the following unloading document cleanup steps with document:

Queue a WebDriver task to run the following steps:
1. Let related browsing contexts be an empty set.
2. Append document ’s browsing context to related browsing contexts.
3. For each worklet global scope in document ’s worklet global scopes :
  1. Let realm be worklet global scope ’s relevant Realm .
  2. Let realm id be the realm id for realm.
  3. Let params be a map mathcing the RealmDestroyedParameters production, with the realm field set of realm id.
  4. Let body be a map matching the RealmDestroyedEvent production, with the params field set to params.
  5. For each session in the set of sessions for which an event is enabled given " script.realmDestroyed " and related browsing contexts:
    1. Emit an event with session and body.
4. Let environment settings be the environment settings object whose responsible document is document.
5. Let realm be environment settings ’ realm execution context 's Realm component.
6. Let realm id be the realm id for realm.
7. Let params be a map mathcing the RealmDestroyedParameters production, with the realm field set to realm id.
8. Let body be a map matching the RealmDestroyedEvent production, with the params field set to params.
9. For each session in the set of sessions for which an event is enabled given " script.realmDestroyed " and related browsing contexts:
  1. Emit an event with session and body.

Whenever a worker event loop event loop is destroyed, either because the worker comes to the end of its lifecycle, or prematurely via the terminate a worker ~~algorithm:~~ algorithm, queue a WebDriver task to run the following steps:

Let environment settings be the environment settings object for which event loop is the responsible event loop .
Let related browsing contexts be the result of get related browsing contexts given environment settings.
Let realm be environment settings ’s environment settings object’s Realm .
Let realm id be the realm id for realm.
Let params be a map mathcing the RealmDestroyedParameters production, with the realm field set of realm id.
Let body be a map matching the RealmDestroyedEvent production, with the params field set to params.

6.4. Log

The log module contains functionality and events related to logging.

A BiDi Session has a log event buffer which is a map from browsing context id to a list of log events for that context that have not been emitted. User agents may impose a maximum size on this buffer, subject to the condition that if events A and B happen in the same context with A occuring before B, and both are added to the buffer, the entry for B must not be removed before the entry for A.

To buffer a log event given session, contexts and event:

Let buffer be session ’s log event buffer .
Let context ids be a new list.
For each context of contexts:
1. Append the browsing context id for context to context ids.
For each context id in context ids:
1. Let other contexts be an empty list
2. For each other id in context ids:
3. If other id is not equal to context id, append other id to other contexts.
4. If buffer does not contain context id, let buffer [ context id ] be a new list.
5. Append ( event, other contexts ) to buffer [ context id ].

Note: we store the other contexts here so that each event is only emitted once. In practice this is only relevant for workers that can be associated with multiple browsing contexts.

Do we want to key this on browsing context or top-level browsing context? The difference is in what happens if an event occurs in a frame and that frame is then navigated before the local end subscribes to log events for the top level context.

6.4.1. Definition

remote end definition

LogEvent = (
  LogEntryAddedEvent
)

6.4.2. Types

6.4.2.1. log.LogEntry

LogLevel = "debug" / "info" / "warning" / "error"
LogEntry = (
  GenericLogEntry //
  ConsoleLogEntry //
  JavascriptLogEntry
)
BaseLogEntry = {
  level: LogLevel,
  text: text / null,
  timestamp: int,
  ?stackTrace: [*StackFrame],
}
GenericLogEntry = {
  BaseLogEntry,
  type: text,
}
ConsoleLogEntry = {
  BaseLogEntry,
  type: "console",
  method: text,
  realm: Realm,
  args: [*RemoteValue],
}
JavascriptLogEntry = {
  BaseLogEntry,
  type: "javascript",
}

Each log event is represented by a LogEntry object. This has a type property which represents the type of log entry added, a level property representing severity, a text property with the log message string itself, and a timestamp property corresponding to the time the log entry was generated. Specific variants of the LogEntry are used to represent logs from different sources, and provide additional fields specific to the entry type.

6.4.2.2. log.StackFrame

StackFrame = {
  url: text,
  functionName: text,
  lineNumber: int,
  columnNumber: int,
}

A frame in a stacktrace is represented by a StackFrame object. This has a url property, which represents the URL of the script, a functionName property which represents the name of the executing function, and lineNumber and columnNumber properties, which represent the line and column number of the executed code.

The current stack trace is a representation of the stack of the running execution context . The details of this are unspecified, and so the behaviour here is implementation defined, but the general process is as follows:

Let stack trace be a new list.
For each stack frame frame in the stack of the running execution context, starting from the most recently executed frame, run the following steps:
1. Let url be the result of running the URL serializer , given the URL of frame ’s associated script resource.
2. Let functionName be the name of frame ’s associated function.
3. Let lineNumber and columnNumber be the one-based line and zero-based column numbers, respectively, of the location in frame ’s associated script resource corresponding to frame.
4. Let frame info be a new map matching the StackFrame production, with the url field set to url, the functionName field set to functionName, the lineNumber field set to lineNumber and the columnNumber field set to columnNumber.
Append frame info to stack trace.
Return stack trace

6.4.3. Events

6.4.3.1. entryAdded

Event Type

 LogEntryAddedEvent = {
  method: "log.entryAdded",
  params: LogEntry,
}

The remote end event trigger is:

Define the following console steps with method, args, and options:

Queue a WebDriver task to run the following steps:
1. If method is " error " or " assert ", let level be " error ". If method is " debug " or " trace " let level be " debug ". If method is " warn " or warning, let level be " warning ". Otherwise let level be " info ".
2. Let timestamp be a time value representing the current date and time in UTC.
3. Let text be an empty string.
4. If Type (| args [0]) is String, and args [0] contains a formatting specifier , let formatted args be Formatter ( args ). Otherwise let formatted args be args.
  
  This is underdefined in the console spec, so it’s unclar if we can get interoperable behaviour here.
5. For each arg in formatted args:
  1. If arg is not the first entry in args, append a U+0020 SPACE to text.
  2. If arg is a primitive value , append ToString ( arg ) to text. Otherwise append an implementation-defined string to text.
6. Let serialized args be a new list.
7. For each arg of args, append the result of serialize as a remote value given arg, null, true, and an empty set to serialized args.
8. Let realm be the realm id of the current Realm Record .
9. If method is " assert ", " error ", " trace ", or " warn ", let stack be the current stack trace . Otherwise let stack be null.
10. Let entry be a map matching the ConsoleLogEntry production, with the the level field set to level, the text field set to text, the timestamp field set to timestamp, the stackTrace field set to stack if stack is not null, or omitted otherwise, the method field set to method, the realm field set to realm and the args field set to serialized args.
11. Let body be a map matching the LogEntryAddedEvent production, with the params field set to entry.
12. Let settings be the current settings object
13. Let related browsing contexts be the result of get related browsing contexts given settings.
14. For each session in active BiDi sessions :
  1. If event is enabled with session, " log.entryAdded " and related browsing contexts, emit an event with session and body.
    
    Otherwise, buffer a log event with session, related browsing contexts, and body.

Define the following error reporting steps with arguments script, line number, column number, message and handled:

Queue a WebDriver task to run the following steps:
1. If handled is true return.
2. Let settings be script ’s settings object .
3. Let stack be the current stack trace for the exception.
4. Let entry be a map matching the JavascriptLogEntry production, with level set to " error ", text set to message, and the timestamp field set to timestamp.
5. Let related browsing contexts be the result of get related browsing contexts given settings.
6. For each session in active BiDi sessions :
  1. If event is enabled with session, " log.entryAdded " and related browsing contexts, emit an event with session and body.
    
    Otherwise, buffer a log event with session, related browsing contexts, and body.

Lots more things require logging. CDP has LogEntryAdded types xml, javascript, network, storage, appcache, rendering, security, deprecation, worker, violation, intervention, recommendation, other. These are in addition to the js exception and console API types that are represented by different methods.

Allow implementation-defined log types

The remote end subscribe steps , with subscribe priority 10, given session, contexts and include global are:

For each context id → events in session ’s log event buffer :
1. Let maybe context be the result of getting a browsing context given context id.
2. If maybe context is an error , remove context id from log event buffer and continue.
3. Let context be maybe context ’s data
4. Let top level context be context ’s top-level browsing context .
5. If include global is true and top level context is not in contexts, or if include global is false and top level context is in contexts:
  1. For each ( event, other contexts ) in events:
    1. Emit an event with session and event.
    2. For each other context id in other contexts:
      1. If log event buffer contains other context id, remove event from log event buffer [ other context id ].

7. Patches to Other Specifications

This specification requires some changes to external specifications to provide the necessary integration points. It is assumed that these patches will be committed to the other specifications as part of the standards process.

7.1. HTML

The a browsing context is discarded algorithm is modified to read as follows:

To discard a browsing context browsingContext, run these steps:

If this is not a recursive invocation of this algorithm, call any browsing context tree discarded steps defined in other applicable specifications with browsingContext.
Discard all Document objects for all the entries in browsingContext ’s session history .
If browsingContext is a top-level browsing context , then remove a browsing context browsingContext.

The actual patch might be better to split the algorithm into an outer algorithm that is called by external callers and an inner algorithm that’s used for recursive calls. That’s quite hard to express as a patch to the specification since it requires changing multiple parts.

The report an error algorithm is modified with an additional step at the end:

Call any error reporting steps defined in external specifications with script, line, col, message, and true if the error is handled , or false otherwise.

7.2. Console

Other specifications can define console steps . When any method of the console interface is called, with method name method and argument args:

If that method does not call the Printer operation, call any console steps defined in external specification with arguments method, args and, undefined.

Otherwise, at the point when the Printer operation is called with arguments name, printerArgs and options (which is undefined if the argument is not provided), call any console steps defined in external specification with arguments name, printerArgs, and options.

WebDriver BiDi

Editor’s Draft, 23 September 2021

Abstract

Status of this document

1. Introduction

2. Infrastructure

3. Protocol

3.1. Definition

3.2. Session

3.3. Modules

3.4. Commands

3.5. Events

4. Transport

4.1. Establishing a Connection

5. Common Data Types

5.1. Remote Value

6. Modules

6.1. The session Module

6.1.1. Definition

6.1.2. Types

6.1.2.1. The session.CapabilitiesRequest Type

6.1.3. Commands

6.1.3.1. The session.status Command

6.1.3.2. The session.new Command

6.1.3.3. The session.subscribe Command

6.1.3.4. The session.unsubscribe Command

6.2. The browsingContext Module

6.2.1. Definition

6.2.2. Types

6.2.2.1. The browsingContext.BrowsingContext Type

6.2.2.2. The browsingContext.BrowsingContextInfo Type

6.2.2.3. The browsingContext.Navigation Type

6.2.2.4. The browsingContext.NavigationInfo Type

6.2.3. Commands

6.2.3.1. The browsingContext.getTree Command

6.2.3.2. The browsingContext.navigate Command

6.2.3.3. The browsingContext.reload Command

6.2.4. Events

6.2.4.1. The browsingContext.contextCreated Event

6.2.4.2. The browsingContext.contextDestroyed Event

6.2.4.3. The browsingContext.navigationStarted Event

6.2.4.4. The browsingContext.fragmentNavigated Event

6.2.4.5. The browsingContext.domContentLoaded Event

6.2.4.6. The browsingContext.load Event

6.2.4.7. The browsingContext.downloadWillBegin Event

6.2.4.8. The browsingContext.navigationAborted Event

6.2.4.9. The browsingContext.navigationFailed Event

6.3. The script Module

6.3.1. Definition

6.3.2. Types

6.3.2.1. The script.Realm type

6.3.2.2. The script.RealmInfo type

6.3.3. Commands

6.3.3.1. The script.getRealms Command

6.3.4. Events

6.3.4.1. The script.realmCreated Event

6.3.4.2. The script.realmDestroyed Event

6.4. Log

6.4.1. Definition

6.4.2. Types

6.4.2.1. log.LogEntry

6.4.2.2. log.StackFrame

6.4.3. Events

6.4.3.1. entryAdded

7. Patches to Other Specifications

7.1. HTML

7.2. Console

Index

Terms defined by this specification

Terms defined by reference

References

Normative References

Informative References

Issues Index