Open Screen Protocol

Add short names to Presentation API spec, so that BS autolinking works as designed.

Can autolinks to HTML51 be automatically generated?

Status of this document

This specification was published by the Second Screen Community Group. It is not a W3C Standard nor is it on the W3C Standards Track. It should not be viewed as a stable specification, and may change in substantial ways at any time. A future version of this document will be published as a Community Group Report.

Please note that under the W3C Community Contributor License Agreement (CLA) there is a limited opt-out and other conditions apply.

Learn more about W3C Community and Business Groups.

1. Introduction

The Open Screen Protocol connects browsers to devices capable of rendering Web content for a shared audience. Typically, these are devices like Internet-connected TVs, HDMI dongles, or "smart" speakers.

The protocol is a suite of subsidiary network protocols that enable two user agents to implement the Presentation API and Remote Playback API in an interoperable fashion. This means that a user can expect these APIs work as intended when connecting two devices from independent implementations of the Open Screen Protocol.

The Open Screen Protocol is a specific implementation of these two APIs, meaning that it does not handle all possible ways that browsers and presentation displays could support these APIs. The Open Screen Protocol specifically supports browsers and displays that are connected via the same local area network, and that initiate presentation or remote playback by sending a URL from the browser to the target display.

The Open Screen Protocol is intended to be extensible, so that additional capabilities can be added over time. This may include new implementations of existing APIs, or new APIs.

1.1. Terminology

We use the term "agent" to mean any implementation of this protocol (browser, device, or otherwise), acting as a controller or a receiver.

We borrow terminology from the Presentation API. We call the agent that is used to discover and initiate presentation of Web content on another device the controller (or controlling user agent when it is a browser). We call the agent on the device rendering the Web content the receiver or presentation display (or receiving user agent when it is a browser). presentation display availability refers to whether or not a receiver is compatible with a presentation request URL. However, in the Presentation API, a "controller" refers to as a specific browsing context within the browser, whereas here the "controller" refers to the browser itself, although it may be acting on behalf of a browsing context.

We borrow terminology from the Remote Playback API. The agent responsible for rendering media of a remote playback is called the remote playback device. In this document, we also refer to it as the receiver because it is shorter and keeps terminology consistent between presentations and remote playbacks. Similarly, we use the term "controller" (referred to as the "user agent" in the Remote Playback API) to refer to the agent that starts, terminates, and controls the remote playback.

For media streaming, we refer to the agent sending media as the media sender and the agent receiving the media as the media receiver. Note that a media receiver may or may not be a receiver or controller as defined by the Presentation API or Remote Playback API. Also note that an agent may be both a sender and receiver.

For additional terms and idioms specific to the Presentation API or Remote Playback API, please consult the respective specifications.

Receiver/Controller/Agent terminology. <https://github.com/webscreens/openscreenprotocol/issues/144>

2. Requirements

2.1. Presentation API Requirements

A controlling user agent must be able to discover the presence of a presentation display connected to the same IPv4 or IPv6 subnet and reachable by IP multicast.
A controlling user agent must be able to obtain the IPv4 or IPv6 address of the display, a friendly name for the display, and an IP port number for establishing a network transport to the display.
A controlling user agent must be able to determine if the receiver is reasonably capable of rendering a specific presentation request URL.
A controlling user agent must be able to start a new presentation on a receiver given a presentation request URL and presentation ID.
A controlling user agent must be able to create a new PresentationConnection to an existing presentation on the receiver, given its presentation request URL and presentation ID.
It must be possible to close a PresentationConnection between a controller and a presentation, and signal both parties with the reason why the connection was closed.
Multiple controllers must be able to connect to a single presentation simultaneously, possibly from from one or more controlling user agents.
Messages sent by the controller must be delivered to the presentation (or vice versa) in a reliable and in-order fashion.
If a message cannot be delivered, then the controlling user agent must be able to signal the receiver (or vice versa) that the connection should be closed with reason error.
The controller and presentation must be able to send and receive DOMString messages (represented as string type in ECMAScript).
The controller and presentation must be able to send and receive binary messages (represented as Blob objects in HTML5, or ArrayBuffer or ArrayBufferView types in ECMAScript).
The controlling user agent must be able to signal to the receiver to terminate a presentation, given its presentation request URL and presentation ID.
The receiver must be able to signal all connected controlling user agents when a presentation is terminated.

2.2. Remote Playback API Requirements

A controlling user agent must be able to find out whether there is at least one compatible remote playback device available for a given HTMLMediaElement, both instantaneously and continuously.
A controlling user agent must be able to to initiate remote playback of an HTMLMediaElement to a compatible remote playback device.
The controlling user agent must be able send media sources as URLs and text tracks from an HTMLMediaElement to a compatible remote playback device.
During remote playback, the controlling user agent and the remote playback device must be able to synchronize the media element state of the HTMLMediaElement.
During remote playback, either the controlling user agent or the remote playback device must be able to disconnect from the other party.
The controlling user agent should be able to pass locale and text direction information to the remote playback device to assist in rendering text during remote playback.

2.3. Non-Functional Requirements

It should be possible to implement an Open Screen presentation display using modest hardware requirements, similar to what is found in a low end smartphone, smart TV or streaming device. See the Device Specifications document for expected presentation display hardware specifications.
It should be possible to implement an Open Screen controlling user agent on a low-end smartphone. See the Device Specifications document for expected controlling user agent hardware specifications.
The discovery and connection protocols should minimize power consumption, especially on the controlling user agent which is likely to be battery powered.
The protocol should minimize the amount of information provided to a passive network observer about the identity of the user, activity on the controlling user agent and activity on the receiver.
The protocol should prevent passive network eavesdroppers from learning presentation URLs, presentation IDs, or the content of presentation messages passed between controllers and presentations.
The protocol should prevent active network attackers from impersonating a display and observing or altering data intended for the controller or presentation.
The controlling user agent should be able to discover quickly when a presentation display becomes available or unavailable (i.e., when it connects or disconnects from the network).
The controlling user agent should present sensible information to the user when a protocol operation fails. For example, if a controlling user agent is unable to start a presentation, it should be possible to report in the controlling user agent interface if it was a network error, authentication error, or the presentation content failed to load.
The controlling user agent should be able to remember authenticated presentation displays. This means it is not required for the user to intervene and re-authenticate each time the controlling user agent connects to a pre-authenticated display.
Message latency between the controller and a presentation should be minimized to permit interactive use. For example, it should be comfortable to type in a form in the controller and have the text appear in the presentation in real time. Real-time latency for gaming or mouse use is ideal, but not a requirement.
The controlling user agent initiating a presentation should communicate its preferred locale to the receiver, so it can render the presentation content in that locale.
It should be possible to extend the control protocol (above the discovery and transport levels) with optional features not defined explicitly by the specification, to facilitate experimentation and enhancement of the base APIs.

3. Discovery with mDNS

Agents may discover one another using DNS-SD over mDNS. To do so, agents must use the service name "_openscreen._udp.local".

Define suspend and resume behavior for discovery protocol. <https://github.com/webscreens/openscreenprotocol/issues/107>

Advertising Agents must use an instance name that is a prefix of the agent’s display name. If the instance name is not the complete display name (if it has been truncated), it must be terminated by a null character. It is prefix so that the name displayed to the user pre-verification can be verified later. It is terminated by a null character in the case of truncation so that the listening agent knows it has been truncated. This complexity is necessary to all for display names that exceed the size allowed in an instance name and for such (possibly truncated) display names to be visible to the user sooner (before a QUIC connection is made). Listening agents must treat instance names as unverified and must verify that the instance name is a prefix of the verified display name before showing the user a verified display name.

Advertising agents must include DNS TXT records with the following keys and values:

fp: The certificate fingerprint of the advertising agent. The format of the fingerprint is defined by RFC 8122 section 5, excluding the "fingerprint:" prefix and including the hash function, space, and hex-encoded fingerprint. The fingerprint value also functions as an ID for the agent. All agents must support the following hash functions: "sha-256", "sha-512". Agents must not support the following hash functions: "md2", "md5".

Include cross references to the specs for these hash functions.

mv: An unsigned integer value that indicates that metadata has changed. The advertising agent must update it to a greater value. This signals to the listening agent that it should connect to the advertising agent to discover updated metadata.

Add examples of sample mDNS records.

Future extensions to this QUIC-based protocol can use the same metadata discovery process to indicate support for those extensions, through a capabilities mechanism to be determined. If a future version of the Open Screen Protocol uses mDNS but breaks compatibility with the metadata discovery process, it should change the DNS-SD service name to a new value, indicating a new mechanism for metadata discovery.

4. Transport and metadata discovery with QUIC

If an agent wants to connect to or learn further metadata about another agent, it initiates a [QUIC] connection to the IP and port from the SRV record. Prior to authentication, a message may be exchanged (such as further metadata), but such info should be treated as unverified (such as indicating to a user that a display name of an unauthenticated agent is unverified).

To learn further metadata, an agent may send an agent-info-request message (see Appendix A: Messages) and receive back an agent-info-response message. Any agent may send this request to learn about the capabilities of another device.

The agent-info-response message contains the following properties:

display-name (required): The display name of the agent, intended to be displayed to a user by the requester. The requester should indicate through the UI if the responder is not authenticated or if the display name changes.
model-name (optional): If the agent is a hardware device, the model name of the device. This is used mainly for debugging purposes, but may be displayed to the user of the requesting agent.
receives-audio (optional): The agent has to indicate that it supports audio. If false or not included, it is assumed audio content is not supported.
receives-video (optional): The agent has to indicate that it supports video . If false or not included, it is assumed video content is not supported.

Listening agents act as QUIC clients. Advertising agents act as QUIC servers.

If a listening agent wishes to receive messages from an advertising agent or an advertising agent wishes to send messages to a listening agent, it may wish to keep the QUIC connection alive. Once neither side needs to keep the connection alive for the purposes of sending or receiving messages, the connection should be closed with an error code of 5139. In order to keep a QUIC connection alive, an agent may send an agent-status-request message, and any agent that receives an agent-status-request message should send an agent-status-response message. Such messages should be sent more frequently than the QUIC idle_timeout transport parameter (see section 18 of [QUIC]) and QUIC PING frames should not be used. An idle_timeout transport parameter of 25 seconds is recommended. The agent should behave as though a timer less than the idle_timeout were reset every time a message is sent on a QUIC stream. If the timer expires, a agent-status-request message should be sent.

If a client agent wishes to send messages to a server agent, the client agent can connect to the server agent "on demand"; it does not need to keep the connection alive.

Define suspend and resume behavior for connection protocol. <https://github.com/webscreens/openscreenprotocol/issues/108>

The agent-info-response message and agent-status-response messages may be extended to include additional information not defined in this spec. If done ad-hoc by applications and not in future specs, keys should be chosen to avoid collision, such as by choosing large integers or long strings. Agents must ignore keys in the agent-info-message that it does not understand to allow agents to easily extend this message.

5. Messages delivery using CBOR and QUIC streams

Messages are serialized using CBOR. To send a group of messages in order, that group of messages must be sent in one QUIC stream. Independent groups of messages (with no ordering dependency across groups) should be sent in different QUIC streams. In order to put multiple CBOR-serialized messages into the the same QUIC stream, the following is used.

For each message, the sender must write to the QUIC stream the following:

A type key representing the type of the message, encoded as a variable-length integer (see Appendix A: Messages for type keys)
The message encoded as CBOR.

If an agent receives a message for which it does not recognize a type key, it must close the QUIC connection with an application error code of 404 and should include the unknown type key in the reason phrase (see QUIC transport section 19.4).

Variable-length integers are encoded in the same format as defined by QUIC transport section 16.

Many messages are requests and responses, so a common format is defined for those. A request and a response includes a request ID which is an unsigned integer chosen by the requester. Responses must include the request ID of the request they are associated with.

Clarify scoping/uniqueness of request IDs. <https://github.com/webscreens/openscreenprotocol/issues/139>

6. Authentication

Each supported authentication method is implemeted via authentication messages specific to that method. The authentication method is explicitly specified by the message itself. The authentication status message is common for all authentication methods. Any new authentication method added must define new authentication messages. The default authentication method is a challenge-response authentication with auth-request-hkdf-scrypt-psk and auth-response-hkdf-scrypt-psk-result.

Prior to authentication, agents exchange auth-capabilities messages specifying pre-shared key (PSK) ease of input for the user and supported PSK input methods. The agent with the lowest PSK ease of input presents a PSK to the user when the agent either sends or receives an authentication request. In case both agents have the same PSK ease of input value, the server presents the PSK to the user. The same pre-shared key is used by both agents to issue an authentication request.

PSK ease of input is an integer in the range from 0 to 100 inclusive, where 0 means it is not possible for the user to input PSK on this device and 100 means that it’s easy for the user to input PSK on the device. Supported PSK input methods are numeric and scanning a QR-code. Devices with non-zero PSK ease of input must support the numeric PSK input method.

In order for one agent (the challenger) to authenticate another (the responder), the challenger may send an authentication-request message and expect an authentication-response message to be sent back from the responder. To mutually authenticate, this mechanism is used twice, once by each side acting as the challenger. This mechanism assumes the agents share a low-entropy secret, such as a number or a short password that could be entered by a user on a keyboard or TV remote control.

For all messages and objects defined in this section, see Appendix A for the full CDDL definitions.

The challenger sends an auth-request-hkdf-scrypt-psk message with the following values:

salt: 32 random bytes. This salt is used in HKDF, so see https://tools.ietf.org/html/rfc5869#section-3.1 for more details on how this value should be generated.
cost: log base 2 of the cost parameter (N) for scrypt defined in RFC 7914 section 2. It must be greater than or equal to 14 (to avoid being too weak) and less than or equal to 128 (the limit defined by scrypt). A value of 15 is recommended (an scrypt N of 2^15 or 32768).

The responder replies with an auth-response-hkdf-scrypt-psk-result message with the following values:

result: If the responder was able to calculate proof of possession of the shared secret, and if it failed, why it failed.
proof: The result of running the authentication mechanism. The steps for hkdf-of-scrypt-of-psk are described below.

The challenger verifies the proof and sends the responder an auth-status message with the following values:

result: If the challenger was able to authenticate the responder or not, and if not, why not.

The challenger must limit the time the responder has to send a response to 60 seconds (to avoid the possibility of brute-force attacks.)

For hkdf-of-scrypt-of-psk, the proof is calculated using the following steps:

Let secret be the pre-shared secret.
Let N be 2 to the power of of the cost from the authentication-request message.
Let r be 8.
Let p be 1.
Let keyLength be 32.
Let scryptResult be the result of running scrypt on secret with cost parameter N, block size r, parallelization parameter p, and derived key length of keyLength.
Let hashFunction be sha-256.
Let salt be the salt from the authentication-request message.
Let info be a 64-byte array containing certificate fingerprint pair with the following values:

Bytes 0-31 of the array are the challenger’s fingerprint: The result of running sha-256 on the Distinguished Encoding Rules (DER) form (see https://tools.ietf.org/html/rfc8122#section-5) of the certificate used by the challenger in the QUIC crypto handshake during connection establishment.
Bytes 32-63 of the array are the responder’s fingerprint: The result of running sha-256 on the Distinguished Encoding Rules (DER) form (see https://tools.ietf.org/html/rfc8122#section-5) of the certificate used by the responder in the QUIC crypto handshake during connection establishment.

Let proof be the result of running \HKDF on scryptResult with both the extract and expand steps, hash function hashFunction, application-specific info, and output key length keyLength.

To verify that the responder’s proof is correct, the challenger makes the same calculation of the proof and compares the result. If the results are the same, the challenger considers the responder authenticated, and considers it unauthenticated otherwise.

Note: the values of 32 above (for salt length, keyLength) are based on the output size of sha-256. If a different hash mechanism is used in the future, these values should be updated as well.

7. Control Protocols

7.1. Presentation Protocol

This section defines the use of the Open Screen Protocol for starting, terminating, and controlling presentations as defined by Presentation API. §7.2 Presentation API defines how APIs in Presentation API map to the protocol messages defined in this section.

For all messages defined in this section, see Appendix A: Messages for the full CDDL definitions.

Add a capability that indicates support for the presentation protocol. <https://github.com/webscreens/openscreenprotocol/issues/123>

Refinements to Presentation API protocol. <https://github.com/webscreens/openscreenprotocol/issues/160>

To learn which receivers are available presentation displays for a particular URL or set of URLs, the controller may send a presentation-url-availability-request message with the following values:

urls: A list of presentation URLs. Must not be empty.
watch-duration: The period of time that the controller is interested in receiving updates about their URLs, should the availability change.
watch-id: An identifier the receiver must use when sending updates about URL availability so that the controller knows which URLs the receiver is referring to. The controller must choose a value that is unique across all presentation URL availability watches to the same receiver.

Watch ID Uniqueness. <https://github.com/webscreens/openscreenprotocol/issues/145>

In response, the receiver should send one presentation-url-availability-response message with the following values:

url-availabilities: A list of URL availability states. Each state must correspond to the matching URL from the request by list index.

While the watch is valid (the watch-duration has not expired), the receivers should send remote-playback-availability-event messages when URL availabilities change. Such events contain the following values:

watch-id: The watch-id given in the presentation-url-availability-response, used to refer to the presentation URLs whose availability has changed.
url-availabilities: A list of URL availability states. Each state must correspond to the URLs from the request referred to by the watch-id.

Note that these messages are not broadcasted to all controllers. They are sent individually to controllers that have requested availability for the URLs that have changed in availability state within the watch duration of the original availability request.

To save power, the controller may disconnect the QUIC connection and later reconnect to send availability requests and receive availability responses and updates. The QUIC connection ID may or may not be the same when reconnecting. Note that the lifetime of a watch-id is not limited to one QUIC connection. The receiver must continue sending updates for watches even if the QUIC connection changes, and thus controller need not send new URL availability requests if the QUIC connection changes.

To start a presentation, the controller may send a presentation-start-request message to the receiver with the following values:

presentation-id: The presentation identifier
url: The selected presentation URL
headers: headers that the receiver should use to fetch the presentationUrl. For example, section 6.6.1 of the Presentation API says that the Accept-Language header should be provided.

The presentation ID must follow the restrictions defined by section 6.1 of the Presentation API, in that it must consist of at least 16 ASCII characters.

When the receiver receives the presentation-start-request, it should send back a presentation-start-response message after either the presentation URL has been fetched and loaded, or the receiver has failed to do so. If it has failed, it must respond with the appropriate result (such as invalid-url or timeout). If it has succeeded, it must reply with a success result. Additionally, the response must include the following:

connection-id: An ID that both agents can use to send connection messages to each other. It is chosen by the receiver for ease of implementation: if the message receiver chooses the connection-id, it may keep the ID unique across connections, thus making message demuxing/routing easier.

To send a presentation message, the controller or receiver may send a presentation-connection-message with the following values:

connection-id: The ID from the presentation-start-response or presentation-connection-open-response messages.
message: The presentation message data.

To terminate a presentation, the controller may send a presentation-termination-request message with the following values:

presentation-id: The ID of the presentation to terminate.
reason: The reason the presentation is being terminated.

When a receiver receives a presentation-termination-request, it should send back a presentation-termination-response message to the requesting controller. It should also notify other controllers about the termination by sending a presentation-termination-event message. And it can send the same message if it terminates a presentation without a request from a controller to do so. This message contains the following values:

presentation-id: The ID of the presentation that was terminated.
reason: The reason the presentation was terminated.

To accept incoming connections requests from controller, a receiver must receive and process the presentation-connection-open-request message which contains the following values:

presentation-id: The ID of the presentation to connect to.
url: The URL of the presentation to connect to.

The receiver should, upon receipt of a presentation-connection-open-request message, send back a presentation-connection-open-response message which contains the following values:

result: a code indicating success or failure, and the reason for the failure
connection-id: An ID that both agents can use to send connection messages to each other. It is chosen by the receiver for ease of implementation (if the message receiver chooses the connection-id, it may keep the ID unique across connections, thus making message demuxing/routing easier).

A controller may terminate a connection without terminating the presentation by sending a presentation-connection-close-request message with the following values:

connection-id: The ID of the connection to close.

Is a Presentation close/terminate from a controller a request/response or event? <https://github.com/webscreens/openscreenprotocol/issues/124>

The receiver should, upon receipt of a presentation-connection-close-request, send back a presentation-connection-close-response message with the following values:

result: If the close succeed or failed, and if it failed why it failed.

Remove presentation-connection-close-response message. <https://github.com/webscreens/openscreenprotocol/issues/138>

The receiver may also close a connection without a request from the controller to do so and without terminating a presentation. If it does so, it should send a presentation-connection-close-event to the controller with the following values:

connection-id: The ID of the connection that was closed
reason: The reason the connection was closed
error-message: A debug message suitable for a log or perhaps presented to the user with more explanation as to why it was closed.

7.2. Presentation API

This section defines how the Presentation API uses the §7.1 Presentation Protocol.

When section 6.4.2 says "This list of presentation displays ... is populated based on an implementation specific discovery mechanism", the controlling user agent may use the mDNS, QUIC, agent-info-request, and presentation-url-availability-request messages defined previously in this spec to discover receivers.

When section 6.4.2 says "To further save power, ... implementation specific discovery of presentation displays can be resumed or suspended.", the controlling user agent may use the power saving mechanism defined in the previous section.

When section 6.3.4 says "Using an implementation specific mechanism, tell U to create a receiving browsing context with D, presentationUrl, and I as parameters.", U (the controlling user agent) may send a presentation-start-request message to D (the receiver), with I for the presentation identifier and presentationUrl for the selected presentation URL.

Once the Presentation API has text about reconnecting via an implementation specific mechanism, quote that here and map it to a message.

When section 6.5.2 says "Using an implementation specific mechanism, transmit the contents of messageOrData as the presentation message data and messageType as the presentation message type to the destination browsing context", the controlling user agent may send a presentation-connection-message with messageOrData for the presentation message data. Note that the messageType is embedded in the encoded CBOR type and does not need an additional value in the message.

When section 6.5.6 says "Send a termination request for the presentation to its receiving user agent using an implementation specific mechanism", the controlling user agent may send a presentation-termination-request message.

When section 6.7.1 says "it MUST listen to and accept incoming connection requests from a controlling browsing context using an implementation specific mechanism", the receiving user agent must receive and process the presentation-connection-open-request.

When section 6.7.1 says "Establish the connection between the controlling and receiving browsing contexts using an implementation specific mechanism.", the receiving user agent, must send a presentation-connection-open-response message.

7.3. Remote Playback Protocol

This section defines the use of the Open Screen Protocol for starting, terminating, and controlling remote playback of media as defined by the Remote Playback API. §7.5 Remote Playback API defines how APIs in Remote Playback API map to the protocol messages defined in this section.

For all messages defined in this section, see Appendix A for the full CDDL definitions.

Add a capability that indicates support for the remote playback protocol. <https://github.com/webscreens/openscreenprotocol/issues/123>

Make a required/default remote playback state table. <https://github.com/webscreens/openscreenprotocol/issues/148>

Refinements to Remote Playback protocol. <https://github.com/webscreens/openscreenprotocol/issues/159>

To learn which receivers are compatible remote playback devices (also called available remote playback devices) for a particular URL or set of URLs, the controller may send a remote-playback-availability-request message with the following values:

urls: A list of media resources. Must not be empty.

Remote Playback HTTP headers. <https://github.com/webscreens/openscreenprotocol/issues/146>

headers: headers that the receiver should use to fetch the urls. For example, section 6.2.4 of the Remote Playback API says that the Accept-Language header should be provided.
watch-duration: The period of time that the controller is interested in receiving updates about their URLs, should the availability change.
watch-id: An identifier the receiver must use when sending updates about URL availability so that the controller knows which URLs the receiver is referring to. The controller must choose a value that is unique across all remote playback availability watches to the same receiver.

In response, the receiver should send a remote-playback-availability-response message with the following values:

url-availabilities: A list of URL availability states. Each state must correspond to the matching URL from the request by list index.

The receivers should later (up to the current time plus request watch-duration) send remote-playback-availability-event messages if URL availabilities change. Such events contain the following values:

watch-id: The watch-id given in the remote-playback-url-availability-response, used to refer to the remote playback URLs whose availability has changed.
url-availabilities: A list of URL availability states. Each state must correspond to the URLs from the request referred to by the watch-id.

To start remote playback, the controller may send a remote-playback-start-request message to the receiver with the following values:

remote-playback-id: An identifier that uniquely identifies the remote playback from the controller to the receiver. It does not need to be unique across all remote playbacks from that controllers to all receivers nor unique across all remote playbacks from all controllers to that receivers.
urls: The media resources that the controller has selected for playback on the receiver.
text-track-urls: URLs of text tracks associated with the media resources.
controls: Initial controls for modifying the initial state of the remote playback, as defined in §7.4 Remote Playback State and Controls. The controller may send controls that are optional for the receiver to support before it knows the receiver supports them. If the receiver does not support them, it will ignore them and the controller will learn that it does not support them from the remote-playback-start-response message.

Remote playback ID uniqueness. <https://github.com/webscreens/openscreenprotocol/issues/147>

When the receiver receives a remote-playback-start-request message, it should send back a remote-playback-start-response message. It should do so quickly, usually before the media resource has been loaded and instead give updates of the progress of loading with remote-playback-state-event messages, unless the receiver decides to not attempt to load the resource at all. If it chooses not to, it must respond with the appropriate failure result (such as timeout or invalid-url). Additionally, the response must include the following:

state: The initial state of the remote playback, as defined in §7.4 Remote Playback State and Controls.

If the controller wishes to modify the state of the remote playback (for example, to pause, resume, skip, etc), it may send a remote-playback-modify-request message with the following values:

remote-playback-id: The ID of the remote playback to be modified.
controls: Updated controls as defined in {#remote-playback-state-and-controls}

When a receiver receives a remote-playback-modify-request it should send a remote-playback-modify-response message in reply with the following values:

state: The updated state of the remote playback as defined in §7.4 Remote Playback State and Controls.

When the state of remote playback changes without request for modification from the controller (such as when the skips or pauses due to user user interaction on the receiver), the receiver may send a remote-playback-state-event to the controller.

remote-playback-id: The ID of the remote playback whose state has changed.
state: The updated state of the remote playback, as defined in §7.4 Remote Playback State and Controls.

To terminate the remote playback, the controller may send a remote-playback-termination-request message with the following values:

remote-playback-id: The ID of the remote playback to terminate.
reason: The reason the remote playback is being terminated.

When a receiver receives a remote-playback-termination-request, it should send back a remote-playback-termination-response message to the controller.

If a receiver terminates a remote playback without a request from the controller to do so, it must send a remote-playback-termination-event message to the controller with the following values:

remote-playback-id: The ID of the remote playback that was terminated.
reason: The reason the remote playback was terminated.

As mentioned in Remote Playback API section 6.2.7, terminating the remote playback means the controller is no longer controlling the remote playback and does not necessarily stop media from rendering on the receiver. Whether or not the receiver stops rendering media depends upon the implementation of the receiver.

7.4. Remote Playback State and Controls

In order for the controller and receiver to stay in sync with regards to the state of the remote playback, the controller may send controls to modify the state (for example, via the remote-playback-modify-request message) and the receiver may send updates about state changes (for example, via the remote-playback-state-event message).

The controls sent by the controller include the following individual control values, each of which is optional. This allows the controller to change one control value or many control values at once without having to specify all control values every time. A non-present control value indicates no change. A present control value indicates the change defined below. These controls intentionally mirror settable attributes and methods of the HtmlMediaElement.

source: Change the media resource URL. See HtmlMediaElement.src for more details. Must not be used in the initial controls of the remote-playback-start-request message (which already contains a list of URLs).
preload: Set how aggressively to preload media. See HtmlMediaElement.preload for more details. Should only be used in the initial controls of the remote-playback-start-request message or when the source is changed. If not set in the initial controls, it is left to the receiver to decide. This is optional for the receiver to support and if not supported, the receiver will behave as though it were never set.
loop: Set whether or not to loop media. See HtmlMediaElement.loop for more details. Should only be used in the initial control of the remote-playback-start-request. If not set in the initial controls, it is assumed to be false.
paused: If true, pause; if false, resume. See HtmlMediaElement.pause(). and HtmlMediaElement.play() for more details. If not set in the initial controls, it is left to the receiver to decide.
muted: If true, mute; if false, unmute. See HtmlMediaElement.muted for more details. If not set in the initial controls, it is left to the receiver to decide.
volume: Set the audio volume in the range from 0.0 to 1.0 inclusive. See HtmlMediaElement.volume for more details. If not set in the initial controls, it is left to the receiver to decide.
seek: Seek to a precise time. See HtmlMediaElement.currentTime for more details.
fast-seek: Seek to an approximate time as fast as possible. See HtmlMediaElement.fastSeek() for more details.
playback-rate: Set the rate a which the media plays. See HtmlMediaElement.playbackRate for more details. If not set in the initial controls, it is left to the receiver to decide. This is optional for the receiver to support and if not supported, the receiver will behave as though it were never set.
poster: Set the URL of an image to show when video data is not available. See HtmlMediaElement.poster for more details. If not set in the initial controls, no poster is used and the receiver can choose what to render when video data is unavailable. This is optional for the receiver to support and if not supported, the receiver will behave as though it were never set.
enabled-audio-track-ids: Enable included audio tracks by ID and disable all other audio tracks. See HtmlMediaElement.audioTracks for more details.
select-video-track-id: Select the given video track by ID and unselect all other video tracks. See HtmlMediaElement.videoTracks for more details.
added-text-tracks: Add text tracks with the given kinds, labels, and languages. See HtmlMediaElement.addTextTrack for more details. This is optional for the receiver to support and if not supported, the receiver will behave as though it were never set.
changed-text-tracks: Change text tracks by ID. All other text tracks are left unchanged. Set the mode, add cues, and remove cues by id. See HtmlMediaElement.textTracks for more details. Note that future specifications or extensions to this specifications are expected to add new properties to the text-track-cue (such as text size, alignment, position, etc). Adding and removing cues is optional for the receiver to support and if not supported, the receiver will behave as though no cues were added or removed (both adding and removing are indicated via the support for "added-cues"). As specified in HtmlMediaElement.textTracks, if a cue ID is invalid (removing an un-added ID or adding an ID twice, for example), the receiver may reject the text track change.

Add a table for whether it’s required and what the default is.

The states sent by the receiver include the following individual state values, each of which is optional. This allows the receiver to update the controller about more than one state value at once without having to specify all state values every time. A non-present state value indicates the state has not changed.

supports: The controls the receiver supports. These may differ for different media resources and should not changes unless the media resource changes. The default is empty (support for nothing) for the initial state in the remote-playback-start-response message.
source: The current media resource URL. See HtmlMediaElement.currentSrc. Must be present in the initial state in the remote-playback-start-response message.
loading: The state of network activity for loading the media resource. See HtmlMediaElement.networkState. The default is empty (NETWORK_EMPTY) for the initial state in the remote-playback-start-response message.
loaded: The state of the loaded media (whether enough is loaded to play). See HtmlMediaElement.readyState. The default is nothing (HAVE_NOTHING) for the initial state in the remote-playback-start-response message.
error: A major error occurred which prevents the remote playback from continuing. See HtmlMediaElement.error and HtmlMediaElement media error codes. The default is no error for the initial state in the remote-playback-start-response message.
epoch: The "zero time" of the media timeline. See HtmlMediaElement’s timeline offset and HtmlMediaElement.getStartDate(). The default is an unknown epoch for the initial state in the remote-playback-start-response message.
duration: The duration of the media timeline. See HtmlMediaElement.duration. The default is an unknown duration for the initial state in the remote-playback-start-response message.
buffered-time-ranges: The time ranges for which media has been buffered. See HtmlMediaElement.buffered.
played-time-ranges: The time ranges reached by the playback position during normal playback. See HtmlMediaElement.played.
seekable-time-ranges: The time ranges for which media is seekable by the controller or the receiver. See HtmlMediaElement.seekable.
position: The playback position. See HtmlMediaElement’s official playback position and HtmlMediaElement.currentTime. The default is 0 for the initial state in the remote-playback-start-response message.
playbackRate: The current rate of playback on a scale where 1.0 is "normal speed". See HtmlMediaElement.playbackRate. The default is 1.0 for the initial state in the remote-playback-start-response message.
paused: Whether media is paused or not. See HtmlMediaElement.paused. The default is false for the initial state in the remote-playback-start-response message.
seeking: Whether the receiver is seeking or not. See HtmlMediaElement.seeking. The default is false for the initial state in the remote-playback-start-response message.
stalled: If true, media is not playing because not enough media is loaded, and false otherwise. See HtmlMediaElement.stalled. The default is false for the initial state in the remote-playback-start-response message.
ended: Whether media has reached the end or not. See HtmlMediaElement.ended. The default is false for the initial state in the remote-playback-start-response message.
volume: The current volume of playback on a scale of 0.0 to 1.0. See HtmlMediaElement.volume.
muted: True if audio is muted (overriding the volume value) and false otherwise. See HtmlMediaElement.muted.
resolution: The "intrinsic width" and "intrinsic width" of the video. See HtmlMediaElement.videoWidth and HtmlMediaElement.videoHeight.
audio-tracks: The available audio tracks, which can individually enabled or disabled. See HtmlMediaElement.audioTracks
video-tracks: The available video tracks. Only one may be selected. See HtmlMediaElement.videoTracks
text-tracks: The available text tracks, which can be individually shown, hidden, or disabled. See HtmlMediaElement.textTracks. The controller can also add cues to and remove cues from text tracks.

All times, time ranges, and durations (such as position, duration, and seekable-time-ranges) used above use a common media-time value (see Appendix A) which includes a time scale. This allows time values which work on different time scales to be expressed without loss of precision. The scale is represented in hertz, such as 90000 for 90000hz, a common time scale for video.

7.5. Remote Playback API

This section defines how the Remote Playback API uses the messages defined in §7.3 Remote Playback Protocol.

When section 6.2.1.2 says "This list contains remote playback devices and is populated based on an implementation specific discovery mechanism" and section 6.2.1.4 says "Retrieve available remote playback devices (using an implementation specific mechanism)", the user agent may use the mDNS, QUIC, agent-info-request, and remote-playback-availability messages defined previously in this spec to discover remote playback devices. The remote-playback-availability urls must contain the availability sources set.

When section 6.2.4 says "Request connection of remote to device. The implementation of this step is specific to the user agent." and "Synchronize the current media element state with the remote playback state", the user agent may send the remote-playback-start-request message to start remote playback. The remote-playback-start-request urls must contain the remote playback source. The current Remote Playback API only allows a single source, but the protocol allows for several and future versions of Remote Playback API may allow for several.

When section 6.2.4 says "The mechanism that is used to connect the user agent with the remote playback device and play the remote playback source is an implementation choice of the user agent. The connection will likely have to provide a two-way messaging abstraction capable of carrying media commands to the remote playback device and receiving media playback state in order to keep the media element state and remote playback state in sync", the user agent may send remote-playback-modify-request messages to change the remote playback state based on changes to the local media element and receive remote-playback-modify-response and remote-playback-state-event messages to change the local media element based on changes to the remote playback state.

Algorithm for what messages to send when local/remote media element changes. <https://github.com/webscreens/openscreenprotocol/issues/158>

When section 6.2.7 says "Request disconnection of remote from the device. The implementation of this step is specific to the user agent.", the controlling user agent may send the remote-playback-termination-request message.

8. Streaming Protocol

This section defines the use of the Open Screen Protocol for streaming media from a media sender to a media receiver.

8.1. Capabilities

If the advertiser is already authenticated, the requester has the ability to request additional information by sending an streaming-capabilities-request message, and receive back a streaming-capabilities-response message with the following properties:

receive-audio (required): A list of capabilities for receiving audio. For an explanation of fields, see below.
receive-video (required): A list of capabilities for receiving video. For an explanation of fields, see below.
receive-data (required): A list of arbitrary data formats the device supports for receiving data.

The format type is used as the basis for audio, video, and data capabilities. Formats are composed of the following properties:

name (required): The name of the format. Expected values include "vp8", "h264", "opus."
parameters (required): A list of (key, value) parameters that can be used to pass fields that are properties of a specific format, and not shared by other formats of that type (audio, video, etc.).

Audio capabilities are composed of the above format type, with the following additional fields:

max-audio-channels (optional): An optional field indicating the maximum amount of audio channels the receiver is capable of supporting. Default value is "2," meaning a stereo speaker channel setup.
min-bit-rate (optional): An optional field indicating the minimum audio bit rate that the receiver can handle, in kilobits per second. Default is no minimum.

Video capabilities are similarly composed of the above format type, with the following additional fields:

max-resolution (optional): An optional field indicating the maximum video-resolution (width, height) that the receiver is capable of processing. Default is no maximum.
max-frames-per-second (optional): An optional field indicating the maximum frames-per-second the receiver is capable of processing. Default is no maximum.
max-pixels-per-second (optional): An optional field indicating the maximum pixels-per-second the receiver is capable of processing, in pixels per second. Default is no maximum.
min-video-bit-rate (optional): An optional field indicating the minimum video bit rate the device is capable of processing, in kilobits per second. Default is no minimum.
aspect-ratio (optional): An optional field indicating what its ideal aspect ratio is, e.g. a 16:10 display could return this value as 1.6 to indicate its preferred content scaling. Default is none.
color-profiles (optional): An optional field indicating what color profiles are understood. The listener may use these values to determine how to encode video. Some examples include: sRGBv4, Rec709, DciP3. The default value is sRGBv4.
native-resolutions (optional): An optional field indicating what video-resolutions the receiver supports and considers to be "native," meaning that scaling is not required. The default value is none.
supports-scaling (optional): A optional boolean field indicating whether the receiver can scale content provided in a video-resolution not listed in the native-resolutions list (if provided) or of a different aspect ratio. The default value is true.

8.2. Sessions

TODO

8.3. Audio

Senders may send audio to receivers by sending audio-frame messages (see Appendix A: Messages) with the following keys and values. An audio frame message contains a set of encoded audio samples for a range of time. A series of encoded audio frames that share a codec, codec parameters and a timeline form an audio encoding.

Unlike most Open Screen Protocol messages, this one uses an array-based grouping rather than a struct-based grouping. For required fields, this allows for a more efficient use of bytes on the wire, which is important for streaming audio because the payload is typically so small and every byte of overhead is relatively large. In order to accomodate optional values in the array-based grouping, one optional field in the array is used to hold all optional values in a struct-based grouping. This will hopefully provide a good balance of efficiency and flexibility.

To allow for audio frames to be sent out of order, they should be sent in separate QUIC streams.

encoding-id: Identifies the media encoding to which this audio frame belongs. This can be used to reference properties of the encoding (from the audio-encoding-offer message) such as the codec, codec properties, time scale (aka clock rate or sample rate), and default duration. Referencing properties of the encoding through the encoding id helps to avoid sending duplicate information in every frame.
start-time: Identifies the beginning of the time range of the audio frame. The time scale is inferred from the properties of the encoding (from the audio-encoding-offer). The end time can be inferred from the start time and duration.
duration: If present, the duration of the audio frame. The time scale is inferred from the properties of the encoding. Likewise, if not present, the duration is inferred from the properties of the encoding.
sync-time: If present, a time used to synchronize the start time of this audio frame (and thus, this encoding) with that of other media encodings on different timelines. It may be wall clock time, but it need not be; it can be any clock chosen by the sender.
payload: The data. The type of data is inferred from the properties of the encoding.

8.4. Video

Senders may send video to receivers by sending video-frame messages (see Appendix A: Messages) with the following keys and values. A video frame message contains an encoded video frame (an encoded image) at a specific point in time or over a specfic time range (if the duration is known). A series of encoded video frames that share a codec, codec parameters and a timeline form a video encoding.

To allow for video frames to be sent out of order, they may be sent in separate QUIC streams. If the encoding is a long chain of encoded video frames dependent on the previous one back until an independent frame, it may make sense to send them in a single QUIC stream starting at the indepdendent frame and ending at the last dependent frame.

encoding-id: Identifies the media encoding to which this video frame belongs. This can be used to reference properties of the encoding such as the codec, codec properties, time scale, and default rotation. Referencing properties of the encoding through the encoding id helps to avoid sending duplicate information in every frame.
sequence-number: Identifies the frame and its order in the encoding. Within an encoding, larger sequence numbers mean later start times. Within an encoding, gaps in sequence numbers mean frames are missing.
depends-on: If present, the sequence numbers of the frames this frame depends on. If a sequence numbers is negative, it is treated as a relative sequence numbers and the sequence numbers is calculated by adding it to the sequence number of this frame. If empty, this is an independent frame (a key frame). If not present, the default value is [-1].
start-time: Identifies the beginning of the time range of the video frame. The time scale is inferred from the properties of the encoding (from the video-encoding-offer). The end time can be inferred from the start time and duration.
duration: If present, the duration of the video frame. The time scale is inferred from the properties of the encoding. If not present, that means duration is unknown.
sync-time: If present, a time used to synchronize the start time of this frame (and thus, this encoding) with that of other media encodings on different timelines.
rotation: If present, indicates how the frame should be rotated after decoding but before rendering. Rotation is clockwise in increments of 90 degrees. The default is 0 (no rotation).
payload: The encoded video frame (encoded image). The codec and codec parameters are inferred from the properties of the encoding.

8.5. Data

Senders may send timed data to receivers by sending data-frame messages (see Appendix A: Messages) with the following keys and values. A data frame message contains an arbitrary payload that can be synchronized with and video, such as text track data. A series of data frames that share a data type and timeline form a data encoding.

To allow for data frames to be sent out of order, they may be sent in separate QUIC streams, but more than one data frame may be sent in one QUIC stream if that makes sense for a specific type of data.

Text track data uses a payload type of text, a default duration of unknown, and a timescale of 1000000 (microseconds).

encoding-id: Identifies the media encoding to which this data frame belongs. This can be used to reference properties of the encoding such as the type of data and time scale. Referencing properties of the encoding through the encoding id helps to avoid sending duplicate information in every frame.
sequence-number: Identifies the frame and its order in the encoding. Within an encoding, larger sequence numbers mean later start times. Within an encoding, gaps in sequence numbers mean frames are missing.
start-time: Identifies the beginning of the time range of the data frame. The time scale is inferred from the properties of the encoding. The end time can be inferred from the start time and duration.
duration: If present, the duration of the data frame. The time scale is inferred from the properties of the encoding. Likewise, if not present, the duration is inferred from the properties of the encoding.
sync-time: If present, a time used to synchronize the start time of this audio frame (and thus, this encoding) with that of other media encodings on different timelines.
payload: The data. The format and parameters are inferred from the properties of the encoding.

8.6. Feedback

The receiver can send feedback to the sender, such as key frame requests.

A video key frame is requested by sending a video-request message with the following keys and values.

To allow for video frames to be sent out of order, they may be sent in separate QUIC streams.

encoding-id: The encoding for which the sender should send a new key frame.
sequence-number: Gives the order in the encoding. Within an encoding, larger sequence numbers invalidate previous ones. A sender may ignore smaller sequence numbers after a larger one has been processed. This it to prevent out-of-order requests from generating more key frames than necessary.
highest-decoded-frame-sequence-number: uint: If set, the sender may generate a video frame dependent on the last decoded frame. If not set, the sender must generate an indepdendent (key) frame.

8.7. Stats

TODO

9. Security and Privacy

The Open Screen Protocol allows two networked agents to discover each other and exchange user and application data. As such, its security and privacy considerations should be closely examined. We first evaluate the protocol itself using the W3C Security and Privacy Questionnaire. We then examine whether the security and privacy guidelines recommended by the Presentation API and the Remote Playback API are met. Finally we discuss recommended mitigations that agents can use to meet these security and privacy requirements.

9.1. Threat Models

9.1.1. Passive Network Attackers

The Open Screen Protocol should assume that all parties that are connected to the same LAN, either through a wired connection or through WiFi, are able to observe all data flowing between Open Screen Protocol agents.

These parties will be able collect any data exposed through unencrypted messages, such as mDNS records and the QUIC handshakes.

These parties may attempt to learn cryptographic parameters by observing data flows on the QUIC connection, or by observing cryptographic timing.

9.1.2. Active Network Attackers

Active attackers, such as compromised routers, will be able to manipulate data exchanged between agents. They can inject traffic into existing QUIC connections and attempt to initiate new QUIC connections. These abilities can be used to attempt the following:

Impersonate an agent or one already trusted by the user, in an attempt to convince the user to authenticate to it.
Connect to an agent and query its capabilities.
Connect to and control a presentation or remote playback, or extract data from the application state of the presentation or remote playback.

One particular attack of concern is misconfigured or compromised routers that expose local network devices (such as Open Screen Protocol agents) to the Internet. This vector of attack has been used by malicious parties to take control of printers and smart TVs by connecting to local network services that would normally be inaccessible from the Internet.

9.1.3. Denial of Service

Parties with connected to the LAN may attempt to deny access to Open Screen Protocol agents. For example, an attacker my attempt to open a large number of QUIC connections to an agent in an attempt to block legitimate connections or exhaust the agent’s system resources. They may also multicast spurious DNS-SD records in an attempt to exhaust the cache capacity for mDNS listeners, or to get listeners to open a large number of bogus QUIC connections.

9.1.4. Same-Origin Policy Violations

The Presentation API allows cross-origin communication between controlling pages and presentations with the consent of each origin (through their use of the API). This is similar to cross-origin communication via postMessage() with a targetOrigin of *. However, the Presentation API does not convey source origin information with each message. Therefore, the Open Screen Protocol does not convey origin information between its agents.

The presentation ID carries some protection against unrestricted cross-origin access; but, rigorous authentication of the parties connected by a PresentationConnection must be done at the application level.

9.2. Open Screen Protocol Security and Privacy Considerations

9.2.1. Personally Identifiable Information & High-Value Data

The following data exchanged by the protocol can be personally identifiable and/or high value data:

Presentation URLs and availability results
Presentation IDs
Presentation connection IDs
Presentation connection messages
Remote playback URLs
Remote playback commands and status messages

Presentation IDs are considered high value data because they can be used in conjunction with a Presentation URL to connect to a running presentation.

Presentation display friendly names, model names, and capabilities, while not considered personally identifiable, are important to protect to prevent an attacker from changing them or substituting other values during the discovery and authentication process.

The following data cannot be reasonably made confidential and should be considered public and untrusted data:

IP addresses and ports used by the Open Screen Protocol.
Data advertised through mDNS, including the display name prefix, the certificate fingerprint, and the metadata version.

9.2.2. Cross Origin State Considerations

Access to origin state across browsing sessions is possible through the Presentation API by reconnecting to a presentation that was started by a previous session. This scenario is addressed in Presentation API §cross-origin-access.

Presentation display availability and remote playback device availability are states that are available cross-origin depending on the user’s network context. Exposure of this data to the Web is also discussed in Presentation API §personally-identifiable-information and Remote Playback API §personally-identifiable-information.

9.2.3. Origin Access to Other Devices

By design, the Open Screen Protocol allows access to presentation displays and remote playback devices from the Web. By implementing the protocol, these devices are knowingly making themselves available to the Web and should be designed accordingly.

Below, we discuss mitigation steps to prevent malicious use of these devices.

9.2.4. Incognito Mode

The Open Screen Protocol does not distinguish between the user agent’s normal browsing and incognito modes, and agents that follow the specification behave identically regardless of which mode is in use.

It’s recommended that user agents use separate authentication contexts and QUIC connections for normal and incognito profiles from the same user agent instance. This prevents Open Screen agents from correlating activity among profiles belonging to the same user (both normal and incognito).

9.2.5. Persistent State

An agent is likely to persist the identity of agents that have successfully completed §6 Authentication. This may include the public key fingerprints, metadata versions, and metadata for those parties.

However, this data is not normally exposed to the Web, only through the native UI of the user agent during the display selection or display authentication process. It can be an implementation choice whether the user agent clears or retains this data when the user clears browsing data.

Fate of metadata / authentication history when clearing browsing data. <https://github.com/webscreens/openscreenprotocol/issues/132>

9.2.6. Other Considerations

The Open Screen Protocol does not grant to the Web additional access to the following:

New script loading mechanisms
Access to the user’s location
Access to device sensors
Access to the user’s local computing environment
Control over the user agent’s native UI
Security characteristics of the user agent

9.3. Presentation API Considerations

Presentation API §security-and-privacy-considerations place these requirements on the Open Screen Protocol:

Presentation URLs and presentation IDs should remain private among the parties that are allowed to connect to a presentation, per the cross-origin access guidelines.
Controllers and receivers should be notified when connections representing multiple user agent profiles have been made to a presentation, per the user interface guidelines.
Messaging between controllers and receivers should be authenticated and confidential, per the guidelines for messaging between presentation connections.

The Open Screen Protocol addresses these considerations by:

Requiring mutual authentication and a TLS-secured QUIC connection before presentation URLs, IDs, or messages are exchanged.
Adding explicit messages and connection IDs for individual PresentationConnections so that agents can track the number of active connections.

Notify endpoints when new connection is created. <https://github.com/webscreens/openscreenprotocol/issues/143>

9.4. Remote Playback API Considerations

The Remote Playback API §security-and-privacy-considerations also state that messaging between local and remote playback devices should also be authenticated and confidential.

This consideration is handled by requiring mutual authentication and a TLS-secured QUIC connection before any remote playback related messages are exchanged.

9.5. Mitigation Strategies

9.5.1. Local passive network attackers

Local passive attackers may attempt to harvest data about user activities and device capabilities using the Open Screen Protocol. The main strategy to address this is data minimization, by only exposing opaque public key fingerprints before user-mediated authentication takes place.

Passive attackers may also attempt timing attacks to learn the cryptographic parameters of the TLS 1.3 QUIC connection.

Review attack and mitigation considerations for TLS 1.3 <https://github.com/webscreens/openscreenprotocol/issues/130>

9.5.2. Local active network attackers

Local active attackers may attempt to impersonate a presentation display the user would normally trust. The §6 Authentication step of the Open Screen Protocol prevents a man-in-the-middle from impersonating an agent, without knowledge of a shared secret. However, it is possible for an attacker to impersonate an existing, trusted display or a newly discovered display that is not yet authenticated and try to convince the user to authenticate it.

This can be addressed through a combination of techniques. The first is detecting and flagging attempts at impersonation; a few of the situations that should be flagged include:

Untrusted agents whose public key fingerprint collides with that from an already-trusted agent that is concurrently being advertised.
Untrusted agents whose friendly name differs from the one previously advertised under a given public key fingerprint.
Untrusted agents that fail the authentication challenge a certain number of times.
Untrusted agents that advertise a friendly name that is similar to that from an already-trusted agent.
Already-trusted agents whose metadata provided through the agent-info message has changed.

Flagging means that the user is notified of the attempt at impersonation. In the last case, the user should be required to re-authenticate to the already-trusted agent to verify its identity.

UI guidelines for pairing and trusted/untrusted data. <https://github.com/webscreens/openscreenprotocol/issues/118>

The second is through management of the low-entropy secret during mutual authentication:

Rotate the low-entropy secret to prevent brute force attacks.
Use an increasing backoff to respond to authentication challenges, also to prevent brute force attacks.
Use a cryptographically sound source of entropy to generate the shared secret.
Require the end user to manually type the shared secret - shown only on the display - to prevent the user from blindly clicking through this step.

The active attacker may also attempt to disrupt data exchanged over the QUIC connection by injecting or modifying traffic. These attacks should be mitigated by a correct implementation of TLS 1.3.

Review attack and mitigation considerations for TLS 1.3 <https://github.com/webscreens/openscreenprotocol/issues/130>

9.5.3. Remote active network attackers

Unfortunately, we cannot rely on network devices to fully protect Open Screen Protocol agents, because a misconfigured firewall or NAT could expose a LAN-connected agent to the broader Internet. Open Screen Protocol agents should be secure against attack from any Internet host.

Mitigations for remote network attackers. <https://github.com/webscreens/openscreenprotocol/issues/131>

9.5.4. Denial of service

It will be difficult to completely prevent denial service of attacks that originate on the user’s local area network. Open Screen Protocol agents can refuse new connections, close connections that receive too many messages, or limit the number of mDNS records cached from a specific responder in an attempt to allow existing activities to continue in spite of such an attack.

9.5.5. Malicious input

Open Screen Protocol agents should be robust against malicious input that attempts to compromise the target device by exploiting parsing vulnerabilities.

CBOR is intended to be less vulnerable to such attacks relative to alternatives like JSON and XML. Still, agents should be thoroughly tested using approaches like fuzz testing.

Where possible, Open Screen Protocol agents (including the content rendering components) should use defense-in-depth techniques like sandboxing to prevent vulnerabilities from gaining access to user data or leading to persistent exploits.

Appendix A: Messages

The following messages are defined with [CDDL]. When integer keys are used, a comment is appended to the line to indicate the name of the field. Object definitions in this specification have this unusual syntax to reduce the number of bytes-on-the-wire, while maintaining a human-readable name for each key. Integer keys are used instead of object arrays to allow for easy indexing of optional fields.

Each root message (one that can be put into a QUIC stream without being enclosed by another message) has a comment indicating the message type key.

Smaller numbers should be reserved for message that will be sent more frequently or are very small or both and larger numbers should be reserved for messages that are infrequently sent or large or both because smaller type keys encode on the wire smaller.