Media Capture and Streams Extensions

The existing enumerateDevices () function exposes camera and microphone label s to let applications build in-content user interfaces for camera and microphone selection. Applications have had to do this because getUserMedia () did not offer a web compatible in-agent device picker. This specification aims to rectify that.

Due to the significant fingerprinting vector caused by device label s, and the well-established nature of the existing APIs, the scope of this particular effort is limited to removing label , leaving the overall constraints-based model intact. This helps ensure a migration path more viable than to a less-powerful API.

This specification augments the existing getUserMedia () function instead of introducing a new less-powerful API to compete with it, for that reason as well.

This specification introduces slightly altered semantics to the getUserMedia () function called "user-chooses" that guarantee a picker will be shown to the user in cases where the user agent would otherwise choose for the user (that is: when application constraints do not narrow down the choices to a single device). This is orthoginal to permission, and offers a better and more consistent user experience across applications and user agents.

Unfortunately, since the "user-chooses" semantics may produce user agent prompts at different times and in different situations compared to the old semantics, they are somewhat incompatible with expectations in some existing web applications that tend to call getUserMedia () repeatedly and lazily instead of using e.g. stream.clone().

User agents are encouraged to provide the new semantics as opt-in initially for web compatibility. User agents MUST deprecate (remove) label from MediaDeviceInfo over time, though specific migration strategies are left to user agents. User agents SHOULD migrate to offering the new semantics by default (opt-out) over time.

Since the constraints-model remains intact, web compatibility problems are expected to be limited to:

Sites that never migrated show e.g. "Camera 1", "Camera 2" etc. instead of descriptive device labels
Sites with no device management strategy provoke a picker in the user agent every visit for users with more than a singular choice of camera or microphone (a feature of sorts)

WebIDLpartial interface MediaDevices {
  readonly attribute GetUserMediaSemantics defaultSemantics;
};

defaultSemantics of type GetUserMediaSemantics , readonly

The default semantics of getUserMedia () in this user agent.

User agents SHOULD default to "browser-chooses" for backwards compatibility, until a transition plan has been enacted where a majority of user agents collectively switch their defaults to "user-chooses" for improved user privacy, and usage metrics suggest this transition is feasible without major breakage.

WebIDLpartial dictionary MediaStreamConstraints {
  GetUserMediaSemantics semantics;
};

semantics of type GetUserMediaSemantics: In cases where the specified constraints do not narrow multiple choices between devices down to one per kind, specifies how the final determination of which devices to pick from the remaining choices MUST be made. If not specified, then the defaultSemantics are used.

WebIDLenum GetUserMediaSemantics {
  "browser-chooses",
  "user-chooses"
};

`GetUserMediaSemantics` Enumeration description
`browser-chooses`	When application-specified constraints do not narrow multiple choices between devices down to one per kind, the user agent is allowed to make the final determination between the remaining choices.
`user-chooses`	When application-specified constraints do not narrow multiple choices between devices down to one per kind, the user agent MUST prompt the user to choose between the remaining choices, even if the application already has permission to some or all of them.

When the getUserMedia () method is invoked, run the following steps before invoking the getUserMedia () algorithm:

Let mediaDevices be the object on which this method was invoked.
Let constraints be the method's first argument.
Let semanticsPresent be true if constraints .semantics exists , otherwise false.
Let semantics be constraints .semantics if present , or the value of mediaDevices . defaultSemantics otherwise.
Replace step 6.5.1. of the getUserMedia () algorithm in its entirety with the following two steps:
1. Let descriptor be a PermissionDescriptor with its name member set to the permission name associated with kind (e.g. " camera " for "video", " microphone " for "audio" ), and, optionally, consider its deviceId member set to any appropriate device's deviceId.
2. If the number of unique devices sourcing tracks of media type kind in candidateSet is greater than 1 and semantics is "user-chooses", then prompt the user to choose a device with descriptor, resulting in provided media. Otherwise, request permission to use a device with descriptor, while considering all devices being attached to a live and same-permission MediaStreamTrack in the current browsing context to mean having permission status " granted ", resulting in provided media.
  
  Same-permission in this context means a MediaStreamTrack that required the same level of permission to obtain as what is being requested.
  
  When asking the user’s permission, the user agent MUST disclose whether permission will be granted only to the device chosen, or to all devices of that kind.
  
  Let track be the provided media, which MUST be precisely one track of type kind from finalSet. If semantics is "browser-chooses" then the decision of which track to choose from finalSet is up to the User Agent, which MAY use the value of the computed "fitness distance" from the SelectSettings algorithm, the value of semanticsPresent, or any other internally-available information about the devices, as inputs to its decision. If semantics is "user-chooses", and the application has not narrowed down the choices to one, then the user agent MUST ask the user to make the final selection.
  
  Once selected, the source of the MediaStreamTrack MUST NOT change.
  
  User Agents are encouraged to default to or present a default choice based primarily on fitness distance, and secondarily on the user's primary or system default device for kind (when possible). User Agents MAY allow users to use any media source, including pre-recorded media files.

This example shows a setup with a start button and a camera selector using the new semantics (microphone is not shown for brievity but is equivalent).

Example 1

<button id="start">Start</button>
<button id="chosenCamera" disabled>Camera: none</button>
<script>
let cameraTrack = null;
start.onclick = async () => {
  try {
    const stream = await navigator.mediaDevices.getUserMedia({
      video: {deviceId: localStorage.cameraId}
    });
    setCameraTrack(stream.getVideoTracks()[0]);
  } catch (err) {
    console.error(err);
  }
}
chosenCamera.onclick = async () => {
  try {
    const stream = await navigator.mediaDevices.getUserMedia({
      video: true,
      semantics: "user-chooses"
    });
    setCameraTrack(stream.getVideoTracks()[0]);
  } catch (err) {
    console.error(err);
  }
}
function setCameraTrack(track) {
  cameraTrack = track;
  const {deviceId, label} = track.getSettings();
  localStorage.cameraId = deviceId;
  chosenCamera.innerText = `Camera: ${label}`;
  chosenCamera.disabled = false;
}


</

script

>

A MediaStreamTrack is a transferable object . This allows manipulating real-time media outside the context it was requested or created in, for instance in workers or third-party iframes.

To preserve the existing privacy and security infrastructure, in particular for capture tracks, the track source lifetime management remains tied to the context that created it. The transfer algorithm MUST ensure the following behaviors:

The context named originalContext that created a track named originalTrack remains in control of the originalTrack source, named trackSource, even when originalTrack is transferred into transferredTrack.
In particular, originalContext remains the proxy to privacy indicators of trackSource. transferredTrack or any of its clones are considered as tracks using trackSource as if they were tracks created in and controlled by originalContext.
When originalContext goes away, trackSource gets ended, thus transferredTrack gets ended.
When originalContext would have muted/unmuted originalTrack, transferredTrack gets muted/unmuted.
If transferredTrack is cloned in transferredTrackClone, transferredTrackClone is tied to trackSource. It is not tied to originalTrack in any way.
If transferredTrack is transferred into transferredAgainTrack, transferredAgainTrack is tied to trackSource. It is not tied to transferredTrack or originalTrack in any way.

The WebIDL changes to make the track transferable are the following:

WebIDL[Exposed=(Window,Worker), Transferable]
partial interface MediaStreamTrack {
};

At creation of a MediaStreamTrack object, called track, run the following steps:

Initialize track. [[IsDetached]] to false.

The MediaStreamTrack transfer steps , given value and dataHolder, are:

If value. [[IsDetached]] is true, throw a "DataCloneError" DOMException.
Set dataHolder. [[id]] to value. id .
Set dataHolder. [[kind]] to value. kind .
Set dataHolder. [[label]] to value. label .
Set dataHolder. [[readyState]] to value. readyState .
Set dataHolder. [[enabled]] to value. enabled .
Set dataHolder. [[muted]] to value. muted .
Set dataHolder. [[source]] to value underlying source.
Set dataHolder. [[constraints]] to value active constraints.
Set value. [[IsDetached]] to true.
Set value. [[ReadyState]] to "ended" (without stopping the underlying source or firing an ended event).

MediaStreamTrack transfer-receiving steps , given dataHolder and track, are:

Initialize track. id to dataHolder. [[id]].
Initialize track. kind to dataHolder. [[kind]].
Initialize track. label to dataHolder. [[label]].
Initialize track. readyState to dataHolder. [[readyState]].
Initialize track. enabled to dataHolder. [[enabled]].
Initialize track. muted to dataHolder. [[muted]].
Initialize the underlying source of track to dataHolder. [[source]].
Set track 's constraints to dataHolder. [[constraints]].

The underlying source is supposed to be kept alive between the transfer and transfer-receiving steps, or as long as the data holder is alive. In a sense, between these steps, the data holder is attached to the underlying source as if it was a track.

On camera and screenshare tracks, frame counters allow the application to tell what the frame rate is, which may be lower than the target frameRate . For example, if the track is sourced from a camera then the production of frames could be slowed down if it's dark or frames could be dropped if the system is CPU starved. This could impact the total number of frames produced by the source and impact how many frames are delivered, discarded or dropped for other reasons.

WebIDLpartial interface MediaStreamTrack {
  Promise<MediaTrackFrameStats> getFrameStats();
};

If a MediaStreamTrack is sourced from getUserMedia() or getDisplayMedia(), the user agent is required to count each frame from its source as follows:

A frame is considered delivered if it either was delivered to a sink or would have been delivered to a sink, if one was connected. This is a subset of total frames and it is incremented at the same time as total frames .
A frame is considered discarded if it was discarded in order to achieve the target frameRate . This is a subset of total frames and it is incremented at the same time as total frames .
The total number of frames that have been processed by this source, meaning it is known whether the frame was considered delivered, discarded or dropped for any other reason. The number of dropped frames for various unknown reasons can be calculated by subtracting delivered frames and discarded frames from total frames .

Note

If the track is unmuted and enabled and the source is backed by a camera, total frames is incremented by frames produced by the camera. If no frames are flowing, such as if the track is muted or disabled, then total frames does not increment.

getFrameStats

When this method is called, the user agenst MUST run the following steps:

Let track be the MediaStreamTrack that this method was called on.
If track is not sourced from getUserMedia() or getDisplayMedia(), reject this method with NotSupportedError and abort these steps.
Let p be a new promise. Begin running the following steps in parallel and return p:

Queue a task to resolve p with a newly constructed MediaTrackFrameStats dictionary where timestamp is set to Performance.timeOrigin + Performance.now(), deliveredFrames is set the total number of delivered frames , discardedFrames is set to the total number of discarded frames and totalFrames is set to the total frames count.

WebIDLdictionary MediaTrackFrameStats {
  DOMHighResTimeStamp timestamp;
  unsigned long long deliveredFrames;
  unsigned long long discardedFrames;
  unsigned long long totalFrames;
};

timestamp of type DOMHighResTimeStamp: The timestamp , relative to the UNIX epoch (Jan 1, 1970, UTC), for when these stats where collected.
deliveredFrames of type unsigned long long: The total number of delivered frames .
discardedFrames of type unsigned long long: The total number of discarded frames .
totalFrames of type unsigned long long: The total frames count.

WebIDLpartial dictionary MediaTrackSupportedConstraints {
  boolean powerEfficientPixelFormat = true;
};

powerEfficientPixelFormat of type boolean , defaulting to true: See powerEfficientPixelFormat for details.

WebIDLpartial dictionary MediaTrackCapabilities {
  sequence<boolean> powerEfficientPixelFormat;
};

powerEfficientPixelFormat of type sequence< boolean >: If the source only has power efficient pixel formats, a single true is reported. If the source only has power inefficient pixel formats, a single false is reported. If the script can control the feature, the source reports a list with both true and false as possible values. See powerEfficientPixelFormat for additional details.

WebIDLpartial dictionary MediaTrackSettings {
  boolean powerEfficientPixelFormat;
};

powerEfficientPixelFormat of type boolean: See powerEfficientPixelFormat for details.

The constrainable properties in this document are defined below.

Property Name Values Notes

Property Name	Values	Notes
powerEfficientPixelFormat	`ConstrainBoolean`	Compressed pixel formats often need to be decoded, for instance for display purposes or when being encoded during a video call. The user agent SHOULD label compressed pixel formats that incur significant power penalty when decoded as power inefficient. The labeling is up to the user agent, but decoding MJPEG in software is an example of an expensive mode. Pixel formats that have not been labeled power inefficient by the user agent are for the purpose of this API considered power efficient. As a constraint, setting it to true allows filtering out inefficient pixel formats and setting it to false allows filtering out efficient pixel formats. As a setting, this reflects whether or not the current pixel format is considered power efficient by the user agent.

powerEfficientPixelFormat


ConstrainBoolean

Compressed pixel formats often need to be decoded, for instance for display purposes or when being encoded during a video call. The user agent SHOULD label compressed pixel formats that incur significant power penalty when decoded as power inefficient. The labeling is up to the user agent, but decoding MJPEG in software is an example of an expensive mode. Pixel formats that have not been labeled power inefficient by the user agent are for the purpose of this API considered power efficient.

As a constraint, setting it to true allows filtering out inefficient pixel formats and setting it to false allows filtering out efficient pixel formats.

As a setting, this reflects whether or not the current pixel format is considered power efficient by the user agent.

The configuration ~~(capabilities, constraints or~~ (capabilities and settings) of a MediaStreamTrack may be changed dynamically outside the control of web applications. One example is when a user decides to switch on background blur through the operating system. Web applications might want to know that the configuration of a particular MediaStreamTrack has changed. For that purpose, a new event is defined below.

~~attribute~~

WebIDLpartial interface MediaStreamTrack {
  attribute EventHandler onconfigurationchange;

};

The onconfigurationchange attribute is an event handler IDL attribute for the onconfigurationchange event handler , whose event handler event type is configurationchange .

When the User Agent detects a change of configuration in a track 's underlying source, the User Agent MUST run the following steps:

If track. muted is true, wait for track. muted to become false or track. readyState to be "ended".
Queue a task on current settings object 's responsible event loop to perform the following steps:

Note

This task will run before any other task that may set track. muted to true.
1. If track. readyState is "ended", abort these steps.
2. If track 's ~~capabilities, constraints~~ capabilities and settings are matching source configuration, abort these steps.
3. Update track 's ~~capabilities, constraints~~ capabilities and settings according track 's underlying source.
4. Fire an event named configurationchange on track.

Note

These events are potentially triggered simultaneously on documents of different origins. User Agents MAY add fuzzing on the timing of events to avoid cross-origin activity correlation.

This example shows how to monitor external background blur changes.

Example 2

const stream = await navigator.mediaDevices.getUserMedia({video: true});const [track] = stream.getVideoTracks();let {backgroundBlur} = track.getSettings();
applyBlurInSoftwareInstead(!backgroundBlur);
track.addEventListener("configurationchange", () => {
  if (backgroundBlur != track.getSettings().backgroundBlur) {
    backgroundBlur = track.getSettings().backgroundBlur;
    applyBlurInSoftwareInstead(!backgroundBlur);
  }
});

Media Capture and Streams Extensions

Abstract

Status of This Document

1. Introduction

2. Terminology

3. Conformance

4. In-browser camera and microphone picker

4.1 getUserMedia "user-chooses" semantics

4.2 Web compatibility and migration

4.3 MediaDevices Interface Extensions

4.3.1 Attributes

4.4 MediaStreamConstraints dictionary extensions

Dictionary `MediaStreamConstraints` Members

4.5 GetUserMediaSemantics enum

4.6 Algorithms

4.7 Examples

5. MediaStreamTrack extensions

5.1 Transferable MediaStreamTrack

5.2 MediaStreamTrack frame stats

5.2.1 Methods

5.2.2 The MediaTrackFrameStats dictionary

Dictionary `MediaTrackFrameStats` Members

6. The powerEfficientPixelFormat constraint

6.1 MediaTrackSupportedConstraints dictionary extensions

Dictionary `MediaTrackSupportedConstraints` Members

6.2 MediaTrackCapabilities dictionary extensions

Dictionary `MediaTrackCapabilities` Members

6.3 MediaTrackSettings dictionary extensions

Dictionary `MediaTrackSettings` Members

6.4 Constrainable Properties

7. Exposing MediaStreamTrack source background blur support

8. VoiceIsolation constraint

Processing considerations

9. Exposing change of MediaStreamTrack configuration

9.1 MediaStreamTrack Interface Extensions

9.2 Example

A. References

A.1 Normative references

Media Capture and Streams Extensions

Abstract

Status of This Document

1. Introduction

2. Terminology

3. Conformance

4. In-browser camera and microphone picker

4.1 getUserMedia "user-chooses" semantics

4.2 Web compatibility and migration

4.3 MediaDevices Interface Extensions

4.3.1 Attributes

4.4 MediaStreamConstraints dictionary extensions

Dictionary MediaStreamConstraints Members

4.5 GetUserMediaSemantics enum

4.6 Algorithms

4.7 Examples

5. MediaStreamTrack extensions

5.1 Transferable MediaStreamTrack

5.2 MediaStreamTrack frame stats

5.2.1 Methods

5.2.2 The MediaTrackFrameStats dictionary

Dictionary MediaTrackFrameStats Members

6. The powerEfficientPixelFormat constraint

6.1 MediaTrackSupportedConstraints dictionary extensions

Dictionary MediaTrackSupportedConstraints Members

6.2 MediaTrackCapabilities dictionary extensions

Dictionary MediaTrackCapabilities Members

6.3 MediaTrackSettings dictionary extensions

Dictionary MediaTrackSettings Members

6.4 Constrainable Properties

7. Exposing MediaStreamTrack source background blur support

8. VoiceIsolation constraint

Processing considerations

9. Exposing change of MediaStreamTrack configuration

9.1 MediaStreamTrack Interface Extensions

9.2 Example

A. References

A.1 Normative references

Dictionary `MediaStreamConstraints` Members

Dictionary `MediaTrackFrameStats` Members

Dictionary `MediaTrackSupportedConstraints` Members

Dictionary `MediaTrackCapabilities` Members

Dictionary `MediaTrackSettings` Members