Media Capabilities

Draft Community Group Report, ~~29 October~~ 19 November 2018

Not Ready For Implementation

This spec is not yet ready for implementation. It exists in this repository to record the ideas and promote discussion.

Before attempting to implement this spec, please contact the editors.

Abstract

This specification intends to provide APIs to allow websites to make an optimal decision when picking media content for the user. The APIs will expose information about the decoding and encoding capabilities for a given format but also output capabilities to find the best match based on the device’s display.

Status of this document

This specification was published by the Web Platform Incubator Community Group . It is not a W3C Standard nor is it on the W3C Standards Track. Please note that under the W3C Community Contributor License Agreement (CLA) there is a limited opt-out and other conditions apply. Learn more about W3C Community and Business Groups .

1. Introduction

This section is non-normative

This specification relies on exposing the following sets of properties:

An API to query the user agent with regards to the decoding and encoding abilities of the device based on information such as the codecs, profile, resolution, bitrates, etc. The API exposes information such as whether the playback should be smooth and power efficient.

The intent of purposes of the decoding capabilities API is to provide a powerful replacement to API such as isTypeSupported() or canPlayType() which are vague and mostly help the callers to know if something can not be decoded but not how well it should perform.
Better information about the display properties such as supported color gamut or dynamic range abilities in order to pick the right content for the display and avoid providing HDR content to an SDR display.
Real time feedback about the playback so an adaptative streaming can alter the quality of the content based on actual user perceived quality. Such information will allow websites to react to a pick of CPU/GPU usage in real time. It is expected that this will be tackled as part of the [media-playback-quality] specification.

2. Decoding and Encoding Capabilities

2.1. Media Configurations

2.1.1. MediaConfiguration

dictionary MediaConfiguration {
  ;
  ;

  VideoConfiguration video;
  AudioConfiguration audio;
};

dictionary MediaDecodingConfiguration : MediaConfiguration {
  required MediaDecodingType type;
  MediaCapabilitiesKeySystemConfiguration keySystemConfiguration;
};

dictionary MediaEncodingConfiguration : MediaConfiguration {
  required MediaEncodingType type;
};

The input to the decoding capabilities is represented by a MediaDecodingConfiguration dictionary and the input of the encoding capabilities by a MediaEncodingConfiguration dictionary.

For a MediaConfiguration to be a valid MediaConfiguration , all of the following conditions MUST be true:

audio or and/or video MUST be present .
audio MUST be a valid audio configuration if present .
video MUST be a valid video configuration if present .

For a MediaDecodingConfiguration to be a valid MediaDecodingConfiguration , all of the following conditions MUST be true:

It MUST be a valid MediaConfiguration .
If keySystemConfiguration is present :
1. If keySystemConfiguration.audioRubstness is present ,audio MUST also be present .
2. If keySystemConfiguration.videoRubstness is present ,video MUST also be present .

For a MediaDecodingConfiguration to describe [ENCRYPTED-MEDIA] , a keySystemConfiguration MUST be present .

2.1.2. MediaDecodingType

enum MediaDecodingType {
  "file",
  "media-source",
};

A MediaDecodingConfiguration has two types:

file is used to represent a configuration that is meant to be used for a plain file playback.
media-source is used to represent a configuration that is meant to be used for playback of a MediaSource as defined in the [media-source] specification.

2.1.3. MediaEncodingType

enum MediaEncodingType {
  "record",
  "transmission"
};

A MediaEncodingConfiguration can have one of two types:

record is used to represent a configuration for recording of media, e.g. using MediaRecorder as defined in [mediastream-recording] .
transmission is used to represent a configuration meant to be transmitted over electronic means (e.g. using RTCPeerConnection ).

2.1.4. MIME types

In the context of this specification, a MIME type is also called content type. A valid media MIME type is a string that is a valid MIME type per [mimesniff] . If the MIME type does not imply a codec, the string MUST also have one and only one parameter that is named codecs with a value describing a single media codec. Otherwise, it MUST contain no parameters.

A valid audio MIME type is a string that is valid media MIME type and for which the type per [RFC7231] is either audio or application.

A valid video MIME type is a string that is a valid media MIME type and for which the type per [RFC7231] is either video or application.

2.1.5. VideoConfiguration

dictionary VideoConfiguration {
  required DOMString contentType;
  required unsigned long width;
  required unsigned long height;
  required unsigned long long bitrate;
  required DOMString framerate;
};

The contentType member represents the MIME type of the video track.

To check if a VideoConfiguration configuration is a valid video configuration , the following steps MUST be run:

If configuration ’s contentType is not a valid video MIME type , return false and abort these steps.
If none of the following is true, return false and abort these steps:
- Applying the rules for parsing floating-point number values to configuration ’s framerate results in a number that is finite and greater than 0.
- configuration ’s framerate contains one ~~occurence~~ occurrence of U+002F SLASH character (/) and the substrings before and after this character, when applying the rules for parsing floating-point number values results in a number that is finite and greater than 0.
Return true.

The width and height members represent respectively the visible horizontal and vertical encoded pixels in the encoded video frames.

The bitrate member represents the average bitrate of the video track given in units of bits per second. In the case of a video stream encoded at a constant bit rate (CBR) this value should be accurate over a short term window. For the case of variable bit rate (VBR) encoding, this value should be usable to allocate any necessary buffering and throughput capability to provide for the un-interrupted decoding of the video stream over the long-term based on the indicated contentType.

The framerate member represents the framerate of the video track. The framerate is the number of frames used in one second (frames per second). It is represented either as a double or as a fraction.

2.1.6. AudioConfiguration

dictionary AudioConfiguration {
  required DOMString contentType;
  DOMString channels;
  unsigned long long bitrate;
  unsigned long samplerate;
};

The contentType member represents the MIME type of the audio track.

To check if a AudioConfiguration configuration is a valid audio configuration , the following steps MUST be run:

If configuration ’s contentType is not a valid audio MIME type , return false and abort these steps.
Return true.

The channels member represents the audio channels used by the audio track.

The channels needs to be defined as a double (2.1, 4.1, 5.1, ...), an unsigned short (number of channels) or as an enum value. The current definition is a placeholder.

The bitrate member represents the number of average bitrate of the audio track. The bitrate is the number of bits used to encode a second of the audio track.

The samplerate represents the samplerate of the audio track in. The samplerate is the number of samples of audio carried per second.

The samplerate is expressed in Hz (ie. number of samples of audio per second). Sometimes the samplerates value are expressed in kHz which represents the number of thousands of samples of audio per second.
44100 Hz is equivalent to 44.1 kHz.

2.1.7. MediaCapabilitiesKeySystemConfiguration

dictionary MediaCapabilitiesKeySystemConfiguration {    required DOMString keySystem;    DOMString initDataType = "";    DOMString audioRobustness = "";    DOMString videoRobustness = "";    MediaKeysRequirement distinctiveIdentifier = "optional";    MediaKeysRequirement persistentState = "optional";    sequence<DOMString> sessionTypes;
  };

This dictionary refers to a number of types defined by [ENCRYPTED-MEDIA] (EME). Sequences of EME types are flattened to a single value whenever the intent of the sequence was to have requestMediaKeySystemAccess() choose a subset it supports. With MediaCapabilities, callers provide the sequence across multiple calls, ultimately letting the caller choose which configuration to use.

The keySystem member represents a keySystem name as described in [ENCRYPTED-MEDIA] .

The initDataType member represents a single value from the initDataTypes sequence described in [ENCRYPTED-MEDIA] .

The audioRobustness member represents an audio robustness level as described in [ENCRYPTED-MEDIA] .

The videoRobustness member represents a video robustness level as described in [ENCRYPTED-MEDIA] .

The distinctiveIdentifier member represents a distinctiveIdentifier requirement as described in [ENCRYPTED-MEDIA] .

The persistentState member represents a persistentState requirement as described in [ENCRYPTED-MEDIA] .

The sessionTypes member represents a sequence of required sessionTypes as described in [ENCRYPTED-MEDIA] .

2.2. Media Capabilities Information

dictionary MediaCapabilitiesInfo {
  required boolean supported;
  required boolean smooth;
  required boolean powerEfficient;
  
};

dictionary MediaCapabilitiesDecodingInfo : MediaCapabilitiesInfo {  required MediaKeySystemAccess keySystemAccess;
};

The MediaCapabilitiesInfo has an associated configuration which is a MediaDecodingConfiguration or MediaEncodingConfiguration.

A MediaCapabilitiesInfo has associated supported , smooth , powerEfficient fields which are booleans.

Authors can use powerEfficient in concordance with the Battery Status API [battery-status] in order to determine whether the media they would like to play is appropriate for the user configuration. It is worth noting that even when a device is not power constrained, high power usage has side effects such as increasing the temperature or the fans noise.

~~When~~ A MediaCapabilitiesDecodingInfo has associated keySystemAccess which is a MediaKeySystemAccess or null as appropriate.

If the encrypted decoding configuration is supported, the resulting MediaCapabilitiesInfo will include a MediaKeySystemAccess. Authors may use this to create MediaKeys and setup encrypted playback.

2.3. Algorithms

2.3.1. Create a MediaCapabilitiesInfo algorithm with

Given a MediaEncodingConfiguration configuration is invoked, the user agent MUST run the, this algorithm returns a MediaCapabilitiesInfo. The following ~~steps:~~ steps are run:

Let info be a new MediaCapabilitiesInfo instance. Unless stated otherwise, reading and writing apply to info for the next steps.
Set configuration to configuration.
If ~~configuration is of type MediaDecodingConfiguration , run the following substeps: If~~ the user agent is able to ~~decode~~ encode the media represented by configuration ,, set supported to true. Otherwise set it to false.
If the user agent is able to ~~decode~~ encode the media represented by configuration at a pace that allows ~~a smooth playback,~~ encoding frames at the same pace as they are sent to the encoder, set smooth to true. Otherwise set it to false.
If the user agent is able to ~~decode~~ encode the media represented by configuration in a power efficient manner, set powerEfficient to true. Otherwise set it to false. The user agent SHOULD NOT take into consideration the current power source in order to determine the ~~decoding~~ encoding power efficiency unless the device’s power source has side effects such as enabling different ~~decoding~~ encoding modules.
Return info.

2.3.2. Create a MediaCapabilitiesDecodingInfo

Given a MediaDecodingConfigurationconfiguration, this algorithm returns a MediaCapabilitiesDecodingInfo. The following steps are run:

If configuration.keySystemConfiguration is present :
1. Set keySystemAccess to the result of running the Check Encrypted Decoding Support algorithm with configuration.
2. If keySystemAccess is ~~of type MediaEncodingConfiguration~~ not null set supported , to true. Otherwise set it to false.
Otherwise, run the following ~~substeps:~~ steps:
1. Set keySystemAccess to null.
2. If the user agent is able to ~~encode~~ decode the media represented by configuration ,, set supported to true. ~~Otherwise~~
3. Otherwise, set it to false.
If the user agent is able to ~~encode~~ decode the media represented by configuration at a pace that allows ~~encoding frames at the same pace as they are sent to the encoder,~~ a smooth playback, set smooth to true. Otherwise set it to false.
If the user agent is able to ~~encode~~ decode the media represented by configuration in a power efficient manner, set powerEfficient to true. Otherwise set it to false. The user agent SHOULD NOT take into consideration the current power source in order to determine the ~~encoding~~ decoding power efficiency unless the device’s power source has side effects such as enabling different ~~encoding~~ decoding modules.
Return info.

2.3.3. Check Encrypted Decoding Support

Given a MediaDecodingConfigurationconfig with a keySystemConfiguration present , this algorithm returns a MediaKeySystemAccess or null as appropriate. The following steps are run:

If the keySystem member of config.keySystemConfiguration is not one of the Key Systems supported by the user agent, return null. String comparison is case-sensitive.
Let origin be the origin of the calling context’s Document .
Let implementation be the implementation of config.keySystemConfiguration.keySystem
Let emeConfiguration be a new MediaKeySystemConfiguration, and initialize it as follows:
1. Set the initDataTypes attribute ~~MUST return supported . The~~ to a sequence containing smooth config.keySystemConfiguration.initDataType.
2. Set the distinctiveIdentifier attribute to config.keySystemConfiguration.distinctiveIdentifier.
3. Set the persistentState attribute ~~MUST return smooth . The~~ to config.keySystemConfiguration.peristentState.
4. Set the sessionTypes attribute to powerEfficient config.keySystemConfiguration.sessionTypes.
5. If an audio is present in config, set the audioCapabilities attribute to a sequence containing a single MediaKeySystemMediaCapability, initialized as follows:
  1. Set the contentType attribute ~~MUST return powerEfficient . Authors can use~~ to config.audio.contentType.
  2. Set the powerEfficient robustness attribute to config.keySystemConfiguration.audioRobustness.
6. If a video is present in ~~concordance with~~ config, set the ~~Battery Status API [battery-status]~~ videoCapabilities attribute to a sequence containing a single MediaKeySystemMediaCapability in order, initialized as follows:
  1. Set the contentType attribute to ~~determine whether~~ config.video.contentType.
  2. Set the ~~media they would like~~ robustness attribute to ~~play is appropriate for~~ config.keySystemConfiguration.videoRobustness.
Let supported configuration be the ~~user configuration. It~~ result of executing the Get Supported Configuration algorithm on implementation,emeConfiguration, and origin.
If supported configuration is ~~worth noting that even when~~ NotSupported, return null and abort these steps.
Let access be a ~~device is not power constrained, high power usage has side effects such~~ new MediaKeySystemAccess object, and initialize it as ~~increasing~~ follows:
1. Set the ~~temperature or~~ keySystem attribute to emeConfiguration.keySystem.
2. Let the ~~fans noise.~~ configuration value be supported configuration.
3. Let the cdm implementation value be implementation.
Return access

2.3. 2.4. Navigator and WorkerNavigator extension

[Exposed=Window]
partial interface Navigator {
  [SameObject] readonly attribute MediaCapabilities mediaCapabilities;
};

[Exposed=Worker]
partial interface WorkerNavigator {
  [SameObject] readonly attribute MediaCapabilities mediaCapabilities;
};

2.4. 2.5. Media Capabilities Interface

[Exposed=(Window, Worker)]
interface MediaCapabilities {
  [);
  [);

  [NewObject] Promise<MediaCapabilitiesDecodingInfo> decodingInfo(MediaDecodingConfiguration configuration);
  [NewObject] Promise<MediaCapabilitiesInfo> encodingInfo(MediaEncodingConfiguration configuration);

};

The decodingInfo() method and the encodingInfo() method method MUST run the following steps:

If configuration is not a valid ~~MediaConfiguration~~ MediaDecodingConfiguration , return a Promise rejected with a newly created TypeError.
If configuration.video configuration.keySystemConfiguration is present , run the following substeps:
1. If the global object ~~and~~ is ~~not a valid video configuration ,~~ of type WorkerGlobalScope, return a Promise rejected with a newly created TypeError.
2. If ~~configuration.audio~~ the result of running Is the environment settings object settings a secure context? [secure-contexts] with the global object’s relevant settings object is not "Secure", return a Promise rejected with a newly created DOMException whose name is ~~present~~ SecurityError .
Let p be a new promise.
In parallel , run the Create a MediaCapabilitiesDecodingInfo algorithm with configuration and resolve p with its result.
Return p.

Note, calling decodingInfo() with a keySystemConfiguration present may have user-visible effects, including requests for user consent. Such calls should only be made when the author intends to create and use a MediaKeys object with the provided configuration.

The encodingInfo() method MUST run the following steps:

If configuration is not a valid ~~audio configuration~~ MediaConfiguration , return a Promise rejected with a newly created TypeError.
Let p be a new promise.
In parallel , run the ~~create~~ Create a MediaCapabilitiesInfo ~~algorithm~~ algorithm with configuration and resolve p with its result.
Return p.

3. Display Capabilities

This section is still Work In Progress and has no shipping implementation. Please look into it in details before implementing it.

3.1. Screen Luminance

interface ScreenLuminance {
  readonly attribute double min;
  readonly attribute double max;
  readonly attribute double maxAverage;
};

The ScreenLuminance object represents the known luminance characteristics of the screen.

The min attribute MUST return the minimal screen luminance that a pixel of the screen can emit in candela per square metre. The minimal screen luminance is the luminance used when showing the darkest color a pixel on the screen can display.

The max attribute MUST return the maximal screen luminance that a pixel of the screen can emit in candela per square metre. The maximal screen luminance is the luminance used when showing the whitest color a pixel on the screen can display.

The maxAverage attribute MUST return the maximal average screen luminance that the screen can emit in candela per square metre. The maximal average screen luminance is the maximal luminance value such as all the pixels of the screen emit the same luminance. The value returned by maxAverage is expected to be different from max as screens usually can’t apply the maximal screen luminance to the entire panel.

3.2. Screen Color Gamut

enum ScreenColorGamut {
  "srgb",
  "p3",
  "rec2020",
};

The ScreenColorGamut represents the color gamut supported by a Screen, that means the range of color that the screen can display.

The ScreenColorGamut values are:

srgb , it represents the [sRGB] color gamut.
p3 , it represents the DCI P3 Color Space color gamut. This color gamut includes the srgb gamut.
rec2020 , it represents the ITU-R Recommendation BT.2020 color gamut. This color gamut includes the p3 gamut.

3.3. Screen extension

Part of this section is 🐵 patching of the CSSOM View Module. Issue #4 is tracking merging the changes. This partial interface requires the Screen interface to become an EventTarget.

partial interface Screen {
  readonly attribute ScreenColorGamut colorGamut;
  readonly attribute ScreenLuminance? luminance;
  attribute EventHandler onchange;
};

The colorGamut attribute SHOULD return the ScreenColorGamut approximately supported by the screen. In other words, the screen does not need to fully support the given color gamut but needs to be close enough. If the user agent does not know the color gamut supported by the screen, if the supported color gamut is lower than srgb, or if the user agent does not want to expose this information for privacy consideration, it SHOULD return srgb as a default value. The value returned by colorGamut MUST match the value returned by the color-gamut CSS media query.

The luminance attribute SHOULD return a ScreenLuminance object that will expose the luminance characteristics of the screen. If the user agent has no access to the luminance characteristics of the screen, it MUST return null. The user agent MAY also return null if it does not want to expose the luminance information for privacy reasons.

The onchange attribute is an event handler whose corresponding event handler event type is change.

Whenever the user agent is aware that the state of the Screen object has changed, that is if one the value exposed on the Screen object or in an object exposed on the Screen object, it MUST queue a task to fire an event named change on Screen.

4. Security and Privacy Considerations

This specification does not introduce any security-sensitive information or APIs but is provides an easier access to some information that can be used to fingerprint users.

4.1. Decoding/Encoding and Fingerprinting

The information exposed by the decoding/encoding capabilities can already be discovered via experimentation with the exception that the API will likely provide more accurate and consistent information. This information is expected to have a high correlation with other information already available to the web pages as a given class of device is expected to have very similar decoding/encoding capabilities. In other words, high end devices from a certain year are expected to decode some type of videos while older devices may not. Therefore, it is expected that the entropy added with this API isn’t going to be significant.

If an implementation wishes to implement a fingerprint-proof version of this specification, it would be recommended to fake a given set of capabilities (ie. decode up to 1080p VP9, etc.) instead of returning always yes or always no as the latter approach could considerably degrade the user’s experience.

4.2. Display and Fingerprinting

The information exposed by the display capabilities can already be accessed via CSS for the most part. The specification also provides default values when the user agent does not which to expose the feature for privacy reasons.

5. Examples

5.1. Query recording capabilities with `encodingInfo()`

The following example can also be found in e.g. this codepen with minimal modifications.

<script>
  const configuration = {
      type : 'record',
      video : {
        contentType : 'video/webm;codecs=vp8',
        width : 640,
        height : 480,
        bitrate : 10000,
        framerate : '30'
    }
  };
  navigator.mediaCapabilities.encodingInfo(configuration)
      .then((result) => {
        console.log(result.contentType + ' is:'
            + (result.supported ? '' : ' NOT') + ' supported,'
            + (result.smooth ? '' : ' NOT') + ' smooth and'
            + (result.powerEfficient ? '' : ' NOT') + ' power efficient');
      })
      .catch((err) => {
        console.error(err, ' caused encodingInfo to throw');
      });
</script>

Conformance

Conformance requirements are expressed with a combination of descriptive assertions and RFC 2119 terminology. The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in the normative parts of this document are to be interpreted as described in RFC 2119. However, for readability, these words do not appear in all uppercase letters in this specification.

All of the text of this specification is normative except sections explicitly marked as non-normative, examples, and notes. [RFC2119]

Examples in this specification are introduced with the words “for example” or are set apart from the normative text with class="example", like this:

This is an example of an informative example.

Informative notes begin with the word “Note” and are set apart from the normative text with class="note", like this:

Note, this is an informative note.