Copyright © 2024 World Wide Web Consortium . W3C ® liability , trademark and permissive document license rules apply.
This document defines a set of ECMAScript APIs in WebIDL to extend the [ mediacapture-streams ] specification.
This section describes the status of this document at the time of its publication. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at https://www.w3.org/TR/.
This is an unofficial proposal.
This document was published by the Web Real-Time Communications Working Group as an Editor's Draft.
Publication as an Editor's Draft does not imply endorsement by W3C and its Members.
This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.
This document was produced by a group operating under the W3C Patent Policy . W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy .
This document is governed by the 03 November 2023 W3C Process Document .
This document contains proposed extensions and modifications to the [ mediacapture-streams ] specification.
New features and modifications to existing features proposed here may be considered for addition into the main specification post Recommendation. Deciding factors will include maturity of the extension or modification, consensus on adding it, and implementation experience.
A
concrete
long-term
goal
is
reducing
the
fingerprinting
surface
of
enumerateDevices
()
by
deprecating
exposure
of
the
device
label
in
its
results.
This
requires
relieving
applications
of
the
burden
of
building
user
interfaces
to
select
cameras
and
microphones
in-content,
by
offering
this
in
user
agents
as
part
of
getUserMedia
()
instead.
Miscellaneous other smaller features are under consideration as well, such as constraints to control multi-channel audio beyond stereo.
This
document
uses
the
definitions
MediaDevices
,
MediaStreamTrack
,
MediaStreamConstraints
,
ConstrainablePattern
,
MediaTrackSupportedConstraints
,
MediaTrackCapabilities
,
MediaTrackConstraintSet
,
MediaTrackSettings
and
ConstrainBoolean
from
[
mediacapture-streams
].
The terms permission state , request permission to use , and prompt the user to choose are defined in [ permissions ].
Performance
.
now
()
is
defined
in
[
hr-time
].
As well as sections marked as non-normative, all authoring guidelines, diagrams, examples, and notes in this specification are non-normative. Everything else in this specification is normative.
The key words MAY , MUST , MUST NOT , and SHOULD in this document are to be interpreted as described in BCP 14 [ RFC2119 ] [ RFC8174 ] when, and only when, they appear in all capitals, as shown here.
The
existing
enumerateDevices
()
function
exposes
camera
and
microphone
label
s
to
let
applications
build
in-content
user
interfaces
for
camera
and
microphone
selection.
Applications
have
had
to
do
this
because
getUserMedia
()
did
not
offer
a
web
compatible
in-agent
device
picker.
This
specification
aims
to
rectify
that.
Due
to
the
significant
fingerprinting
vector
caused
by
device
label
s,
and
the
well-established
nature
of
the
existing
APIs,
the
scope
of
this
particular
effort
is
limited
to
removing
label
,
leaving
the
overall
constraints-based
model
intact.
This
helps
ensure
a
migration
path
more
viable
than
to
a
less-powerful
API.
This
specification
augments
the
existing
getUserMedia
()
function
instead
of
introducing
a
new
less-powerful
API
to
compete
with
it,
for
that
reason
as
well.
This
specification
introduces
slightly
altered
semantics
to
the
getUserMedia
()
function
called
"user-chooses"
that
guarantee
a
picker
will
be
shown
to
the
user
in
cases
where
the
user
agent
would
otherwise
choose
for
the
user
(that
is:
when
application
constraints
do
not
narrow
down
the
choices
to
a
single
device).
This
is
orthogonal
to
permission,
and
offers
a
better
and
more
consistent
user
experience
across
applications
and
user
agents.
Unfortunately,
since
the
"user-chooses"
semantics
may
produce
user
agent
prompts
at
different
times
and
in
different
situations
compared
to
the
old
semantics,
they
are
somewhat
incompatible
with
expectations
in
some
existing
web
applications
that
tend
to
call
getUserMedia
()
repeatedly
and
lazily
instead
of
using
e.g.
stream.clone()
.
User
agents
are
encouraged
to
provide
the
new
semantics
as
opt-in
initially
for
web
compatibility.
User
agents
MUST
deprecate
(remove)
label
from
MediaDeviceInfo
over
time,
though
specific
migration
strategies
are
left
to
user
agents.
User
agents
SHOULD
migrate
to
offering
the
new
semantics
by
default
(opt-out)
over
time.
Since the constraints-model remains intact, web compatibility problems are expected to be limited to:
WebIDLpartial interface MediaDevices {
readonly attribute GetUserMediaSemantics
defaultSemantics
;
};
defaultSemantics
of
type
GetUserMediaSemantics
,
readonly
The
default
semantics
of
getUserMedia
()
in
this
user
agent.
User
agents
SHOULD
default
to
"browser-chooses"
for
backwards
compatibility,
until
a
transition
plan
has
been
enacted
where
a
majority
of
user
agents
collectively
switch
their
defaults
to
"user-chooses"
for
improved
user
privacy,
and
usage
metrics
suggest
this
transition
is
feasible
without
major
breakage.
WebIDLpartial dictionary MediaStreamConstraints {
GetUserMediaSemantics
semantics
;
};
MediaStreamConstraints
Members
semantics
of
type
GetUserMediaSemantics
In
cases
where
the
specified
constraints
do
not
narrow
multiple
choices
between
devices
down
to
one
per
kind,
specifies
how
the
final
determination
of
which
devices
to
pick
from
the
remaining
choices
MUST
be
made.
If
not
specified,
then
the
defaultSemantics
are
used.
WebIDLenum GetUserMediaSemantics
{
"browser-chooses
",
"user-chooses
"
};
GetUserMediaSemantics
Enumeration
description
|
|
---|---|
browser-chooses
|
When application-specified constraints do not narrow multiple choices between devices down to one per kind, the user agent is allowed to make the final determination between the remaining choices. |
user-chooses
|
When application-specified constraints do not narrow multiple choices between devices down to one per kind, the user agent MUST prompt the user to choose between the remaining choices, even if the application already has permission to some or all of them. |
When
the
getUserMedia
()
method
is
invoked,
run
the
following
steps
before
invoking
the
getUserMedia
()
algorithm:
Let mediaDevices be the object on which this method was invoked.
Let constraints be the method's first argument.
Let
semanticsPresent
be
true
if
constraints
.semantics
exists
,
otherwise
false
.
Let
semantics
be
constraints
.semantics
if
it
exists
,
or
the
value
of
mediaDevices
.
otherwise.
defaultSemantics
Replace
step
6.5.1.
of
the
getUserMedia
()
algorithm
in
its
entirety
with
the
following
two
steps:
Let
descriptor
be
a
PermissionDescriptor
with
its
name
member
set
to
the
permission
name
associated
with
kind
(e.g.
"camera"
for
"video"
,
"microphone"
for
"audio"
).
If
the
number
of
unique
devices
sourcing
tracks
of
media
type
kind
in
candidateSet
is
greater
than
1
and
semantics
is
"user-chooses"
,
then
prompt
the
user
to
choose
a
device
with
descriptor
,
resulting
in
provided
media.
Otherwise,
request
permission
to
use
a
device
with
descriptor
,
while
considering
all
devices
being
attached
to
a
live
and
same-permission
MediaStreamTrack
in
the
current
browsing
context
to
mean
having
permission
status
"
granted
",
resulting
in
provided
media.
Same-permission
in
this
context
means
a
MediaStreamTrack
that
required
the
same
level
of
permission
to
obtain
as
what
is
being
requested.
When asking the user’s permission, the user agent MUST disclose whether permission will be granted only to the device chosen, or to all devices of that kind .
Let
track
be
the
provided
media,
which
MUST
be
precisely
one
track
of
type
kind
from
finalSet
.
If
semantics
is
"browser-chooses"
then
the
decision
of
which
track
to
choose
from
finalSet
is
up
to
the
User
Agent,
which
MAY
use
the
value
of
the
computed
"fitness
distance"
from
the
SelectSettings
algorithm,
the
value
of
semanticsPresent
,
or
any
other
internally-available
information
about
the
devices,
as
inputs
to
its
decision.
If
semantics
is
"user-chooses"
,
and
the
application
has
not
narrowed
down
the
choices
to
one,
then
the
user
agent
MUST
ask
the
user
to
make
the
final
selection.
Once
selected,
the
source
of
the
MediaStreamTrack
MUST
NOT
change.
User Agents are encouraged to default to or present a default choice based primarily on fitness distance, and secondarily on the user's primary or system default device for kind (when possible). User Agents MAY allow users to use any media source, including pre-recorded media files.
This example shows a setup with a start button and a camera selector using the new semantics (microphone is not shown for brievity but is equivalent).
<button id="start">Start</button>
<button id="chosenCamera" disabled>Camera: none</button>
<script>
let cameraTrack = null;
start.onclick = () => {
start.onclick = async () => {
try {
navigator.mediaDevices.getUserMedia({
.cameraId}
const stream = await navigator.mediaDevices.getUserMedia({
video: {deviceId: localStorage.cameraId}
});
setCameraTrack(stream.getVideoTracks()[]);
setCameraTrack(stream.getVideoTracks()[0]);
} catch (err) {
.error(err);
console.error(err);
}
}
chosenCamera.onclick = () => {
chosenCamera.onclick = async () => {
try {
navigator.mediaDevices.getUserMedia({
const stream = await navigator.mediaDevices.getUserMedia({
video: true,
semantics: "user-chooses"
});
setCameraTrack(stream.getVideoTracks()[]);
setCameraTrack(stream.getVideoTracks()[0]);
} catch (err) {
.error(err);
console.error(err);
}
}
{
function setCameraTrack(track) {
cameraTrack = track;
{deviceId, label} = track.getSettings();
.cameraId = deviceId;
chosenCamera.innerText = ;
chosenCamera.disabled = ;
const {deviceId, label} = track.getSettings();
localStorage.cameraId = deviceId;
chosenCamera.innerText = `Camera: ${label}`;
chosenCamera.disabled = false;
}
</
script
>
A
MediaStreamTrack
is
a
transferable
object
.
This
allows
manipulating
real-time
media
outside
the
context
it
was
requested
or
created
in,
for
instance
in
workers
or
third-party
iframes.
To preserve the existing privacy and security infrastructure, in particular for capture tracks, the track source lifetime management remains tied to the context that created it. The transfer algorithm MUST ensure the following behaviors:
The context named originalContext that created a track named originalTrack remains in control of the originalTrack source, named trackSource , even when originalTrack is transferred into transferredTrack .
In particular, originalContext remains the proxy to privacy indicators of trackSource . transferredTrack or any of its clones are considered as tracks using trackSource as if they were tracks created in and controlled by originalContext .
When originalContext goes away, trackSource gets ended, thus transferredTrack gets ended.
When originalContext would have muted/unmuted originalTrack , transferredTrack gets muted/unmuted.
If transferredTrack is cloned in transferredTrackClone , transferredTrackClone is tied to trackSource . It is not tied to originalTrack in any way.
If transferredTrack is transferred into transferredAgainTrack , transferredAgainTrack is tied to trackSource . It is not tied to transferredTrack or originalTrack in any way.
The WebIDL changes to make the track transferable are the following:
WebIDL[Exposed=(Window,Worker), Transferable]
partial interface MediaStreamTrack {
};
At
creation
of
a
MediaStreamTrack
object,
called
track
,
run
the
following
steps:
Initialize
track
.
[[IsDetached]]
to
false
.
The
MediaStreamTrack
transfer
steps
,
given
value
and
dataHolder
,
are:
If
value
.
[[IsDetached]]
is
true
,
throw
a
"DataCloneError"
DOMException.
Set
dataHolder
.
[[id]]
to
value
.
id
.
Set
dataHolder
.
[[kind]]
to
value
.
kind
.
Set
dataHolder
.
[[label]]
to
value
.
label
.
Set
dataHolder
.
[[readyState]]
to
value
.
readyState
.
Set
dataHolder
.
[[enabled]]
to
value
.
enabled
.
Set
dataHolder
.
[[muted]]
to
value
.
muted
.
Set
dataHolder
.
[[source]]
to
value
underlying
source.
Set
dataHolder
.
[[constraints]]
to
value
active
constraints.
Set
dataHolder
.
[[contentHint]]
to
value
application-set
content
hint.
Set
value
.
[[IsDetached]]
to
true
.
Set
value
.
[[ReadyState]]
to
"
ended
"
(without
stopping
the
underlying
source
or
firing
an
ended
event).
MediaStreamTrack
transfer-receiving
steps
,
given
dataHolder
and
track
,
are:
Initialize
track
.
id
to
dataHolder
.
[[id]]
.
Initialize
track
.
kind
to
dataHolder
.
[[kind]]
.
Initialize
track
.
label
to
dataHolder
.
[[label]]
.
Initialize
track
.
readyState
to
dataHolder
.
[[readyState]]
.
Initialize
track
.
enabled
to
dataHolder
.
[[enabled]]
.
Initialize
track
.
muted
to
dataHolder
.
[[muted]]
.
Set
track
application-set
content
hint
to
dataHolder
.
[[contentHint]]
.
Initialize
the
underlying
source
of
track
to
dataHolder
.
[[source]]
.
Set
track
's
constraints
to
dataHolder
.
[[constraints]]
.
The underlying source is supposed to be kept alive between the transfer and transfer-receiving steps, or as long as the data holder is alive. In a sense, between these steps, the data holder is attached to the underlying source as if it was a track.
On microphone audio tracks, frame counters allow the application to tell the ratio of audio that is delivered as one quality indicator and the latency metrics measure the input delay from capture to application.
On
camera
and
screenshare
video
tracks,
frame
counters
allow
the
application
to
tell
what
the
frame
rate
is,
which
may
be
lower
than
the
target
frameRate
.
For
example,
if
the
track
is
sourced
from
a
camera
then
the
production
of
frames
could
be
slowed
down
if
it's
dark
or
frames
could
be
dropped
if
the
system
is
CPU
starved.
This
could
impact
the
total
number
of
frames
produced
by
the
source
and
impact
how
many
frames
are
delivered,
discarded
or
dropped
for
other
reasons.
WebIDLpartial interface MediaStreamTrack {
[SameObject] readonly attribute
(MediaStreamTrackAudioStats
or MediaStreamTrackVideoStats
)? stats
;
};
Let
the
MediaStreamTrack
have
a
[[Stats]]
internal
slot
initialized
it
to
null
,
unless
otherwise
specified
below.
If
the
track's
is
of
kind
"audio"
,
run
the
following
steps:
If
the
MediaStreamTrack
is
sourced
from
getUserMedia()
,
initialize
[[Stats]]
to
a
new
instance
of
MediaStreamTrackAudioStats
set
up
to
expose
audio
stats
for
this
MediaStreamTrack
.
If
the
track's
is
of
kind
"video"
,
run
the
following
steps:
If
the
MediaStreamTrack
is
sourced
from
getUserMedia()
or
getDisplayMedia()
,
initialize
[[Stats]]
to
a
new
instance
of
MediaStreamTrackVideoStats
set
up
to
expose
video
stats
for
this
MediaStreamTrack
.
stats
of
type
(
MediaStreamTrackAudioStats
or
MediaStreamTrackVideoStats
),
readonly
When this getter is called, the user agenst MUST run the following steps:
Let
track
be
the
MediaStreamTrack
that
this
getter
is
called
on.
Return
track
.
[[Stats]]
.
WebIDL[Exposed=Window]
interface MediaStreamTrackAudioStats
{
readonly attribute unsigned long long deliveredFrames
;
readonly attribute DOMHighResTimeStamp deliveredFramesDuration
;
readonly attribute unsigned long long totalFrames
;
readonly attribute DOMHighResTimeStamp totalFramesDuration
;
readonly attribute DOMHighResTimeStamp latency
;
readonly attribute DOMHighResTimeStamp averageLatency
;
readonly attribute DOMHighResTimeStamp minimumLatency
;
readonly attribute DOMHighResTimeStamp maximumLatency
;
undefined resetLatency
();
[Default] object toJSON
();
};
The
following
metrics
lack
Working
Group
consensus:
deliveredFrames
,
deliveredFramesDuration
,
totalFrames
and
totalFramesDuration
.
See
Issue
#129
.
The
MediaStreamTrackAudioStats
expose
frame
counters
for
the
MediaStreamTrack
that
created
it.
For
this
track,
the
user
agent
is
required
to
count
each
audio
frame
from
its
source
as
follows:
A frame is considered a delivered audio frame if it either was delivered to a sink or would have been delivered to a sink, if one was connected.
The delivered audio frames duration is the total duration of all delivered audio frames . This measurement is incremented at the same time as delivered audio frames and is measured in milliseconds.
An audio frame that is discarded because it cannot be delivered on time, or it cannot be delivered for any other reason, is considered dropped .
The dropped audio frames duration is the total duration of all dropped audio frames . This measurement is incremented at the same time as dropped audio frames and is measured in milliseconds.
If the track is unmuted and enabled, the counters increase as audio is produced by the capture device. If no audio is flowing, such as if the track is muted or disabled, then the counters do not increase.
Input latency is the time, in milliseconds, between the point in time an audio input device has acquired a signal and the time it is available for consumption, which may include buffering by the user agent.
The latest input latency is the latest available input latency as estimated between the track's input device and delivery to any of its sinks.
The user agent updates its estimates at sufficient frequency to allow monitoring. The latency is representative of the experienced delay, but is not necessarily an exact measurement of the last individual audio frame that was delivered.
A sink that consumes audio may add additional processing latency not included in this measurement, such as playout delay or encode time.
Every time the latest input latency measurement is updated, the user agent also updates its average input latency , minimum input latency and maximum input latency which are the average, minimum and maximum observed measurements since the last latency reset time .
Let
the
MediaStreamTrackAudioStats
have
internal
slots
[[DeliveredFrames]]
,
[[DeliveredFramesDuration]]
,
[[DroppedFrames]]
,
[[DroppedFramesDuration]]
,
[[Latency]]
,
[[AverageLatency]]
,
[[MinimumLatency]]
and
[[MaximumLatency]]
,
initialized
to
0.
Let
the
MediaStreamTrackAudioStats
also
have
internal
slots
[[LastTask]]
and
[[LastExposureTime]]
,
initialized
to
undefined
.
The expose audio frame counters steps are the following:
Let task be the current task .
If
[[LastTask]]
is
equal
to
task
,
abort
these
steps.
Set
[[LastTask]]
to
task
.
Set
[[DeliveredFrames]]
to
delivered
audio
frames
,
set
[[DeliveredFramesDuration]]
to
delivered
audio
frames
duration
,
set
[[DroppedFrames]]
to
dropped
audio
frames
,
set
[[DroppedFramesDuration]]
to
dropped
audio
frames
duration
,
set
[[Latency]]
to
the
latest
input
latency
,
set
[[AverageLatency]]
to
the
average
input
latency
,
set
[[MinimumLatency]]
to
the
minimum
input
latency
and
set
[[MaximumLatency]]
to
the
maximum
input
latency
.
Set
[[LastExposureTime]]
to
reflect
the
time
that
these
metrics
were
exposed.
Only updating these counters once per task preserves the run-to-completion semantics defined in [ API-DESIGN-PRINCIPLES ].
deliveredFrames
of
type
unsigned
long
long,
readonly
Upon
getting,
run
the
expose
audio
frame
counters
steps
and
return
[[DeliveredFrames]]
.
deliveredFramesDuration
of
type
DOMHighResTimeStamp,
readonly
Upon
getting,
run
the
expose
audio
frame
counters
steps
and
return
[[DeliveredFramesDuration]]
.
totalFrames
of
type
unsigned
long
long,
readonly
Upon
getting,
run
the
expose
audio
frame
counters
steps
and
return
the
sum
of
[[DeliveredFrames]]
and
[[DroppedFrames]]
.
totalFramesDuration
of
type
DOMHighResTimeStamp,
readonly
Upon
getting,
run
the
expose
audio
frame
counters
steps
and
return
the
sum
of
[[DeliveredFramesDuration]]
and
[[DroppedFramesDuration]]
.
Because audio capture devices produce audio in real-time, audio frames may be dropped if not processed in a timely manner.
The
ratio
of
audio
duration
that
was
delivered,
i.e.
not
dropped,
can
be
calculated
as
deliveredFramesDuration
/
totalFramesDuration
.
latency
of
type
DOMHighResTimeStamp,
readonly
Upon
getting,
run
the
expose
audio
frame
counters
steps
and
return
[[Latency]]
.
averageLatency
of
type
DOMHighResTimeStamp,
readonly
Upon
getting,
run
the
expose
audio
frame
counters
steps
and
return
[[AverageLatency]]
.
minimumLatency
of
type
DOMHighResTimeStamp,
readonly
Upon
getting,
run
the
expose
audio
frame
counters
steps
and
return
[[MinimumLatency]]
.
maximumLatency
of
type
DOMHighResTimeStamp,
readonly
Upon
getting,
run
the
expose
audio
frame
counters
steps
and
return
[[MaximumLatency]]
.
resetLatency
When called, run the following steps:
Run the expose audio frame counters steps .
Set
[[AverageLatency]]
,
[[MinimumLatency]]
and
[[MaximumLatency]]
to
[[Latency]]
.
Set
the
latency
reset
time
to
[[LastExposureTime]]
.
toJSON
When called, run [ WEBIDL ]'s default toJSON steps .
WebIDL[Exposed=Window]
interface MediaStreamTrackVideoStats
{
readonly attribute unsigned long long deliveredFrames
;
readonly attribute unsigned long long discardedFrames
;
readonly attribute unsigned long long totalFrames
;
[Default] object toJSON
();
};
The
MediaStreamTrackVideoStats
expose
frame
counters
for
the
MediaStreamTrack
that
created
it.
For
this
track,
the
user
agent
is
required
to
count
each
video
frame
from
its
source
as
follows:
A frame is considered a delivered video frame if it either was delivered to a sink or would have been delivered to a sink, if one was connected. This is a subset of total video frames and it is incremented at the same time as total video frames .
A
video
frame
is
considered
discarded
if
it
was
discarded
in
order
to
achieve
the
target
frameRate
.
This
is
a
subset
of
total
video
frames
and
it
is
incremented
at
the
same
time
as
total
video
frames
.
The total number of frames that have been processed by this source, meaning it is known whether the frame was considered delivered, discarded or dropped for any other reason. The number of dropped frames for various unknown reasons can be calculated by subtracting delivered video frames and discarded video frames from total video frames .
If the track is unmuted and enabled and the source is backed by a camera, total frames is incremented by frames produced by the camera. If no frames are flowing, such as if the track is muted or disabled, then total frames does not increment.
Let
the
MediaStreamTrackVideoStats
have
internal
slots
[[DeliveredFrames]]
,
[[DiscardedFrames]]
and
[[TotalFrames]]
,
initialized
to
0.
Let
the
MediaStreamTrackVideoStats
also
have
an
internal
slot
[[LastTask]]
initialized
to
null
.
The expose video frame counters steps are the following:
Let task be the current task .
If
[[LastTask]]
is
equal
to
task
,
abort
these
steps.
Set
[[LastTask]]
to
task
.
Set
[[DeliveredFrames]]
to
delivered
video
frames
,
set
[[DiscardedFrames]]
to
discarded
video
frames
and
set
[[TotalFrames]]
to
total
video
frames
.
Only updating these counters once per task preserves the run-to-completion semantics defined in [ API-DESIGN-PRINCIPLES ].
deliveredFrames
of
type
unsigned
long
long,
readonly
Upon
getting,
run
the
expose
video
frame
counters
steps
and
return
[[DeliveredFrames]]
.
discardedFrames
of
type
unsigned
long
long,
readonly
Upon
getting,
run
the
expose
video
frame
counters
steps
and
return
[[DiscardedFrames]]
.
totalFrames
of
type
unsigned
long
long,
readonly
Upon
getting,
run
the
expose
video
frame
counters
steps
and
return
[[TotalFrames]]
.
toJSON
When called, run [ WEBIDL ]'s default toJSON steps .
WebIDLpartial dictionary MediaTrackSupportedConstraints {
boolean powerEfficient
= true;
};
MediaTrackSupportedConstraints
Members
powerEfficient
of
type
boolean
,
defaulting
to
true
WebIDLpartial dictionary MediaTrackCapabilities {
sequence<boolean> powerEfficient
;
};
MediaTrackCapabilities
Members
powerEfficient
of
type
sequence<
boolean
>
The
source
may
operate
in
different
configurations.
If
all
configurations
have
the
same
power
efficiency
impact,
a
single
false
is
reported.
Otherwise,
the
source
reports
a
list
with
both
true
and
false
as
possible
values.
See
powerEfficient
for
additional
details.
WebIDLpartial dictionary MediaTrackSettings {
boolean powerEfficient
;
};
MediaTrackSettings
Members
powerEfficient
of
type
boolean
The constrainable properties in this document are defined below.
powerEfficient
capabilities.
Property Name | Values | Notes |
---|---|---|
powerEfficient |
ConstrainBoolean
|
Cameras can often operate in different configurations. Configurations are typically selected based on constraints that are related to observable parameters like width or height. Configurations may have less directly observable characteristics: power consumption, low light sensitivity, fast autofocus... The powerEfficient constraint allows web applications to favor selection of configurations that consume less power. This may be useful for web applications that may use the camera for an extended amount of time, like video conference web applications. On the other hand, applications that may use the camera for a small amount of time may prefer to not use the powerEfficient constraint. This constraint is only applicable to camera sources. As a constraint, setting it to true instructs the user agent to prefer configuration that it considers power efficient. |
WebIDLpartial dictionary MediaTrackSupportedConstraints {
boolean powerEfficientPixelFormat
= true;
};
MediaTrackSupportedConstraints
Members
powerEfficientPixelFormat
of
type
boolean
,
defaulting
to
true
WebIDLpartial dictionary MediaTrackCapabilities {
sequence<boolean> powerEfficientPixelFormat
;
};
MediaTrackCapabilities
Members
powerEfficientPixelFormat
of
type
sequence<
boolean
>
If
the
source
only
has
power
efficient
pixel
formats,
a
single
true
is
reported.
If
the
source
only
has
power
inefficient
pixel
formats,
a
single
false
is
reported.
If
the
script
can
control
the
feature,
the
source
reports
a
list
with
both
true
and
false
as
possible
values.
See
powerEfficientPixelFormat
for
additional
details.
WebIDLpartial dictionary MediaTrackSettings {
boolean powerEfficientPixelFormat
;
};
MediaTrackSettings
Members
powerEfficientPixelFormat
of
type
boolean
The constrainable properties in this document are defined below.
Property Name | Values | Notes |
---|---|---|
powerEfficientPixelFormat |
ConstrainBoolean
|
Compressed pixel formats often need to be decoded, for instance for display purposes or when being encoded during a video call. The user agent SHOULD label compressed pixel formats that incur significant power penalty when decoded as power inefficient. The labeling is up to the user agent, but decoding MJPEG in software is an example of an expensive mode. Pixel formats that have not been labeled power inefficient by the user agent are for the purpose of this API considered power efficient. As a constraint, setting it to true allows filtering out inefficient pixel formats and setting it to false allows filtering out efficient pixel formats. As a setting, this reflects whether or not the current pixel format is considered power efficient by the user agent. |
This section is non-normative.
Video media flowing inside media stream tracks comprises of a sequence of video frames, where the frames are sampled from the media at instants spread out over time.
Each video frame must have a presentation timestamp which is relative to a source specific origin. A source of frames can define how this timestamp is set. A sink of frames can define how this timestamp is used.
The timestamp is present for sinks to be able to define an absolute presentation timeline of the frames relative to a clock reference, for example for playback.
Each frame may have an absolute capture timestamp representing the instant the frame capture process began, which is useful for example for delay measurements and synchronization. A source of frames can define how this timestamp is set, otherwise it is unset. A sink of frames can define how this timestamp is used if set.
Each frame may have an absolute receive timestamp representing the last received timestamp of packets used to produce this video frame was received in its entirety. The timestamp is useful for example for network jitter measurements. A source of frames can define how this timestamp is set, otherwise it is unset. A sink of frames can define how this timestamp is used if set.
Each frame may have a RTP timestamp representing the packet RTP timestamp used to produce this video frame. The timestamp is useful for example for frame identification and playback quality measurements. A source of frames can define how the timestamp is set, otherwise it is unset. A sink of frames can define how this timestamp is used if set. The packet RTP timestamp concept is defined in [ RFC3550 ] Section 5.1.
The capture timestamp and receive timestamp are using the same clock and offset. The presentation timestamp and capture timestamp are using the same clock and have an offset which can be arbitrarily chosen by the user agent since it isn't directly observable by script.
VideoFrameMetadata
WebIDLpartial dictionary VideoFrameMetadata { DOMHighResTimeStamp captureTime
; DOMHighResTimeStamp receiveTime
; unsigned long rtpTimestamp
;
};
captureTime
of
type
DOMHighResTimeStamp
The
capture
timestamp
of
the
frame
relative
to
Performance
.
timeOrigin
.
It
corresponds
to
the
capture
timestamp
of
MediaStreamTrack
video
frames.
receiveTime
of
type
DOMHighResTimeStamp
The
receive
time
of
the
corresponding
encoded
frame
relative
to
Performance
.
timeOrigin
.
It
corresponds
to
the
receive
timestamp
of
MediaStreamTrack
video
frames.
rtpTimestamp
of
type
unsigned
long
The
RTP
timestamp
of
the
corresponding
encoded
frame.
It
corresponds
to
RTP
timestamp
of
MediaStreamTrack
video
frames.
timestamp
from
presentation
timestamp
minus
offset
.
captureTime
from
capture
timestamp
if
set.
receiveTime
from
receive
timestamp
if
set.
rtpTimestamp
from
RTP
timestamp
if
set.
timestamp
.
captureTime
if
present
.
receiveTime
if
present
.
rtpTimestamp
if
present
.
The
user
agent
MUST
set
the
capture
timestamp
of
each
video
frame
that
is
sourced
from
getUserMedia
()
and
getDisplayMedia
()
to
its
best
estimate
of
the
time
that
the
frame
was
captured.
This
value
MUST
be
monotonically
increasing.
Some
platforms
or
User
Agents
may
provide
built-in
support
for
video
effects
triggered
by
user
motion
heuristics,
in
particular
for
camera
video
streams.
Web
applications
may
either
want
to
control
or
at
least
be
aware
that
these
heuristics
are
active
and
might
trigger
these
effects
at
the
source
level.
This
can
for
instance
allow
the
web
application
to
update
its
UI
or
to
turn
off
these
heuristics
where
having
such
effects
tiggered
accidentally
might
be
considered
insensitive
or
inappropriate.
For
that
reason,
we
extend
MediaStreamTrack
with
the
following
properties.
The WebIDL changes are the following:
WebIDLpartial dictionary MediaTrackSupportedConstraints {
boolean gestureReactions
= true;
};
partial dictionary MediaTrackConstraintSet {
ConstrainBoolean gestureReactions
;
};
partial dictionary MediaTrackSettings {
boolean gestureReactions
;
};
partial dictionary MediaTrackCapabilities {
sequence<boolean> gestureReactions
;
};
Some
platforms
or
User
Agents
may
provide
built-in
support
for
automatic
continuous
framing
based
on
the
position
of
human
faces
within
the
field
of
view,
in
particular
for
camera
video
streams.
Web
applications
may
either
want
to
control
or
at
least
be
aware
that
automatic
continuous
human
face
framing
is
applied
at
the
source
level.
This
may
for
instance
allow
the
web
application
to
update
its
UI
or
to
not
apply
human
face
framing
on
its
own.
For
that
reason,
we
extend
MediaStreamTrack
with
the
following
properties.
The WebIDL changes are the following:
WebIDLpartial dictionary MediaTrackSupportedConstraints {
boolean faceFraming
= true;
};
partial dictionary MediaTrackCapabilities {
sequence<boolean> faceFraming
;
};
partial dictionary MediaTrackConstraintSet {
ConstrainBoolean faceFraming
;
};
partial dictionary MediaTrackSettings {
boolean faceFraming
;
};
When
the
"faceFraming"
setting
is
set
to
true
by
the
ApplyConstraints
algorithm
,
the
UA
will
attempt
to
continuously
improve
framing
by
cropping
to
human
faces.
When
the
"faceFraming"
setting
is
set
to
false
by
the
ApplyConstraints
algorithm
,
the
UA
will
not
crop
to
human
faces.
<video></video>
<script>
// Open camera.
});
[videoTrack] = stream.getVideoTracks();
const stream = await navigator.mediaDevices.getUserMedia({video: true});
const [videoTrack] = stream.getVideoTracks();
// Try to improve framing.
capabilities = videoTrack.getCapabilities();
const capabilities = videoTrack.getCapabilities();
if ("faceFraming" in capabilities) {
});
await videoTrack.applyConstraints({faceFraming: true});
} else {
// Face framing is not supported by the platform or by the camera.
// Consider falling back to some other method.
}
// Show to user.
);
videoElement.srcObject = stream;
const videoElement = document.querySelector("video");
videoElement.srcObject = stream;
</
script
>
Some
platforms
or
User
Agents
may
provide
built-in
support
for
human
eye
gaze
correction
to
make
the
eyes
of
faces
appear
to
look
at
the
camera,
in
particular
for
camera
video
streams.
This
may
for
instance
allow
the
web
application
to
update
its
UI
or
to
not
apply
human
eye
gaze
correction
on
its
own.
For
that
reason,
we
extend
MediaStreamTrack
with
the
following
properties.
The WebIDL changes are the following:
WebIDLpartial dictionary MediaTrackSupportedConstraints {
boolean eyeGazeCorrection
= true;
};
partial dictionary MediaTrackCapabilities {
sequence<boolean> eyeGazeCorrection
;
};
partial dictionary MediaTrackConstraintSet {
ConstrainBoolean eyeGazeCorrection
;
};
partial dictionary MediaTrackSettings {
boolean eyeGazeCorrection
;
};
When
the
"eyeGazeCorrection"
setting
is
set
to
true
,
the
User
Agent
will
attempt
to
correct
human
eye
gaze
so
that
the
eyes
of
faces
appear
to
look
at
the
camera.
When
the
"eyeGazeCorrection"
setting
is
set
to
false
,
the
User
Agent
will
not
correct
human
eye
gaze.
<video></video>
<script>
// Open camera.
});
[videoTrack] = stream.getVideoTracks();
const stream = await navigator.mediaDevices.getUserMedia({video: true});
const [videoTrack] = stream.getVideoTracks();
// Try to correct eye gaze.
videoCapabilities = videoTrack.getCapabilities();
)) {
}});
const videoCapabilities = videoTrack.getCapabilities();
if ((videoCapabilities.eyeGazeCorrection || []).includes(true)) {
await videoTrack.applyConstraints({eyeGazeCorrection: {exact: true}});
} else {
// Eye gaze correction is not supported by the platform or by the camera.
// Consider falling back to some other method.
}
// Show to user.
);
videoElement.srcObject = stream;
const videoElement = document.querySelector("video");
videoElement.srcObject = stream;
</
script
>
Some platforms offer functionality for voice isolation: Attempting to remove all parts of an audio track that do not correspond to a human voice. Some platforms even attempt to remove extraneous voices, leaving the "main voice" as the dominant component of the audio. The exact methods used may vary between implementations.
This constraint permits the platform to turn on that functionality, with the desired result being that the "main voice" in the audio signal is the dominant component of the audio.
This will have large effects on audio that is presented for other reasons than to transmit voice (for instance music or ambient noises), so needs to be off by default.
This constraint is a stronger version of noise cancellation, which means that if the "noiseSuppression" constraint is set to false and "voiceIsolation" is set to true, the value of "noiseCancellation" will be ignored.
This constraint has no such relationship with any other constraint; in particular it does not affect echoCancellation.
The WebIDL changes are the following:
WebIDLpartial dictionary MediaTrackSupportedConstraints {
boolean voiceIsolation
= true;
};
partial dictionary MediaTrackConstraintSet {
ConstrainBoolean voiceIsolation
;
};
partial dictionary MediaTrackSettings {
boolean voiceIsolation
;
};
partial dictionary MediaTrackCapabilities {
sequence<boolean> voiceIsolation
;
};
When
the
"voiceIsolation"
setting
is
set
to
true
by
the
ApplyConstraints
algorithm
,
the
UA
will
attempt
to
remove
the
components
of
the
audio
track
that
do
not
correspond
to
a
human
voice.
If
a
dominant
voice
can
be
identified,
the
UA
will
attempt
to
enhance
that
voice.
When
the
"voiceIsolation"
constraint
setting
is
set
to
false
by
the
ApplyConstraints
algorithm
,
the
UA
will
process
the
audio
according
to
other
settings
in
its
normal
fashion.
The
configuration
(capabilities
and
settings)
of
a
MediaStreamTrack
may
be
changed
dynamically
outside
the
control
of
web
applications.
One
example
is
when
a
user
decides
to
switch
on
background
blur
through
the
operating
system.
Web
applications
might
want
to
know
that
the
configuration
of
a
particular
MediaStreamTrack
has
changed.
For
that
purpose,
a
new
event
is
defined
below.
WebIDLpartial interface MediaStreamTrack {
attribute EventHandler onconfigurationchange
;
};
The
onconfigurationchange
attribute
is
an
event
handler
IDL
attribute
for
the
onconfigurationchange
event
handler
,
whose
event
handler
event
type
is
configurationchange
.
When the User Agent detects a change of configuration in a track 's underlying source, the User Agent MUST run the following steps:
If
track
.
muted
is
true
,
wait
for
track
.
muted
to
become
false
or
track
.
readyState
to
be
"ended".
Queue a task on current settings object 's responsible event loop to perform the following steps:
This
task
will
run
before
any
other
task
that
may
set
track
.
muted
to
true
.
If
track
.
readyState
is
"ended",
abort
these
steps.
If track 's capabilities and settings are matching source configuration, abort these steps.
Update track 's capabilities and settings according track 's underlying source.
Fire
an
event
named
configurationchange
on
track
.
These events are potentially triggered simultaneously on documents of different origins. User Agents MAY add fuzzing on the timing of events to avoid cross-origin activity correlation.
This example shows how to monitor external background blur changes.
const stream = await navigator.mediaDevices.getUserMedia({video: true});
const [track] = stream.getVideoTracks();
let {backgroundBlur} = track.getSettings();
applyBlurInSoftwareInstead(!backgroundBlur);
track.addEventListener( {
(backgroundBlur != track.getSettings().backgroundBlur) {
backgroundBlur = track.getSettings().backgroundBlur;
applyBlurInSoftwareInstead(!backgroundBlur);
track.addEventListener("configurationchange", () => {
if (backgroundBlur != track.getSettings().backgroundBlur) {
backgroundBlur = track.getSettings().backgroundBlur;
applyBlurInSoftwareInstead(!backgroundBlur);
}
});
Human
face
metadata
describes
the
geometrical
information
of
human
faces
in
video
frames.
It
can
be
set
by
web
applications
using
the
standard
means
when
creating
VideoFrameMetadata
for
VideoFrame
s
or
it
can
be
set
by
a
user
agent
when
the
media
track
constraint,
defined
below,
is
used
to
enable
face
detection
for
the
MediaStreamTrack
which
provides
the
VideoFrame
s.
The facial metadata can be used by video encoders to enhance the quality of the faces in encoded video streams or for other suitable purposes.
VideoFrameMetadata
WebIDLpartial dictionary VideoFrameMetadata {
sequence<Segment
> segments
;
};
segments
of
type
sequence<
Segment
>
The set of known geometrical segments in the video frame.
Segment
WebIDLdictionary Segment
{
required SegmentType
type
;
required long id
;
long partOf
;
required float probability
;
Point2D centerPoint
;
DOMRectInit boundingBox
;
};
WebIDLenum SegmentType
{
"human-face
",
"left-eye
",
"right-eye
",
"eye
",
"mouth
",
};
Segment
Members
type
of
type
SegmentType
The type of segment which the segment refers to.
It must be one of the following values:
human-face
The segment describes a human face.
left-eye
The segment describes oculus sinister .
right-eye
The segment describes oculus dexter .
eye
The segment describes an eye, either left or right.
mouth
The segment describes a mouth.
id
of
type
long
An
identifier
of
the
object
described
by
the
segment,
unique
within
a
sequence.
If
the
same
object
can
be
tracked
over
multiple
frames
originating
from
the
same
MediaStreamTrack
source
or
it
can
be
matched
to
correspond
to
the
same
object
in
MediaStreamTrack
s
which
are
cloned
from
the
same
original
MediaStreamTrack
,
the
user
agent
SHOULD
use
the
same
id
for
the
segments
which
describe
the
object.
id
is
also
used
in
conjunction
with
the
member
partOf
.
The
user
agent
MUST
NOT
select
the
value
to
assign
to
id
in
such
a
way
that
the
detected
objects
could
be
correlated
to
match
between
different
MediaStreamTrack
sources
unless
the
MediaStreamTrack
s
are
cloned
from
the
same
original
MediaStreamTrack
.
partOf
of
type
long
If
defined,
references
another
segment
which
has
the
member
id
set
to
the
same
value.
The
referenced
segment
corresponds
to
an
object
of
which
the
object
described
by
this
segment
is
part
of.
If
undefined,
the
object
described
by
this
segment
is
not
known
to
be
part
of
any
other
object
described
by
any
segment
associated
with
the
MediaStreamTrack
.
probability
of
type
float
If
nonzero,
this
is
the
estimate
of
the
conditional
probability
that
the
segmented
object
actually
is
of
the
type
indicated
by
the
type
member
on
the
condition
that
the
detection
has
been
made.
The
value
of
this
member
must
be
always
zero
or
above
with
a
maximum
of
one.
The
special
value
of
zero
indicates
that
the
probability
estimate
is
not
available.
centerPoint
of
type
Point2D
The
coordinates
of
the
approximate
center
of
the
object
described
by
this
Segment
.
The
object
location
in
the
frame
can
be
specified
even
if
it
is
obscured
by
other
objects
in
front
of
it
or
it
lies
partially
or
fully
outside
of
the
frame.
The
x
and
y
values
of
the
point
are
interpreted
to
represent
a
coordinate
in
a
normalized
square
space.
The
origin
of
coordinates
{x,y}
=
{0.0,
0.0}
represents
the
upper
left
corner
whereas
the
{x,y}
=
{1.0,
1.0}
represents
the
lower
right
corner
relative
to
the
rendered
frame.
boundingBox
of
type
DOMRectInit
A bounding box surrounding the object described by this segment.
The object bounding box in the frame can be specified even if it is obscured by other objects in front of it or it lies partially or fully outside of the frame.
See
the
member
centerPoint
for
the
definition
of
the
coordinate
system.
MediaTrackSupportedConstraints
dictionary
extensions
WebIDLpartial dictionary MediaTrackSupportedConstraints {
boolean humanFaceDetectionMode
= true;
};
MediaTrackSupportedConstraints
Members
humanFaceDetectionMode
of
type
boolean
,
defaulting
to
true
MediaTrackCapabilities
dictionary
extensions
WebIDLpartial dictionary MediaTrackCapabilities {
sequence<DOMString> humanFaceDetectionMode
;
};
MediaTrackCapabilities
Members
humanFaceDetectionMode
of
type
sequence<
DOMString
>
The
sequence
of
supported
face
detection
modes.
Each
string
MUST
be
one
of
the
members
of
HumanFaceDetectionModeEnum
.
See
humanFaceDetectionMode
for
additional
details.
MediaTrackConstraintSet
dictionary
extensions
WebIDLpartial dictionary MediaTrackConstraintSet {
ConstrainDOMString humanFaceDetectionMode
;
};
MediaTrackConstraintSet
Members
humanFaceDetectionMode
of
type
ConstrainDOMString
MediaTrackSettings
WebIDLpartial dictionary MediaTrackSettings {
DOMString humanFaceDetectionMode
;
};
MediaTrackSettings
Members
humanFaceDetectionMode
of
type
DOMString
HumanFaceDetectionModeEnum
WebIDLenum HumanFaceDetectionModeEnum
{
"none
",
"bounding-box
",
"bounding-box-with-landmark-center-point
",
};
HumanFaceDetectionModeEnum
Enumeration
Description
none
This
MediaStreamTrack
source
does
not
set
metadata
in
VideoFrameMetadata
of
VideoFrame
s
related
to
human
faces
or
human
face
landmarks,
that
is,
to
any
Segment
which
has
the
type
set
to
any
of
the
alternatives
listed
in
enumeration
SegmentType
.
As
an
input,
this
is
interpreted
as
a
command
to
turn
off
the
setting
of
human
face
and
landmark
detection.
bounding-box
This
source
sets
metadata
related
to
human
faces
(segment
type
of
"
human-face
")
including
bounding
box
information
in
the
member
boundingBox
of
each
Segment
related
to
a
detected
face.
The
source
does
not
set
the
human
face
landmark
information.
As
an
input,
this
is
interpreted
as
a
command
to
enable
the
setting
of
human
face
detection
and
to
find
the
bounding
box
of
each
detected
face.
bounding-box-with-landmark-center-point
With
this
setting,
the
source
sets
a
superset
of
the
metadata
compared
to
the
"
bounding-box
"
setting.
The
source
sets
the
same
metadata
and
additionally
metadata
related
to
human
face
landmarks
(all
other
SegmentType
s
except
"
human-face
")
including
center
point
information
in
the
member
centerPoint
of
each
Segment
related
to
a
detected
landmark.
As
an
input,
this
is
interpreted
as
a
command
to
enable
the
setting
of
human
face
and
face
landmark
detection
and
to
set
bounding
box
related
information
to
face
segment
metadata
and
to
set
the
center
point
information
of
each
detected
face
landmark.
The constrainable properties in this section are defined below.
Property Name | Values | Notes |
---|---|---|
humanFaceDetectionMode |
ConstrainDOMString
|
This
string
(or
each
string,
when
a
list)
should
be
one
of
the
members
of
As
a
As
a
setting,
this
reflects
which
face
geometrical
properties
the
user
agent
detects
and
sets
in
the
metadata
of
the
|
// main.js:
// Open camera with face detection enabled
navigator.mediaDevices.getUserMedia({
const stream = await navigator.mediaDevices.getUserMedia({
video: { humanFaceDetectionMode: 'bounding-box' }
});
[videoTrack] = stream.getVideoTracks();
) {
const [videoTrack] = stream.getVideoTracks();
if (videoTrack.getSettings().humanFaceDetectionMode != 'bounding-box') {
throw('Face bounding box detection is not supported');
}
// Use a video worker and show to user.
);
);
videoWorker.postMessage({: videoTrack}, [videoTrack]);
videoWorker.onmessage);
videoElement.srcObject = MediaStream([data.videoTrack]);
const videoElement = document.querySelector('video');
const videoWorker = new Worker('video-worker.js');
videoWorker.postMessage({track: videoTrack}, [videoTrack]);
const {data} = await new Promise(r => videoWorker.onmessage);
videoElement.srcObject = new MediaStream([data.videoTrack]);
// video-worker.js:
self.onmessage = : {track}}) => {
VideoTrackGenerator();
parent.postMessage({: generator.track}, [generator.track]);
MediaStreamTrackProcessor({track});
TransformStream({
{
frame.metadata().segments || []) {
) {
self.onmessage = async ({data: {track}}) => {
const generator = new VideoTrackGenerator();
parent.postMessage({videoTrack: generator.track}, [generator.track]);
const {readable} = new MediaStreamTrackProcessor({track}); const transformer = new TransformStream({ async transform(frame, controller) { for (const segment of frame.metadata().segments || []) { if (segment.type === 'human-face') {
// the metadata is coming directly from the video track with
// bounding-box face detection enabled
.log(
console.log(
`Face @ (${segment.boundingBox.x}, ${segment.boundingBox.y}), size ` +
`${segment.boundingBox.width}x${segment.boundingBox.height}`);
}
}
controller.enqueue(frame);
controller.enqueue(frame);
}
});
readable.pipeThrough(transformer).pipeTo(generator.writable);
await readable.pipeThrough(transformer).pipeTo(generator.writable);
};
MediaStreamTrack
's
objects
are
exposed
to
workers,
so
can
do
MediaStream
's
objects.
The WebIDL changes are the following:
WebIDL[Exposed=(Window,Worker)]
partial interface MediaStream {
};
Referenced in:
Referenced in:
Referenced in:
Referenced in:
Referenced in:
Referenced in:
Referenced in:
Referenced in:
Referenced in:
Referenced in:
Referenced in:
Referenced in:
Referenced in: