Copyright © 2023 World Wide Web Consortium . W3C ® liability , trademark and permissive document license rules apply.
This document defines a set of ECMAScript APIs in WebIDL to extend the [ mediacapture-streams ] specification.
This section describes the status of this document at the time of its publication. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at https://www.w3.org/TR/.
This is an unofficial proposal.
This document was published by the Web Real-Time Communications Working Group as an Editor's Draft.
Publication as an Editor's Draft does not imply endorsement by W3C and its Members.
This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.
This document was produced by a group operating under the W3C Patent Policy . W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy .
This document is governed by the 2 November 2021 W3C Process Document .
This document contains proposed extensions and modifications to the [ mediacapture-streams ] specification.
New features and modifications to existing features proposed here may be considered for addition into the main specification post Recommendation. Deciding factors will include maturity of the extension or modification, consensus on adding it, and implementation experience.
A
concrete
long-term
goal
is
reducing
the
fingerprinting
surface
of
enumerateDevices
()
by
deprecating
exposure
of
the
device
label
in
its
results.
This
requires
relieving
applications
of
the
burden
of
building
user
interfaces
to
select
cameras
and
microphones
in-content,
by
offering
this
in
user
agents
as
part
of
getUserMedia
()
instead.
Miscellaneous other smaller features are under consideration as well, such as constraints to control multi-channel audio beyond stereo.
This
document
uses
the
definitions
MediaDevices
,
MediaStreamTrack
,
MediaStreamConstraints
,
ConstrainablePattern
,
MediaTrackSupportedConstraints
,
MediaTrackCapabilities
,
MediaTrackConstraintSet
,
MediaTrackSettings
and
ConstrainBoolean
from
[
mediacapture-streams
].
The terms permission state , request permission to use , and prompt the user to choose are defined in [ permissions ].
As well as sections marked as non-normative, all authoring guidelines, diagrams, examples, and notes in this specification are non-normative. Everything else in this specification is normative.
The key words MAY , MUST , MUST NOT , and SHOULD in this document are to be interpreted as described in BCP 14 [ RFC2119 ] [ RFC8174 ] when, and only when, they appear in all capitals, as shown here.
The
existing
enumerateDevices
()
function
exposes
camera
and
microphone
label
s
to
let
applications
build
in-content
user
interfaces
for
camera
and
microphone
selection.
Applications
have
had
to
do
this
because
getUserMedia
()
did
not
offer
a
web
compatible
in-agent
device
picker.
This
specification
aims
to
rectify
that.
Due
to
the
significant
fingerprinting
vector
caused
by
device
label
s,
and
the
well-established
nature
of
the
existing
APIs,
the
scope
of
this
particular
effort
is
limited
to
removing
label
,
leaving
the
overall
constraints-based
model
intact.
This
helps
ensure
a
migration
path
more
viable
than
to
a
less-powerful
API.
This
specification
augments
the
existing
getUserMedia
()
function
instead
of
introducing
a
new
less-powerful
API
to
compete
with
it,
for
that
reason
as
well.
This
specification
introduces
slightly
altered
semantics
to
the
getUserMedia
()
function
called
"user-chooses"
that
guarantee
a
picker
will
be
shown
to
the
user
in
cases
where
the
user
agent
would
otherwise
choose
for
the
user
(that
is:
when
application
constraints
do
not
narrow
down
the
choices
to
a
single
device).
This
is
orthoginal
to
permission,
and
offers
a
better
and
more
consistent
user
experience
across
applications
and
user
agents.
Unfortunately,
since
the
"user-chooses"
semantics
may
produce
user
agent
prompts
at
different
times
and
in
different
situations
compared
to
the
old
semantics,
they
are
somewhat
incompatible
with
expectations
in
some
existing
web
applications
that
tend
to
call
getUserMedia
()
repeatedly
and
lazily
instead
of
using
e.g.
stream.clone()
.
User
agents
are
encouraged
to
provide
the
new
semantics
as
opt-in
initially
for
web
compatibility.
User
agents
MUST
deprecate
(remove)
label
from
MediaDeviceInfo
over
time,
though
specific
migration
strategies
are
left
to
user
agents.
User
agents
SHOULD
migrate
to
offering
the
new
semantics
by
default
(opt-out)
over
time.
Since the constraints-model remains intact, web compatibility problems are expected to be limited to:
WebIDLpartial interface MediaDevices {
readonly attribute GetUserMediaSemantics
defaultSemantics
;
};
defaultSemantics
of
type
GetUserMediaSemantics
,
readonly
The
default
semantics
of
getUserMedia
()
in
this
user
agent.
User
agents
SHOULD
default
to
"browser-chooses"
for
backwards
compatibility,
until
a
transition
plan
has
been
enacted
where
a
majority
of
user
agents
collectively
switch
their
defaults
to
"user-chooses"
for
improved
user
privacy,
and
usage
metrics
suggest
this
transition
is
feasible
without
major
breakage.
WebIDLpartial dictionary MediaStreamConstraints {
GetUserMediaSemantics
semantics
;
};
MediaStreamConstraints
Members
semantics
of
type
GetUserMediaSemantics
In
cases
where
the
specified
constraints
do
not
narrow
multiple
choices
between
devices
down
to
one
per
kind,
specifies
how
the
final
determination
of
which
devices
to
pick
from
the
remaining
choices
MUST
be
made.
If
not
specified,
then
the
defaultSemantics
are
used.
WebIDLenum GetUserMediaSemantics
{
"browser-chooses
",
"user-chooses
"
};
GetUserMediaSemantics
Enumeration
description
|
|
---|---|
browser-chooses
|
When application-specified constraints do not narrow multiple choices between devices down to one per kind, the user agent is allowed to make the final determination between the remaining choices. |
user-chooses
|
When application-specified constraints do not narrow multiple choices between devices down to one per kind, the user agent MUST prompt the user to choose between the remaining choices, even if the application already has permission to some or all of them. |
When
the
getUserMedia
()
method
is
invoked,
run
the
following
steps
before
invoking
the
getUserMedia
()
algorithm:
Let mediaDevices be the object on which this method was invoked.
Let constraints be the method's first argument.
Let
semanticsPresent
be
true
if
constraints
.semantics
exists
,
otherwise
false
.
Let
semantics
be
constraints
.semantics
if
present
,
or
the
value
of
mediaDevices
.
otherwise.
defaultSemantics
Replace
step
6.5.1.
of
the
getUserMedia
()
algorithm
in
its
entirety
with
the
following
two
steps:
Let
descriptor
be
a
PermissionDescriptor
with
its
name
member
set
to
the
permission
name
associated
with
kind
(e.g.
"
camera
"
for
"video"
,
"
microphone
"
for
"audio"
),
and,
optionally,
consider
its
deviceId
member
set
to
any
appropriate
device's
deviceId
.
If
the
number
of
unique
devices
sourcing
tracks
of
media
type
kind
in
candidateSet
is
greater
than
1
and
semantics
is
"user-chooses"
,
then
prompt
the
user
to
choose
a
device
with
descriptor
,
resulting
in
provided
media.
Otherwise,
request
permission
to
use
a
device
with
descriptor
,
while
considering
all
devices
being
attached
to
a
live
and
same-permission
MediaStreamTrack
in
the
current
browsing
context
to
mean
having
permission
status
"
granted
",
resulting
in
provided
media.
Same-permission
in
this
context
means
a
MediaStreamTrack
that
required
the
same
level
of
permission
to
obtain
as
what
is
being
requested.
When asking the user’s permission, the user agent MUST disclose whether permission will be granted only to the device chosen, or to all devices of that kind .
Let
track
be
the
provided
media,
which
MUST
be
precisely
one
track
of
type
kind
from
finalSet
.
If
semantics
is
"browser-chooses"
then
the
decision
of
which
track
to
choose
from
finalSet
is
up
to
the
User
Agent,
which
MAY
use
the
value
of
the
computed
"fitness
distance"
from
the
SelectSettings
algorithm,
the
value
of
semanticsPresent
,
or
any
other
internally-available
information
about
the
devices,
as
inputs
to
its
decision.
If
semantics
is
"user-chooses"
,
and
the
application
has
not
narrowed
down
the
choices
to
one,
then
the
user
agent
MUST
ask
the
user
to
make
the
final
selection.
Once
selected,
the
source
of
the
MediaStreamTrack
MUST
NOT
change.
User Agents are encouraged to default to or present a default choice based primarily on fitness distance, and secondarily on the user's primary or system default device for kind (when possible). User Agents MAY allow users to use any media source, including pre-recorded media files.
This example shows a setup with a start button and a camera selector using the new semantics (microphone is not shown for brievity but is equivalent).
<button id="start">Start</button>
<button id="chosenCamera" disabled>Camera: none</button>
<script>
let cameraTrack = null;
start.onclick = async () => {
try {
const stream = await navigator.mediaDevices.getUserMedia({
video: {deviceId: localStorage.cameraId}
});
setCameraTrack(stream.getVideoTracks()[0]);
} catch (err) {
console.error(err);
}
}
chosenCamera.onclick = async () => {
try {
const stream = await navigator.mediaDevices.getUserMedia({
video: true,
semantics: "user-chooses"
});
setCameraTrack(stream.getVideoTracks()[0]);
} catch (err) {
console.error(err);
}
}
function setCameraTrack(track) {
cameraTrack = track;
const {deviceId, label} = track.getSettings();
localStorage.cameraId = deviceId;
chosenCamera.innerText = `Camera: ${label}`;
chosenCamera.disabled = false;
}
</
script
>
A
MediaStreamTrack
is
a
transferable
object
.
This
allows
manipulating
real-time
media
outside
the
context
it
was
requested
or
created
in,
for
instance
in
workers
or
third-party
iframes.
To preserve the existing privacy and security infrastructure, in particular for capture tracks, the track source lifetime management remains tied to the context that created it. The transfer algorithm MUST ensure the following behaviors:
The context named originalContext that created a track named originalTrack remains in control of the originalTrack source, named trackSource , even when originalTrack is transferred into transferredTrack .
In particular, originalContext remains the proxy to privacy indicators of trackSource . transferredTrack or any of its clones are considered as tracks using trackSource as if they were tracks created in and controlled by originalContext .
When originalContext goes away, trackSource gets ended, thus transferredTrack gets ended.
When originalContext would have muted/unmuted originalTrack , transferredTrack gets muted/unmuted.
If transferredTrack is cloned in transferredTrackClone , transferredTrackClone is tied to trackSource . It is not tied to originalTrack in any way.
If transferredTrack is transferred into transferredAgainTrack , transferredAgainTrack is tied to trackSource . It is not tied to transferredTrack or originalTrack in any way.
The WebIDL changes to make the track transferable are the following:
WebIDL[Exposed=(Window,Worker), Transferable]
partial interface MediaStreamTrack {
};
At
creation
of
a
MediaStreamTrack
object,
called
track
,
run
the
following
steps:
Initialize
track
.
[[IsDetached]]
to
false
.
The
MediaStreamTrack
transfer
steps
,
given
value
and
dataHolder
,
are:
If
value
.
[[IsDetached]]
is
true
,
throw
a
"DataCloneError"
DOMException.
Set
dataHolder
.
[[id]]
to
value
.
id
.
Set
dataHolder
.
[[kind]]
to
value
.
kind
.
Set
dataHolder
.
[[label]]
to
value
.
label
.
Set
dataHolder
.
[[readyState]]
to
value
.
readyState
.
Set
dataHolder
.
[[enabled]]
to
value
.
enabled
.
Set
dataHolder
.
[[muted]]
to
value
.
muted
.
Set
dataHolder
.
[[source]]
to
value
underlying
source.
Set
dataHolder
.
[[constraints]]
to
value
active
constraints.
Set
value
.
[[IsDetached]]
to
true
.
Set
value
.
[[ReadyState]]
to
"ended"
(without
stopping
the
underlying
source
or
firing
an
ended
event).
MediaStreamTrack
transfer-receiving
steps
,
given
dataHolder
and
track
,
are:
Initialize
track
.
id
to
dataHolder
.
[[id]]
.
Initialize
track
.
kind
to
dataHolder
.
[[kind]]
.
Initialize
track
.
label
to
dataHolder
.
[[label]]
.
Initialize
track
.
readyState
to
dataHolder
.
[[readyState]]
.
Initialize
track
.
enabled
to
dataHolder
.
[[enabled]]
.
Initialize
track
.
muted
to
dataHolder
.
[[muted]]
.
Initialize
the
underlying
source
of
track
to
dataHolder
.
[[source]]
.
Set
track
's
constraints
to
dataHolder
.
[[constraints]]
.
The underlying source is supposed to be kept alive between the transfer and transfer-receiving steps, or as long as the data holder is alive. In a sense, between these steps, the data holder is attached to the underlying source as if it was a track.
On
camera
and
screenshare
tracks,
frame
counters
allow
the
application
to
tell
what
the
frame
rate
is,
which
may
be
lower
than
the
target
frameRate
.
For
example,
if
the
track
is
sourced
from
a
camera
then
the
production
of
frames
could
be
slowed
down
if
it's
dark
or
frames
could
be
dropped
if
the
system
is
CPU
starved.
This
could
impact
the
total
number
of
frames
produced
by
the
source
and
impact
how
many
frames
are
delivered,
discarded
or
dropped
for
other
reasons.
WebIDLpartial interface MediaStreamTrack {
Promise<MediaTrackFrameStats
> getFrameStats
();
};
If
a
MediaStreamTrack
is
sourced
from
getUserMedia()
or
getDisplayMedia()
,
the
user
agent
is
required
to
count
each
frame
from
its
source
as
follows:
A frame is considered delivered if it either was delivered to a sink or would have been delivered to a sink, if one was connected. This is a subset of total frames and it is incremented at the same time as total frames .
A
frame
is
considered
discarded
if
it
was
discarded
in
order
to
achieve
the
target
frameRate
.
This
is
a
subset
of
total
frames
and
it
is
incremented
at
the
same
time
as
total
frames
.
The total number of frames that have been processed by this source, meaning it is known whether the frame was considered delivered, discarded or dropped for any other reason. The number of dropped frames for various unknown reasons can be calculated by subtracting delivered frames and discarded frames from total frames .
If the track is unmuted and enabled and the source is backed by a camera, total frames is incremented by frames produced by the camera. If no frames are flowing, such as if the track is muted or disabled, then total frames does not increment.
getFrameStats
When this method is called, the user agenst MUST run the following steps:
Let
track
be
the
MediaStreamTrack
that
this
method
was
called
on.
If
track
is
not
sourced
from
getUserMedia()
or
getDisplayMedia()
,
reject
this
method
with
NotSupportedError
and
abort
these
steps.
Let p be a new promise. Begin running the following steps in parallel and return p :
Queue
a
task
to
resolve
p
with
a
newly
constructed
MediaTrackFrameStats
dictionary
where
timestamp
is
set
to
Performance.timeOrigin
+
Performance.now()
,
deliveredFrames
is
set
the
total
number
of
delivered
frames
,
discardedFrames
is
set
to
the
total
number
of
discarded
frames
and
totalFrames
is
set
to
the
total
frames
count.
WebIDLdictionary MediaTrackFrameStats
{
DOMHighResTimeStamp timestamp
;
unsigned long long deliveredFrames
;
unsigned long long discardedFrames
;
unsigned long long totalFrames
;
};
MediaTrackFrameStats
Members
timestamp
of
type
DOMHighResTimeStamp
The
timestamp
,
relative
to
the
UNIX
epoch
(Jan
1,
1970,
UTC),
for
when
these
stats
where
collected.
deliveredFrames
of
type
unsigned
long
long
The total number of delivered frames .
discardedFrames
of
type
unsigned
long
long
The total number of discarded frames .
totalFrames
of
type
unsigned
long
long
The total frames count.
WebIDLpartial dictionary MediaTrackSupportedConstraints {
boolean powerEfficientPixelFormat
= true;
};
MediaTrackSupportedConstraints
Members
powerEfficientPixelFormat
of
type
boolean
,
defaulting
to
true
WebIDLpartial dictionary MediaTrackCapabilities {
sequence<boolean> powerEfficientPixelFormat
;
};
MediaTrackCapabilities
Members
powerEfficientPixelFormat
of
type
sequence<
boolean
>
If
the
source
only
has
power
efficient
pixel
formats,
a
single
true
is
reported.
If
the
source
only
has
power
inefficient
pixel
formats,
a
single
false
is
reported.
If
the
script
can
control
the
feature,
the
source
reports
a
list
with
both
true
and
false
as
possible
values.
See
powerEfficientPixelFormat
for
additional
details.
WebIDLpartial dictionary MediaTrackSettings {
boolean powerEfficientPixelFormat
;
};
MediaTrackSettings
Members
powerEfficientPixelFormat
of
type
boolean
The constrainable properties in this document are defined below.
Property Name | Values | Notes |
---|---|---|
powerEfficientPixelFormat |
ConstrainBoolean
|
Compressed pixel formats often need to be decoded, for instance for display purposes or when being encoded during a video call. The user agent SHOULD label compressed pixel formats that incur significant power penalty when decoded as power inefficient. The labeling is up to the user agent, but decoding MJPEG in software is an example of an expensive mode. Pixel formats that have not been labeled power inefficient by the user agent are for the purpose of this API considered power efficient. As a constraint, setting it to true allows filtering out inefficient pixel formats and setting it to false allows filtering out efficient pixel formats. As a setting, this reflects whether or not the current pixel format is considered power efficient by the user agent. |
Some
platforms
or
User
Agents
may
provide
built-in
support
for
background
blurring
of
video
frames,
in
particular
for
camera
video
streams.
Web
applications
may
either
want
to
control
or
at
least
be
aware
that
background
blur
is
applied
at
the
source
level.
This
may
for
instance
allow
the
web
application
to
update
its
UI
or
to
not
apply
background
blur
on
its
own.
For
that
reason,
we
extend
MediaStreamTrack
with
the
following
properties.
The WebIDL changes are the following:
WebIDLpartial dictionary MediaTrackSupportedConstraints {
boolean backgroundBlur
= true;
};
partial dictionary MediaTrackConstraintSet {
ConstrainBoolean backgroundBlur
;
};
partial dictionary MediaTrackSettings {
boolean backgroundBlur
;
};
partial dictionary MediaTrackCapabilities {
sequence<boolean> backgroundBlur
;
};
Some platforms offer functionality for voice isolation: Attempting to remove all parts of an audio track that do not correspond to a human voice. Some platforms even attempt to remove extraneous voices, leaving the "main voice" as the dominant component of the audio. The exact methods used may vary between implementations.
This constraint permits the platform to turn on that functionality, with the desired result being that the "main voice" in the audio signal is the dominant component of the audio.
This will have large effects on audio that is presented for other reasons than to transmit voice (for instance music or ambient noises), so needs to be off by default.
This constraint is a stronger version of noise cancellation, which means that if the "noiseSuppression" constraint is set to false and "voiceIsolation" is set to true, the value of "noiseCancellation" will be ignored.
This constraint has no such relationship with any other constraint; in particular it does not affect echoCancellation.
The WebIDL changes are the following:
WebIDLpartial dictionary MediaTrackSupportedConstraints {
boolean voiceIsolation
= true;
};
partial dictionary MediaTrackConstraintSet {
ConstrainBoolean voiceIsolation
;
};
partial dictionary MediaTrackSettings {
boolean voiceIsolation
;
};
partial dictionary MediaTrackCapabilities {
sequence<boolean> voiceIsolation
;
};
When
the
"voiceIsolation"
setting
is
set
to
true
by
the
ApplyConstraints
algorithm
,
the
UA
will
attempt
to
remove
the
components
of
the
audio
track
that
do
not
correspond
to
a
human
voice.
If
a
dominant
voice
can
be
identified,
the
UA
will
attempt
to
enhance
that
voice.
When
the
"voiceIsolation"
constraint
setting
is
set
to
false
by
the
ApplyConstraints
algorithm
,
the
UA
will
process
the
audio
according
to
other
settings
in
its
normal
fashion.
The
configuration
(capabilities,
constraints
or
(capabilities
and
settings)
of
a
MediaStreamTrack
may
be
changed
dynamically
outside
the
control
of
web
applications.
One
example
is
when
a
user
decides
to
switch
on
background
blur
through
the
operating
system.
Web
applications
might
want
to
know
that
the
configuration
of
a
particular
MediaStreamTrack
has
changed.
For
that
purpose,
a
new
event
is
defined
below.
WebIDLpartial interface MediaStreamTrack {
attribute EventHandler onconfigurationchange
;
};
The
onconfigurationchange
attribute
is
an
event
handler
IDL
attribute
for
the
onconfigurationchange
event
handler
,
whose
event
handler
event
type
is
configurationchange
.
When the User Agent detects a change of configuration in a track 's underlying source, the User Agent MUST run the following steps:
If
track
.
muted
is
true
,
wait
for
track
.
muted
to
become
false
or
track
.
readyState
to
be
"ended".
Queue a task on current settings object 's responsible event loop to perform the following steps:
This
task
will
run
before
any
other
task
that
may
set
track
.
muted
to
true
.
If
track
.
readyState
is
"ended",
abort
these
steps.
If
track
's
capabilities,
constraints
capabilities
and
settings
are
matching
source
configuration,
abort
these
steps.
Update
track
's
capabilities,
constraints
capabilities
and
settings
according
track
's
underlying
source.
Fire
an
event
named
configurationchange
on
track
.
These events are potentially triggered simultaneously on documents of different origins. User Agents MAY add fuzzing on the timing of events to avoid cross-origin activity correlation.
This example shows how to monitor external background blur changes.
const stream = await navigator.mediaDevices.getUserMedia({video: true});const [track] = stream.getVideoTracks();let {backgroundBlur} = track.getSettings();
applyBlurInSoftwareInstead(!backgroundBlur);
track.addEventListener("configurationchange", () => {
if (backgroundBlur != track.getSettings().backgroundBlur) {
backgroundBlur = track.getSettings().backgroundBlur;
applyBlurInSoftwareInstead(!backgroundBlur);
}
});
Referenced in:
Referenced in:
Referenced in:
Referenced in:
Referenced in:
Referenced in:
Referenced in: