Copyright © 2021 W3C ® ( MIT , ERCIM , Keio , Beihang ). W3C liability , trademark and permissive document license rules apply.
This document defines a set of ECMAScript APIs in WebIDL to extend the [ GETUSERMEDIA ] specification.
This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at https://www.w3.org/TR/.
This is an unofficial proposal.
This document was published by the Web Real-Time Communications Working Group as an Editor's Draft.
GitHub Issues are preferred for discussion of this specification.
Publication as an Editor's Draft does not imply endorsement by the W3C Membership.
This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.
This document was produced by a group operating under the W3C Patent Policy . W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy .
This document is governed by the 15 September 2020 W3C Process Document .
This document contains proposed extensions and modifications to the [ GETUSERMEDIA ] specification.
New features and modifications to existing features proposed here may be considered for addition into the main specification post Recommendation. Deciding factors will include maturity of the extension or modification, consensus on adding it, and implementation experience.
A
concrete
long-term
goal
is
reducing
the
fingerprinting
surface
of
enumerateDevices
()
by
deprecating
exposure
of
the
device
label
in
its
results.
This
requires
relieving
applications
of
the
burden
of
building
user
interfaces
to
select
cameras
and
microphones
in-content,
by
offering
this
in
user
agents
as
part
of
getUserMedia
()
instead.
Miscellaneous other smaller features are under consideration as well, such as constraints to control multi-channel audio beyond stereo.
This
document
uses
the
definitions
MediaDevices
,
MediaStreamTrack
,
MediaStreamConstraints
and
ConstrainablePattern
from
[
GETUSERMEDIA
].
The terms permission state , request permission to use , and prompt the user to choose are defined in [ permissions ].
As well as sections marked as non-normative, all authoring guidelines, diagrams, examples, and notes in this specification are non-normative. Everything else in this specification is normative.
The key words MAY , MUST , MUST NOT , and SHOULD in this document are to be interpreted as described in BCP 14 [ RFC2119 ] [ RFC8174 ] when, and only when, they appear in all capitals, as shown here.
The
existing
enumerateDevices
()
function
exposes
camera
and
microphone
label
s
to
let
applications
build
in-content
user
interfaces
for
camera
and
microphone
selection.
Applications
have
had
to
do
this
because
getUserMedia
()
did
not
offer
a
web
compatible
in-agent
device
picker.
This
specification
aims
to
rectify
that.
Due
to
the
significant
fingerprinting
vector
caused
by
device
label
s,
and
the
well-established
nature
of
the
existing
APIs,
the
scope
of
this
particular
effort
is
limited
to
removing
label
,
leaving
the
overall
constraints-based
model
intact.
This
helps
ensure
a
migration
path
more
viable
than
to
a
less-powerful
API.
This
specification
augments
the
existing
getUserMedia
()
function
instead
of
introducing
a
new
less-powerful
API
to
compete
with
it,
for
that
reason
as
well.
This
specification
introduces
slightly
altered
semantics
to
the
getUserMedia
()
function
called
"user-chooses"
that
guarantee
a
picker
will
be
shown
to
the
user
in
cases
where
the
user
agent
would
otherwise
choose
for
the
user
(that
is:
when
application
constraints
do
not
narrow
down
the
choices
to
a
single
device).
This
is
orthoginal
to
permission,
and
offers
a
better
and
more
consistent
user
experience
across
applications
and
user
agents.
Unfortunately,
since
the
"user-chooses"
semantics
may
produce
user
agent
prompts
at
different
times
and
in
different
situations
compared
to
the
old
semantics,
they
are
somewhat
incompatible
with
expectations
in
some
existing
web
applications
that
tend
to
call
getUserMedia
()
repeatedly
and
lazily
instead
of
using
e.g.
stream.clone()
.
User
agents
are
encouraged
to
provide
the
new
semantics
as
opt-in
initially
for
web
compatibility.
User
agents
MUST
deprecate
(remove)
label
from
MediaDeviceInfo
over
time,
though
specific
migration
strategies
are
left
to
user
agents.
User
agents
SHOULD
migrate
to
offering
the
new
semantics
by
default
(opt-out)
over
time.
Since the constraints-model remains intact, web compatibility problems are expected to be limited to:
WebIDLpartial interface MediaDevices {
readonly attribute GetUserMediaSemantics defaultSemantics;
};
defaultSemantics
of
type
GetUserMediaSemantics
,
readonly
The
default
semantics
of
getUserMedia
()
in
this
user
agent.
User
agents
SHOULD
default
to
"browser-chooses"
for
backwards
compatibility,
until
a
transition
plan
has
been
enacted
where
a
majority
of
user
agents
collectively
switch
their
defaults
to
"user-chooses"
for
improved
user
privacy,
and
usage
metrics
suggest
this
transition
is
feasible
without
major
breakage.
WebIDLpartial dictionary MediaStreamConstraints {
GetUserMediaSemantics semantics;
};
MediaStreamConstraints
Members
semantics
of
type
GetUserMediaSemantics
In
cases
where
the
specified
constraints
do
not
narrow
multiple
choices
between
devices
down
to
one
per
kind,
specifies
how
the
final
determination
of
which
devices
to
pick
from
the
remaining
choices
MUST
be
made.
If
not
specified,
then
the
defaultSemantics
are
used.
WebIDLenum GetUserMediaSemantics {
"browser-chooses",
"user-chooses"
};
GetUserMediaSemantics
Enumeration
description
|
|
|---|---|
browser-chooses
|
When application-specified constraints do not narrow multiple choices between devices down to one per kind, the user agent is allowed to make the final determination between the remaining choices. |
user-chooses
|
When application-specified constraints do not narrow multiple choices between devices down to one per kind, the user agent MUST prompt the user to choose between the remaining choices, even if the application already has permission to some or all of them. |
When
the
getUserMedia
()
method
is
invoked,
run
the
following
steps
before
invoking
the
getUserMedia
()
algorithm:
Let mediaDevices be the object on which this method was invoked.
Let constraints be the method's first argument.
Let
semanticsPresent
be
true
if
constraints
.semantics
exists
,
otherwise
false
.
Let
semantics
be
constraints
.semantics
if
present
,
or
the
value
of
mediaDevices
.
otherwise.
defaultSemantics
Replace
step
6.5.1.
of
the
getUserMedia
()
algorithm
in
its
entirety
with
the
following
two
steps:
Let
descriptor
be
a
PermissionDescriptor
with
its
name
member
set
to
the
permission
name
associated
with
kind
(e.g.
"
camera
"
for
"video"
,
"
microphone
"
for
"audio"
),
and,
optionally,
consider
its
deviceId
member
set
to
any
appropriate
device's
deviceId
.
If
the
number
of
unique
devices
sourcing
tracks
of
media
type
kind
in
candidateSet
is
greater
than
1
and
semantics
is
"user-chooses"
,
then
prompt
the
user
to
choose
a
device
with
descriptor
,
resulting
in
provided
media.
Otherwise,
request
permission
to
use
a
device
with
descriptor
,
while
considering
all
devices
being
attached
to
a
live
and
same-permission
MediaStreamTrack
in
the
current
browsing
context
to
mean
having
permission
status
"
granted
",
resulting
in
provided
media.
Same-permission
in
this
context
means
a
MediaStreamTrack
that
required
the
same
level
of
permission
to
obtain
as
what
is
being
requested.
When asking the user’s permission, the user agent MUST disclose whether permission will be granted only to the device chosen, or to all devices of that kind .
Let
track
be
the
provided
media,
which
MUST
be
precisely
one
track
of
type
kind
from
finalSet
.
If
semantics
is
"browser-chooses"
then
the
decision
of
which
track
to
choose
from
finalSet
is
up
to
the
User
Agent,
which
MAY
use
the
value
of
the
computed
"fitness
distance"
from
the
SelectSettings
algorithm,
the
value
of
semanticsPresent
,
or
any
other
internally-available
information
about
the
devices,
as
inputs
to
its
decision.
If
semantics
is
"user-chooses"
,
and
the
application
has
not
narrowed
down
the
choices
to
one,
then
the
user
agent
MUST
ask
the
user
to
make
the
final
selection.
Once
selected,
the
source
of
the
MediaStreamTrack
MUST
NOT
change.
User Agents are encouraged to default to or present a default choice based primarily on fitness distance, and secondarily on the user's primary or system default device for kind (when possible). User Agents MAY allow users to use any media source, including pre-recorded media files.
This example shows a setup with a start button and a camera selector using the new semantics (microphone is not shown for brievity but is equivalent).
<button id="start">Start</button>
<button id="chosenCamera" disabled>Camera: none</button>
<script>
let cameraTrack = null;
start.onclick = async () => {
try {
const stream = await navigator.mediaDevices.getUserMedia({
video: {deviceId: localStorage.cameraId}
});
setCameraTrack(stream.getVideoTracks()[0]);
} catch (err) {
console.error(err);
}
}
chosenCamera.onclick = async () => {
try {
const stream = await navigator.mediaDevices.getUserMedia({
video: true,
semantics: "user-chooses"
});
setCameraTrack(stream.getVideoTracks()[0]);
} catch (err) {
console.error(err);
}
}
function setCameraTrack(track) {
cameraTrack = track;
const {deviceId, label} = track.getSettings();
localStorage.cameraId = deviceId;
chosenCamera.innerText = `Camera: ${label}`;
chosenCamera.disabled = false;
}
</
script
>
A
MediaStreamTrack
is
a
transferable
object
.
This
allows
manipulating
real-time
media
outside
the
context
it
was
requested
or
created
in,
for
instance
in
workers
or
third-party
iframes.
To preserve the existing privacy and security infrastructure, in particular for capture tracks, the track source lifetime management remains tied to the context that created it. The transfer algorithm MUST ensure the following behaviors:
The context named originalContext that created a track named originalTrack remains in control of the originalTrack source, named trackSource , even when originalTrack is transferred into transferredTrack .
In particular, originalContext remains the proxy to privacy indicators of trackSource . transferredTrack or any of its clones are considered as tracks using trackSource as if they were tracks created in and controlled by originalContext .
When originalContext goes away, trackSource gets ended, thus transferredTrack gets ended.
When originalContext would have muted/unmuted originalTrack , transferredTrack gets muted/unmuted.
If transferredTrack is cloned in transferredTrackClone , transferredTrackClone is tied to trackSource . It is not tied to originalTrack in any way.
If transferredTrack is transferred into transferredAgainTrack , transferredAgainTrack is tied to trackSource . It is not tied to transferredTrack or originalTrack in any way.
The WebIDL changes are the following:
WebIDL[Exposed=(Window,Worker), Transferable]
partial interface MediaStreamTrack {
};
At
creation
of
a
MediaStreamTrack
object,
called
track
,
run
the
following
steps:
Initialize
track
.
[[IsDetached]]
to
false
.
The
MediaStreamTrack
transfer
steps
,
given
value
and
dataHolder
,
are:
If
value
.
[[IsDetached]]
is
true
,
throw
a
"DataCloneError"
DOMException.
Set
dataHolder
.
[[id]]
to
value
.
id
.
Set
dataHolder
.
[[kind]]
to
value
.
kind
.
Set
dataHolder
.
[[label]]
to
value
.
label
.
Set
dataHolder
.
[[readyState]]
to
value
.
readyState
.
Set
dataHolder
.
[[enabled]]
to
value
.
enabled
.
Set
dataHolder
.
[[muted]]
to
value
.
muted
.
Set
dataHolder
.
[[source]]
to
value
underlying
source.
Set
dataHolder
.
[[constraints]]
to
value
active
constraints.
Set
value
.
[[IsDetached]]
to
true
.
Set
value
.
readyState
to
"ended"
.
MediaStreamTrack
transfer-receiving
steps
,
given
dataHolder
and
track
,
are:
Initialize
track
.
id
to
dataHolder
.
[[id]]
.
Initialize
track
.
kind
to
dataHolder
.
[[kind]]
.
Initialize
track
.
label
to
dataHolder
.
[[label]]
.
Initialize
track
.
readyState
to
dataHolder
.
[[readyState]]
.
Initialize
track
.
enabled
to
dataHolder
.
[[enabled]]
.
Initialize
track
.
muted
to
dataHolder
.
[[muted]]
.
Initialize
the
underlying
source
of
track
to
dataHolder
.
[[source]]
with
tieSourceToContext
equal
to
false
.
Set
track
's
constraints
to
dataHolder
.
[[constraints]]
.
The underlying source is supposed to be kept alive between the transfer and transfer-receiving steps, or as long as the data holder is alive. In a sense, between these steps, the data holder is attached to the underlying source as if it was a track.
When using echo cancellation, the reference signal that is cancelled against has so far been left unspecified. This extension specifies a default and a means of overriding it.
WebIDLpartial dictionary MediaTrackConstraintSet { ConstrainDOMString echoCancellationReferenceSinkId;
};
partial dictionary MediaTrackCapabilities {
sequence<DOMString> echoCancellationReferenceSinkId;
};
partial dictionary MediaTrackSupportedConstraints {
boolean echoCancellationReferenceSinkId = true;
};
partial dictionary MediaTrackSettings {
DOMString echoCancellationReferenceSinkId;
};
echoCancellationReferenceSinkId refers to a sink ID from [ AUDIO-OUTPUT ]. It controls the reference for echo cancellation. It applies only to audio MediaStreamTrack objects.
When the MediaTrackSettings object of the track contains "echoCancellation = false", this constraint has no effect.
When the echoCancellationReferenceSinkId is not specified, the default reference signal is the default system output. Specifying the sink id "" chooses the default system output.
The property is changeable with applyConstraints().
Referenced in:
Referenced in: