Copyright © 2015-2023 World Wide Web Consortium . W3C ® liability , trademark and permissive document license rules apply.
This
document
defines
how
a
user's
display,
or
parts
thereof,
can
be
used
as
the
source
of
a
media
stream
using
getDisplayMedia
,
an
extension
to
the
Media
Capture
API
[
GETUSERMEDIA
].
This section describes the status of this document at the time of its publication. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at https://www.w3.org/TR/.
This document is not complete. It is subject to major changes and, while early experimentations are encouraged, it is therefore not intended for implementation.
This document was published by the Web Real-Time Communications Working Group as an Editor's Draft.
Publication as an Editor's Draft does not imply endorsement by W3C and its Members.
This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.
This document was produced by a group operating under the W3C Patent Policy . W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy .
This document is governed by the 2 November 2021 W3C Process Document .
This section is non-normative.
This document describes an extension to the Media Capture API [ GETUSERMEDIA ] that enables the acquisition of a user's display, or part thereof, in the form of a video track. In some cases system, application or window audio is also captured which is presented in the form of an audio track. This enables a number of applications, including screen sharing using WebRTC [ WEBRTC ].
This feature has signficant security implications. Applications that use this API to access information that is displayed to users could access confidential information from other origins if that information is under the control of the application. This includes content that would otherwise be inaccessible due to the protections offered by the user agent sandbox.
This document concerns itself primarily with the capture of video and audio [ GETUSERMEDIA ], but the general mechanisms defined here could be extended to other types of media, of which depth [ MEDIACAPTURE-DEPTH ] is currently defined.
As well as sections marked as non-normative, all authoring guidelines, diagrams, examples, and notes in this specification are non-normative. Everything else in this specification is normative.
The key words MAY , MUST , MUST NOT , and SHOULD in this document are to be interpreted as described in BCP 14 [ RFC2119 ] [ RFC8174 ] when, and only when, they appear in all capitals, as shown here.
This specification defines conformance criteria that apply to a single product: the user agent that implements the interfaces that it contains.
Implementations that use ECMAScript [ ECMA-262 ] to implement the APIs defined in this specification must implement them in a manner consistent with the ECMAScript Bindings defined in the Web IDL specification [ WEBIDL ], as this specification uses that specification and terminology.
The
following
example
demonstrates
a
request
for
display
capture
using
the
navigator.mediaDevices.getDisplayMedia
method
defined
in
this
document.
try {
let mediaStream = await navigator.mediaDevices.getDisplayMedia({video:true});
videoElement.srcObject = mediaStream;
} catch (e) {
console.log('Unable to acquire screen capture: ' + e);
}
This
document
uses
the
definition
of
MediaStream
,
MediaStreamTrack
and
ConstrainablePattern
from
[
GETUSERMEDIA
].
Screen capture encompasses the capture of several different types of screen-based surfaces. Collectively, these are referred to as display surfaces , of which this document defines the following types:
MediaStreamTrack
.
This document draws a distinction between two variants of each type of display surface:
Some operating systems permit windows from different applications to occlude other windows, in whole or part, so the visible display surface is a strict subset of the logical display surface .
The source pixel ratio of a display surface is 1/96th of 1 inch divided by its vertical pixel size.
The devicechange event is defined in [ GETUSERMEDIA ].
Capture
of
displayed
media
is
enabled
through
the
addition
of
a
new
getDisplayMedia
method
on
the
MediaDevices
interface,
that
is
similar
to
getUserMedia
()
,
except
that
it
acquires
media
from
one
display
device
chosen
by
the
end-user
each
time.
WebIDLpartial interface MediaDevices {
Promise<MediaStream> getDisplayMedia(optional DisplayMediaStreamOptions options = {});
};
getDisplayMedia
Prompts the user for permission to live-capture their display.
The
user
agent
MUST
let
the
end-user
choose
which
display
surface
to
share
out
of
all
available
choices
every
time,
and
MUST
NOT
use
any
MediaTrackConstraints
in
options
.
video
or
options
.
audio
to
limit
that
choice.
The
user
agent
MAY
use
the
presence
of
the
displaySurface
constraint
and
its
value
to
influence
the
presentation
to
the
user
of
the
sources
to
pick
from.
The
user
agent
MUST
still
offer
the
user
unlimited
choice
of
any
display
surface
.
The
user
agent
is
strongly
recommended
to
steer
users
away
from
sharing
a
monitor,
as
this
poses
risks
to
user
privacy
.
Any
MediaTrackConstraints
in
options
.
video
or
options
.
audio
MUST
be
applied
to
the
media
chosen
by
the
user
only
after
the
user
has
made
their
selection.
In
the
case
of
audio,
the
user
agent
MAY
present
the
end-user
with
audio
sources
to
share.
Which
choices
are
available
to
choose
from
is
up
to
the
user
agent,
and
the
audio
source(s)
are
not
necessarily
the
same
as
the
video
source(s).
An
audio
source
may
be
a
particular
window
,
browser
,
the
entire
system
audio
or
any
combination
thereof.
Unlike
getUserMedia
()
with
regards
to
audio+video,
the
user
agent
is
allowed
not
to
return
audio
even
if
the
audio
constraint
is
present.
If
the
user
agent
knows
no
audio
will
be
shared
for
the
lifetime
of
the
stream
it
MUST
NOT
include
an
audio
track
in
the
resulting
stream.
The
user
agent
MAY
accept
a
request
for
audio
and
video
by
only
returning
a
video
track
in
the
resulting
stream,
or
it
MAY
accept
the
request
by
returning
both
an
audio
track
and
a
video
track
in
the
resulting
stream.
The
user
agent
MUST
reject
audio-only
requests.
In
addition
to
drawing
from
a
different
set
of
sources
and
requiring
user
selection,
getDisplayMedia
also
differs
from
getUserMedia
()
in
that
"
granted
"
permissions
cannot
be
persisted.
When
the
getDisplayMedia
()
method
is
called,
the
user
agent
MUST
run
the
following
steps:
Let
controller
be
options
.
controller
if
present,
or
null
otherwise.
If
controller
is
not
null
,
run
the
following
steps:
If
controller
.
[[IsBound]]
is
true
,
return
a
promise
rejected
with
a
DOMException
object
whose
name
attribute
has
the
value
InvalidStateError
.
Set
controller
.
[[IsBound]]
to
true
.
If
the
relevant
global
object
of
this
does
not
have
transient
activation
,
return
a
promise
rejected
with
a
DOMException
object
whose
name
attribute
has
the
value
InvalidStateError
.
Let options be the method's first argument.
Let
constraints
be
[
options
.
audio
,
options
.
video
]
.
If
constraints.video
is
false
,
return
a
promise
rejected
with
a
newly
created
TypeError
.
For each existing member in constraints whose value, CS , is a dictionary, run the following steps:
If
CS
contains
a
member
named
advanced
,
return
a
promise
rejected
with
a
newly
created
TypeError
.
If
CS
contains
a
member
whose
name
specifies
a
constrainable
property
applicable
to
display
surface
s,
and
whose
value
in
turn
is
a
dictionary
containing
a
member
named
either
min
or
exact
,
return
a
promise
rejected
with
a
newly
created
TypeError
.
If
CS
contains
a
member
whose
name
specifies
a
constrainable
property
applicable
to
display
surface
s,
and
whose
value
in
turn
is
a
dictionary
containing
a
member
named
max
,
and
that
member's
value
in
turn
is
less
than
the
constrainable
property's
floor
value
,
then
let
failedConstraint
be
the
name
of
the
member,
let
message
be
either
undefined
or
an
informative
human-readable
message,
and
return
a
promise
rejected
with
a
new
OverconstrainedError
created
by
calling
OverconstrainedError(
failedConstraint
,
message
)
.
Let
requestedMediaTypes
be
the
set
of
media
types
in
constraints
with
either
a
dictionary
value
or
a
value
of
true
.
If
the
current
settings
object
's
relevant
global
object
's
associated
Document
is
NOT
fully
active
or
does
NOT
have
focus
,
return
a
promise
rejected
with
a
DOMException
object
whose
name
attribute
has
the
value
InvalidStateError
.
Let p be a new promise.
Run the following steps in parallel :
For each media type T in requestedMediaTypes ,
If
no
sources
of
type
T
are
available,
reject
p
with
a
new
DOMException
object
whose
name
attribute
has
the
value
NotFoundError
.
Read
the
current
permission
state
for
obtaining
sources
of
type
T
in
the
current
browsing
context.
If
the
permission
state
is
"
denied
",
jump
to
the
step
labeled
PermissionFailure
below.
Optionally, e.g., based on a previously-established user preference, for security reasons, or due to platform limitations, jump to the step labeled Permission Failure below.
Prompt
the
user
to
choose
a
display
device,
for
a
PermissionDescriptor
with
its
name
set
to
"display-capture",
resulting
in
a
set
of
provided
media.
The provided media MUST include precisely one video track.
The
provided
media
MUST
include
at
most
one
audio
track.
This
audio
track
MUST
NOT
be
included
if
audio
was
not
specified
in
requestedMediaTypes
,
or
if
it
was
specified
as
false
.
The
devices
chosen
MUST
be
the
ones
determined
by
the
user.
Once
selected,
the
source
of
a
MediaStreamTrack
MUST
NOT
change,
unless
the
user
permits
it
through
their
interaction
with
the
user
agent.
User agents are encouraged to warn users against sharing browser display devices as well as monitor display devices where browser windows are visible, or otherwise try to discourage their selection on the basis that these represent a significantly higher risk when shared.
If
the
result
of
the
request
is
"
granted
",
then
for
each
device
that
is
sourcing
the
provided
media,
using
a
stable
and
private
id
for
the
device,
deviceId
,
set
[[devicesLiveMap]]
[deviceId]
to
true
,
if
it
isn’t
already
true
,
and
set
the
[[devicesAccessibleMap]]
[deviceId]
to
true
,
if
it
isn’t
already
true
.
The
user
agent
MUST
NOT
store
a
"
granted
"
permission
entry.
If
the
result
is
"
denied
",
jump
to
the
step
labeled
Permission
Failure
below.
If
the
user
never
responds,
this
algorithm
stalls
on
this
step.
If
the
user
grants
permission
but
a
hardware
error
such
as
an
OS/program/webpage
lock
prevents
access,
reject
p
with
a
new
DOMException
object
whose
name
attribute
has
the
value
NotReadableError
and
abort
these
steps.
If
the
result
is
"
granted
"
but
device
access
fails
for
any
reason
other
than
those
listed
above,
reject
p
with
a
new
DOMException
object
whose
name
attribute
has
the
value
AbortError
and
abort
these
steps.
Let
stream
be
the
MediaStream
object
for
which
the
user
granted
permission.
Run
the
ApplyConstraints
algorithm
on
all
tracks
in
stream
with
the
appropriate
constraints.
Should
this
fail,
let
failedConstraint
be
the
result
of
the
algorithm
that
failed,
and
let
message
be
either
undefined
or
an
informative
human-readable
message,
and
then
reject
p
with
a
new
OverconstrainedError
created
by
calling
OverconstrainedError(
failedConstraint
,
message
)
.
This
invocation
of
getDisplayMedia
()
is
now
considered
to
have
produced
a
new
capture-session
.
null
,
run
the
following
steps:
Let toplevelTraversable be the current settings object 's's relevant global object 's navigable 's top-level traversable .
Listen to toplevelTraversable 's change of system focus .
The first time toplevelTraversable is losing focus , queue a global task on the user interaction task source given current settings object 's relevant global object to run the following step:
Set
controller
.
[[FocusChangeDisabled]]
to
true
.
These
steps
ensure
CaptureController
will
not
override
explicit
focus
actions
made
by
the
user,
typically
if
a
user
decides
to
switch
to
another
surface
shortly
after
starting
capture.
This
algorithm
describes
what
to
do
for
surface
pickers
implemented
by
the
user
agent
but
the
same
requirement
applies
to
surface
pickers
implemented
outside
of
the
user
agent
,
where
the
loss
of
capturing
document
focus
is
not
necessarily
the
signal
triggering
setting
[[FocusChangeDisabled]]
to
true
.
Set
controller
.
[[Source]]
to
stream
's
video
track's
[[Source]]
.
Set
controller
.
[[DisplaySurfaceType]]
to
the
to
stream
's
video
track's
DisplayCaptureSurfaceType
.
Queue a task to run the finalize focus decision algorithm on controller .
Resolve p with stream and abort these steps.
Permission
Failure
:
Reject
p
with
a
new
DOMException
object
whose
name
attribute
has
the
value
NotAllowedError
.
Return p .
When
the
top-level
document
loses
focus,
run
the
following
steps
on
all
CaptureController
objects
in
that
document
and
in
documents
of
its
nested
browsing
contexts
:
If
[[Source]]
is
undefined
,
abort
these
steps.
Set
[[FocusChangeDisabled]]
to
true
.
The user agent MUST NOT capture content that's behind a partially transparent captured display surface .
For
the
newly
created
MediaStreamTrack
,
the
user
agent
MUST
NOT
capture
the
prompt
that
was
shown
to
the
user.
Information that is not currently rendered to the screen SHOULD be obscured in captures unless the application has been specifically authorized to access that content (e.g. through means such as elevated permissions ).
The user agent MUST NOT share audio without active user consent , for example if the capture of the video of a window is accompanied by capture of the audio of the entire system, including applications unrelated to that window.
A display surface that is being shared may temporarily or permanently become inaccessible to the application because of actions taken by the operating system or user agent. What makes a display surface considered inaccesible is outside the scope of this specification, but examples MAY include a monitor disconnecting, window or browser closing or becoming minimized, or due to an incoming call on a phone.
User agents ultimately control what inaccesible means in this context, but are encouraged to only fire mute and unmute events for interruptions that have external reasons.
When
display
surface
enters
an
inaccessible
state
that
is
not
necessarily
permanent,
the
user
agent
MUST
queue
a
task
that
sets
the
muted
state
of
the
corresponding
media
track
to
true
.
When
display
surface
exits
an
inaccessible
state
and
becomes
accessible,
the
user
agent
MUST
queue
a
task
that
sets
the
muted
state
of
the
corresponding
media
track
to
false
.
When a display surface enters an inaccessible state that is permanent (such as the source window closing), the user agent MUST queue a task that ends the corresponding media track.
A
stream
that
was
just
returned
by
getDisplayMedia
MAY
contain
tracks
that
are
muted
by
default.
Audio
and
video
tracks
belonging
to
the
same
stream
MAY
be
muted/unmuted
independently
of
one
another.
Not
accepting
constraints
for
source
selection
means
that
getDisplayMedia
only
provides
fingerprinting
surface
that
exposes
whether
audio,
video
or
audio
and
video
display
sources
are
present.
Note
that
accepting
the
displaySurface
constraint
does
not
limit
user
selection.
Constraints
serve
a
different
purpose
in
getDisplayMedia
than
they
do
in
getUserMedia
()
.
They
do
not
aid
discovery,
instead
they
are
applied
only
after
user-selection.
This
section
define
which
constraints
apply
to
getDisplayMedia
tracks;
constraints
defined
for
getUserMedia
()
do
not
apply
unless
listed
here.
Some of these constraints enable user agent processing like downscaling and frame decimation, as well as display-specific features. Others enable observation of inherent properties of a user-selected display surface , as capabilities and settings.
The
following
new
and
existing
MediaStreamTrack
Constrainable
Properties
are
defined
to
apply
to
the
user-selected
video
display
surface
,
with
the
following
behavior:
| Property Name | Type | Behavior |
|---|---|---|
| width |
unsigned
long
|
The width, in pixels. As a capability, max MUST reflect the display surface 's width, and min MUST reflect the width of the smallest aspect-preserving representation available through downscaling by the user agent. |
| height |
unsigned
long
|
The height, in pixels. As a capability, max MUST reflect the display surface 's height, and min MUST reflect the height of the smallest aspect-preserving representation available through downscaling by the user agent. |
| frameRate |
double
|
The frame rate (frames per second). As a capability, max MUST reflect the display surface 's frame rate, and min MUST reflect the lowest frame rate available through frame decimation by the user agent. |
| aspectRatio |
double
|
The
exact
aspect
ratio
(width
in
pixels
divided
by
height
in
pixels,
represented
as
a
double
rounded
to
the
tenth
decimal
place)
or
aspect
ratio
range.
As
a
setting,
represents
width
/
height
.
As
a
capability,
min
and
max
both
MUST
be
the
current
setting
value,
rendering
this
property
immutable
from
the
application
viewpoint.
|
| resizeMode |
DOMString
|
This
string
is
one
of
the
members
of
VideoResizeModeEnum
.
As
a
setting,
"
none
"
means
the
MediaStreamTrack
contains
all
bits
needed
to
render
the
display
in
full
detail,
which
if
the
source
pixel
ratio
>
1
,
means
width
and
height
will
be
larger
than
the
display's
appearance
from
an
end-user
viewpoint
would
suggest,
whereas
"
crop-and-scale
"
means
the
MediaStreamTrack
contains
an
aspect-preserved
representation
of
the
display
surface
that
has
been
downscaled
by
the
user
agent,
but
not
cropped.
As
a
capability,
the
values
"
none
"
and
"
crop-and-scale
"
both
MUST
be
present.
|
| displaySurface |
DOMString
|
This
string
is
one
of
the
members
of
As a setting, indicates the type of display surface that is being captured. As a capability, the setting value MUST be the lone value present, rendering this property immutable from the application viewpoint.
As
a
constraint,
the
value
signals
the
application's
preference
of
a
particular
display
surface
type
to
the
user
agent;
the
user
agent
MAY
reorder
the
options
offered
to
the
user
according
to
that
preference.
This
constraint
is
ignored
for
all
other
purposes,
and
can
therefore
not
cause
any
side
effects
(such
as
being
the
cause
of
|
| logicalSurface |
boolean
|
As
a
setting,
a
value
of
true
indicates
capture
of
a
logical
display
surface
,
whereas
a
value
of
false
indicates
a
capture
of
a
visible
display
surface
.
As
a
capability,
this
same
value
MUST
be
the
lone
value
present,
rendering
this
property
immutable
from
the
application
viewpoint.
|
| cursor |
DOMString
|
This
string
is
one
of
the
members
of
CursorCaptureConstraint
.
As
a
setting,
indicates
if
and
when
the
cursor
is
included
in
the
captured
display
surface
.
As
a
capability,
the
user
agent
MUST
include
only
the
set
of
values
from
CursorCaptureConstraint
it
is
capable
of
supporting
for
this
display
surface
.
|
The
following
new
and
existing
MediaStreamTrack
Constrainable
Properties
are
defined
to
apply
to
the
user-selected
audio
sources,
with
the
following
behavior:
| Property Name | Type | Behavior |
|---|---|---|
| restrictOwnAudio |
boolean
|
As a setting, this value indicates whether or not the user agent is applying own audio restriction to the source. As a constraint, this property can be constrained resulting in a source with own audio restriction enabled or disabled.
When
own
audio
restriction
is
applied,
the
user
agent
MUST
attempt
to
remove
any
audio
from
the
audio
being
captured
that
was
produced
by
the
document
that
performed
|
| suppressLocalAudioPlayback |
boolean
|
As a setting, this value indicates whether or not the application instructed the user agent to apply local audio playback suppression to the source.
As
a
constraint,
this
value
is
only
meaningful
if
the
user
selects
capturing
a
browser
display
surface
.
In
that
case,
a
value
of
When local audio playback suppression is applied, the user agent SHOULD stop relaying audio to the local speakers, but that audio MUST still be captured by any ongoing audio-capturing capture-sessions . This suppression MUST NOT be observable to the captured document. Furthermore, the capturing document may only observe whether it is applying suppressLocalAudioPlayback ; not whether that suppression is having an effect (i.e. can't observe if the user is overriding this in the user agent).
When
a
browser
display
surface
is
subject
to
multiple
concurrent
captures,
local
audio
playback
suppression
SHOULD
be
applied
as
long
as
at
least
one
active
audio-capturing
capture-session
is
constraining
suppressLocalAudioPlayback
to
|
When inherent properties of the underlying source of a user-selected display surface change, for example in response to the end-user resizing a captured window, and these changes render the capabilities and/or settings of one or more constrainable properties outdated, the user agent MUST queue a task to run the following step:
Update all affected constrainable properties at the same time.
If this causes an "overconstrained" situation, then the user agent MUST ignore the culprit constraints for as long as they overconstrain. The user agent MUST NOT mute the track.
While min and exact constraints produce TypeError on getDisplayMedia(), this specification does not alter the track.applyConstraints() method. Therefore, they may instead produce OverconstrainedError or succeed depending on values, and therefore potentially be present to cause this "overconstrained" situation. The max constraint may also cause this, e.g. with aspectRatio. This spec considers these to be edge cases that aren't useful.
For the purposes of the SelectSettings algorithm, the user agent SHOULD consider all possible combinations of downscaled dimensions that preserve the aspect ratio of the original display surface (to the nearest pixel), and frame rates available through frame decimation, as available settings dictionaries .
The downscaling and decimation effects of constraints is then effectively governed by the fitness distance algorithm.
The
intent
is
for
the
user
agent
to
produce
output
that
is
close
to
the
ideal
width
,
ideal
height
,
and/or
ideal
frameRate
when
these
are
specified,
while
at
all
times
preserving
the
aspect
ratio
of
the
original
display
surface
.
The user agent SHOULD downscale by the source pixel ratio by default, unless otherwise directed by applied constraints.
The user agent MUST NOT crop the captured output.
The user agent MUST NOT upscale the captured output, or create additional frames, except as needed to preserve high resolutions and frame rates in an aggregated display surface .
For
each
constrainable
property
of
positive
numeric
type
in
this
specification,
the
user
agent
MUST
establish
a
floor
value
,
representing
the
smallest
allowable
value
supported
by
the
user
agent
regardless
of
source.
This
value
MUST
be
constant
and
MUST
be
greater
than
0
.
The
user
agent
is
encouraged
to
support
all
values
above
the
floor
value
regardless
of
source.
The
purpose
of
the
floor
value
is
to
help
user
agents
avoid
failing
getDisplayMedia
()
with
OverconstrainedError
after
the
user
has
already
been
prompted,
and
avoid
leaking
information
about
the
user's
system.
Describes
whether
an
application
invoking
setFocusBehavior
()
would
like
the
user
agent
to
focus
the
display
surface
associated
with
that
CaptureController
's
capture-session
.
WebIDLenum CaptureStartFocusBehavior {
"focus-capturing-application",
"focus-captured-surface",
"no-focus-change"
};
| Enumeration description | |
|---|---|
focus-capturing-application
|
The application prefers to be focused. |
focus-captured-surface
|
The
application
prefers
that
the
display
surface
associated
with
this
CaptureController
's
capture-session
be
focused.
|
no-focus-change
|
The application prefers that the user agent not change focus, leaving focus with whichever surface last had focus following the user's interaction with the user agent and/or operating system. |
no-focus-change
".
A
CaptureController
object
may
be
associated
with
a
capture-session
.
It
would
be
used
to
expose
functionality
that's
associated
with
the
capture-session
itself,
rather
than
with
the
call
to
getDisplayMedia
()
or
its
resulting
stream
or
tracks.
Any
given
capture-session
is
associated
with
at
most
one
CaptureController
.
At
most
one
CaptureController
is
associated
with
any
given
capture-session
.
WebIDL[Exposed=Window, SecureContext]
interface
interface CaptureController : EventTarget {
constructor();
undefined setFocusBehavior(CaptureStartFocusBehavior focusBehavior);
};
CaptureController
does
not
yet
define
event
handlers,
so
it
is
not
required
to
inherit
from
EventTarget
.
This
is
for
the
benefit
of
future
specifications
that
extend
CaptureController
with
event
handler
attributes;
if
inheritance
is
not
used,
it
can
be
removed.
constructor
CaptureController
object
with
the
following
internal
slots:
| Internal Slot | Initial value | Description ( non-normative ) |
|---|---|---|
| [[IsBound]] |
false
|
Whether an application has attempted to associate this with a capture-session . |
| [[Source]] |
null
|
The source of the associated capture-session . |
| [[DisplaySurfaceType]] |
null
|
Once capture starts, this will be set to the type of the captured display surface . |
| [[FocusChangeDisabled]] |
false
|
Whether focus-change has been disabled by an external event or a user agent consideration. |
| [[FocusDecisionFinalized]] |
false
|
Set to true when the focus decision is finalized. |
| [[FocusBehavior]] |
null
|
The focus behavior desired by the application. |
The
user
agent
MAY
set
[[FocusChangeDisabled]]
to
true
at
any
moment
based
on
its
own
logic.
setFocusBehavior
Run the following steps:
Let focusBehavior be the method's first argument.
If
[[Source]]
is
null
,
set
[[FocusBehavior]]
to
focusBehavior
and
abort
these
steps.
If
[[Source]]
has
been
stopped
,
throw
an
"
InvalidStateError
"
DOMException
.
If
[[DisplaySurfaceType]]
is
neither
"
browser
"
nor
"
window
",
throw
an
"
InvalidStateError
"
DOMException
.
If
[[FocusDecisionFinalized]]
is
true
,
throw
an
"
InvalidStateError
"
DOMException
.
Set
[[FocusBehavior]]
to
focusBehavior
.
Run the finalize focus decision algorithm on this .
The finalize focus decision algorithm , given a controller , consists of running the following steps:
If
too
much
time
has
elapsed
since
the
capture-session
started,
the
user
agent
SHOULD
set
[[FocusDecisionFinalized]]
to
true
.
The
timespan
is
left
up
to
the
user
agent,
but
it
is
recommended
that
a
value
of
one
second
be
used.
If
controller
.
[[FocusDecisionFinalized]]
is
true
,
abort
these
steps.
Set
controller
.
[[FocusDecisionFinalized]]
to
true
.
If
controller
.
[[FocusChangeDisabled]]
is
true
,
abort
these
steps.
If
controller
.
[[DisplaySurfaceType]]
is
neither
"
browser
"
nor
"
window
",
abort
these
steps.
Run the following step in parallel :
If
controller
.
[[FocusBehavior]]
is
"
focus-capturing-application
",
focus
the
display
surface
representing
the
capturing
document.
If
controller
.
[[FocusBehavior]]
is
"
focus-captured-surface
",
focus
the
display
surface
referred
to
by
controller
.
[[Source]]
.
Describes the different hints an application can provide about whether the display surface the application is in, should be among the choices offered to the user.
WebIDLenum SelfCapturePreferenceEnum {
"include",
"exclude"
};
| Enum value | Description |
|---|---|
include
|
The application prefers the surface be included among the choices offered. |
exclude
|
The application prefers the surface be excluded from the choices offered. |
Describes
whether
an
application
invoking
getDisplayMedia
()
would
like
the
user
agent
to
include
system
audio
among
the
audio
sources
offered
to
the
user.
WebIDLenum SystemAudioPreferenceEnum {
"include",
"exclude"
};
| Enumeration description | |
|---|---|
include
|
The application prefers that options to share system audio be offered to the user for monitor display surfaces . |
exclude
|
The application prefers that options to share system audio not be offered to the user. |
Describes
whether
an
application
invoking
getDisplayMedia
()
would
like
the
user
agent
to
offer
the
user
an
option
to
dynamically
switch
the
source
display
surface
during
the
capture.
WebIDLenum SurfaceSwitchingPreferenceEnum {
"include",
"exclude"
};
| Enumeration description | |
|---|---|
include
|
The application prefers that an option to dynamically switch the source display surface during the capture be offered to the user. |
exclude
|
The application prefers that an option to dynamically switch the source display surface during the capture NOT be offered to the user. |
The
DisplayMediaStreamOptions
dictionary
is
used
to
instruct
the
user
agent
what
sort
of
MediaStreamTrack
s
may
be
included
in
the
MediaStream
returned
by
getDisplayMedia
.
WebIDLdictionary DisplayMediaStreamOptions {
(boolean or MediaTrackConstraints) video = true;
(boolean or MediaTrackConstraints) audio = false;
CaptureController controller;
SelfCapturePreferenceEnum selfBrowserSurface;
SystemAudioPreferenceEnum systemAudio;
SurfaceSwitchingPreferenceEnum surfaceSwitching;
};
DisplayMediaStreamOptions
Members
video
of
type
(boolean
or
MediaTrackConstraints
)
,
defaulting
to
true
If
true
,
it
requests
that
the
returned
MediaStream
contain
a
video
track.
If
a
Constraints
structure
is
provided,
it
further
specifies
desired
processing
options
to
be
applied
to
the
video
track
rendition
of
the
display
surface
chosen
by
the
user.
If
false
,
the
request
will
be
rejected
with
a
TypeError
,
as
per
the
getDisplayMedia
algorithm
.
audio
of
type
(boolean
or
MediaTrackConstraints
)
,
defaulting
to
false
If
true
,
it
signals
an
interest
that
the
returned
MediaStream
contain
an
audio
track,
if
supported
and
audio
is
available
for
display
surface
chosen
by
the
user.
If
a
Constraints
structure
is
provided,
it
further
specifies
desired
processing
options
to
be
applied
to
the
audio
track.
If
false
,
the
MediaStream
will
not
contain
an
audio
track.
controller
of
type
CaptureController
If
present,
this
CaptureController
object
will
be
associated
with
the
capture-session
.
Through
the
methods
exposed
on
this
object,
the
capture-session
can
be
manipulated.
selfBrowserSurface
of
type
SelfCapturePreferenceEnum
systemAudio
of
type
SystemAudioPreferenceEnum
surfaceSwitching
of
type
SurfaceSwitchingPreferenceEnum
MediaTrackSupportedConstraints
MediaTrackSupportedConstraints
is
extended
here
with
the
list
of
constraints
that
a
user
agent
recognizes.
WebIDLpartial dictionary MediaTrackSupportedConstraints {
boolean displaySurface = true;
boolean logicalSurface = true;
boolean cursor = true;
boolean restrictOwnAudio = true;
boolean suppressLocalAudioPlayback = true;
};
displaySurface
of
type
boolean
,
defaulting
to
true
Whether
displaySurface
constraint
is
recognized.
logicalSurface
of
type
boolean
,
defaulting
to
true
Whether
logicalSurface
constraint
is
recognized.
cursor
of
type
boolean
,
defaulting
to
true
Whether
cursor
constraint
is
recognized.
restrictOwnAudio
of
type
boolean
,
defaulting
to
true
Whether
restrictOwnAudio
constraint
is
recognized.
suppressLocalAudioPlayback
of
type
boolean
,
defaulting
to
true
Whether
suppressLocalAudioPlayback
constraint
is
recognized.
MediaTrackConstraintSet
MediaTrackConstraintSet
is
used
for
reading
the
current
status
of
constraints.
WebIDLpartial dictionary MediaTrackConstraintSet {
ConstrainDOMString displaySurface;
ConstrainBoolean logicalSurface;
ConstrainDOMString cursor;
ConstrainBoolean restrictOwnAudio;
ConstrainBoolean suppressLocalAudioPlayback;
};
displaySurface
of
type
ConstrainDOMString
The
type
of
display
surface
that
is
being
captured.
This
assumes
values
from
the
DisplayCaptureSurfaceType
enumeration.
logicalSurface
of
type
ConstrainBoolean
A
value
of
true
indicates
capture
of
a
logical
display
surface
;
a
value
of
false
indicates
a
capture
of
a
visible
display
surface
.
cursor
of
type
ConstrainDOMString
Assumes
values
from
the
CursorCaptureConstraint
enumeration
that
determines
if
and
when
the
cursor
is
included
in
the
captured
display
surface.
restrictOwnAudio
of
type
ConstrainBoolean
This
constraint
is
only
applicable
to
audio
tracks.
See
restrictOwnAudio
.
suppressLocalAudioPlayback
of
type
ConstrainBoolean
This
constraint
is
only
applicable
to
audio
tracks.
See
suppressLocalAudioPlayback
.
MediaTrackSettings
When
the
getSettings
()
method
is
invoked
on
a
video
stream
track,
the
user
agent
must
return
the
extended
MediaTrackSettings
dictionary,
representing
the
current
status
of
the
underlying
user
agent.
WebIDLpartial dictionary MediaTrackSettings {
DOMString displaySurface;
boolean logicalSurface;
DOMString cursor;
boolean restrictOwnAudio;
boolean suppressLocalAudioPlayback;
};
displaySurface
of
type
DOMString
The
type
of
display
surface
that
is
being
captured.
This
assumes
values
from
the
DisplayCaptureSurfaceType
enumeration.
logicalSurface
of
type
boolean
A
value
of
true
indicates
capture
of
a
logical
display
surface
;
a
value
of
false
indicates
a
capture
capture
of
a
visible
display
surface
.
cursor
of
type
DOMString
Assumes
values
from
the
CursorCaptureConstraint
enumeration
that
determines
if
and
when
the
cursor
is
included
in
the
captured
display
surface.
restrictOwnAudio
of
type
boolean
Indicates
whether
the
restrictOwnAudio
constraint
is
applied
(
true
)
or
not
(
false
).
suppressLocalAudioPlayback
of
type
boolean
Indicates whether or not the application instructed the user agent to apply local audio playback suppression to the source.
MediaTrackCapabilities
When
the
getCapabilities
()
method
is
invoked
on
a
video
stream
track,
the
user
agent
must
return
the
extended
MediaTrackCapabilities
dictionary,
representing
the
capabilities
of
the
underlying
user
agent.
WebIDLpartial dictionary MediaTrackCapabilities {
DOMString displaySurface;
boolean logicalSurface;
sequence<DOMString> cursor;
};
displaySurface
of
type
DOMString
MUST
be
the
same
value
as
is
returned
by
getSettings
()
,
rendering
this
property
immutable
from
the
application's
viewpoint.
logicalSurface
of
type
boolean
MUST
be
the
same
value
as
is
returned
by
getSettings
()
,
rendering
this
property
immutable
from
the
application's
viewpoint.
cursor
of
type
sequence<DOMString>
MUST
consist
of
exactly
the
set
of
values
from
CursorCaptureConstraint
that
the
user
agent
is
capable
of
supporting
for
this
track.
The
DisplayCaptureSurfaceType
enumeration
describes
the
different
types
of
display
surface.
WebIDLenum DisplayCaptureSurfaceType {
"monitor",
"window",
"browser"
};
| Enum value | Description |
|---|---|
monitor
|
a monitor display surface , physical display, or collection of physical displays |
window
|
a window display surface , or single application window |
browser
|
a browser display surface , or single browser window |
The
CursorCaptureConstraint
enumerates
the
conditions
under
which
the
cursor
is
captured.
WebIDLenum CursorCaptureConstraint {
"never",
"always",
"motion"
};
| Enum value | Description |
|---|---|
never
|
a
"
never
"
cursor
capture
constraint
omits
the
cursor
from
the
captured
display
surface.
|
always
|
a
"
always
"
cursor
capture
constraint
includes
the
cursor
in
the
captured
display
surface.
|
motion
|
a
"
motion
"
cursor
capture
constraint
includes
the
cursor
in
the
captured
display
surface
when
the
cursor/pointer
is
moved.
The
captured
cursor
is
removed
when
there
is
no
further
movement
of
the
pointer/cursor
for
certain
period
of
time,
as
determined
by
the
user
agent
.
|
Each
potential
source
of
capture
is
treated
by
this
API
as
a
discrete
media
source.
However,
display
capture
sources
MUST
NOT
be
enumerated
by
enumerateDevices
()
,
since
this
would
reveal
too
much
information
about
the
host
system.
Display
capture
sources
therefore
cannot
be
selected
with
the
deviceId
constraint,
since
their
deviceId
s
are
not
exposed.
This is not to be confused with the stable and private id of the same name used in algorithms to implement privacy indicators.
Screen Capture is a powerful feature which is identified by the name "display-capture", requiring express permission to be used.
As required for integration with the Permissions specification, this specification defines the following:
prompt
"
and
"
denied
".
The
user
agent
MUST
NOT
ever
set
this
descriptor's
permission
state
to
"
granted
".
This
specification
defines
a
policy-controlled
feature
identified
by
the
string
"display-capture"
.
Its
default
allowlist
is
"self"
.
A
document
's
permissions
policy
determines
whether
any
content
in
that
document
is
allowed
to
use
getDisplayMedia
.
If
disabled
in
any
document,
no
content
in
the
document
will
be
allowed
to
use
getDisplayMedia
.
This
is
enforced
by
the
prompt
the
user
to
choose
algorithm.
This
specification
extends
the
Privacy
Indicator
Requirements
of
getUserMedia
()
to
include
getDisplayMedia
.
References
in
this
specification
to
[[devicesLiveMap]],
[[devicesAccessibleMap]],
and
[[kindsAccessibleMap]]
refer
to
the
definitions
already
created
to
support
Privacy
Indicator
Requirements
for
getUserMedia
()
.
For
each
kind
of
device
that
getDisplayMedia
exposes,
using
a
stable
and
private
id
for
the
device,
deviceId
,
set
kind
to
"Display"
+
kind
,
and
do
the
following:
false
.
false
.
Then,
given
the
new
definitions
above,
the
requirements
on
the
user
agent
are
those
specified
in
Privacy
Indicator
Requirements
of
getUserMedia
()
.
Even
though
there's
a
single
permission
descriptor
for
getDisplayMedia
,
the
above
definitions
distinguish
by
kind
to
enable
user
agents
to
implement
privacy
indicators
that
show
the
end-user
the
specific
kinds
of
display
sources
that
are
being
shared
at
any
point.
Since
this
specification
forbids
user
agents
from
persisting
"
granted
"
permissions,
only
the
"Live"
indicators
are
significant.
The
user
agent
MUST
NOT
fire
the
devicechange
event
based
on
changes
in
the
set
of
available
sources
from
getDisplayMedia
.
This section is informative; however, it notes some serious risks to platform security if the advice it contains are not adhered to.
The risks to user privacy and security posed by capture of displayed content are twofold. The immediate and obvious risk is that users inadvertently share content that they did not wish to share, or might not have realized would be shared.
Display capture presents a less obvious risk to the cross site request forgery protections offered by the browser sandbox. Display and capture of information that is also under the control of an application, even indirectly, can allow that application to access information that would otherwise be inaccessible to it directly. For example, the canvas API does not permit sampling of a canvas, or conversion to an accessible form if it is not origin-clean [ 2DCONTEXT ].
This issue is discussed in further detail in [ RTCWEB-SECURITY-ARCH ] and [ RTCWEB-SECURITY ].
Display capture that includes browser windows, particularly those that are under any form of control by the application, risks violation of these basic security protections. This risk is not entirely contained to browser windows, since control channels between browser applications and other applications, depending on the operating system. The key consideration is whether the captured display surface could be somehow induced to present information that would otherwise be secret from the application that is receiving the resulting media.
Capture of logical display surfaces causes there to be a potential for content to be shared that a user is not made aware of. A logical display surface might render information that a user did not intend to expose. This can be more easily recognized if this information is visible. Such means are likely ineffectual against a machine, but a human recipient is less able to process content that appears only briefly.
It is encouraged that information that is not currently rendered to the screen be obscured in captures unless the application has been specifically authorized to access that content through elevated permissions .
How obscured areas of the logical display surface are captured to produce a visible display surface capture MAY vary. Some applications, like presentation software, benefit from having obscured portions of the screen render the image that appeared prior to being obscured. Freezing images can cause visual artifacts for changing content, or hide the fact that content is being obscured. Note that frozen portions of a capture can be incorrectly perceived as a bug. Alternatively, obscured areas might be replaced with content that marks them as being obscured, such as a grey color or hatching.
Some systems may only capture the logical display surface . Devices with small screens, for instance, do not typically have the concept of a window , and render applications in full screen modes only. These systems might provide a capture of an application that is not currently visible, which could be unusable without capturing the logical display surface .
When capturing a window or other display surface that is partially transparent, any content behind it will not be captured.
There
is
a
risk
that
the
user
prompt
be
exposed
to
the
web
page
for
a
short
amount
of
time
by
the
newly
created
MediaStreamTrack
,
for
instance
if
the
user
selects
the
screen
on
which
the
user
prompt
is
displayed.
In
the
case
of
the
user
prompt
displaying
previews
of
the
various
surfaces
available
for
selection,
those
previews
will
not
be
captured
by
the
newly
created
MediaStreamTrack
.
getDisplayMedia
allows
capturing
audio
alongside
video,
this
poses
privacy
and
security
concern
as
this
may
expose
additional
information
about
system
applications,
and
the
set
of
shared
audio
sources
are
not
necessarily
the
same
as
the
set
of
shared
video
sources.
For
example,
the
capture
of
the
video
of
a
window
that
is
accompanied
by
the
audio
of
the
entire
system,
including
applications
unrelated
to
that
window,
will
not
be
shared
without
active
user
consent
.
It
is
important
that
the
user
is
aware
of
what
content
will
be
shared,
including
any
possible
audio.
It
is
strongly
encouraged
that
the
user
is
allowed
to
give
consent
to
video
but
not
audio,
resulting
in
a
video-only
stream.
This
ensures
that
the
request
for
audio
is
always
optional
and
does
not
restrict
the
user's
choices
compared
to
a
video-only
request.
Implementations are advised to provide user feedback and control mechanisms similar to those offered users when sharing a camera or microphone, as encouraged in [ GETUSERMEDIA ].
It is important that a user be aware that content is being shared when content is actively being captured. User agents are advised to display a prominent indicator while content is being captured. In addition to an indicator, a user agent is advised to provide a means to learn precisely what is being shared; while this capability is trivially provided by an application by rendering the captured content, this information allows a user to accurately assess what is being shared.
In addition to feedback mechanisms, a means to for the user to stop any active capture is advisable.
Referenced in:
Referenced in:
Referenced in:
Referenced in:
Referenced in:
Referenced in:
Referenced in:
Referenced in:
Referenced in:
Referenced in:
Referenced in:
Referenced in:
Referenced in:
Referenced in:
Referenced in:
Referenced in: