Copyright © 2022 W3C ® ( MIT , ERCIM , Keio , Beihang ). W3C liability , trademark and permissive document license rules apply.
This document introduces an API for cropping a video track derived from display-capture of the current tab.
This section describes the status of this document at the time of its publication. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at https://www.w3.org/TR/.
This First Public Working Draft represents the direction the Web Real-Time Communications Working Group intends to explore to solve the use case of partial capture of browsing contexts. The Working Group is particularly interested in feedback on how well this direction matches the said use case from potential adopters of the API.This document was published by the Web Real-Time Communications Working Group as an Editor's Draft.
Publication as an Editor's Draft does not imply endorsement by W3C and its Members.
This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.
This document was produced by a group operating under the W3C Patent Policy . W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy .
This document is governed by the 2 November 2021 W3C Process Document .
As well as sections marked as non-normative, all authoring guidelines, diagrams, examples, and notes in this specification are non-normative. Everything else in this specification is normative.
The key words MUST and MUST NOT in this document are to be interpreted as described in BCP 14 [ RFC2119 ] [ RFC8174 ] when, and only when, they appear in all capitals, as shown here.
This document uses the definition of the following concepts from [ SCREEN-CAPTURE ]: display-surface and browser display-surface .
This
specification
defines
self-capture
as
the
capture
of
a
browser
display-surface
that
is
the
rendered
form
of
the
top-level
browsing
context
of
the
associated
Document
of
the
MediaDevices
object
from
which
the
application
initiated
the
capture
session.
A
self-capture
video
track
is
a
MediaStreamTrack
sourced
by
self-capture
.
This section is non-normative.
Complex
applications
often
comprise
multiple
documents
in
distinct
iframes
,
all
displayed
within
the
same
browsing
context
.
Consider
such
an
application.
Assume
one
of
these
documents,
CAPTURING-DOC
uses
getDisplayMedia
()
or
getViewportMedia
to
capture
the
entire
current
browsing
context
.
If
this
document
then
wishes
to
crop
the
video
track
to
the
coordinates
of
some
sub-section
CAPTURE-TARGET
of
a
collaborating
document
CAPTURED-DOC
,
how
can
CAPTURING-DOC
do
so
performantly
and
reliably?
Recall
especially
that
changes
in
layout
due
to
scrolling,
zooming
or
window
resizing
present
additional
challenges.
Consider
a
combo-application
consisting
of
two
major
parts
hosted
in
different
iframes
within
the
same
tab
-
a
video-conferencing
application
and
a
productivity-suite
application.
Assume
the
video-conferencing
uses
existing/upcoming
APIs
such
as
getDisplayMedia
()
and/or
getViewportMedia
and
captures
the
entire
tab.
Now
it
needs
to
crop
away
everything
other
than
a
particular
section
of
the
productivity-suite.
It
needs
to
crop
away
its
own
video-conferencing
content,
any
speaker
notes
and
other
private
and/or
irrelevant
content
in
the
productivity-suite,
before
transmitting
the
resulting
cropped
video
remotely.
Moreover, consider that it is likely that the two collaborating applications are cross-origin from each other. They can post messages, but all communication is asynchronous, and it's easier and more performant if information is transmitted sparingly between them. That precludes solutions involving posting of entire frames, as well as solutions which are too slow to react to changes in layout (e.g. scrolling, zooming and window-size changes).
It
is
worthwhile
to
note
that
most
applications
would
likey
prefer
to
use
getViewportMedia
in
such
scenarios.
However,
as
of
this
writing,
getViewportMedia
is
still
unspecified
and
unimplemented.
It
will
have
non-trivial
requirements
whose
adoption
will
take
some
time
and
effort.
As
such,
many
applications
will
likely
use
a
combination
of
getDisplayMedia
()
and
Region
Capture
for
some
time
to
come.
The
combination
of
getDisplayMedia
()
and
Region
Capture
is
also
useful
for
applications
that
allow
the
users
to
choose
whichever
display-surface
they
wish,
but
offer
distinct
functionality
depending
on
whether
users
choose
to
self-capture
or,
conversely,
choose
to
capture
a
window
or
monitor.
Such
applications
would
only
succeed
in
using
Region
Capture
if
the
user
chose
to
self-capture
;
otherwise,
the
attempt
to
apply
cropping
would
be
a
no-op.
As
presently
defined,
cropTo
(
cropTarget
)
returns
a
rejected
Promise
if
the
cropTarget
is
not
associated
with
an
Element
within
either
the
current
top-level
browsing
context
or
any
of
its
descendant
browsing
contexts
.
That
means
that
all
of
the
mechanisms
introduced
by
this
document
are
only
relevant
for
self-capture
.
An
immediate
corollary
is
that
navigation
of
the
(shared)
top-level
browsing
context
breaks
off
the
capture,
and
therefore
also
the
cropping
session.
The region-capture mechanism comprises two parts:
Element
as
a
potential
target
for
the
cropping
mechanism
.
Element
,
or
to
stop
such
cropping
and
revert
a
track
to
its
uncropped
state.
We
define
two
crop-states
for
video
tracks
-
cropped
and
uncropped
.
Tracks
start
out
uncropped
,
and
may
turn
to
cropped
when
cropTo
is
successfully
called
on
them.
The
cropping
mechanism
presented
in
this
document
(
cropTo
)
relies
on
Crop-session
Target
rather
than
on
direct
node
references.
This
serves
a
dual
purpose.
CropTarget
is
an
intentionally
empty,
opaque
identifier
that
exposes
nothing.
identifier.
Its
sole
purpose
is
to
be
handed
to
cropTo
as
input.
WebIDL[Exposed=(Window,Worker), Serializable]
interface CropTarget {
[SecureContext] static Promise<CropTarget> fromElement(Element element);
};
There
is
no
consensus
yet
on
the
name
for
CropTarget
.
This
is
under
discussion
in
issue
#18
.
cropTarget
keeps
a
weak
reference
to
the
element
it
represents.
In
other
words,
cropTarget
will
not
prevent
garbage
collection
of
its
element.
CropTarget
objects
are
serializable.
The
serialization
steps
,
given
value
,
serialized
,
and
a
boolean
forStorage
,
are:
If
forStorage
There
is
true
,
throw
with
a
new
DOMException
object
whose
name
attribute
has
the
value
"
no
consensus
yet
on
whether
DataCloneError
fromElement
".
Set
serialized
.[[CropTargetElement]]
to
value
.[[Element]].
The
deserialization
steps
,
given
serialized
and
value
are:
Set
value
.[[Element]]
to
serialized
.[[CropTargetElement]].
should
be
exposed
beyond
secure
contexts.
produceCropTarget()
fromElement()
Calling
produceCropTarget
fromElement
on
with
an
Element
of
a
supported
type
associates
that
Element
with
a
CropTarget
.
This
CropTarget
may
be
used
as
input
to
cropTo
.
We
define
a
valid
CropTarget
as
one
returned
by
a
previous
call
to
.produceCropTarget
CropTarget
fromElement
()
in
the
current
top-level
browsing
context
or
any
of
its
descendant
browsing
contexts
.
When
is
called
produceCropTarget
fromElement
on
with
a
given
element
,
the
user
agent
creates
a
CropTarget
with
element
as
input.
The
user
agent
MUST
return
a
Promise
p
.
The
user
agent
MUST
resolve
p
only
after
it
has
finished
all
the
necessary
internal
propagation
of
state
associated
with
the
new
CropTarget
,
at
which
point
the
user
agent
MUST
be
ready
to
receive
the
new
CropTarget
as
a
valid
parameter
to
cropTo
.
When
cloning
an
Element
on
which
was
previously
called,
the
clone
is
not
associated
with
any
produceCropTarget
fromElement
CropTarget
.
If
is
later
called
on
the
clone,
a
new
produceCropTarget
fromElement
CropTarget
will
be
assigned
to
it.
There
is
no
consensus
yet
on
the
following
issues:
Whether
whether
producing
a
should
be
produceCropTarget()
CropTarget
exposed
on
instances
of
done
by
invoking
an
asynchronous
method
like
.MediaDevices
CropTarget
fromElement
()
,
or
on
instances
of
a
CropTarget
constructor
that
accepts
an
Element
.
as
input.
This
is
under
discussion
in
further
discussed
on
issue
#11
#17
.
To create a CropTarget with element as input, run the following steps:
Let
cropTarget
be
a
new
object
of
type
.
produceCropTarget
CropTarget
Let weakRef be a weak reference to element .
Create cropTarget .[[Element]] initialized to weakRef .
cropTarget keeps a weak reference to the element it represents. In other words, cropTarget will not prevent garbage collection of its element.
CropTarget
or
objects
are
serializable.
The
serialization
steps
,
given
value
,
serialized
,
and
a
boolean
forStorage
,
are:
If
forStorage
is
true
,
throw
with
a
new
Promise
DOMException
<
object
whose
CropTarget
name
>.
This
is
under
discussion
in
issue
#17
.
attribute
has
the
value
"
DataCloneError
".
Set serialized .[[CropTargetElement]] to value .[[Element]].
The deserialization steps , given serialized and value are:
Set value .[[Element]] to serialized .[[CropTargetElement]].
Recall
that,
as
per
[
SCREEN-CAPTURE
],
when
getDisplayMedia
()
is
called,
it
returns
a
Promise
<
MediaStream
>,
and
that
this
MediaStream
contains
exactly
one
video
track,
whose
type
is
MediaStreamTrack
.
We
specify
that
if
the
user
chooses
to
capture
a
browser
display-surface
,
the
user
agent
MUST
instantiate
the
video
track
as
either
MediaStreamTrack
,
or
as
some
sub-class
of
MediaStreamTrack
,
and
that
cropTo
MUST
be
exposed
on
this
track.
For
simplicity's
sake,
this
document
assumes
that
a
subclass
called
BrowserCaptureMediaStreamTrack
is
used
by
the
user
agent.
The track MUST be initially uncropped .
WebIDL[Exposed = Window]
interface BrowserCaptureMediaStreamTrack : MediaStreamTrack {
Promise<undefined> cropTo(CropTarget? cropTarget);
BrowserCaptureMediaStreamTrack clone();
};
cropTo()
Calls
to
this
method
instruct
the
user
agent
to
start/stop
cropping
a
self-capture
video
track
to
the
bounding
client
rectangle
of
cropTarget
.[[Element]].
Since
the
track
is
restricted
to
the
visible
viewport
of
the
display-surface
,
the
captured
area
will
be
the
intersection
of
the
visible
viewport
and
the
element
bounding
client
rectangle.
Whenever
cropTo
is
invoked,
the
user
agent
MUST
execute
the
following
algorithm:
Promise
,
rejected
with
an
NotSupportedError
.
The user agent MUST validate cropTarget according to this track's current crop-state .
undefined
.
If
the
user
agent
does
not
accept
cropTarget
,
return
a
Promise
rejected
with
an
UnknownError
.
Promise
.
Run the following steps in parallel:
undefined
nor
a
valid
CropTarget
,
reject
p
with
a
NotAllowedError
and
abort
these
steps.
If
cropTarget
is
either
undefined
or
a
valid
CropTarget
,
the
user
agent
MUST
update
this
video
track's
crop-state
according
to
cropTarget
:
undefined
,
the
user
agent
MUST
stop
cropping.
This
video
track
reverts
to
the
uncropped
state.
CropTarget
.
This
means
that
for
each
new
frame
produced
on
the
track,
the
user
agent
calculates
the
bounding
box
of
the
pixels
belonging
to
the
element,
and
crops
the
frame
to
the
coordinates
of
this
bounding
box.
Call the track's state before this method invocation PRE-STATE , and after this method invocation POST-STATE . The user agent MUST resolve p when it is guaranteed that no more frames cropped (or uncropped) according to PRE-STATE have been delivered to the application, and that any additional frames delivered to the application will therefore be cropped (or uncropped) according to either POST-STATE or a later state.
The timing of the cropTo promise resolution and the timing of the actual cropping of video frames is observable to JavaScript through MediaStreamTrack transforms . It is expected that the first newly cropped video frame will be enqueued on the MediaStreamTrack ReadableStream just after the cropTo promise is resolved.
clone()
When
a
BrowserCaptureMediaStreamTrack
is
cloned,
the
user
agent
MUST
produce
a
track
which
is
initially
uncropped
,
regardless
of
the
crop-state
of
the
original
track.
We
define
an
Element
for
which
a
CropTarget
was
produced
(through
a
call
to
)
as
a
potential
crop-target
.
produceCropTarget
fromElement
We
define
a
potential
crop-target
which
is
targeted
by
a
successful
call
to
cropTo
as
the
crop-session
target
.
Consider a frame produced on a cropped video track. The user agent calculates the intersection of (i) the top-level browsing context 's viewport and (ii) the bounding box of all pixels belonging to the crop-session target . This intersection is defined as the crop-session target's coordinates for that frame.
Consider a video track VT cropped to a given crop-session target TARGET . We define the behavior of the crop-session of the VT in the face of changes undergone by TARGET .
We define as an empty crop-session target the case where a crop-session target is attached to the DOM, yet consists of zero pixels which are drawn inside of the top-level browsing context's viewport.
Some examples of when this could happen include:
The user agent MUST NOT produce new frames on tracks with an empty crop-session target . For such a track, the user agent MUST resume the production of frames if the track either become uncropped , or if its crop-session target stops being empty .
We define as disconnected crop-session target a crop-session target that had been detached from the DOM.
The
difference
between
an
empty
crop-session
target
and
a
disconnected
crop-session
target
,
is
that
a
disconnected
one
may
become
unreachable
,
in
which
case
it
would
not
produce
any
new
frames.
Nevertheless,
the
user
agent
MUST
treat
a
disconnected
crop-session
target
the
same
way
it
treats
an
empty
crop-session
target
.
The
application
may
call
cropTo
on
the
track
with
either
undefined
or
a
new
CropTarget
,
thereby
allowing
the
production
of
frames
on
the
track
to
be
resumed.
Code in the capture-target:
const mainContentArea = navigator.getElementById('mainContentArea');
navigator.mediaDevices.produceCropTarget(mainContentArea);
const cropTarget = await CropTarget.fromElement(mainContentArea);
sendCropTarget(cropTarget);
function sendCropTarget(cropTarget) {
// Can send the crop-target to another document in this tab
// using postMessage() or using any other means.
// Possibly there is no other document, and this is just consumed locally.
}
Code in the capturing-document:
async function startCroppedCapture(cropTarget) {
const stream = await navigator.mediaDevices.getDisplayMedia();
const [track] = stream.getVideoTracks();
if (!!track.cropTo) {
handleError(stream);
return;
}
await track.cropTo(cropTarget);
transmitVideoRemotely(track);
}
Referenced in:
Referenced in:
Referenced in:
Referenced in:
Referenced in:
Referenced in:
Referenced in:
Referenced in:
Referenced in: