1. Introduction
The [WEBRTC-NV-USE-CASES] document describes several functions that can only be achieved by access to media (requirements N20-N22), including, but not limited to:
-
Funny Hats
-
Machine Learning
-
Virtual Reality Gaming
These use cases further require that processing can be done in worker threads (requirement N23-N24).
This specification gives an interface based on [WEBCODECS] and [STREAMS] to provide access to such functionality.
This specification provides access to raw media, which is the output of a media source such as a camera, microphone, screen capture, or the decoder part of a codec and the input to the decoder part of a codec. The processed media can be consumed by any destination that can take a MediaStreamTrack, including HTML <video> tags, RTCPeerConnection, canvas or MediaRecorder.
This specification explicitly aims to support the following use cases:
-
Video processing : This is the "Funny Hats" use case, where the input is a single video track and the output is a transformed video track.
-
Custom video sink : In this use case, the purpose is not producing a processed
MediaStreamTrack
, but to consume the media in a different way. For example, an application could use [WEBCODECS] and [WEBTRANSPORT] to create anRTCPeerConnection
-like sink, but using different codec configuration and networking protocols. -
Multi-source processing : In this use case, two or more tracks are combined into one. For example, a presentation containing a live weather map and a camera track with the speaker can be combined to produce a weather report application.
Note: There is no WG consensus on whether or not audio use cases should be supported.
Note: The WG expects that the Streams spec will adopt the solutions outlined in the relevant explainer , to solve some issues with the current Streams specification.
2. Specification
This
specification
shows
the
IDL
extensions
for
[MEDIACAPTURE-STREAMS]
.
It
defines
some
new
objects
that
inherit
the
MediaStreamTrack
interface,
and
can
be
constructed
from
a
MediaStreamTrack
.
The API consists of two elements. One is a track sink that is capable of exposing the unencoded media frames from the track to a ReadableStream. The other one is the inverse of that: it provides a track source that takes media frames as input.
2.1. MediaStreamTrackProcessor
A
MediaStreamTrackProcessor
allows
the
creation
of
a
ReadableStream
that
can
expose
the
media
flowing
through
a
given
MediaStreamTrack
.
If
the
MediaStreamTrack
is
a
video
track,
the
chunks
exposed
by
the
stream
will
be
VideoFrame
objects.
This
makes
MediaStreamTrackProcessor
effectively
a
sink
in
the
MediaStream
model
.
A
MediaStreamTrackProcessor
internally
contains
a
circular
queue
that
allows
buffering
incoming
media
frames
delivered
by
the
track
it
is
connected
to.
This
buffering
allows
the
MediaStreamTrackProcessor
to
temporarily
hold
frames
waiting
to
be
read
from
its
associated
ReadableStream
.
The
application
can
influence
the
maximum
size
of
the
queue
via
a
parameter
provided
in
the
MediaStreamTrackProcessor
constructor.
However,
the
maximum
size
of
the
queue
is
decided
by
the
UA
and
can
change
dynamically,
but
it
will
not
exceed
the
size
requested
by
the
application.
If
the
application
does
not
provide
a
maximum
size
parameter,
the
UA
is
free
to
decide
the
maximum
size
of
the
queue.
When
a
new
frame
arrives
to
the
MediaStreamTrackProcessor
,
if
the
queue
has
reached
its
maximum
size,
the
oldest
frame
will
be
removed
from
the
queue,
and
the
new
frame
will
be
added
to
the
queue.
This
means
that
for
the
particular
case
of
a
queue
with
a
maximum
size
of
1,
if
there
is
a
queued
frame,
it
will
aways
be
the
most
recent
one.
The
UA
is
also
free
to
remove
any
frames
from
the
queue
at
any
time.
The
UA
may
remove
frames
in
order
to
save
resources
or
to
improve
performance
in
specific
situations.
In
all
cases,
frames
that
are
not
dropped
must
be
made
available
to
the
ReadableStream
in
the
order
in
which
they
arrive
to
the
MediaStreamTrackProcessor
.
A
MediaStreamTrackProcessor
makes
frames
available
to
its
associated
ReadableStream
only
when
a
read
request
has
been
issued
on
the
stream.
The
idea
is
to
avoid
the
stream’s
internal
buffering,
which
does
not
give
the
UA
enough
flexibility
to
choose
the
buffering
policy.
2.1.1. Interface definition
[Exposed =DedicatedWorker ]interface {
MediaStreamTrackProcessor constructor (MediaStreamTrackProcessorInit );
init readonly attribute ReadableStream readable ; };dictionary {
MediaStreamTrackProcessorInit required MediaStreamTrack ; [
track EnforceRange ]unsigned short ; };
maxBufferSize
Note: There is WG consensus that the interface should be exposed on DedicatedWorker. There is no WG consensus on whether or not the interface should not be exposed on Window.
Note: There is consensus in the WG that creating a MediaStreamTrackProcessor from a MediaStreamTrack of kind "video" should exist. There is no WG consensus on whether or not creating a MediaStreamTrackProcessor from a MediaStreamTrack of kind "audio" should be supported.
2.1.2. Internal slots
-
[[track]]
-
Track
whose
raw
data
is
to
be
exposed
by
the
MediaStreamTrackProcessor
. -
[[maxBufferSize]]
-
The
maximum
number
of
media
frames
to
be
buffered
by
the
MediaStreamTrackProcessor
as specified by the application. It may have no value if the application does not provide it. Its minimum valid value is 1. -
[[queue]]
- A queue used to buffer media frames not yet read by the application
-
[[numPendingReads]]
- An integer whose value represents the number of read requests issued by the application that have not yet been handled.
-
[[isClosed]]
-
An
boolean
whose
value
indicates
if
the
MediaStreamTrackProcessor
is closed.
2.1.3. Constructor
MediaStreamTrackProcessor(
init
)
-
If init .
track
is not a validMediaStreamTrack
, throw aTypeError
. -
Let maxBufferSize be 1.
-
If init .
maxBufferSize
has an integer value greater than 1, run the following substeps:-
Set maxBufferSize to init .
maxBufferSize
. -
The user agent MAY decide to clamp maxBufferSize to a lower value, but no lower than 1.
Clamping maxBufferSize can be useful for some sources like cameras, for instance in case they can only use a limited number of VideoFrames at any given time.
-
-
Let processor be a new
MediaStreamTrackProcessor
object. -
Set processor .
[[track]]
to init .track
. -
Set processor .
[[maxBufferSize]]
to maxBufferSize . -
Set processor .
[[queue]]
to an empty queue . -
Set processor .
[[numPendingReads]]
to 0. -
Set processor .
[[isClosed]]
to false. -
Return processor .
2.1.4. Attributes
-
readable
, of type ReadableStream , readonly -
Allows
reading
the
frames
delivered
by
the
MediaStreamTrack
stored in the[[track]]
internal slot. This attribute is created the first time it is invoked according to the following steps:-
Initialize this .
readable
to be a newReadableStream
. -
Set up this .
readable
with its pullAlgorithm set to processorPull with this as parameter, cancelAlgorithm set to processorCancel with this as parameter, and highWatermark set to 0.
The processorPull algorithm is given a processor as input. It is defined by the following steps:
-
Increment the value of the processor .
[[numPendingReads]]
by 1. -
Queue a task to run the maybeReadFrame algorithm with processor as parameter.
-
Return a promise resolved with undefined.
The maybeReadFrame algorithm is given a processor as input. It is defined by the following steps:
-
If processor .
[[queue]]
is empty , abort these steps. -
If processor .
[[numPendingReads]]
equals zero, abort these steps. -
Let frame be the result of dequeueing a frame media data from processor .
[[queue]]
. -
Decrement processor .
[[numPendingReads]]
by 1. -
Go to step 1.
The processorCancel algorithm is given a processor as input. It is defined by running the following steps:
-
Run the processorClose algorithm with processor as parameter.
-
Return a promise resolved with undefined.
The processorClose algorithm is given a processor as input. It is defined by running the following steps:
-
If processor .
[[isClosed]]
is true, abort these steps. -
Disconnect processor from processor .
[[track]]
. The mechanism to do this is UA specific and the result is that processor is no longer a sink of processor .[[track]]
. -
Close processor .
readable
. [[controller]] . -
Empty processor .
[[queue]]
. -
Set processor .
[[isClosed]]
to true.
-
2.1.5. Handling interaction with the track
When the
[[track]]
of
a
MediaStreamTrackProcessor
processor
delivers
a
frame
to
processor
,
the
UA
MUST
execute
the
handleNewFrame
algorithm
with
processor
as
parameter.
The handleNewFrame algorithm is given a processor as input. It is defined by running the following steps:
-
If processor .
[[queue]]
has processor .[[maxBufferSize]]
elements, run the following steps:-
Let droppedFrame be the result of dequeueing processor .
[[queue]]
. -
Run the Close VideoFrame algorithm with droppedFrame .
-
-
Initialize
timestamp
from presentation timestamp if set, otherwise leave the attribute not present . Initialize
captureTime
from capture timestamp if set, otherwise leave the attribute not present .Initialize
receiveTime
from receive timestamp if set, otherwise leave the attribute not present .Initialize
rtpTimestamp
from RTP timestamp if set, otherwise leave the attribute not present .Enqueue the new frame media data in processor .
[[queue]]
.-
Queue a task to run the maybeReadFrame algorithm with processor as parameter.
At
any
time,
the
UA
MAY
remove
any
frame
from
processor
.
[[queue]]
.
The
UA
may
decide
to
remove
frames
from
processor
.
[[queue]]
,
for
example,
to
prevent
resource
exhaustion
or
to
improve
performance
in
certain
situations.
The application may detect that frames have been dropped by noticing that there is a gap in the timestamps of the frames.
When
the
[[track]]
of
a
MediaStreamTrackProcessor
processor
ends
,
the
processorClose
algorithm
must
be
executed
with
processor
as
parameter.
2.2. VideoTrackGenerator
A
VideoTrackGenerator
allows
the
creation
of
a
video
source
for
a
MediaStreamTrack
in
the
MediaStream
model
that
generates
its
frames
from
a
Stream
of
VideoFrame
objects.
It
has
two
readonly
attributes:
a
writable
WritableStream
and
a
track
MediaStreamTrack
.
The
VideoTrackGenerator
is
the
underlying
sink]
of
its
writable
attribute.
The
track
attribute
is
the
output.
Further
tracks
connected
to
the
same
VideoTrackGenerator
can
be
created
using
the
clone
method
on
the
track
attribute.
The
WritableStream
accepts
VideoFrame
objects.
When
a
VideoFrame
is
written
to
writable
,
the
frame’s
close()
method
is
automatically
invoked,
so
that
its
internal
resources
are
no
longer
accessible
from
JavaScript.
Note: There is consensus in the WG that a source capable of generating a MediaStreamTrack of kind "video" should exist. There is no WG consensus on whether or not a source capable of generating a MediaStreamTrack of kind "audio" should exist.
2.2.1. Interface definition
[Exposed =DedicatedWorker ]interface {
VideoTrackGenerator constructor ();readonly attribute WritableStream writable ;attribute boolean muted ;readonly attribute MediaStreamTrack track ; };
Note: There is WG consensus that this interface should be exposed on DedicatedWorker. There is no WG consensus on whether or not it should be exposed on Window.
2.2.2. Internal slots
-
[[track]]
-
The
MediaStreamTrack
output of this source -
[[isMuted]]
-
A
boolean
whose
value
indicates
whether
this
source
and
all
the
MediaStreamTrack
s it sources, are currentlymuted
or not.
2.2.3. Constructor
VideoTrackGenerator()
-
Let generator be a new
VideoTrackGenerator
object. -
Let track be a newly created
MediaStreamTrack
with source set to generator and tieSourceToContext set tofalse
. -
Initialize generator .
track
to track . -
Return generator .
2.2.4. Attributes
-
writable
, of type WritableStream , readonly -
Allows
writing
video
frames
to
the
VideoTrackGenerator
. When this attribute is accessed for the first time, it MUST be initialized with the following steps:-
Initialize this .
writable
to be a newWritableStream
. -
Set up this .
writable
, with its writeAlgorithm set to writeFrame with this as parameter, with closeAlgorithm set to closeWritable with this as parameter and abortAlgorithm set to closeWritable with this as parameter.
The writeFrame algorithm is given a generator and a frame as input. It is defined by running the following steps:
-
If frame is not a
VideoFrame
object, return a promise rejected with aTypeError
. -
If the value of frame ’s
[[Detached]]
internal slot is true, return a promise rejected with aTypeError
. -
If generator .
[[isMuted]]
is false, for each live track sourced from generator , named track , run the following steps:-
Let clone be the result of running the Clone videoFrame algorithm with frame .
-
Set presentation timestamp to the value of
timestamp
. Set capture timestamp to the value of
captureTime
if present .Set receive timestamp to the value of
receiveTime
if present .Set RTP timestamp to the value of
rtpTimestamp
if present .Send clone to track .
-
-
Run the Close VideoFrame algorithm with frame .
-
Return a promise resolved with undefined.
When the media data is sent to a track, the UA may apply processing (e.g., cropping and downscaling) to ensure that the media data sent to the track satisfies the track’s constraints. Each track may receive a different version of the media data depending on its constraints.
The closeWritable algorithm is given a generator as input. It is defined by running the following steps.
-
For each track
t
sourced from generator , endt
. -
Return a promise resolved with undefined.
-
-
muted
, of type boolean -
Mutes
the
VideoTrackGenerator
. The getter steps are to return this .[[isMuted]]
. The setter steps, given a value newValue , are as follows:-
If newValue is equal to this .
[[isMuted]]
, abort these steps. -
Set this .
[[isMuted]]
to newValue . -
Unless one has been queued already this run of the event loop, queue a task to run the following steps:
-
Let settledValue be this .
[[isMuted]]
. -
For each live track sourced by this , queue a task to set a track’s muted state to settledValue .
-
-
-
track
, of type MediaStreamTrack , readonly -
The
MediaStreamTrack
output. The getter steps are to return this .[[track]]
.
2.2.5. Specialization of MediaStreamTrack behavior
A
VideoTrackGenerator
acts
as
the
source
for
one
or
more
MediaStreamTrack
s.
This
section
adds
clarifications
on
how
a
MediaStreamTrack
sourced
from
a
VideoTrackGenerator
behaves.
2.2.5.1. stop
The
stop
method
stops
the
track.
When
the
last
track
sourced
from
a
VideoTrackGenerator
ends,
that
VideoTrackGenerator
's
writable
is
closed
.
2.2.5.2. Constrainable properties
The
following
constrainable
properties
are
defined
for
any
MediaStreamTrack
s
sourced
from
a
VideoTrackGenerator
:
Property Name | Values | Notes |
---|---|---|
width |
ConstrainULong
|
As
a
setting,
this
is
the
width,
in
pixels,
of
the
latest
frame
received
by
the
track.
As
a
capability,
max
MUST
reflect
the
largest
width
a
VideoFrame
may
have,
and
min
MUST
reflect
the
smallest
width
a
VideoFrame
may
have.
|
height |
ConstrainULong
|
As
a
setting,
this
is
the
height,
in
pixels,
of
the
latest
frame
received
by
the
track.
As
a
capability,
max
MUST
reflect
the
largest
height
a
VideoFrame
may
have,
and
min
MUST
reflect
the
smallest
height
a
VideoFrame
may
have.
|
frameRate |
ConstrainDouble
|
As
a
setting,
this
is
an
estimate
of
the
frame
rate
based
on
frames
recently
received
by
the
track.
As
a
capability
min
MUST
be
zero
and
max
MUST
be
the
maximum
frame
rate
supported
by
the
system.
|
aspectRatio |
ConstrainDouble
|
As
a
setting,
this
is
the
aspect
ratio
of
the
latest
frame
delivered
by
the
track;
this
is
the
width
in
pixels
divided
by
height
in
pixels
as
a
double
rounded
to
the
tenth
decimal
place.
As
a
capability,
min
MUST
be
the
smallest
aspect
ratio
supported
by
a
VideoFrame
,
and
max
MUST
be
the
largest
aspect
ratio
supported
by
a
VideoFrame
.
|
resizeMode |
ConstrainDOMString
|
As
a
setting,
this
string
should
be
one
of
the
members
of
VideoResizeModeEnum
.
The
value
"
none
"
means
that
the
frames
output
by
the
MediaStreamTrack
are
unmodified
versions
of
the
frames
written
to
the
writable
backing
the
track,
regardless
of
any
constraints.
The
value
"
crop-and-scale
"
means
that
the
frames
output
by
the
MediaStreamTrack
may
be
cropped
and/or
downscaled
versions
of
the
source
frames,
based
on
the
values
of
the
width,
height
and
aspectRatio
constraints
of
the
track.
As
a
capability,
the
values
"
none
"
and
"
crop-and-scale
"
both
MUST
be
present.
|
The
applyConstraints
method
applied
to
a
video
MediaStreamTrack
sourced
from
a
VideoTrackGenerator
supports
the
properties
defined
above.
It
can
be
used,
for
example,
to
resize
frames
or
adjust
the
frame
rate
of
the
track.
Note
that
these
constraints
have
no
effect
on
the
VideoFrame
objects
written
to
the
writable
of
a
VideoTrackGenerator
,
just
on
the
output
of
the
track
on
which
the
constraints
have
been
applied.
Note
also
that,
since
a
VideoTrackGenerator
can
in
principle
produce
media
data
with
any
setting
for
the
supported
constrainable
properties,
an
applyConstraints
call
on
a
track
backed
by
a
VideoTrackGenerator
will
generally
not
fail
with
OverconstrainedError
unless
the
given
constraints
are
outside
the
system-supported
range,
as
reported
by
getCapabilities
.
2.2.5.3. Events and attributes
Events and attributes work the same as for any
MediaStreamTrack
.
It
is
relevant
to
note
that
if
the
writable
stream
of
a
VideoTrackGenerator
is
closed,
all
the
live
tracks
connected
to
it
are
ended
and
the
ended
event
is
fired
on
them.
3. Examples
3.1. Video Processing
Consider a face recognition function
detectFace(videoFrame)
that
returns
a
face
position
(in
some
format),
and
a
manipulation
function
blurBackground(videoFrame,
facePosition)
that
returns
a
new
VideoFrame
similar
to
the
given
videoFrame
,
but
with
the
non-face
parts
blurred.
The
example
also
shows
the
video
before
and
after
effects
on
video
elements.
// main.js const stream= await navigator. mediaDevices. getUserMedia({ video: true }); const videoBefore= document. getElementById( 'video-before' ); const videoAfter= document. getElementById( 'video-after' ); videoBefore. srcObject= stream. clone(); const [ track] = stream. getVideoTracks(); const worker= new Worker( 'worker.js' ); worker. postMessage({ track}, [ track]); const { data} = await new Promise( r=> worker. onmessage); videoAfter. srcObject= new MediaStream([ data. track]); // worker.js self. onmessage= async ({ data: { track}}) => { const source= new VideoTrackGenerator(); parent. postMessage({ track: source. track}, [ source. track]); const { readable} = new MediaStreamTrackProcessor({ track}); const transformer= new TransformStream({ async transform( frame, controller) { const facePosition= await detectFace( frame); const newFrame= blurBackground( frame, facePosition); frame. close(); controller. enqueue( newFrame); } }); await readable. pipeThrough( transformer). pipeTo( source. writable); };
3.2. Multi-consumer post-processing with constraints
A common use case is to remove the background from live camera video fed into a video conference, with a live self-view showing the result. It’s desirable for the self-view to have a high frame rate even if the frame rate used for actual sending may dip lower due to back pressure from bandwidth constraints. This can be achieved by applying constraints to a track clone, avoiding having to process twice.// main.js const stream= await navigator. mediaDevices. getUserMedia({ video: true }); const [ track] = stream. getVideoTracks(); const worker= new Worker( 'worker.js' ); worker. postMessage({ track}, [ track]); const { data} = await new Promise( r=> worker. onmessage); const selfView= document. getElementById( 'video-self' ); selfView. srcObject= new MediaStream([ data. track. clone()]); // 60 fps await data. track. applyConstraints({ width: 320 , height: 200 , frameRate: 30 }); const pc= new RTCPeerConnection( config); pc. addTrack( data. track); // 30 fps // worker.js self. onmessage= async ({ data: { track}}) => { const source= new VideoTrackGenerator(); parent. postMessage({ track: source. track}, [ source. track]); const { readable} = new MediaStreamTrackProcessor({ track}); const transformer= new TransformStream({ transform: myRemoveBackgroundFromVideo}); await readable. pipeThrough( transformer). pipeTo( source. writable); };
3.3. Multi-consumer post-processing with constraints in a worker
Being able to show a higher frame-rate self-view is also relevant when sending video frames over WebTransport in a worker. The same technique above may be used here, except constraints are applied to a track clone in the worker.// main.js const stream= await navigator. mediaDevices. getUserMedia({ video: true }); const [ track] = stream. getVideoTracks(); const worker= new Worker( 'worker.js' ); worker. postMessage({ track}, [ track]); const { data} = await new Promise( r=> worker. onmessage); const selfView= document. getElementById( 'video-self' ); selfView. srcObject= new MediaStream([ data. track]); // 60 fps // worker.js self. onmessage= async ({ data: { track}}) => { const source= new VideoTrackGenerator(); const sendTrack= source. track. clone(); parent. postMessage({ track: source. track}, [ source. track]); await sendTrack. applyConstraints({ width: 320 , height: 200 , frameRate: 30 }); const wt= new WebTransport( "https://webtransport.org:8080/up" ); const { readable} = new MediaStreamTrackProcessor({ track}); const transformer= new TransformStream({ transform: myRemoveBackgroundFromVideo}); await readable. pipeThrough( transformer) . pipeThrough({ writable: source. writable, readable: sendTrack. readable}), . pipeThrough( createMyEncodeVideoStream({ codec: "vp8" , width: 640 , height: 480 , bitrate: 1000000 , })) . pipeThrough( new TransformStream({ transform: mySerializer})); . pipeTo( wt. createUnidirectionalStream()); // 30 fps };
The
above
example
avoids
using
the
tee()
function
to
serve
multiple
consumers,
due
to
its
issues
with
real-time
streams.
For brevity, the example also over-simplifies using a WebCodecs wrapper to encode and send video frames over a single WebTransport stream (incurring head-of-line blocking).
4. Implementation advice
This section is informative.
4.1. Use with multiple consumers
There are use cases where the programmer may desire that a single stream of frames is consumed by multiple consumers.
Examples
include
the
case
where
the
result
of
a
background
blurring
function
should
be
both
displayed
in
a
self-view
and
encoded
using
a
VideoEncoder
.
For
cases
where
both
consumers
are
consuming
unprocessed
frames,
and
synchronization
is
not
desired,
instantianting
multiple
MediaStreamTrackProcessor
objects
is
a
robust
solution.
For
cases
where
both
consumers
intend
to
convert
the
result
of
a
processing
step
into
a
MediaStreamTrack
using
a
VideoTrackGenerator
,
for
example
when
feeding
a
processed
stream
to
both
a
<video>
tag
and
an
RTCPeerConnection
,
attaching
the
resulting
MediaStreamTrack
to
multiple
sinks
may
be
the
most
appropriate
mechanism.
For cases where the downstream processing takes frames, not streams, the frames can be cloned as needed and sent off to the downstream processing; "clone" is a cheap operation.
When the stream is the output of some processing, and both branches need a Stream object to do further processing, one needs a function that produces two streams from one stream.
However, the standard tee() operation is problematic in this context:
-
It defeats the backpressure mechanism that guards against excessive queueing
-
It creates multiple links to the same buffers, meaning that the question of which consumer gets to destroy() the buffer is a difficult one to address
Therefore, the use of tee() with Streams containing media should only be done when fully understanding the implications. Instead, custom elements for splitting streams more appropriate to the use case should be used.
-
If both branches require the ability to dispose of the frames, clone() the frame and enqueue distinct copies in both queues. This corresponds to the function ReadableStreamTee(stream, cloneForBranch2=true). Then choose one of the alternatives below.
-
If one branch requires all frames, and the other branch tolerates dropped frames, enqueue buffers in the all-frames-required stream and use the backpressure signal from that stream to stop reading from the source. If backpressure signal from the other stream indicates room, enqueue the same frame in that queue too.
-
If neither stream tolerates dropped frames, use the combined backpressure signal to stop reading from the source. In this case, frames will be processed in lockstep if the buffer sizes are both 1.
-
If it is OK for the incoming stream to be stalled only when the underlying buffer pool allocated to the process is exhausted, standard tee() may be used.
Note: There are issues filed on the Streams spec where the resolution might affect this section: https://github.com/whatwg/streams/issues/1157, https://github.com/whatwg/streams/issues/1156, https://github.com/whatwg/streams/issues/401, https://github.com/whatwg/streams/issues/1186
5. Security and Privacy considerations
This
API
defines
a
MediaStreamTrack
source
and
a
MediaStreamTrack
sink.
The
security
and
privacy
of
the
source
(
VideoTrackGenerator
)
relies
on
the
same-origin
policy.
That
is,
the
data
VideoTrackGenerator
can
make
available
in
the
form
of
a
MediaStreamTrack
must
be
visible
to
the
document
before
a
VideoFrame
object
can
be
constructed
and
pushed
into
the
VideoTrackGenerator
.
Any
attempt
to
create
VideoFrame
objects
using
cross-origin
data
will
fail.
Therefore,
VideoTrackGenerator
does
not
introduce
any
new
fingerprinting
surface.
The
MediaStreamTrack
sink
introduced
by
this
API
(
MediaStreamTrackProcessor
)
exposes
MediaStreamTrack
the
same
data
that
is
exposed
by
other
MediaStreamTrack
sinks
such
as
WebRTC
peer
connections,
and
media
elements.
The
security
and
privacy
of
MediaStreamTrackProcessor
relies
on
the
security
and
privacy
of
the
MediaStreamTrack
sources
of
the
tracks
to
which
MediaStreamTrackProcessor
is
connected.
For
example,
camera,
microphone
and
screen-capture
tracks
rely
on
explicit
use
authorization
via
permission
dialogs
(see
[MEDIACAPTURE-STREAMS]
and
[SCREEN-CAPTURE]
),
while
element
capture
and
VideoTrackGenerator
rely
on
the
same-origin
policy.
A
potential
issue
with
MediaStreamTrackProcessor
is
resource
exhaustion.
For
example,
a
site
might
hold
on
to
too
many
open
VideoFrame
objects
and
deplete
a
system-wide
pool
of
GPU-memory-backed
frames.
UAs
can
mitigate
this
risk
by
limiting
the
number
of
pool-backed
frames
a
site
can
hold.
This
can
be
achieved
by
reducing
the
maximum
number
of
buffered
frames
and
by
refusing
to
deliver
more
frames
to
readable
once
the
budget
limit
is
reached.
Accidental
exhaustion
is
also
mitigated
by
automatic
closing
of
VideoFrame
objects
once
they
are
written
to
a
VideoTrackGenerator
.
6. Backwards compatibility with earlier proposals
This section is informative.
Previous proposals for this interface had an API like this:
[Exposed =Window ,DedicatedWorker ]interface MediaStreamTrackGenerator :MediaStreamTrack {constructor (MediaStreamTrackGeneratorInit init );attribute WritableStream writable ; // VideoFrame or AudioData };dictionary MediaStreamTrackGeneratorInit {required DOMString kind ; };
The VideoTrackGenerator can be shimmed on top of MediaStreamTrackGenerator like this:
// Not tested, unlikely to work as written! class VideoTrackGenerator { constructor() { this.innerGenerator = new MediaStreamTrackGenerator({kind: 'video'}); this.writable = this.innerGenerator.writable; this.track = this.innerGenerator.clone(); } // Missing: shim for setting of the "muted" attribute. };
Further description of the previous proposals, including considerations involving processing of audio, can be found in earlier versions of this document.
Note: A link will be placed here pointing to the chrome-96 branch when we have finished moving repos about.