Copyright © 2018 W3C ® ( MIT , ERCIM , Keio , Beihang ). W3C liability , trademark and permissive document license rules apply.
This document collects use cases and requirements for improved support for timed events related to audio or video media on the Web, such as subtitles, captions, or other web content, where synchronization to a playing audio or video media stream is needed, and makes recommendations for new or changed Web APIs to realize these requirements.
This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at https://www.w3.org/TR/.
This document was published by the Media & Entertainment Interest Group as an Editor's Draft.
Comments regarding this document are welcome. Please send them to public-web-and-tv@w3.org ( archives ).
Publication as an Editor's Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.
This document was produced by a group operating under the W3C Patent Policy . W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy .
This document is governed by the 1 February 2018 W3C Process Document .
Media timed events describes a generic capability for making changes to a Web page, or executing application code triggered from JavaScript events, at specific points on the media timeline of an audio or video media stream.
The following terms are used in this document:
The following terms are defined in [ HTML52 ]:
This
section
describes
specific
Media-timed
events
carry
metadata
that
is
related
to
points
in
time,
or
regions
of
time
on
the
media
timeline,
which
can
be
used
to
trigger
retrieval
and/or
rendering
of
web
resources
synchronized
with
media
playback.
Such
resources
can
be
used
to
enhance
user
experience
in
the
context
of
media
that
is
being
rendered.
Some
examples
include
display
of
social
media
feeds
corresponding
to
a
live
broadcast
such
as
a
sporting
event,
banner
advertisements
for
sponsored
content,
accessibility-related
assets,
such
as
large
print
rendering
of
captions,
and
display
of
track
titles
or
images
alongside
an
audio
stream.
The
following
sections
describe
a
few
use
cases
for
media
timed
events.
in
more
detail.
Use
cases
for
A
media
timed
events
include:
Displaying
of
images
content
provider
wants
to
provide
visual
information
alongside
an
audio
stream.
For
example,
in
RadioVIS
in
DVB,
an
emsg
event
contains
stream,
such
as
an
image
URL,
which
of
the
user
agent
requests.
In
this
use
case,
synchronization
artist
and
title
of
the
image
rendering
current
playing
track,
to
within
a
second
or
so
is
acceptable
give
users
live
information
about
the
content
they
are
listening
to.
Examples
include
HLS
timed
metadata
[
HLS-TIMED-METADATA
],
which
uses
in-band
ID3
metadata
to
carry
the
image
content,
and
RadioVIS
in
DVB
([
DVB-DASH
],
section
9.1.7).
Such
9.1.7),
which
defines
an
in-band
event
messages
that
contain
image
displays
may
be
activated
URLs
and
deactivated
at
different
intervals
text
messages
to
be
displayed,
with
information
about
when
the
content
should
be
displayed,
in
relation
to
the
duration
media
timeline.
Section
5.10.4
of
the
associated
emsg
event.
Notifying
the
[
MPEGDASH
]
describes
a
DASH
specific
event
that
is
used
to
notify
a
DASH
player
Web
application
that
it
should
refresh
its
copy
of
the
MPD
document
([
MPEGDASH
],
section
5.10.4).
This
manifest
(MPD)
document.
An
in-band
emsg
event
is
used
an
alternative
to
setting
a
cache
duration
in
the
response
to
the
HTTP
response,
request
for
the
manifest,
so
the
client
can
refresh
the
MPD
when
it
actually
changes,
and
reduces
so
reducing
the
load
on
HTTP
servers
caused
by
frequent
server
requests.
Reference: M&E IG call 1 Feb 2018: Minutes , [ DASH-EVENTING ].
See also this issue against the [ WEB-MEDIA-GUIDELINES ]. TODO: Add detail here.
[
WebVMT
]
is
a
format
for
metadata
cues,
synchronised
with
the
a
timed
media
file,
that
can
drive
an
online
map,
e.g.,
OpenStreetMap,
rendered
in
a
separate
HTML
element
alongside
the
media
element
on
the
web
page.
The
media
playhead
position
controls
presentation
and
animation
of
the
map,
e.g.,
pan
and
zoom,
and
allows
annotations
to
be
added
and
removed,
e.g.,
markers,
at
specified
times
during
media
playback.
Control
can
also
be
overridden
by
the
user
with
the
usual
interactive
features
of
the
map
at
any
time,
e.g.,
zoom.
Concrete
examples
are
provided
by
the
tech
demos
at
the
WebVMT
website.
Reference: M&E IG TF call 17 Sept 2018: Minutes .
A
video
image
analysis
system
processes
a
media
stream
to
detect
and
recognize
objects
shown
in
the
videox.
This
system
generates
metadata
describing
the
objects,
including
timestamps
that
describe
the
when
the
objects
are
visible,
together
with
position
information
(e.g.,
bounding
boxes).
A
web
application
then
uses
this
could
be
rendering
timed
metadata
to
overlay
labels
and
annotations
on
the
video
using
HTML
and
CSS.
During
a
live
media
presentation,
dynamic
and
unpredictable
events
may
occur
which
causes
cause
temporary
suspension
of
the
media
presentation.
During
that
suspension
interval,
auxiliary
content
such
as
the
presentation
of
UI
controls
and
media
files,
may
be
unavailable.
Depending
on
the
specific
user
engagement
(or
not)
with
the
UI
controls
and
the
time
at
which
any
such
engagement
occurs,
specific
web
resources
may
be
rendered
at
defined
times
in
a
synchronized
manner.
For
example,
a
multimedia
A/V
clip
along
with
subtitles
corresponding
to
an
advertisement,
and
which
were
previously
downloaded
and
cached
by
the
UA,
are
played
out.
This section describes gaps in existing existing Web platform capabilities needed to support the use cases and requirements described in this document. Where applicable, this section also describes how existing Web platform features can be used as workarounds, and any associated limitations.
The
DataCue
API
has
been
previously
discussed
as
a
means
to
deliver
in-band
event
data
to
Web
applications,
but
this
is
not
implemented
in
all
of
the
main
browser
engines.
It
is
included
in
the
18
October
2018
HTML
5.3
draft
[
HTML53-20181018
],
but
is
not
included
in
[
HTML
].
See
discussion
here
and
notes
on
implementation
status
here
.
WebKit
supports
a
DataCue
interface
that
extends
HTML5
DataCue
with
two
attributes
to
support
non-text
metadata,
type
and
value
.
interface DataCue : TextTrackCue {
attribute ArrayBuffer data; // Always empty
// Proposed extensions.
attribute any value;
readonly attribute DOMString type;
};
type
is
a
string
identifying
the
type
of
metadata:
WebKit
DataCue
metadata
types
|
|
---|---|
"com.apple.quicktime.udta"
|
QuickTime User Data |
"com.apple.quicktime.mdta"
|
QuickTime Metadata |
"com.apple.itunes"
|
iTunes metadata |
"org.mp4ra"
|
MPEG-4 metadata |
"org.id3"
|
ID3 metadata |
and
value
is
an
object
with
the
metadata
item
key,
data,
and
optionally
a
locale:
value = {
key: String
data: String | Number | Array | ArrayBuffer | Object
locale: String
}
Neither
[
MSE-BYTE-STREAM-FORMAT-ISOBMFF
]
nor
[
INBANDTRACKS
]
describe
handling
of
emsg
boxes.
On resource constrained devices such as smart TVs and streaming sticks, parsing media segments to extract event information leads to a significant performance penalty, which can have an impact on UI rendering updates if this is done on the UI thread. There can also be an impact on the battery life of mobile devices. Given that the media segments will be parsed anyway by the user agent, parsing in JavaScript is an expensive overhead that could be avoided.
[
HBBTV
]
section
9.3.2
describes
a
mapping
between
the
emsg
fields
described
above
and
the
TextTrack
and
DataCue
APIs.
A
TextTrack
instance
is
created
for
each
event
stream
signalled
in
the
MPD
document
(as
identified
by
the
schemeIdUri
and
value
),
and
the
inBandMetadataTrackDispatchType
TextTrack
attribute
contains
the
scheme_id_uri
and
value
values.
Because
HbbTV
devices
include
a
native
DASH
client,
parsing
of
the
MPD
document
and
creation
of
the
TextTrack
s
is
done
by
the
UA.
To support DASH clients implemented in Web applications, there is therefore either a need for an API that allows applications to tell the UA which schemes it wants to receive, or the UA should simply expose all event streams to applications. Which of these is preferred?
The
timing
guarantees
provided
in
HTML5
regarding
the
triggering
of
TextTrackCue
events
may
be
not
be
enough
to
avoid
events
being
missed
.
Describe gaps relating to synchronized rendering of web resources. Can we define a generic web API for scheduling page changes synchronized to playing media? Related: [ css-animations-1 ], [ web-animations-1 ], [ css-transitions-1 ]. See also: https://github.com/bbc/VideoContext . Should this be in scope for the TF?
There
is
no
API
for
surfacing
Web
content
embedded
in
ISO
BMFF
containers
into
the
browser
(e.g.,
the
HTMLCue
proposal
discussed
at
TPAC
2015
).
Add more detail on what's required. Some questions / considerations:
This section describes recommendations from the Media & Entertainment Interest Group for the development of a generic media timed event API.
The
API
should
allow
Web
applications
to
subscribe
to
receive
specific
event
types.
For
example,
to
support
DASH
emsg
and
MPD
events,
the
API
should
allow
subscription
by
id
and
(optional)
value
.
This
is
to
make
receiving
events
opt-in
from
the
application
point
of
view.
The
user
agent
should
deliver
only
those
events
to
a
Web
application
for
which
the
application
has
subscribed.
The
API
should
also
allow
Web
applications
to
unsubscribe
from
specific
event
streams
by
event
type.
To
be
able
to
handle
out
of
band
events,
the
API
must
allow
Web
applications
to
create
events
to
be
added
to
the
media
timeline,
to
be
triggered
by
the
user
agent.
The
API
should
allow
the
Web
application
to
provide
all
necessary
parameters
to
define
the
event,
including
start
and
end
times,
event
type,
and
data
payload.
The
payload
should
be
any
data
type
(e.g.,
the
set
of
types
supported
by
the
WebKit
DataCue
).
For
DASH
MPD
events,
the
event
type
is
defined
by
the
id
and
(optional)
value
fields.
For those events that the application has subscribed to receive, the API must:
The API must provide guarantees that no events can be missed during linear playback of the media.
We
recommend
updating
[
INBANDTRACKS
]
to
describe
handling
of
in-band
media
timed
events
supported
on
the
web
platform,
following
a
registry
approach
with
one
specification
per
media
format
that
describes
the
event
details
for
that
format.
In
particular,
we
recommend
that
browser
engines
support
emsg
events.
The time marches on algorithm should be reviewed and updated to ensure that events are delivered to the Web application within time constraints described elsewhere in this report.
Thanks to Charles Lo, Nigel Megitt, Jon Piesing, and Rob Smith for their contributions to this document.