Copyright © 2018 W3C® (MIT, ERCIM, Keio, Beihang). W3C liability, trademark and permissive document license rules apply.
This document collects use cases and requirements for improved support for timed events related to audio or video media on the Web, such as subtitles, captions, or other web content, where synchronization to a playing audio or video media stream is needed, and makes recommendations for new or changed Web APIs to realize these requirements.
This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at https://www.w3.org/TR/.
This document was published by the Media & Entertainment Interest Group as an Editor's Draft.
Comments regarding this document are welcome. Please send them to public-web-and-tv@w3.org (archives).
Publication as an Editor's Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.
This document was produced by a group operating under the W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.
This document is governed by the 1 February 2018 W3C Process Document.
Media timed events describes a generic capability for making changes to a Web page, or executing application code triggered from JavaScript events, at specific points on the media timeline of an audio or video media stream.
The following terms are used in this document:
The following terms are defined in [HTML52]:
This section describes specific use cases for media timed events.
Use cases for media timed events include:
emsg
event contains an image URL,
which the user agent requests. In this use case, synchronization of the
image rendering to within a second or so is acceptable
([DVB-DASH], section 9.1.7). Such image displays may be activated and deactivated
at different intervals in the duration of the associated emsg
event.
Reference: M&E IG call 1 Feb 2018: Minutes, [DASH-EVENTING].
See also this issue against the [WEB-MEDIA-GUIDELINES]. TODO: Add detail here.
[WebVMT] is a format for metadata cues, synchronised with the timed media file, that can drive an online map, e.g., OpenStreetMap, rendered in a separate HTML element alongside the media element on the web page. The media playhead position controls presentation and animation of the map, e.g., pan and zoom, and allows annotations to be added and removed, e.g., markers, at specified times during media playback. Control can also be overridden by the user with the usual interactive features of the map at any time, e.g., zoom. Concrete examples are provided by the tech demos at the WebVMT website.
Reference: M&E IG TF call 17 Sept 2018: Minutes.
Add use case descriptions for synchronised rendering here. Note that this could be rendering of any web resource, not necessarily those embedded in media containers. Describe a few motivating application scenarios.
During a live media presentation, dynamic and unpredictable events may occur which causes temporary suspension of the media presentation. During that suspension interval, auxiliary content such as the presentation of UI controls and media files, may be unavailable. Depending on the specific user engagement (or not) with the UI controls and the time at which any such engagement occurs, specific web resources may be rendered at defined times in a synchronized manner. For example, a multimedia A/V clip along with subtitles corresponding to an advertisement, and which were previously downloaded and cached by the UA, are played out.
Media-timed events can be used to trigger retrieval and/or rendering of web resources. Such resources can be used to enhance user experience in the context of media that is being rendered. Some examples include:
Add use case descriptions for rendering of Web content embedded in media containers (e.g., [WEB-ISOBMFF]). Describe a few motivating application scenarios.
This section describes gaps in existing existing Web platform capabilities needed to support the use cases and requirements described in this document. Where applicable, this section also describes how existing Web platform features can be used as workarounds, and any associated limitations.
The DataCue
API has been previously discussed as a means to
deliver in-band event data to Web applications, but this is not implemented
in all of the main browser engines. It is included
in the 26 April 2018 HTML 5.3 draft [HTML53-20180426], but is
not included
in [HTML]. See discussion here
and notes on implementation status here.
WebKit supports
a DataCue
interface that extends HTML5 DataCue
with two attributes to support non-text metadata, type
and
value
.
interface DataCue : TextTrackCue {
attribute ArrayBuffer data; // Always empty
// Proposed extensions.
attribute any value;
readonly attribute DOMString type;
};
type
is a string identifying the type of metadata:
WebKit DataCue metadata types |
|
---|---|
"com.apple.quicktime.udta" |
QuickTime User Data |
"com.apple.quicktime.mdta" |
QuickTime Metadata |
"com.apple.itunes" |
iTunes metadata |
"org.mp4ra" |
MPEG-4 metadata |
"org.id3" |
ID3 metadata |
and value
is an object with the metadata item key, data, and optionally a locale:
value = {
key: String
data: String | Number | Array | ArrayBuffer | Object
locale: String
}
Neither [MSE-BYTE-STREAM-FORMAT-ISOBMFF] nor [INBANDTRACKS] describe
handling of emsg
boxes.
On resource constrained devices such as smart TVs and streaming sticks, parsing media segments to extract event information leads to a significant performance penalty, which can have an impact on UI rendering updates if this is done on the UI thread. There can also be an impact on the battery life of mobile devices. Given that the media segments will be parsed anyway by the user agent, parsing in JavaScript is an expensive overhead that could be avoided.
[HBBTV] section 9.3.2 describes a mapping between the emsg
fields described above
and the TextTrack
and DataCue
APIs. A TextTrack
instance is created for each event
stream signalled in the MPD document (as identified by the
schemeIdUri
and value
), and the
inBandMetadataTrackDispatchType
TextTrack
attribute contains the scheme_id_uri
and value
values. Because HbbTV devices include a native
DASH client, parsing of the MPD document and creation of the
TextTrack
s is done by the UA.
To support DASH clients implemented in Web applications, there is therefore either a need for an API that allows applications to tell the UA which schemes it wants to receive, or the UA should simply expose all event streams to applications. Which of these is preferred?
The timing guarantees provided in HTML5 regarding the triggering of
TextTrackCue
events may be not be enough to avoid
events being missed.
Describe gaps relating to synchronized rendering of web resources. Can we define a generic web API for scheduling page changes synchronized to playing media? Related: [css-animations-1], [web-animations-1], [css-transitions-1]. See also: https://github.com/bbc/VideoContext. Should this be in scope for the TF?
There is no API for surfacing Web content embedded in ISO BMFF
containers into the browser (e.g., the HTMLCue
proposal
discussed at TPAC 2015).
Add more detail on what's required. Some questions / considerations:
This section describes recommendations from the Media & Entertainment Interest Group for the development of a generic media timed event API.
The API should allow Web applications to subscribe to receive specific
event types. For example, to support DASH emsg
and MPD events,
the API should allow subscription by id
and (optional) value
.
This is to make receiving events opt-in from the application point of view.
The user agent should deliver only those events to a Web application
for which the application has subscribed. The API should also allow Web
applications to unsubscribe from specific event streams by event type.
To be able to handle out of band events, the API must allow Web
applications to create events to be added to the media timeline,
to be triggered by the user agent. The API should allow the
Web application to provide all necessary parameters to define
the event, including start and end times, event type, and data
payload. The payload should be any data type (e.g., the set of
types supported by the WebKit DataCue
). For DASH MPD
events, the event type is defined by the id
and
(optional) value
fields.
For those events that the application has subscribed to receive, the API must:
The API must provide guarantees that no events can be missed during linear playback of the media.
We recommend updating [INBANDTRACKS] to describe handling of in-band media
timed events supported on the web platform, following a registry approach
with one specification per media format that describes the event details
for that format. In particular, we recommend that browser engines support
emsg
events.
The time marches on algorithm should be reviewed and updated to ensure that events are delivered to the Web application within time constraints described elsewhere in this report.
Thanks to Charles Lo, Nigel Megitt, Jon Piesing, and Rob Smith for their contributions to this document.