Media Timed Events

W3C Editor's Draft

This version:
https://w3c.github.io/me-media-timed-events/
Latest published version:
https://www.w3.org/TR/media-timed-events/
Latest editor's draft:
https://w3c.github.io/me-media-timed-events/
Editors:
(British Broadcasting Corporation)
(Qualcomm)

Abstract

This document collects use cases and requirements for improved support for timed events related to audio or video media on the Web, such as subtitles, captions, or other web content, where synchronization to a playing audio or video media stream is needed, and makes recommendations for new or changed Web APIs to realize these requirements.

Status of This Document

This is a preview

Do not attempt to implement this version of the specification. Do not reference this version as authoritative in any way. Instead, see https://w3c.github.io/me-media-timed-events/ for the Editor's draft.

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at https://www.w3.org/TR/.

This document was published by the Media & Entertainment Interest Group as an Editor's Draft.

Comments regarding this document are welcome. Please send them to public-web-and-tv@w3.org (archives).

Publication as an Editor's Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

This document was produced by a group operating under the W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.

This document is governed by the 1 February 2018 W3C Process Document.

1. Introduction

Media timed events describes a generic capability for making changes to a Web page, or executing application code triggered from JavaScript events, at specific points on the media timeline of an audio or video media stream.

2. Terminology

The following terms are used in this document:

The following terms are defined in [HTML52]:

3. Use cases

This section describes specific use cases for media timed events.

3.1 Synchronised event triggering

Use cases for media timed events include:

Reference: M&E IG call 1 Feb 2018: Minutes, [DASH-EVENTING].

Editor's note

See also this issue against the [WEB-MEDIA-GUIDELINES]. TODO: Add detail here.

3.2 Synchronized rendering of web resources

[WebVMT] is a format for metadata cues, synchronised with the timed media file, that can drive an online map, e.g., OpenStreetMap, rendered in a separate HTML element alongside the media element on the web page. The media playhead position controls presentation and animation of the map, e.g., pan and zoom, and allows annotations to be added and removed, e.g., markers, at specified times during media playback. Control can also be overridden by the user with the usual interactive features of the map at any time, e.g., zoom. Concrete examples are provided by the tech demos at the WebVMT website.

Reference: M&E IG TF call 17 Sept 2018: Minutes.

Editor's note

Add use case descriptions for synchronised rendering here. Note that this could be rendering of any web resource, not necessarily those embedded in media containers. Describe a few motivating application scenarios.

During a live media presentation, dynamic and unpredictable events may occur which causes temporary suspension of the media presentation. During that suspension interval, auxiliary content such as the presentation of UI controls and media files, may be unavailable. Depending on the specific user engagement (or not) with the UI controls and the time at which any such engagement occurs, specific web resources may be rendered at defined times in a synchronized manner. For example, a multimedia A/V clip along with subtitles corresponding to an advertisement, and which were previously downloaded and cached by the UA, are played out.

3.3 Rendering of Web content embedded in media containers

Media-timed events can be used to trigger retrieval and/or rendering of web resources. Such resources can be used to enhance user experience in the context of media that is being rendered. Some examples include:

Editor's note

Add use case descriptions for rendering of Web content embedded in media containers (e.g., [WEB-ISOBMFF]). Describe a few motivating application scenarios.

5. Gap analysis

This section describes gaps in existing existing Web platform capabilities needed to support the use cases and requirements described in this document. Where applicable, this section also describes how existing Web platform features can be used as workarounds, and any associated limitations.

5.1 Synchronized event triggering

5.1.1 DASH and ISO BMFF emsg events

The DataCue API has been previously discussed as a means to deliver in-band event data to Web applications, but this is not implemented in all of the main browser engines. It is included in the 26 April 2018 HTML 5.3 draft [HTML53-20180426], but is not included in [HTML]. See discussion here and notes on implementation status here.

WebKit supports a DataCue interface that extends HTML5 DataCue with two attributes to support non-text metadata, type and value.

interface DataCue : TextTrackCue {
  attribute ArrayBuffer data; // Always empty

  // Proposed extensions.
  attribute any value;
  readonly attribute DOMString type;
};

type is a string identifying the type of metadata:

WebKit DataCue metadata types
"com.apple.quicktime.udta" QuickTime User Data
"com.apple.quicktime.mdta" QuickTime Metadata
"com.apple.itunes" iTunes metadata
"org.mp4ra" MPEG-4 metadata
"org.id3" ID3 metadata

and value is an object with the metadata item key, data, and optionally a locale:

value = {
  key: String
  data: String | Number | Array | ArrayBuffer | Object
  locale: String
}

Neither [MSE-BYTE-STREAM-FORMAT-ISOBMFF] nor [INBANDTRACKS] describe handling of emsg boxes.

On resource constrained devices such as smart TVs and streaming sticks, parsing media segments to extract event information leads to a significant performance penalty, which can have an impact on UI rendering updates if this is done on the UI thread. There can also be an impact on the battery life of mobile devices. Given that the media segments will be parsed anyway by the user agent, parsing in JavaScript is an expensive overhead that could be avoided.

[HBBTV] section 9.3.2 describes a mapping between the emsg fields described above and the TextTrack and DataCue APIs. A TextTrack instance is created for each event stream signalled in the MPD document (as identified by the schemeIdUri and value), and the inBandMetadataTrackDispatchType TextTrack attribute contains the scheme_id_uri and value values. Because HbbTV devices include a native DASH client, parsing of the MPD document and creation of the TextTracks is done by the UA.

Editor's note

To support DASH clients implemented in Web applications, there is therefore either a need for an API that allows applications to tell the UA which schemes it wants to receive, or the UA should simply expose all event streams to applications. Which of these is preferred?

5.1.2 Synchronization and timing

The timing guarantees provided in HTML5 regarding the triggering of TextTrackCue events may be not be enough to avoid events being missed.

5.2 Synchronized rendering of web resources

Editor's note

Describe gaps relating to synchronized rendering of web resources. Can we define a generic web API for scheduling page changes synchronized to playing media? Related: [css-animations-1], [web-animations-1], [css-transitions-1]. See also: https://github.com/bbc/VideoContext. Should this be in scope for the TF?

5.3 Rendering of Web content embedded in media containers

There is no API for surfacing Web content embedded in ISO BMFF containers into the browser (e.g., the HTMLCue proposal discussed at TPAC 2015).

Editor's note

Add more detail on what's required. Some questions / considerations:

Editor's note
  • Are the web resources intended to be handed to a Web application for rendering, or direct rendering by the UA?
  • How do we guarantee that resources are delivered to the browser sufficiently ahead of time?
  • How does same-origin policy affect such resources?

6. Recommendations

This section describes recommendations from the Media & Entertainment Interest Group for the development of a generic media timed event API.

6.1 Subscribing to event streams

The API should allow Web applications to subscribe to receive specific event types. For example, to support DASH emsg and MPD events, the API should allow subscription by id and (optional) value. This is to make receiving events opt-in from the application point of view. The user agent should deliver only those events to a Web application for which the application has subscribed. The API should also allow Web applications to unsubscribe from specific event streams by event type.

6.2 Out-of-band events

To be able to handle out of band events, the API must allow Web applications to create events to be added to the media timeline, to be triggered by the user agent. The API should allow the Web application to provide all necessary parameters to define the event, including start and end times, event type, and data payload. The payload should be any data type (e.g., the set of types supported by the WebKit DataCue). For DASH MPD events, the event type is defined by the id and (optional) value fields.

6.3 Event triggering

For those events that the application has subscribed to receive, the API must:

The API must provide guarantees that no events can be missed during linear playback of the media.

6.4 In-band event processing

We recommend updating [INBANDTRACKS] to describe handling of in-band media timed events supported on the web platform, following a registry approach with one specification per media format that describes the event details for that format. In particular, we recommend that browser engines support emsg events.

6.5 Synchronization

The time marches on algorithm should be reviewed and updated to ensure that events are delivered to the Web application within time constraints described elsewhere in this report.

7. Acknowledgments

Thanks to Charles Lo, Nigel Megitt, Jon Piesing, and Rob Smith for their contributions to this document.

A. References

A.1 Normative references

[3GPP-INTERACTIVITY-TR]
TR 26.953: Interactivity Support for 3GPP-Based Streaming and Download Services (Release 15). 3GPP. June 2018. URL: http://www.3gpp.org/ftp/Specs/archive/26_series/26.953/26953-f00.zip
[3GPP-INTERACTIVITY-WID]
SP-170796: New WID on 3GPP Service Interactivity. 3GPP. September 2017. URL: http://www.3gpp.org/ftp/tsg_sa/TSG_SA/TSGS_77/Docs/SP-170796.zip
[BBC-SUBTITLES]
Subtitle Guidelines. BBC. May 2018. URL: http://bbc.github.io/subtitle-guidelines/
[css-animations-1]
CSS Animations Level 1. Dean Jackson; David Baron; Tab Atkins Jr.; Brian Birtles. W3C. 11 October 2018. W3C Working Draft. URL: https://www.w3.org/TR/css-animations-1/
[css-transitions-1]
CSS Transitions. David Baron; Dean Jackson; Brian Birtles; David Hyatt. W3C. 11 October 2018. W3C Working Draft. URL: https://www.w3.org/TR/css-transitions-1/
[DASH-EVENTING]
DASH Eventing and HTML5. Giridhar Mandyam.February 2018. URL: https://www.w3.org/2011/webtv/wiki/images/a/a5/DASH_Eventing_and_HTML5.pdf
[DVB-DASH]
DVB Document A168. Digital Video Broadcasting (DVB); MPEG-DASH Profile for Transport of ISO BMFF Based DVB Services over IP Based Networks. DVB. November 2017. URL: https://www.dvb.org/resources/public/standards/a168_dvb_mpeg-dash_nov_2017.pdf
[EBU-TT-D]
EBU TECH 3380: "EBU-TT-D Subtitling Distribution Format". European Broadcasting Union. URL: https://tech.ebu.ch/docs/tech/tech3380.pdf
[HBBTV]
HbbTV 2.0.2 Specification. HbbTV Association. 16 February 2018. URL: https://www.hbbtv.org/wp-content/uploads/2018/02/HbbTV_v202_specification_2018_02_16.pdf
[HBBTV-TESTS]
HbbTV Test Suite 2018-1. HbbTV Association. 2018. URL: https://www.hbbtv.org/wp-content/uploads/2018/03/HbbTV-testcases-2018-1.pdf
[HTML]
HTML Standard. Anne van Kesteren; Domenic Denicola; Ian Hickson; Philip Jägenstedt; Simon Pieters. WHATWG. Living Standard. URL: https://html.spec.whatwg.org/multipage/
[HTML52]
HTML 5.2. Steve Faulkner; Arron Eicholz; Travis Leithead; Alex Danilo; Sangwhan Moon. W3C. 14 December 2017. W3C Recommendation. URL: https://www.w3.org/TR/html52/
[HTML53-20180426]
HTML 5.3. Patricia Aas; Shwetank Dixit; Terence Eden; Bruce Lawson; Sangwhan Moon; Xiaoqian Wu; Scott O'Hara. W3C. 26 April 2018. W3C Working Draft. URL: https://www.w3.org/TR/2018/WD-html53-20180426/
[INBANDTRACKS]
Sourcing In-band Media Resource Tracks from Media Containers into HTML. Silvia Pfeiffer; Bob Lund. W3C. 26 April 2015. Unofficial Draft. URL: https://dev.w3.org/html5/html-sourcing-inband-tracks/
[MPEGDASH]
ISO/IEC 23009-1:2014 Information technology -- Dynamic adaptive streaming over HTTP (DASH) -- Part 1: Media presentation description and segment formats. ISO/IEC. URL: http://standards.iso.org/ittf/PubliclyAvailableStandards/c065274_ISO_IEC_23009-1_2014.zip
[MSE-BYTE-STREAM-FORMAT-ISOBMFF]
ISO BMFF Byte Stream Format. Matthew Wolenetz; Jerry Smith; Mark Watson; Aaron Colwell; Adrian Bateman. W3C. 4 October 2016. W3C Note. URL: https://www.w3.org/TR/mse-byte-stream-format-isobmff/
[SCTE-35]
Digital Program Insertion Cueing Message for Cable. The Society of Cable and Television Engineers. 2016. URL: https://www.scte.org/SCTEDocs/Standards/SCTE%2035%202016.pdf
[web-animations-1]
Web Animations. Brian Birtles; Robert Flack; Stephen McGruer; Antoine Quint; Shane Stephens; Alex Danilo; Tab Atkins Jr.. W3C. 11 October 2018. W3C Working Draft. URL: https://www.w3.org/TR/web-animations-1/
[WEB-ISOBMFF]
ISO/IEC JTC1/SC29/WG11 N16944 Working Draft on Carriage of Web Resources in ISOBMFF. Thomas Stockhammer; Cyril Concolato. MPEG. July 2017. URL: https://mpeg.chiariglione.org/standards/mpeg-4/timed-text-and-other-visual-overlays-iso-base-media-file-format/wd-carriage-web
[WEB-MEDIA-GUIDELINES]
Web Media Application Developer Guidelines 2018. Joel Korpi; Thasso Griebel; Jeff Burtoft. W3C. 26 April 2018. CG-DRAFT. URL: https://w3c.github.io/webmediaguidelines/
[WebVMT]
WebVMT: The Web Video Map Tracks Format. Rob Smith. W3C. 11 October 2018. W3C Editor's Draft. URL: https://w3c.github.io/sdw/proposals/geotagging/webvmt/
[WEBVTT]
WebVTT: The Web Video Text Tracks Format. Simon Pieters; Silvia Pfeiffer; Phillip Jägenstedt; Ian Hickson. W3C. 10 May 2018. W3C Candidate Recommendation. URL: https://www.w3.org/TR/webvtt1/