Media Timed Events

W3C Interest Group Note 28 January 13 February 2019

This version:: ~~https://www.w3.org/TR/2019/NOTE-media-timed-events-20190128/~~ https://www.w3.org/TR/2019/NOTE-media-timed-events-20190213/
Latest published version:: https://www.w3.org/TR/media-timed-events/
Latest editor's draft:: https://w3c.github.io/me-media-timed-events/
Editors:: Chris Needham ( British Broadcasting Corporation ); Giridhar Mandyam ( Qualcomm )
Participate:: GitHub w3c/me-media-timed-events; File a bug; Commit history; Pull requests

Abstract

This document collects use cases and requirements for improved support for timed events related to audio or video media on the web, where synchronization to a playing audio or video media stream is needed, and makes recommendations for new or changed web APIs to realize these requirements. The goal is to extend the existing support in HTML for text track cue events to add support for dynamic content replacement cues and generic metadata events that drive synchronized interactive media ~~experiences.~~ experiences, and improve synchronization timing accuracy.

Status of This Document

This is a preview

Do not attempt to implement this version of the specification. Do not reference this version as authoritative in any way. Instead, see https://w3c.github.io/me-media-timed-events/ for the Editor's draft.

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at https://www.w3.org/TR/.

This document was published by the Media & Entertainment Interest Group as an Interest Group Note.

GitHub Issues are preferred for discussion of this specification.

Publication as an Interest Group Note does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

The disclosure obligations of the Participants of this group are described in the charter .

This document is governed by the 1 February 2018 W3C Process Document .

1. Introduction

There is a need in the media industry for an API to support metadata events synchronized to audio or video media, specifically for both out-of-band event streams and in-band discrete events (for example, MPD and emsg events in MPEG-DASH). These media timed events can be used to support use cases such as dynamic content replacement, ad insertion, or presentation of supplemental content alongside the audio or video, or more generally, making changes to a web page, or executing application code triggered from JavaScript events, at specific points on the media timeline of an audio or video media stream.

2. Terminology

The following terms are used in this document:

in-band — timed event information that is delivered within the audio or video media container or multiplexed with the media stream.
out-of-band — timed event information that is delivered over some other mechanism external to the media container or media stream.

The following terms are defined in [ HTML ]:

media timeline

3. Use cases

Media-timed events carry metadata that is related to points in time, or regions of time on the media timeline , which can be used to trigger retrieval and/or rendering of web resources synchronized with media playback. Such resources can be used to enhance user experience in the context of media that is being rendered. Some examples include display of social media feeds corresponding to a live broadcast such as a sporting event, banner advertisements for sponsored content, accessibility-related assets, such as large print rendering of captions, and display of track titles or images alongside an audio stream.

The following sections describe a few use cases in more detail.

3.1 Dynamic content insertion

A media content provider wants to allow insertion of content, such as personalised video, local news, or advertisements, into a video media stream that contains the main program content. To achieve this, media timed events used to describe the points on the media timeline , known as splice points, where switching playback to inserted content is possible.

The Society for Cable and Televison Engineers (SCTE) specification "Digital Program Insertion Cueing for Cable" [ SCTE35 ] defines a data cue format for describing such insertion points. Use of these cues in MPEG-DASH and HLS streams is described in [ SCTE35 ], sections 12.1 and 12.2.

3.2 Audio stream with titles and images

A media content provider wants to provide visual information alongside an audio stream, such as an image of the artist and title of the current playing track, to give users live information about the content they are listening to.

Examples include HLS timed metadata [ HLS-TIMED-METADATA ], which uses in-band ID3 metadata to carry the image content, and RadioVIS in DVB ([ DVB-DASH ], section 9.1.7), which defines in-band event messages that contain image URLs and text messages to be displayed, with information about when the content should be displayed in relation to the media timeline .

3.2 3.3 MPEG-DASH manifest expiry notifications

Section 5.10.4 of [ MPEGDASH ] describes an MPEG-DASH specific event that is used to notify a DASH player web application that it should refresh its copy of the manifest (MPD) document. An in-band emsg event is used an alternative to setting a cache duration in the response to the HTTP request for the manifest, so the client can refresh the MPD when it actually changes, so reducing the load on HTTP servers caused by frequent server requests.

Reference: M&E IG call 1 Feb 2018: Minutes , [ DASH-EVENTING ].

Editor's note

See also this issue against the [ WEB-MEDIA-GUIDELINES ]. TODO: Add detail here.

3.4 Subtitle and caption rendering synchronization

A subtitle or caption author wants ensure that subtitle changes are aligned as closely as possible to shot changes in the video. The BBC Subtitle Guidelines [ BBC-SUBTITLE ] describes authoring best practices. In particular, in section 6.1 authors are advised "it is likely to be less tiring for the viewer if shot changes and subtitle changes occur at the same time. Many subtitles therefore start on the first frame of the shot and end on the last frame."

3.3 3.5 Synchronized map animations

A user records footage with metadata, including geolocation, on a mobile video device, e.g., drone or dashcam, to share on the web alongside a map, e.g., OpenStreetMap.

[ WebVMT ] is an open format for metadata cues, synchronized with a timed media file, that can be used to drive an online map rendered in a separate HTML element alongside the media element on the web page. The media playhead position controls presentation and animation of the map, e.g., pan and zoom, and allows annotations to be added and removed, e.g., markers, at specified times during media playback. Control can also be overridden by the user with the usual interactive features of the map at any time, e.g., zoom. Concrete examples are provided by the tech demos at the WebVMT website.

Reference: M&E IG TF call 17 Sept 2018: Minutes .

3.4 3.6 Media analysis visualization

A video image analysis system processes a media stream to detect and recognize objects shown in the video. This system generates metadata describing the objects, including timestamps that describe the when the objects are visible, together with position information (e.g., bounding boxes). A web application then uses this timed metadata to overlay labels and annotations on the video using HTML and CSS.

3.5 3.7 Presentation of auxiliary content in live media

During a live media presentation, dynamic and unpredictable events may occur which cause temporary suspension of the media presentation. During that suspension interval, auxiliary content such as the presentation of UI controls and media files, may be unavailable. Depending on the specific user engagement (or not) with the UI controls and the time at which any such engagement occurs, specific web resources may be rendered at defined times in a synchronized manner. For example, a multimedia A/V clip along with subtitles corresponding to an advertisement, and which were previously downloaded and cached by the UA, are played out.

This section describes existing media industry specifications and standards that specify carriage of media timed events, or otherwise provide requirements for web APIs related to the triggering of media timed events.

4.1 MPEG Common Media Application Format (CMAF)

MPEG Common Media Application Format (CMAF) [ MPEGCMAF ] is a media container format optimized for large scale delivery of a single encrypted, adaptable multimedia presentation to a wide range of devices and adaptive streaming methods, including HTTP Live Streaming [ RFC8216 ] and MPEG-DASH [ MPEGDASH ]. It is based on the ISO BMFF [ ISOBMFF ] and supports the AVC, AAC, HEVC codecs, Common Encryption (CENC), and subtitles using IMSC1 and WebVTT. The goal is to reduce media storage and delivery costs by using a single common media format across different client devices.

CMAF media may contain in-band events in the form of Event Message ( emsg ) boxes in ISO BMFF files. emsg is specified in [ MPEGDASH ], section 5.10.3.3, and described in more detail in the following section of this document.

4.2 MPEG-DASH

MPEG-DASH is an adaptive bitrate streaming technique in which the audio and video media is partitioned into segments. The Media Presentation Description (MPD) is an XML document that contains metadata required by a DASH client to access the media segments and to provide the streaming service to the user. The media segments can use any codec, typically within a fragmented MP4 (ISO BMFF) container or MPEG-2 transport stream.

In MPEG-DASH, media timed events may be delivered either in-band or out-of-band :

In-band events are emsg boxes in ISO BMFF files. The presence of emsg events in the media container for given event schemes is signalled in the MPD document using an EventStream XML element ([ MPEGDASH ], section 5.10.2).
Out-of-band events are represented by Event XML elements contained within an EventStream element in the MPD.

An emsg event contains the following information, as specified in [ MPEGDASH ], section 5.10.3.3:

scheme_id_uri — A URI that identifies the message scheme
value — The event value (string)
timescale — Timescale units, in ticks per second
presentation_time_delta — Presentation time delta (with respect to the media segment), in timescale units
event_duration — Event duration, in timescale units
id — Event message identifier
message_data — Message body (may be empty)

4.2 4.3 HbbTV

HbbTV is an interactive TV application standard that supports both broadcast (DVB) media delivery, and internet streaming using MPEG-DASH. The HbbTV application environment is based on HTML and JavaScript. MPEG-DASH streaming is implemented nativey by the user agent, rather than through a JavaScript web application using Media Source Extensions.

HbbTV includes support for emsg events ([ DVB-DASH ], section 9.1) and requires this be mapped to HTML5 DataCue ([ HBBTV ], section 9.3.2). The revision of HTML5 referenced by [ HBBTV ] is [ html51-20151008 ]. This feature is included in user agents shipping in connected TVs across Europe from 2017.

The HbbTV device test suite includes test pages and streams that cover emsg support ([ HBBTV-TESTS ]). HbbTV has a reference application and content for DASH+DRM which includes emsg support.

4.3 4.4 DASH Industry Forum APIs for Interactivity

The DASH-IF InterOp Working Group has an ongoing work item, DAInty , "DASH APIs for Interactivity", which aims to specify a set of APIs between the DASH client/player and interactivity-capable applications, for both web and native applications. The origin of this work is a related 3GPP work item on Service Interactivity ([ 3GPP-INTERACTIVITY-WID ], [ 3GPP-INTERACTIVITY-TR ]). The objective is to provide service enablers for user engagement with auxiliary content and UIs on mobile device during live or time-shifted viewing of streaming content delivered over 3GPP broadcast or unicast bearers, and the measurement and reporting of such interactive consumption.

Two APIs are being developed that are relevant to the scope of the present document:

Application subscription/DASH client dispatch of DASH event stream messages containing interactivity information. Events can be delivered in-band ( emsg ) and/or as MPD events.
Application subscription/DASH client dispatch of ISO BMFF Timed Metadata tracks providing similar functionality to DASH event streams.

Two modes for dispatching events are defined. In Mode 1, events are dispatched at the time the event arrives, and in Mode 2, events are dispatched at the given time on the media timeline . The "arrival" of events from the DASH client perspective may be either static or pre-provisioned, in the case MPD Events, or dynamic in the case of inband events carried in the emsg. The application can register with the DASH client which Mode to use.

Reference: M&E IG, Media Timed Events Task Force call 20 Aug 2018: Minutes .

4.4 4.5 BBC Subtitle Guidelines SCTE-35

The ~~BBC Subtitle Guidelines ([~~ Society for Cable and Televison Engineers (SCTE) has produced the SCTE-35 specification "Digital Program Insertion Cueing for Cable" [ ~~BBC-SUBTITLES~~ SCTE35 ~~]) describe best practice~~ ], which defines a data cue format for authoring subtitles or captions. In particular, the guidelines place requirements on the synchronization accuracy of caption rendering. For example, in section 6.1, caption authors are advised "it is likely describing insertion points, to ~~be less tiring for the viewer if shot changes and subtitle changes occur at the same time. Many subtitles therefore start on the first frame of the shot and end on~~ support the ~~last frame."~~ dynamic content insertion use case.

Subtitles for video are typically authored against video at a nominal frame rate, e.g., 25 frames per second, which corresponds to 40 milliseconds per frame. The actual video frame rate may be adjusted dynamically according to the video encoding, but the subtitle timing must remain the same ([ [ ~~EBU-TT-D~~ SCTE214-1 ~~], Annex E). Where captions are rendered by application JavaScript code,~~ ] section 6.7 describes the carriage of SCTE-35 events in ~~response to TextTrackCue events, this places~~ a ~~requirement on user agents for timely delivery of these events, so that application code can respond and render the cues. 4.5 SCTE Media Splicing Requirements~~ MPEG-DASH MPD document, as out-of-band events. [ SCTE214-2 ~~The Society for Cable~~ ] section 9 and ~~Televison Engineers (SCTE) has produced~~ [ SCTE214-3 ] section 7.3 describe the carriage of SCTE-35 ~~specification "Digital Program Insertion Cueing for Cable"~~ events as in-band events in MPEG-DASH using MPEG2-TS and ISO BMFF respectively, using emsg.

[ SCTE35 ~~]. In it,~~ ] section 9.1 describes the requirements for ~~splicing (Sec. 9.1) are~~ content splicing: "In order to give advance warning of the impending splice (a pre-roll function), the splice_insert() command could be sent multiple times before the splice point. For example, the splice_insert() command could be sent at 8, 5, 4 and 2 seconds prior to the packet containing the related splice point. In order to meet other splicing deadlines in the system, any message received with less than 4 seconds of advance notice may not create the desired result."

This places an implicit requirement on the user agent in handling media-timed events related to insertion cues. The content originator may provide the cue in advance with as little as 2 seconds of the insertion time. Therefore the propagation of the event data associated with the insertion cue to the application by the user agent should be considerably less than 2 seconds.

4.6 MPEG Working Draft on Carriage of Web Resources in ISO BMFF

The MPEG Working Draft on Carriage of Web Resources in ISO BMFF [ WEB-ISOBMFF ] is a draft document that specifies the use of the ISO BMFF container format for the storage and delivery of web content. The goal is to allow web resources (HTML, JavaScript, etc.) to be parsed from the storage and processed by a user agent at specific presentation times on the media timeline , and so be synchronized with other tracks within the container, such as audio, video, and subtitles.

The Media & Entertainment Interest Group is actively tracking this work is open to discussing specific requirements for media timed events as development progresses.

4.7 WebVTT

[ WEBVTT ] is a W3C specification that provides a format for web video text tracks. A VTTCue is a text track cue, and may have attributes that affect rendering of the cue text on a web page. WebVTT metadata cues are text that is aligned to the media timeline . Web applications can use VTTCue to schedule out-of-band metadata events by serializing the event data to a string format (JSON, for example) when creating the cue, and deserializing the data when the cue is triggered.

Web applications can also use VTTCue to trigger rendering of out-of-band delivered timed text cues, such as TTML or IMSC format captions.

5. Gap analysis

This section describes gaps in existing existing web platform capabilities needed to support the use cases and requirements described in this document. Where applicable, this section also describes how existing web platform features can be used as workarounds, and any associated limitations.

~~5.1 Synchronized event triggering~~

5.1.1 5.1 MPEG-DASH and ISO BMFF emsg events

The DataCue API has been previously discussed as a means to deliver in-band event data to web applications, but this is not implemented in all of the main browser engines. It is included in the 18 October 2018 HTML 5.3 draft [ HTML53-20181018 ], but is not included in [ HTML ]. See discussion here and notes on implementation status here .

WebKit supports a DataCue interface that extends HTML5 DataCue with two attributes to support non-text metadata, type and value.

Example 1

interface DataCue : TextTrackCue {
  attribute ArrayBuffer data; // Always empty
  // Proposed extensions.
  attribute any value;
  readonly attribute DOMString type;
};

type is a string identifying the type of metadata:

WebKit `DataCue` metadata types
`"com.apple.quicktime.udta"`	QuickTime User Data
`"com.apple.quicktime.mdta"`	QuickTime Metadata
`"com.apple.itunes"`	iTunes metadata
`"org.mp4ra"`	MPEG-4 metadata
`"org.id3"`	ID3 metadata

and value is an object with the metadata item key, data, and optionally a locale:

Example 2

value = {
  key: String
  data: String | Number | Array | ArrayBuffer | Object
  locale: String
}

Neither [ MSE-BYTE-STREAM-FORMAT-ISOBMFF ] nor [ INBANDTRACKS ] describe handling of emsg boxes.

On resource constrained devices such as smart TVs and streaming sticks, parsing media segments to extract event information leads to a significant performance penalty, which can have an impact on UI rendering updates if this is done on the UI thread. There can also be an impact on the battery life of mobile devices. Given that the media segments will be parsed anyway by the user agent, parsing in JavaScript is an expensive overhead that could be avoided.

[ HBBTV ] section 9.3.2 describes a mapping between the emsg fields described above and the TextTrack and DataCue APIs. A TextTrack instance is created for each event stream signalled in the MPD document (as identified by the schemeIdUri and value ), and the inBandMetadataTrackDispatchType TextTrack attribute contains the scheme_id_uri and value values. Because HbbTV devices include a native DASH client, parsing of the MPD document and creation of the TextTrack s is done by the user agent, rather than by application JavaScript code.

Editor's note 5.2 Synchronization of text track cue rendering To support DASH clients implemented in web applications, there is therefore either a need

Subtitles for ~~an API that allows applications to tell the UA~~ video are typically authored against video at a nominal frame rate, e.g., 25 frames per second, which ~~schemes it wants~~ corresponds to ~~receive, or the UA should simply expose all event streams~~ 40 milliseconds per frame. The actual video frame rate may be adjusted dynamically according to ~~applications. Which of these is preferred? 5.1.2 Synchronization and~~ the video encoding, but the subtitle timing must remain the same ([ EBU-TT-D ], Annex E).

This places a requirement on user agents for timely delivery of TextTrackCue and VTTCue events, so that application code can respond and render the cues. For subtitle rendering to be possible with frame accuracy, we recommend that cue events are fired within 20 milliseconds of their position on the media timeline .

The ~~timing guarantees provided~~ time marches on steps in [ HTML ] ~~regarding~~ control the ~~triggering~~ firing of cue events during media playback. Time marches on requires a TextTrackCue timeupdate ~~events may~~ event to be ~~not~~ fired at the HTMLMediaElement between 15 and 250 milliseconds since the last such event, and this requirement therefore specifies the rate at which time marches on is executed during playback. In practice it has been found that the timing varies between browser implementations.

There are two methods a web application can use to handle text track cues:

Add an oncuechange handler function to the TextTrack and inspect the track's activeCues list. Because activeCues contains the list of cues that are active at the time that time marches on is run, it is possible for cues to be ~~enough~~ missed by a web application using this method, where cues appear on the media timeline between successive executions of time marches on during media playback. This may occur if the cues have short duration, or by a long-running event handler function.
Add onenter and onexit handler functions to ~~avoid~~ each cue. The time marches on steps guarantee that enter and exit events ~~being missed .~~ will be fired for all cues, including those that appear on the media timeline between successive executions of time marches on during media playback. This method is only possible for cues created by the web application, i.e., VTTCue objects, and not cue objects created by the user agent.

An issue with handling of text track and data cue events in HbbTV was reported in 2013. HbbTV requires the user agent to implement an MPEG-DASH client, and so applications must use the first of the above methods for cue handling, which means that applications can miss cues as described above.

5.2 5.3 Synchronized rendering of web resources

Editor's note

Describe gaps relating to synchronized rendering of web resources. Can we define a generic web API for scheduling page changes synchronized to playing media? Related: [ css-animations-1 ], [ web-animations-1 ], [ css-transitions-1 ]. See also: https://github.com/bbc/VideoContext . Should this be in scope for the TF?

5.3 5.4 Rendering of web content embedded in media containers

There is no API for surfacing web content embedded in ISO BMFF containers into the browser (e.g., the HTMLCue proposal discussed at TPAC 2015 ).

Editor's note

Add more detail on what's required. Some questions / considerations:

Editor's note

Are the web resources intended to be handed to a web application for rendering, or direct rendering by the UA?
How do we guarantee that resources are delivered to the browser sufficiently ahead of time?
How does same-origin policy affect such resources?

6. Recommendations

This section describes recommendations from the Media & Entertainment Interest Group for the development of a generic media timed event ~~API.~~ API, and associated synchronization considerations.

6.1 Subscribing to event streams

The API should allow web applications to subscribe to receive specific event types. For example, to support DASH emsg and MPD events, the API should allow subscription by id and (optional) value. This is to make receiving events opt-in from the application point of view. The user agent should deliver only those events to a web application for which the application has subscribed. The API should also allow web applications to unsubscribe from specific event streams by event type.

6.2 Out-of-band events

To be able to handle out of band events, the API must allow web applications to create events to be added to the media timeline , to be triggered by the user agent. The API should allow the web application to provide all necessary parameters to define the event, including start and end times, event type, and data payload. The payload should be any data type (e.g., the set of types supported by the WebKit DataCue ). For DASH MPD events, the event type is defined by the id and (optional) value fields.

6.3 Event triggering

For those events that the application has subscribed to receive, the API ~~must:~~ should:

Generate a JavaScript event when an in-band media timed event is parsed from the media container or media stream (DAInty Mode 1) 1).
Generate a JavaScript ~~event~~ events when the current media playback position reaches the start time and the end time of a media timed event during playback (DAInty Mode 2). This applies equally to in-band events that the user agent has extracted from the media container, and out-of-band events added by the web application.

The API must provide guarantees that no events can be missed during linear playback of the media.

6.4 In-band event processing

We recommend updating [ INBANDTRACKS ] to describe handling of in-band media timed events supported on the web platform, following a registry approach with one specification per media format that describes the event details for that format.

6.5 MPEG-DASH events

We recommend that browser engines support MPEG-DASH emsg in-band events and MPD out-of-band events, as part of their support for the MPEG Common Media Application Format (CMAF) [ MPEGCMAF ].

6.6 Synchronization

~~The~~ In order to acheive greater synchronization accuracy between media playback and web content rendered by an application, the time marches on ~~algorithm~~ steps in [ HTML ] should be ~~reviewed and updated to ensure that events are delivered~~ modified to ~~the web application within~~ allow delivery of media timed event start time ~~constraints described elsewhere in this report.~~ and end time notifications within 20 milliseconds of their positions on the media timeline .

7. Acknowledgments

Thanks to Charles Lo, Nigel Megitt, Jon Piesing, and Rob Smith for their contributions to this document.

A. References

A.1 Normative references

[3GPP-INTERACTIVITY-TR]: TR 26.953: Interactivity Support for 3GPP-Based Streaming and Download Services (Release 15) . 3GPP. June 2018. URL: http://www.3gpp.org/ftp/Specs/archive/26_series/26.953/26953-f00.zip
[3GPP-INTERACTIVITY-WID]: SP-170796: New WID on 3GPP Service Interactivity . 3GPP. September 2017. URL: http://www.3gpp.org/ftp/tsg_sa/TSG_SA/TSGS_77/Docs/SP-170796.zip ~~[BBC-SUBTITLES]~~
[BBC-SUBTITLE]: Subtitle ~~Guidelines~~ Guidelines, Version 1.1.7 . BBC. May 2018. URL: ~~http://bbc.github.io/subtitle-guidelines/~~ https://bbc.github.io/subtitle-guidelines/
[DASH-EVENTING]: DASH Eventing and HTML5 . Giridhar Mandyam.February 2018. URL: https://www.w3.org/2011/webtv/wiki/images/a/a5/DASH_Eventing_and_HTML5.pdf
[DVB-DASH]: ~~DVB Document A168.~~ ETSI TS 103 285 Digital Video Broadcasting (DVB); MPEG-DASH Profile for Transport of ISO BMFF Based DVB Services over IP Based Networks . ~~DVB. November 2017.~~ European Telecommunications Standards Insitute. March 2018. URL: ~~https://www.dvb.org/resources/public/standards/a168_dvb_mpeg-dash_nov_2017.pdf~~ http://www.etsi.org/deliver/etsi_ts/103200_103299/103285/01.02.01_60/ts_103285v010201p.pdf
[EBU-TT-D]: EBU TECH 3380: "EBU-TT-D Subtitling Distribution Format" . European Broadcasting Union. URL: https://tech.ebu.ch/docs/tech/tech3380.pdf
[HBBTV]: HbbTV 2.0.2 Specification . HbbTV Association. 16 February 2018. URL: https://www.hbbtv.org/wp-content/uploads/2018/02/HbbTV_v202_specification_2018_02_16.pdf
[HBBTV-TESTS]: HbbTV Test Suite 2018-1 . HbbTV Association. 2018. URL: https://www.hbbtv.org/wp-content/uploads/2018/03/HbbTV-testcases-2018-1.pdf
[HLS-TIMED-METADATA]: Timed Metadata for HTTP Live Streaming . URL: https://developer.apple.com/library/archive/documentation/AudioVideo/Conceptual/HTTP_Live_Streaming_Metadata_Spec/Introduction/Introduction.html
[HTML]: HTML Standard . Anne van Kesteren; Domenic Denicola; Ian Hickson; Philip Jägenstedt; Simon Pieters. WHATWG. Living Standard. URL: https://html.spec.whatwg.org/multipage/
[html51-20151008]: HTML 5.1 . Simon Pieters; Anne van Kesteren; Philip Jägenstedt; Domenic Denicola; Ian Hickson; Steve Faulkner; Travis Leithead; Erika Doyle Navara; Theresa O'Connor; Robin Berjon. W3C. 8 October 2015. W3C Working Draft. URL: https://www.w3.org/TR/2015/WD-html51-20151008/
[HTML53-20181018]: HTML 5.3 . Patricia Aas; Shwetank Dixit; Terence Eden; Bruce Lawson; Sangwhan Moon; Xiaoqian Wu; Scott O'Hara. W3C. 18 October 2018. W3C Working Draft. URL: https://www.w3.org/TR/2018/WD-html53-20181018/
[INBANDTRACKS]: Sourcing In-band Media Resource Tracks from Media Containers into HTML . Silvia Pfeiffer; Bob Lund. W3C. 26 April 2015. Unofficial Draft. URL: https://dev.w3.org/html5/html-sourcing-inband-tracks/
[ISOBMFF]: Information technology — Coding of audio-visual objects — Part 12: ISO Base Media File Format . ISO/IEC. December 2015. International Standard. URL: http://standards.iso.org/ittf/PubliclyAvailableStandards/c068960_ISO_IEC_14496-12_2015.zip
[MPEGCMAF]: ISO/IEC 23000-19:2018 Information technology -- Multimedia application format (MPEG-A) -- Part 19: Common media application format (CMAF) for segmented media . ISO/IEC. January 2018. URL: https://www.iso.org/standard/71975.html
[MPEGDASH]: ISO/IEC 23009-1:2014 Information technology -- Dynamic adaptive streaming over HTTP (DASH) -- Part 1: Media presentation description and segment formats . ISO/IEC. URL: http://standards.iso.org/ittf/PubliclyAvailableStandards/c065274_ISO_IEC_23009-1_2014.zip
[MSE-BYTE-STREAM-FORMAT-ISOBMFF]: ISO BMFF Byte Stream Format . Matthew Wolenetz; Jerry Smith; Mark Watson; Aaron Colwell; Adrian Bateman. W3C. 4 October 2016. W3C Note. URL: https://www.w3.org/TR/mse-byte-stream-format-isobmff/
[RFC8216]: HTTP Live Streaming . R. Pantos, Ed.; W. May. IETF. August 2017. Informational. URL: https://tools.ietf.org/html/rfc8216
[SCTE214-1]: MPEG DASH for IP-Based Cable Services, Part 1: MPD Constraints and Extensions . ANSI/SCTE. URL: https://www.scte.org/SCTEDocs/Standards/ANSI_SCTE%20214-1%202015.pdf
[SCTE214-2]: MPEG DASH for IP-Based Cable Services, Part 2: DASH/TS Profile . ANSI/SCTE. URL: https://www.scte.org/SCTEDocs/Standards/ANSI_SCTE%20214-2%202015.pdf
[SCTE214-3]: MPEG DASH for IP-Based Cable Services, Part 3: DASH/FF Profile . ANSI/SCTE. URL: https://www.scte.org/SCTEDocs/Standards/ANSI_SCTE%20214-3%202015.pdf
[SCTE35]: Digital Program Insertion Cueing Message for Cable . ANSI/SCTE. URL: https://www.scte.org/SCTEDocs/Standards/SCTE%2035%202016.pdf
[WEB-ISOBMFF]: ISO/IEC JTC1/SC29/WG11 N16944 Working Draft on Carriage of Web Resources in ISOBMFF . Thomas Stockhammer; Cyril Concolato. MPEG. July 2017. URL: https://mpeg.chiariglione.org/standards/mpeg-4/timed-text-and-other-visual-overlays-iso-base-media-file-format/wd-carriage-web
[WebVMT]: WebVMT: The Web Video Map Tracks Format . Rob Smith. W3C. 11 October 2018. W3C Editor's Draft. URL: https://w3c.github.io/sdw/proposals/geotagging/webvmt/
[WEBVTT]: WebVTT: The Web Video Text Tracks Format . Silvia Pfeiffer. W3C. 10 May 2018. W3C Candidate Recommendation. URL: https://www.w3.org/TR/webvtt1/

A.2 Informative references

[css-animations-1]: CSS Animations Level 1 . Dean Jackson; David Baron; Tab Atkins Jr.; Brian Birtles. W3C. 11 October 2018. W3C Working Draft. URL: https://www.w3.org/TR/css-animations-1/
[css-transitions-1]: CSS Transitions . David Baron; Dean Jackson; Brian Birtles; David Hyatt. W3C. 11 October 2018. W3C Working Draft. URL: https://www.w3.org/TR/css-transitions-1/
[web-animations-1]: Web Animations . Brian Birtles; Robert Flack; Stephen McGruer; Antoine Quint; Shane Stephens; Alex Danilo; Tab Atkins Jr.. W3C. 11 October 2018. W3C Working Draft. URL: https://www.w3.org/TR/web-animations-1/
[WEB-MEDIA-GUIDELINES]: Web Media Application Developer Guidelines 2018 . Joel Korpi; Thasso Griebel; Jeff Burtoft. W3C. 26 April 2018. CG-DRAFT. URL: https://w3c.github.io/webmediaguidelines/

Media Timed Events

W3C Interest Group Note 28 January 13 February 2019

Abstract

Status of This Document

1. Introduction

2. Terminology

3. Use cases

3.1 Dynamic content insertion

3.2 Audio stream with titles and images

3.2 3.3 MPEG-DASH manifest expiry notifications

3.4 Subtitle and caption rendering synchronization

3.3 3.5 Synchronized map animations

3.4 3.6 Media analysis visualization

3.5 3.7 Presentation of auxiliary content in live media

4. Related industry standards specifications

4.1 MPEG Common Media Application Format (CMAF)

4.2 MPEG-DASH

4.2 4.3 HbbTV

4.3 4.4 DASH Industry Forum APIs for Interactivity

4.4 4.5 BBC Subtitle Guidelines SCTE-35

4.6 MPEG Working Draft on Carriage of Web Resources in ISO BMFF

4.7 WebVTT

5. Gap analysis

5.1.1 5.1 MPEG-DASH and ISO BMFF emsg events

Editor's note 5.2 Synchronization of text track cue rendering To support DASH clients implemented in web applications, there is therefore either a need

5.2 5.3 Synchronized rendering of web resources

5.3 5.4 Rendering of web content embedded in media containers

6. Recommendations

6.1 Subscribing to event streams

6.2 Out-of-band events

6.3 Event triggering

6.4 In-band event processing

6.5 MPEG-DASH events

6.6 Synchronization

7. Acknowledgments

A. References

A.1 Normative references

A.2 Informative references