Carriage of ID3 Timed Metadata in the Common Media Application Format (CMAF)

v1.0.0

AOM Final Deliverable,

This version:
https://AOMediaCodec.github.io/id3-emsg
Issue Tracking:
GitHub
Editors:
Not Ready For Implementation

This spec is not yet ready for implementation. It exists in this repository to record the ideas and promote discussion.

Before attempting to implement this spec, please contact the editors.

Copyright 2020, AOM

Licensing information is available at http://aomedia.org/license/

The MATERIALS ARE PROVIDED “AS IS.” The Alliance for Open Media, its members, and its contributors expressly disclaim any warranties (express, implied, or otherwise), including implied warranties of merchantability, non-infringement, fitness for a particular purpose, or title, related to the materials. The entire risk as to implementing or otherwise using the materials is assumed by the implementer and user. IN NO EVENT WILL THE ALLIANCE FOR OPEN MEDIA, ITS MEMBERS, OR CONTRIBUTORS BE LIABLE TO ANY OTHER PARTY FOR LOST PROFITS OR ANY FORM OF INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES OF ANY CHARACTER FROM ANY CAUSES OF ACTION OF ANY KIND WITH RESPECT TO THIS DELIVERABLE OR ITS GOVERNING AGREEMENT, WHETHER BASED ON BREACH OF CONTRACT, TORT (INCLUDING NEGLIGENCE), OR OTHERWISE, AND WHETHER OR NOT THE OTHER MEMBER HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.


Abstract

This specification defines how ID3 metadata can be carried as timed metadata in Common Media Application Format (CMAF) compatible fragmented MP4 streams using Event Message ('emsg') and Event Message Instance ('emib') boxes.

1. Introduction

HTTP Live Streaming (HLS) [HLS] supports the inclusion of timed metadata in ID3 format [ID3] in various container formats, as described in [TM-HLS] .

A large ecosystem has built up around carrying timed ID3 metadata in HLS for applications such as ad delivery & audience measurement. There are many benefits to adopting CMAF for HLS media delivery, but without a specification for carrying ID3 as sparse timed metadata in CMAF, deployment by companies in this ecosystem is blocked.

This specification describes how such ID3 metadata can be carried as timed metadata in a CMAF-compatible fragmented MP4 (fMP4) stream [CMAF] as used by the HLS protocol.

CMAF-compatible fragmented MP4 can also be used in DASH. The elements defined in this specification may also be used with DASH.

The specification also describes how ID3 metadata can be carried in a CMAF-compatible event message track as described in [EMSG-TRACK] .

2. Timed Metadata in a CMAF-compatible stream

2.1. Overview

Timed Metadata in a CMAF-compatible stream is can be signaled via one or more Event Message boxes ( emsg ) [CMAF] per segment. Timed Metadata can also be signaled via an Event Message Instance Box ( emib ) as defined in [EMSG-TRACK] .

Event messages with the scheme specified in this document will identify boxes that carry ID3v2 metadata [ID3] .

2.2. ID3 Metadata in an Event Message Box

2.2.1. Introduction

One or more Event Message boxes ( emsg ) [CMAF] can be included per segment. Version 1 of the Event Message box [DASH] must be used.

2.2.2. Syntax

For convenience, the follow following box definition is reproduced from [DASH] , section 5.10.3.3.3.

aligned(8) class DASHEventMessageBox extends FullBox('emsg', version, flags = 0)
{
  if (version==0) {
    string              scheme_id_uri;
    string              value;
    unsigned int(32)    timescale;
    unsigned int(32)    presentation_time_delta;
    unsigned int(32)    event_duration;
    unsigned int(32)    id;
  } else if (version==1) {
    unsigned int(32)    timescale;
    unsigned int(64)    presentation_time;
    unsigned int(32)    event_duration;
    unsigned int(32)    id;
    string              scheme_id_uri;
    string              value;
  }
  unsigned int(8) message_data[];
}

2.2.3. Semantics

scheme_id_uri MUST be set to https://aomedia.org/emsg/ID3 to identify ID3v2 metadata [ID3] .

value may either be an absolute or relative user-specified URI which defines the semantics of the id field. Any relative URI is considered to be relative to the scheme_id_uri .

message_data MUST contain complete ID3 version 2.4 data [ID3] .

In general, ID3 don’t carry a duration and in those cases the event_duration field should be set to 0xFFFFFFFF . If in a particular case, the ID3 message carries a duration, it should be reflected in the event_duration field.

The presentation_time must be within the time interval of the fragment.

The id field is not restricted in this version of the specification.

2.3. ID3 Metadata in an Event Message Instance Box

2.3.1. Introduction

In an event message track, one or more Event Message Instance Boxes ( emib ) can be included per segment.

2.3.2. Syntax

For convenience, the following box definition is reproduced from [EMSG-TRACK] , section 6.1.2.

aligned(8) class EventMessageInstanceBox extends FullBox('emib', version, flags = 0) {  unsigned int(32)    reserved = 0;  signed int(64)      presentation_time_delta;  unsigned int(32)    event_duration;  unsigned int(32)    id;  string              scheme_id_uri;  string              value;  unsigned int(8)     message_data[];}

2.3.3. Semantics

The id , scheme_id_uri , value , and message_data semantics are identical to the Event Message Box semantics.

The presentation_time_delta semantics defined in [EMSG-TRACK] remain unchanged.

In general, ID3 don’t carry a duration and in those cases the event_duration field should be set to 0 . If in a particular case, the ID3 message carries a duration, it should be reflected in the event_duration field.

2.4. Signaling

Files compliant to this specification should signal it using the brand aid3 as part of the list compatible brands in the file type box. Manifest formats using files compliant to this specification may signal these files using the following URN: urn:aomedia:cmaf:id3 .

Conformance

Conformance requirements are expressed with a combination of descriptive assertions and RFC 2119 terminology. The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in the normative parts of this document are to be interpreted as described in RFC 2119. However, for readability, these words do not appear in all uppercase letters in this specification.

All of the text of this specification is normative except sections explicitly marked as non-normative, examples, and notes. [RFC2119]

Examples in this specification are introduced with the words “for example” or are set apart from the normative text with class="example" , like this:

This is an example of an informative example.

Informative notes begin with the word “Note” and are set apart from the normative text with class="note" , like this:

Note, this is an informative note.

Index

Terms defined by this specification

Terms defined by reference

References

Normative References

[CMAF]
Information technology — Multimedia application format (MPEG-A) — Part 19: Common media application format (CMAF) for segmented media . Standard. URL: http://www.iso.org/iso/catalogue_detail?csnumber=71975
[DASH]
Information technology — Dynamic adaptive streaming over HTTP (DASH) — Part 1: Media presentation description and segment formats . Standard. URL: https://www.iso.org/standard/65274.html
[EMSG-TRACK]
Information technology - MPEG systems technologies - Part 18: Event message track format for the ISO base media file format . Standard. URL: https://www.iso.org/standard/82529.html
[RFC2119]
S. Bradner. Key words for use in RFCs to Indicate Requirement Levels . March 1997. Best Current Practice. URL: https://datatracker.ietf.org/doc/html/rfc2119

Informative References

[HLS]
HTTP Live Streaming . Standard. URL: https://tools.ietf.org/html/rfc8216
[ID3]
The ID3 audio file data tagging format . Standard. URL: http://www.id3.org/Developer_Information
[TM-HLS]
Timed Metadata for HTTP Live Streaming . Documentation. URL: https://developer.apple.com/library/archive/documentation/AudioVideo/Conceptual/HTTP_Live_Streaming_Metadata_Spec/Introduction/Introduction.html