Dubbing and Audio description Profiles of TTML2

This section is non-normative.

Creating a dub is a complex, multi-step process that involves:

Transcribing and timing the dialogue in the original language from a completed programme to create a source transcription text
Notating dialogue with character information and other annotations
Generating localization notes to guide further adaptation
Translating the dialogue to a target language
Adapting the translation to the dubbing and subtitling specifications; ex. matching the actor’s lip movements in the case of dubs and considering reading speeds and shot changes.

The result of creating a dubbing script can be useful as a starting point for creation of subtitles or closed captions in alternate languages. This specification is designed to facilitate the addition of, and conversion to, subtitle and caption documents in other profiles of TTML, such as [ttml-imsc1.2], for example by permitting subtitle styling syntax to be carried in DAPT documents. Alternatively, styling can be applied to assist voice artists when recording scripted dialogue or audio descriptions.

Creating audio description content is also a complex process with similar steps. An audio description, also known as video description, is an audio service to assist viewers who can not fully see a visual presentation to understand the content. It is the result of the audio rendition of one or more descriptions mixed with the audio associated with the programme prior to any mixing with audio description (sometimes referred to as main programme audio), at moments when this does not clash with dialogue, to deliver an audio description mixed audio track. A description is a set of words that describe an aspect of the programme presentation, suitable for rendering into audio by means of vocalisation and recording or used as a text alternative source for text to speech translation, as defined in [WCAG20]. More information about what audio description is and how it works can be found at [WHP051].

Issue 28: [ED Issue] Improve introduction to describe audio description steps ...

Was Issue 1:

Improve introduction to describe audio description steps and how it relates to dubbing. Maybe reuse the diagrams from the requirements document, or link the requirements doc.

This specification defines DAPT, a TTML-based format for the exchange of timed text content among authoring and prompting tools in the localization and audio description pipeline. A DAPT file is designed to carry pertinent information for dubbing or audio description such as type of script, dialogue, descriptions, timing, metadata, original language text, transcribed text, language information, and to be extensible for future annotations. This specification first defines the data model (see 4. DAPT Data Model and corresponding TTML syntax) for DAPT scripts and then its representation as a TTML document with restrictions (see #ttml-format).

The top level structure of a document is as follows. The xmlns attribute and the tt element indicate that this is a TTML document and the contentProfiles attribute indicates that it adheres to the DAPT profile defined in this specification. The daptm:scriptType attribute indicates the type of script but in this empty example, it is not relevant, as only the structure of the document is shown. The structure is applicable to all types of scripts, dubbing or audio description.

Example 1

<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>Error</title>
</head>
<body>
<pre>Cannot GET /uploads/9wY0pR/examples/intro-top-level.xml</pre>
</body>
</html>

The following examples correspond to the timed text scripts produced at each stage of the workflow described in [DAPT-REQS].

The first example shows a script where timed opportunities for descriptions or transcriptions have been identified but no text has been written:

Example 2

<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>Error</title>
</head>
<body>
<pre>Cannot GET /uploads/9wY0pR/examples/intro-times-only.xml</pre>
</body>
</html>

The following examples will demonstrate different uses in dubbing and audio description workflows.

When descriptions are added this becomes a Pre Recording Script:

Example 3

<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>Error</title>
</head>
<body>
<pre>Cannot GET /uploads/9wY0pR/examples/intro-times-and-text.xml</pre>
</body>
</html>

After creating audio recordings, if not using text to speech, instructions for playback mixing can be inserted. For example, The gain of "received" audio can be changed before mixing in the audio played from inside the span, smoothly animating the value on the way in and returning it on the way out:

Example 4

<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>Error</title>
</head>
<body>
<pre>Cannot GET /uploads/9wY0pR/examples/intro-script-with-gain.xml</pre>
</body>
</html>

In the above example, the div element's begin time becomes the "syncbase" for its child, so the times on the animate and span elements are relative to 25s here. The first animate element drops the gain from 1 to 0.39 over 0.3s, freezing that value after it ends, and the second one raises it back in the final 0.3s of this description. Then the span is timed to begin only after the first audio dip has finished.

If the audio recording is long and just a snippet needs to be played, that can be done using clipBegin and clipEnd. If we just want to play the part of the audio from file from 5s to 8s it would look like:

Example 5

<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>Error</title>
</head>
<body>
<pre>Cannot GET /uploads/9wY0pR/examples/intro-script-with-audio-clipped.xml</pre>
</body>
</html>

Or audio attributes can be added to trigger the text to be spoken:

Example 6

<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>Error</title>
</head>
<body>
<pre>Cannot GET /uploads/9wY0pR/examples/intro-script-with-speak.xml</pre>
</body>
</html>

From the basic structure of Example 1, a transcription of the audio programme produces an original language dubbing script, which can look as follows. No specific style or layout is defined, and here the focus is on the transcription of the dialogs. Characters are identified in the metadata section.

Example 7

<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>Error</title>
</head>
<body>
<pre>Cannot GET /uploads/9wY0pR/examples/intro-original-language.xml</pre>
</body>
</html>

After translating the text, the document is modified. It includes translation text, and in this case the original text is preserved. The main document language is changed to indicate that the focus is on the translated language:

Example 8

<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>Error</title>
</head>
<body>
<pre>Cannot GET /uploads/9wY0pR/examples/intro-original-language-with-dub-language.xml</pre>
</body>
</html>

The process of adaptation, before recording, could adjust the wording and/or add further timing to assist in the recording. The daptm:scriptType attribute is also modified, as in the following example:

Example 9

<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>Error</title>
</head>
<body>
<pre>Cannot GET /uploads/9wY0pR/examples/intro-original-language-with-dub-language-and-adaptation.xml</pre>
</body>
</html>

This section specifies the data model for DAPT and its corresponding TTML syntax. In the model, there are objects which can have properties and be associated with other objects. In the TTML syntax, these objects and properties are expressed as elements and attributes.

Error

Cannot GET /uploads/9wY0pR/figures/class-diagram.svg

Figure 1 Class diagram showing main entities in the DAPT data model.

A DAPT Script corresponds to a document processed within an authoring workflow or processed by a client. It has properties and objects defined in the following sections: Script Type, Primary Language, Script Events and, for Dubbing Scripts, Characters.

A DAPT Script is represented as a TTML document with the structure and constraints also defined in the following sections.

The Script Type property is a mandatory property of a DAPT Script which describes the type of documents used in Dubbing and Audio Description workflows, among the following: Original Language Dialogue List, Translated Dialogue List, Pre-recording Dub Script, As-recorded Dub Script, and Audio Description Script. Original Language Dialogue List, Translated Dialogue List, Pre-recording Dub Script, As-recorded Dub Script are referred to as Dubbing Scripts.

To represent this property, the following TTML attribute MUST be present on the tt element:

daptm:scriptType
  : "DUBBING_ORIGINAL_DIALOGUE_LIST"
  | "DUBBING_TRANSLATED_DIALOGUE_LIST"
  | "DUBBING_PRE_RECORDING"
  | "DUBBING_AS_RECORDED"
  | "AUDIO_DESCRIPTION"

The definitions of the types of documents and the corresponding daptm:scriptType values are:

Original Language Dialogue List:
When the daptm:scriptType value is DUBBING_ORIGINAL_DIALOGUE_LIST, the document is a literal transcription of the dialogue and on-screen text in their original spoken/written language(s).

Script Events in this type of script SHOULD contain Text objects whose Text Language Source is set to Original and SHOULD NOT contain Text objects whose Text Language Source is set to Translation.

Example 10
If a programme contains dialogue in English and Hebrew, the Original Language Dialogue List will contain some Script Events in English and some in Hebrew, all marked as Original Text Language Source. None of the Script Events is marked as Translation.
Translated Dialogue List:
When the daptm:scriptType value is DUBBING_TRANSLATED_DIALOGUE_LIST, the document represents a translation of the Original Language Dialogue List in a common language.

It can be adapted to produce a Pre-Recording Dub Script, and/or used as the basis for producing translation subtitles, or used as the basis for a further translation into the Target Recording Language.

Script Events in this type of script SHOULD contain Text objects whose Text Language Source is set to Translation. They MAY also contain Original language Text.

Note
A Script Event in this type of script has a Translation Text object in the Target Recording Language and can also contain an Original Text object, for context, to assist in the adaption process. It can also have Translation Text objects in other languages, for example used as intermediate ("pivot") translations, or as additional context to assist in the adaptation process.

Example 11
If a programme contains dialogue in English and Hebrew, the French Translated Dialogue List will contain at least the translation in French of all Script Events. It may still retain text content in Hebrew and English to assist further processing.
Pre-recording Dub Script:
When the daptm:scriptType value is DUBBING_PRE_RECORDING, the document represents the result of the adaptation of a Translated Dialogue List for dubbing, e.g. for better lip-sync.

Script Events in this type of script SHOULD contain Translation Text objects in the Target Recording Language and MAY also contain Original Text objects from the Original Language Dialogue List or Translation Text objects in other languages for context, to assist further processing.

Note
The Script Type of a DAPT Script cannot necessarily be detected by inspecting the text content of the document. For example, the adaptation of a Translated Dialogue List into a Pre-recording Dub Script can consist in replacing some words in the text content of a Script Event without altering the rest of the document. In either case, Text objects that are translations have Text Language Source properties set to Translation.
As-recorded Dub Script:
When the daptm:scriptType value is DUBBING_AS_RECORDED, the document represents a transcription of the output of the dubbing workflow.

Script Events in this type of script SHOULD contain Translation Text objects in the Target Recording Language and MAY also contain Original Text objects from the Original Language Dialogue List or Translation Text objects in other languages for for context and quality verification.

Note
Even though the output of the dubbing workflow is in the same language as those Text objects with the matching language, the Text Language Source of those Text objects remains Translation since they differ from the original language of the dialogue.
Audio Description Script:
When the daptm:scriptType value is AUDIO_DESCRIPTION, the document represents text script, times, optional links to audio and mixing instructions for the purpose of producing an audio description audio track.

Script Events in this type of script SHOULD contain Original Text objects and MAY contain Translation Text objects in other languages. If the Target Recording Language is different to the Primary language then Script Events in this type of script SHOULD contain Translation Text objects in the Target Recording Language.

Note
The Text objects in Audio Description Script Events are each expected to have a Text Language Source property of Original if their language is the Primary Language or, when representing in-image text, if they are in the same language as that text. If audio description scripts are translated, their translations would be marked as Translation.

Example 12

<tt daptm:scriptType="DUBBING_DIALOGUE_LIST">
...
</tt>

The Primary Language is a mandatory property of a DAPT Script which represents the default language for the Text content of Script Events. It is represented in TTML with the following structure and constraints:

the xml:lang attribute MUST be present on the tt element and its value MUST NOT be empty.

Note

All text content in a DAPT Script has a specified language. When multiple languages are used, the Primary Language can correspond to the language of the majority of Script Events, to the language being spoken for the longest duration, or to the language arbitrarily chosen by the author.

Note

Text content is marked either as being either in the Original language or as being a Translation independently of the Primary Language, using the Text Language Source property.

Example 13

An Original Language Dialogue List script is prepared for a video containing dialogue in Danish and Swedish. The Primary Language is set to Danish by setting xml:lang="da" on the <tt> element. Script Events that contain Swedish Text override this by setting xml:lang="sv" on the <p> element. Script Events that contain Danish Text can set the xml:lang attribute or omit it, since the inherited language is the Primary Language of the document. In both cases the Script Events' Text objects are <p> elements that represent original language text and therefore set their Text Language Source to Original by setting daptm:langSrc="original".

A DAPT Script MAY contain zero or more Script Event objects, each corresponding to dialogue, on screen text, or descriptions for a given time interval.

A DAPT Script MAY contain zero or more Character objects, each describing a character that can be referenced by a Script Event.

In Dubbing Scripts, it is necessary to identify each character in the programme. This is done with a Character object which has the following properties:

a mandatory Identifier which is a unique identifier used to reference the character from elsewhere in the document, for example to indicate when a Character participates in a Script Event, or to link a Character Style to its Character.
a mandatory Name which is the name of the Character in the programme
an optional Talent Name, which is the name of the actor speaking dialogue for this Character
zero or more Character Style objects which can be applied to control the visual appearance of Script Events spoken by the Character, for example during recording by an actor or when transforming the script into subtitles.

A Character is represented in TTML with the following structure and constraints:

There MUST be a ttm:agent element, child of the metadata element that indicates the Script Type in the head element, with the following constraints:
- The type attribute MUST be set to character.
- The xml:id attribute MUST be present on the ttm:agent and set to the Character Identifier.
- The ttm:agent MUST contain a ttm:name element with its type attribute set to alias and its content set to the Character Name.

Example 14

...
<metadata>
  <ttm:agent type="character" xml:id="character_1">
    <ttm:name type="alias">DESK CLERK</ttm:name>
  </ttm:agent>
</metadata>
...

If the Character has a Talent Name property, the following TTML constraints apply:
- Another ttm:agent MUST be added to the metadata element that indicates the Script Type in the head element, with the following:
  - its type attribute MUST be set to person
  - its xml:id attribute MUST be set.
  - it MUST have a ttm:name child element whose type MUST be set to full and its content set to the Talent Name
- the ttm:agent whose type is set to character MUST have a ttm:actor child, with its agent attribute set to the xml:id of the ttm:agent whose type is set to person.
- the ttm:agent whose type is set to character should appear before the ttm:agent whose type is set to person.

Example 15

...
<metadata>
  <ttm:agent type="character" xml:id="character_2">
    <ttm:name type="alias">BOOKER</ttm:name>
    <ttm:actor agent="actor_A"/>
  </ttm:agent>
  <ttm:agent type="person" xml:id="actor_A">
    <ttm:name type="full">Matthias Schoenaerts</ttm:name>
  </ttm:agent>
</metadata>
...

A Character Style is represented in TTML with the following structure and constraints:
- Each Character Style is represented by one or more <style> element children of the <styling> element.
- Each such <style> element is associated with the Character by having a ttm:agent attribute whose value is the xml:id of the <ttm:agent> element representing the Character.
- A Script Event MAY apply Character Styles by including the xml:id of each style in the style attribute of the <div> element that defines that Script Event.
- A Text object MAY apply Character Styles by including the xml:id of each style in the style attribute of the <p> element that defines that Text object.
- A Script Event SHOULD NOT apply a Character Style for a Character that is not associated with that Script Event.
- A Text object SHOULD NOT apply a Character Style for a Character that is not associated with that Text object's Script Event.

Note

Any style attribute defined in [TTML2] or [ttml-imsc1.2] (or other profiles using non-W3C namespaces) can be present on the <style> element.

A <style> element MAY omit the ttm:agent attribute if it is not associated with a Character. Such styles MAY be applied in the same way as any other style, via a reference in the style attribute.

Character Styles are applied to Script Events and Text by using the style attribute to specify the set of applicable styles. Presentation Processors MUST NOT apply character styles to text if they are not specified using the style attribute.

Issue 44: [ED issue] Define DAPT-specific conformant implementation types

We should define our own classes of conformant implementation types, to avoid using the generic "presentation processor" or "transformation processor" ones. We could link to them.
At the moment, I can think of the following classes:

DAPT Authoring Tool: tool that produces compliant DAPT documents or consumes DAPT compliant document. I don't think they map to TTML2 processors.
DAPT Audio Recorder/Renderer: tool that takes DAPT Audio Description scripts, e.g. with mixing instruction, and produces audio output, e.g. a WAVE file. I think it is a "presentation processor"
DAPT Validator: tool that verify that a DAPT document is compliant to the specification. I'm not sure what it maps to in TTML2 terminology.

Note

Given a document that specifies a <style> element with a ttm:agent attribute, and a Script Event with the same ttm:agent attribute value, but that does not reference (directly or by inheritance) that style using a style attribute, a Presentation Processor would not be conformant if it applied that style anyway, solely on the basis of the ttm:agent being the same.

Example 16

...
<styling>
  <style xml:id="style_a"
         ttm:agent="character_3"
         tts:color="#FFFFFF" tts:backgroundColor="#8F42AD"/>         
</styling>     
...
  <div xml:id="event_6" ttm:agent="character_3" style="style_a" ... >
  <-- Script event contents here, in Character 3's style -->
  </div>

  <div xml:id="event_7" ttm:agent="character_3" style="some_other_style" ... >
  <-- Script event contents here, not in Character 3's style -->
  </div>

Issue 15: DAPT Data Model: Harmonize use of styles done awaiting confirmation

Styles are used for characters, texts and contextual texts. From my reading, they are essentially the same property for all three "contexts" where there are used but they have a different name in each context (Style, Text Styles and Contextual Text Styles). This could be represented as one property (e.g. Style) that has a cardinality of 1..* and references a Style Object (with style feature e.g. color). The association is implied by the object it belongs to.

An Script Event object represents dialogue, on screen text or audio descriptions to be spoken and has the following properties:

A mandatory Script Event Identifier which is unique in the script
An optional Begin property and an optional End and an optional Duration property that together define the Script Event's time interval in the programme timeline
Note
Typically Script Events do not overlap in time. However, there can be cases where they do, e.g. in Dubbing Scripts when different Characters speak different text at the same time.
An optional Script Event Type used to identify in Dubbing Scripts if the Script Event represents spoken text, or on screen text, and in the later case, the type of on-screen text (title, credit, location, ...).
Zero or more Character Identifiers indicating the Characters involved in this Script Event.
Note
While typically, a Script Event corresponds to one single Character, there are cases where multiple characters can be associated with a Script Event. This is when all Characters speak the same text at the same time.
Zero or more Text objects, each representing either the Original language script or Translations of the original language script in other languages.
an optional Script Event Description property, as a human-readable description of the Script Event.
Note
The Script Event Description does not need to be unique, i.e. it does not need to have a different value for each Script Event. For example a particular value could be re-used to identify in a human-readable way one or more Script Events that are intended to be processed together, e.g. in a batch recording.
an optional On Screen property, which is an annotation indicating the position of the character during the Script Event

An Script Event is represented in TTML with the following structure and constraints:

MUST

div

The xml:id attribute MUST be present containing the Script Event Identifier.
The begin, end and dur attributes define respectively the Begin, End and Duration of the Script Event.

The begin and end attributes SHOULD be present. The dur attribute MAY be present.

Note
As noted in [ttml2] if both an end attribute and a dur attribute are present, the end time is the earlier of end and (begin + dur).
Note
If timing attributes are omitted, the following rules apply:
- The default value for begin is zero, i.e. the same as the begin time of the parent element.
- The default value for end is indefinite, i.e. it resolves to the same as the end time of the parent timed element, if there is one. The topmost timed element is the <body> element, whose end time is for practical purposes the end of the Related Media Object.
- The default value for dur is indefinite, i.e. the end time resolves to the same as the end time of the parent element.
The ttm:agent attribute MAY be present and if present, MUST contain a reference to each ttm:agent that represents an associated Character.
Note
Multiple references are specified using a space-separated list.

Example 17

...
<div xml:id="event_1"
     begin="9663f" end="9682f" 
     ttm:agent="character_4">
...
</div>
...

It MAY contain zero or more p elements representing each Text object.
The style attribute MAY be present. If present, it MAY contain a reference to the <style> defining the Character Style. Additional style references or inline styles MAY be used.
It MAY contain a metadata element representing the On Screen property.

The Text object contains text content in a single language, and may be styled and associated with a Character. It indicates whether it is in the Original language or if it is a Translation, as well as its language.

The language for which a dubbing or audio description script is being prepared is called the Target Recording Language.

Note

Issue 15: DAPT Data Model: Harmonize use of styles done awaiting confirmation

A Text object is represented with a p element with the following constraints:

The text content of the <p> is the Text of the Script Event.
Note
The text content of the paragraph can be structured using TTML elements such as <br> or <span> which can include TTML attributes such as tts:ruby used to alter the layout or styling of sections of text within each paragraph. Similarly metadata can be added using attributes or <metadata> elements.

Editor's note
Need to specify that <audio> elements are permitted too, for AD mixing.
The style attribute MAY be present. If present, it MAY contain a reference to the <style> that defines the relevant Character Style. Additional style references or inline styles MAY be used as defined in [TTML2], and MAY be applied to sub-sections of the text defined by <span> elements.
The <p> element MUST have one Text Language Source property.
The <p> element SHOULD have an xml:lang attribute corresponding to the language of the Text object.
Note
If a <p> element omits the xml:lang attribute then its computed language is derived by inheritance from its parent element, and so forth up to the root <tt> element, which is required to set the Primary Language via its xml:lang attribute. Care should be taken if changing the Primary Language of a Document Instance in case doing so affects descendant elements unexpectedly. Authors can mitigate this risk by explicitly setting xml:lang on all <p> elements.

Note
Within a <div> representing a Script Event the order of <p> children is significant.

Example 18

<div xml:id="event_3"
     begin="9663f" end="9682f" 
     style="style_a"
     ttm:agent="character_3">
  <p xml:lang="pt-BR" daptm:langSrc="original" >Você vai ter.</p>
  <p xml:lang="fr" daptm:langSrc="translation" >Bah, il arrive.</p>
</div>

The Text Language Source property is an annotation indicating whether a Text object is in the same language as the relevant part of the Related Media Object's language (original), or if it is a representation in another language (translation):

Original - the Text is any of:
1. the same language as the dialogue that it represents in the original programme audio;
2. a transcription of text visible in the original programme video, in the same language as that text;
3. representative of non-dialogue sound and is in the Primary Language of the Document Instance;
4. representative of the scene in the original programme video, and is in the Primary Language of the Document Instance.
Translation - the Text is a representation of the Original language text in a different language.

Note

The language of a Text object whose Text Language Source is Original can be different to the document's Primary Language. For example a transcript of a short piece of dialogue spoken in a different language to the majority of the dialogue would be annotated with both a different xml:lang value and as being Original.

The Text Language Source property is represented as a daptm:langSrc attribute with the following constraints:

Editor's note

Should we use an abbreviated attribute name?

Editor's note

Initial design is to use an abbreviated name and original|translation, though I considered using an abbreviated value too, since this attribute will appear on every <p> element. Abbreviating O for Original is probably a bad idea because the letter O and the number 0 can easily be confused. I also considered P for Primary but that caused potential confusion between Primary Language and Primary Text Language Source.

daptm:langSrc
  : "original"
  | "translation"

The On Screen property is an annotation indicating the position in the scene of the character during the Script Event:

ON - the character is on screen for the entire duration
OFF - the character is off screen for the entire duration
ON_OFF - the character starts on screen, but goes off screen at some point
OFF_ON - the character starts off screen, but goes on screen at some point

The On Screen property is represented as a metadata element with the following constraints:

The following attribute corresponding to the On Screen Script Event property may be present:
```
daptm:onScreen
  : "ON" 
  | "OFF"
  | "ON_OFF"
  | "OFF_ON"
```
Issue 34: [ED Issue] Should we use `ttm:role` values for the on screen property?
From ED Issue under Document Structure -> Event:

Should we use a ttm:role attribute for that instead?
A ttm:desc element may be present representing the Script Event Description.
Script Event Type
Issue 35: [ED Issue] How to add event type information?
From ED Issue under Document Structure -> Event -> ttm:desc:

How to add Event type information? ttm:role? What if already used by the onscreen annotation?

A Document Instance MUST be serialised as a well-formed XML 1.0 [xml] document encoded using the UTF-8 character encoding as specified in [UNICODE].

The resulting [xml] document MUST NOT contain any of the following physical structures:

entity declarations; and
entity references other than to predefined entities.

Note

The resulting [xml] document can contain character references, and entity referencess to predefined entities.

The predefined entities are (including the leading ampersand and trailing semicolon):

& for an ampersand &
' for an apostrophe '
> for a greater than sign >
< for a less than sign <
" for a quote symbol "

Note

A Document Instance can also be used as an in-memory model for processing, in which case the serialisation requirements do not apply.

A Document Instance MAY contain elements and attributes that are neither specifically permitted nor forbidden by a profile.

Note

Document Instances remain subject to the content conformance requirements specified at Section 3.1 of [TTML2]. In particular, a Document Instance can contain elements and attributes not in any TT namespace, i.e. in foreign namespaces, since such elements and attributes are pruned by the algorithm at Section 4 of [TTML2] prior to evaluating content conformance.

Note

For validation purposes it is good practice to define and use a content specification for all foreign namespace elements and attributes used within a Document Instance.

A transformation processor SHOULD preserve such elements or attributes whenever possible.

Editor's note

Do we need to say that a presentation processor may ignore foreign vocab?

Many dubbing and audio description workflows permit annotation of Script Events or documents with proprietary metadata. Metadata vocabulary defined in this specification or in [TTML2] MAY be included. Additional vocabulary in other namespaces MAY also be included.

Note

It is possible to add information such as the title of the programme using [TTML2] constructs.

Example 19

<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>Error</title>
</head>
<body>
<pre>Cannot GET /uploads/9wY0pR/examples/metadata-ttml2.xml</pre>
</body>
</html>

Note

It is possible to add workflow-specific information using a foreign namespace. In the following example, a fictitious namespace vendorm from an "example vendor" is used to provide document-level information not defined by DAPT.

Example 20

<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>Error</title>
</head>
<body>
<pre>Cannot GET /uploads/9wY0pR/examples/metadata-proprietary.xml</pre>
</body>
</html>

The following namespaces (see [xml-names]) are used in this specification:

Name	Prefix	Value	Defining Specification
XML	`xml`	`http://www.w3.org/XML/1998/namespace`	[xml-names]
TT	`tt`	`http://www.w3.org/ns/ttml`	[TTML2]
TT Parameter	`ttp`	`http://www.w3.org/ns/ttml#parameter`	[TTML2]
TT Feature	none	`http://www.w3.org/ns/ttml/feature/`	[TTML2]
TT Audio Style	tta:	`http://www.w3.org/ns/ttml#audio`	[TTML2]
DAPT Metadata	`daptm`	`http://www.w3.org/ns/ttml/profile/dapt#metadata`	This specification

The namespace prefix values defined above are for convenience and Document Instances MAY use any prefix value that conforms to [xml-names].

The namespaces defined by this proposal document are mutable [namespaceState]; all undefined names in these namespaces are reserved for future standardization by the W3C.

Editor's note

Check if we need this, and if we do, rework for audio - e.g. relate to audio sample, or other quantisation units?

Each Intermediate Synchronic Document of the Document Instance is intended to be rendered starting on a specific frame and removed by a specific frame of the Related Video Object.

Note

In the context of this specification rendering could be visual presentation of text, for example to show an actor what words to speak, or could be audible playback of an audio resource, or could be physical or haptic, such as a Braille display.

When mapping a media time expression M to a frame F of a Related Video Object (or Related Media Object), e.g. for the purpose of mixing audio sources signalled by a Document Instance into the main programme audio of the Related Video Object, the presentation processor MUST map M to the frame F with the presentation time that is the closest to, but not less, than M.

EXAMPLE 1 A media time expression of 00:00:05.1 corresponds to frame ceiling( 5.1 × ( 1000 / 1001 × 30) ) = 153 of a Related Video Object with a frame rate of 1000 / 1001 × 30 ≈ 29.97.

Note

In typical scenario, the same video program (the Related Video Object) will be used for Document Instance authoring, delivery and user playback. The mapping from media time expression to Related Video Object above allows the author to precisely associate audio description content with video frames, e.g. around existing audio dialogue and sound effects. In circumstances where the video program is downsampled during delivery, the application can specify that, at playback, the relative video object be considered the delivered video program upsampled to is original rate, thereby allowing audio content to be presented at the same temporal locations it was authored.

TTML documents representing DAPT Scripts MUST specify a ttp:contentProfiles attribute on the tt element with one value equal to the designator of the DAPT 1.0 profile to which the Document Instance conforms. Other values MAY be present to declare conformance to other profiles of [TTML2].

This profile is associated with the following profile designator:

Profile Name	Profile Designator
DAPT 1.0	`http://www.w3.org/ns/ttml/profile/dapt`

Within a DAPT Script, the following constraints apply in relation to time attributes and time expressions:

The only permitted ttp:timeBase value is media, since 5.8 Features prohibits all timeBase features other than #timeBase-media.

This means that the beginning of the document timeline, i.e. time "zero", is the beginning of the Related Media Object.

The only permitted value of the timeContainer attribute is the default value, par.

Documents SHOULD omit the timeContainer attribute on all elements.

Documents MUST NOT set the timeContainer attribute to any value other than par on any element.

Note

This means that the begin attribute value for every timed element is relative to the computed begin time of its parent element, or for the <body> element, to time zero.

If the document contains any time expression that uses the f metric, or any time expression that contains a frames component, the ttp:frameRate attribute MUST be present on the <tt> element.

If the document contains any time expression that uses the t metric, the ttp:tickRate attribute MUST be present on the <tt> element.

This section is non-normative.

As specified in [ttml2], a #time-clock-with-frames expression is translated to a media time M according to
M = 3600 · hours + 60 · minutes + seconds + (frames ÷ (ttp:frameRateMultiplier · ttp:frameRate)).

Note

For example, the expression 01:23:45:15, where ttp:frameRate="25" and ttp:frameRateMultiplier is unspecified, and therefore the default of 1, is equivalent to a media time

M = 3600 × 01 + 60 × 23 + 45 + (15 ÷ (1 × 25))
  = 5025.6s

All time expressions within a document SHOULD use the same syntax, either clock-time or offset-time as defined in [ttml2].

Note

A clock-time has one of the forms:

hh:mm:ss.sss
hh:mm:ss
hh:mm:ss:ff
hh:mm:ss:ff.x

where hh is hours, mm is minutes, ss is seconds, ss.sss is seconds with a decimal fraction of seconds (any precision), ff is frames, and x is sub-frames.

Note

An offset-time has one of the forms:

nn metric
nn.nn metric

where nn is an integer, nn.nn is a number with a decimal fraction (any precision), and metric is one of:

h for hours,
m for minutes,
s for seconds,
ms for milliseconds,
f for frames, and
t for ticks.

See Conformance for a definition of permitted, prohibited and optional.

Editor's note

Intent is to make it as easy as possible to transform a Document Instance into IMSC ([ttml-imsc1.2]), so we need to check that there are no IMSC permitted Features that we prohibit, or if there are, then we can explain the reason.

Editor's note

Editorial task: go through this list of features and check the disposition of each. IMSC-only features should be optional.

Feature	Disposition	Additional provision
Relative to the TT Feature namespace
`#animation-version-2`	permitted
`#audio`	permitted
`#audio-description`	permitted
`#audio-speech`	permitted
`#backgroundColor-block`	prohibited
`#backgroundColor-region`	prohibited
`#cellResolution`	prohibited
`#chunk`	permitted
`#clockMode`	prohibited
`#clockMode-gps`	prohibited
`#clockMode-local`	prohibited
`#clockMode-utc`	prohibited
`#content`	permitted
`#contentProfiles`	required
`#core`	permitted
`#data`	permitted
`#display-block`	prohibited
`#display-inline`	prohibited
`#display-region`	prohibited
`#display`	prohibited Editor's note Consider display="none" in relation to AD content
`#dropMode`	prohibited
`#dropMode-dropNTSC`	prohibited
`#dropMode-dropPAL`	prohibited
`#dropMode-nonDrop`	prohibited
`#embedded-audio`	permitted
`#embedded-data`	permitted
`#extent-root`	prohibited
`#extent`	prohibited
`#frameRate`	permitted	See 5.7.3 `ttp:frameRate`.
`#frameRateMultiplier`	permitted
`#gain`	permitted
`#layout`	prohibited
`#length-cell`	prohibited
`#length-integer`	prohibited
`#length-negative`	prohibited
`#length-percentage`	prohibited
`#length-pixel`	prohibited
`#length-positive`	prohibited
`#length-real`	prohibited
`#length`	prohibited
`#markerMode`	prohibited
`#markerMode-continuous`	prohibited
`#markerMode-discontinuous`	prohibited
`#metadata`	permitted
`#opacity`	prohibited
`#origin`	prohibited
`#overflow`	prohibited
`#overflow-visible`	prohibited
`#pan`	permitted
`#pitch`	permitted
`#pixelAspectRatio`	prohibited
`#presentation`	prohibited
`#processorProfiles`	permitted
`#profile`	permitted	See 5.6 Profile Signaling.
`#region-timing`	prohibited
`#resources`	permitted
`#showBackground`	prohibited
`#source`	permitted
`#speak`	permitted
`#speech`	permitted
`#structure`	permitted
`#styling`	permitted
`#styling-chained`	permitted
`#styling-inheritance-content`	permitted
`#styling-inheritance-region`	prohibited
`#styling-inline`	permitted
`#styling-nested`	permitted
`#styling-referential`	permitted
`#subFrameRate`	permitted
`#tickRate`	permitted	See 5.7.4 `ttp:tickRate`.
`#timeBase-clock`	prohibited
`#timeBase-media`	permitted	See 5.7.1 `ttp:timeBase`. NOTE: [TTML1] specifies that the default timebase is `"media"` if `ttp:timeBase` is not specified on `tt`.
`#timeBase-smpte`	prohibited
`#time-clock-with-frames`	permitted	See 5.7.3 `ttp:frameRate` and 5.7.5 `#time-clock-with-frames`.
`#time-clock`	permitted
`#time-offset-with-frames`	permitted	See 5.7.3 `ttp:frameRate`.
`#time-offset-with-ticks`	permitted	See 5.7.4 `ttp:tickRate`.
`#time-offset`	permitted
`#timeContainer`	permitted	See 5.7.2 `timeContainer`.
`#timing`	permitted	See 5.7.6 Time expressions.
`#transformation`	permitted	See constraints at #profile.
`#visibility-block`	prohibited
`#visibility-region`	prohibited
`#writingMode-horizontal-lr`	prohibited
`#writingMode-horizontal-rl`	prohibited
`#writingMode-horizontal`	prohibited
`#zIndex`	prohibited

This specification does not put additional constraints on the layout and rendering features defined in [ttml-imsc1.2].

Note

Layout of the paragraphs may rely on the default TTML region (i.e. if no layout is used in the head element) or may be explicit by the use of the region attribute, if a region element is defined in a layout element in the head element.

Dubbing and Audio description Profiles of TTML2

Abstract

Status of This Document

1. Scope

2. Introduction

2.1 Example documents

2.1.1 Basic document structure

2.1.2 Audio Description Examples

2.1.3 Dubbing Examples

3. Documentation Conventions

4. DAPT Data Model and corresponding TTML syntax

4.1 DAPT Script

4.1.1 Script Type

4.1.2 Primary Language

4.1.3 Script Events

4.1.4 Characters

4.2 Character

4.3 Script Event

4.4 Text

4.5 Text Language Source

4.6 On Screen

5. Constraints

5.1 Document Encoding

5.2 Foreign Elements and Attributes

5.2.1 Proprietary Metadata

5.3 Namespaces

5.4 Related Video Object

5.5 Synchronization

5.6 Profile Signaling

5.6.1 Profile Designator

5.7 Timing constraints

5.7.1 ttp:timeBase

5.7.2 timeContainer

5.7.3 ttp:frameRate

5.7.4 ttp:tickRate

5.7.5 #time-clock-with-frames

5.7.6 Time expressions

5.8 Features

5.9 Layout

6. Conformance

6.1 Profile Resolution Semantics

A. Index

A.1 Terms defined by this specification

A.2 Terms defined by reference

B. Acknowledgments

C. References

C.1 Normative references

C.2 Informative references

5.7.1 `ttp:timeBase`

5.7.2 `timeContainer`

5.7.3 `ttp:frameRate`

5.7.4 `ttp:tickRate`

5.7.5 `#time-clock-with-frames`