Text-to-Speech Rendering of Electronic Documents Containing Ruby: User Requirements

This document addresses issues related to text-to-speech functionality in HTML documents and EPUB publications that contain ruby annotations. Although the typographic aspects of ruby are addressed in [ JLREQ ] and [ CLREQ ], the text-to-speech implications have received little attention. This document examines the various uses of ruby and discusses the implications of different reading strategies, identifying the types of information that may be relevant for text-to-speech. It does not specify processing procedures or algorithms, nor does it introduce new mechanisms or markup related to reading ruby aloud beyond those already defined; such matters are outside the scope of this document.

The primary purpose of ruby annotations is to indicate how to pronounce CJK ideographic characters, a practice known as Furigana (see also JLReq terminology ).

In contemporary usage, it is uncommon to attach ruby annotations to all CJK ideographic characters in a given document. Instead, it is more common to attach ruby annotations to only some of the CJK ideographic characters.

Ruby annotations find their application in various contexts, including trade books, newspapers, textbooks, teaching materials, and more, but are rarely utilized in business documents.

Even for simple CJK ideographic characters, ruby annotations may be added for some users who have particular difficulties with CJK ideographic characters (in electronic documents, it is easy to make ruby annotations visible or invisible based on user preferences). Such ruby annotations are referred to as "furigana added for enhanced accessibility".

Some simple CJK ideographic characters have more than one possible reading and thus require ruby annotations for disambiguation. This is common for names of people and places. For example, 山崎 (a person's name) may be read as YAMAZAKI or YAMASAKI.

When ruby annotations are applied only to selected CJK ideographic characters in a document, typically only the first occurrence of such characters or words receives an annotation, and subsequent occurrences do not. This practice assumes that users will learn the correct pronunciation from the first occurrence.

Especially in Japan, ruby annotations are also used to indicate something different from the reading of a CJK ideographic character. Such ruby annotations are referred to as Gikun . Gikun is commonly employed in light novels and comics.

Here are some examples of Gikun:

敵とも (where 敵 means 'enemy' and とも means 'friend'). The combination means 'frenemy'.
生命いのち (where the typical reading of 生命 is SEIMEI rather than いのち (INOCHI), both of which mean 'Life')
背景バック (where the typical reading of 背景 is HAIKEI rather than バック (back), an English translation)
牛乳ミルク (where the typical reading of 牛乳 is GYUUNYUU rather than ミルク (milk), an English translation)

Even when Gikun is used for a compound word, it is unlikely to be repeated for later occurrences of the same word. Moreover, different GIKUN may be added for subsequent occurrences of the same word. For example, the next occurrence of 生命 may well be 生命ライフ where ライフ (life) is an English translation.

Note

Unusual names of people in Japan are typically written using CJK ideographic characters but are pronounced quite differently from the standard reading of these characters. For instance, 男あだむ is an unusual name, where 男 (usually read as OTOKO) means 'man', and あだむ represents 'Adam' in Kana.

Character names in comics, animations, and light novels can sometimes be extremely challenging to pronounce. Many of the character names in Demon Slayer (Kimetsu no Yaiba) fall into this category. For example, almost no one can read 不死川玄弥 as SHINAZUGAWA GENNYA without assistance.

Names of places can also be difficult to read due to historical reasons. For instance, 神居古潭かむいこたん , 温根沼おんねとう , 音威子府おといねっぷ are names of places in Hokkaido (the northern island of Japan). These names are challenging to pronounce because they originated from the Ainu language , which is entirely different from the Japanese language.

In many instances, the first occurrence of an unusual name is accompanied by a ruby annotation, but subsequent occurrences are not.

Interlinear notes resemble ruby annotations in appearance. A note in JLreq introduces interlinear notes as follows:

Note : Quoted note from JLReq

In the example shown in a figure referenced in the quoted note ("An example of a note in inter lines") , 徳川慶喜 (Tokugawa Yoshinobu) is accompanied by an interlinear note "1837-1913 江戸幕府最後の将軍 " (1837-1913 the last shogun of the Edo shogunate). Other examples are: a modern kana phrase as an interlinear note for a historical kana phrase, a standard Japanese expression as an interlinear note for an expression in a dialect, a modern CJK ideographic character as an interlinear note for a traditional CJK ideographic character, an English text chunk as an interlinear note for a Japanese text chunk, and an official name as an interlinear note for an abbreviated name.

One could argue that HTML ruby elements should not be used for representing interlinear notes (see Kobayashi Sensei's mail in Japanese ). However, it is not difficult to imagine that ruby elements are actually used for representing interlinear notes.

Ruby annotations can be used for purposes other than indicating the reading of CJK ideographic text. For example, they may be applied to foreign-language text written in non-native scripts, chemical formulas, mathematical expressions, or other specialized notations, in order to convey pronunciation or supplementary information.

However, such uses of ruby annotations fall outside the scope of this note. This document focuses exclusively on ruby annotations associated with CJK ideographic text, where language-dependent reading behavior and accessibility considerations raise specific and non-trivial issues.

A sequence of characters can be accompanied by two ruby annotations, typically consisting of Furigana and either GIKUN or an interlinear note . In an example provided in JLreq ("An example of ruby annotations attached to both sides of the base characters"), 東南 is accompanied by たつみ and とうなん . Here 東南 means 'southeast', with とうなん (TOUNAN) serving as Furigana , and たつみ (TATSUMI) as GIKUN , as 辰巳 (read as TATSUMI ) indicates the same direction as 東南 .

We offer two additional illustrative examples.

Double-sided ruby example 1 — Figure 1 東洋 features an upper-side ruby annotation オリエント and a lower-side ruby annotation とうよう

In this example, とうよう serves as Furigana , while オリエント is used as Gikun

Double-sided ruby example 2 — Figure 2 織田信長 features an upper-side ruby annotation "1534〜82" and a lower-side ruby annotation おだのぶなが

In this example, おだのぶなが serves as Furigana , while "1534〜82" is presented as an interlinear note .

There are three possible ~~options~~ strategies for text-to-speech rendering: (1) reading aloud both ruby bases and ruby annotations, (2) reading aloud ruby annotations only, and (3) reading aloud ruby bases only. This section evaluates the consequences of these ~~options,~~ strategies, assuming the roles and semantics of ruby annotations described in 2. Roles of ruby annotations .

The strategies discussed in this section for reading aloud content that contains ruby annotations are inherently language-dependent. The presence of ruby annotations does not necessarily imply that the annotations themselves should be read aloud. The appropriateness, effectiveness, and potential drawbacks of a given strategy depend on the writing system, linguistic conventions, and typical roles that ruby annotations play in each language.

This Note focuses primarily on Japanese and Chinese, which both employ ruby annotations but do so in fundamentally different ways. In Japanese, ruby annotations can be essential for correct interpretation and comprehension of the base text in read-aloud scenarios. As a result, how ruby annotations are handled during read-aloud processing can have a significant impact on user experience, and careful selection among alternative strategies is required.

In contrast, in Chinese, ruby annotations such as pinyin or zhuyin are typically used as auxiliary pronunciation aids and are not generally required for intelligible read-aloud output. In many practical reading scenarios, text-to-speech (TTS) or other read-aloud output based solely on the ruby base text is sufficient. Reading ruby annotations aloud may in some cases introduce confusion, particularly when the assumptions underlying the annotations do not align with those of the read-aloud output.

The analysis of Chinese reading practices in this Note is limited. While some general observations are provided to inform the discussion of read-aloud strategies, a comprehensive treatment of Chinese reading practices — including dialectal variation, prosodic information, and phonological detail — is beyond the scope of this version of the Note.

The strategies described in the following subsections should therefore not be interpreted as universally applicable recommendations. User agents and assistive technologies are expected to take the target language, content type, and usage context into account when selecting or implementing a strategy for reading aloud content that includes ruby annotations.

Note

~~In this option,~~ This strategy reads aloud both ruby bases and ruby annotations ~~are read aloud~~ (double reading). Many implementations (screen readers, in particular) support only ~~the option of reading both ruby bases and annotations.~~ this strategy. For example, foo bar is read aloud as 'foo bar' or 'bar foo'.

The ~~option~~ strategy of reading aloud both interferes with readers' understanding significantly.

彼の名前は出羽内でわないです。

This sentence is intended to mean "His name is Dewanai". Double reading completely changes the meaning: it will be interpreted as "His name is not Dewanai".

それでは話はなしにならない。

This sentence is intened to mean "Nonsense!". Double reading completely changes the meaning: it will be interpreted as "You have to deal with it".

Consider this English sentence having a ruby annotation: "My name is Knot not ".

Double reading completely changes the meaning: it will be interpreted as "My name is not Knot".

Another example: "There is a road in Autin spelled both Manchaca Man-Chack and Menchaca Man-Chack ".

Double reading makes the road name read aloud twice, possibly differently.

Yet another example: " Oxoerythromycin oxo-eur-ithro-mycin is a ketone derived from erythromycin".

Double reading makes this compound name read aloud twice, possibly differently.

The ~~option~~ strategy of reading aloud both is sensible. It is common to read aloud ruby annotations first then ruby bases next, but it is sometimes better to read aloud ruby bases first and ruby annotations next [ Transliteration Training Course ]).

敵とも is read aloud as TEKI TOMO or TOMO TEKI, which means 'enemy friend' or 'friend enemy' (equal to 'frenemy').

生命いのち is read aloud as SEIMEI INOCHI or INOCHI SEIMEI, where SEIMEI is a loan word from Chinese and INOCHI is a native Japanese word. Both means life.

The ~~option~~ strategy of reading aloud both interferes with readers' understanding significantly.

不死川玄弥しなずがわげんや is read aloud as FUSHIKAWA GENYA SHINAZUGAWA GENYA or SHINAZUGAWA GENYA FUSHIKAWA GENYA, which suggests two persons rather than one person.

The ~~option~~ strategy of reading aloud both is sensible. It is necessary to read aloud ruby bases first then ruby annotations next.

For example, 徳川慶喜 1837-1913 江戸幕府最後の将軍 is read aloud as TOKUGAWA YOSHINOBU 1837-1913 EDO BAKUFU SAIGONO SHOUGUN, which means 'Tokugawa Yoshinobu 1837-1913, the last shogun of the Edo shogunate'.

Since there are two ruby annotations, double-sided ruby leads to reading aloud three times. One of the ruby annotations is typically furigana, so the description in 3.1.1 applies. If the other ruby annotation is a Gikun, the description in 3.1.2 applies; if it is an interlinear note, the description in 3.1.4 applies.

Note

~~In this option,~~ This strategy reads aloud ruby annotations ~~are read aloud but~~ without reading the ruby ~~bases are not.~~ bases. For example, foo bar is read aloud as 'bar'.

Even native Japanese speakers may easily assume, without thorough consideration, that reading only ruby annotations used as furigana aloud will provide reasonable results. This common assumption, however, is not always correct.

Each hiragarana character represents a mora (a basic timing unit in phonology), which is typically a single vowel or a consonant followed by a single vowel. The same sequence of moras may mean different words depending on the pitch accent. For example, both 雨 (rain) and 飴 (candy) consists of the same moras: あ and め . However, if the Tokyo accent is used as a basis, the first mora in 雨 has a high pitch, and the second has a low pitch; 飴 has the opposite pitch accent.

Reading aloud ruby annotations rather than ruby bases often leads to incorrect pitch accent. As an example, consider 雨あめが好き (I like rain) and 飴あめが好き (I like candy). In both cases, reading aloud ruby annotations rather than ruby bases implies that the TTS engine will receive あめが好き and create the same result.

A similar example is 牡蠣かきを食べる (I eat oysters) and 柿かきを食べる (I eat persimmons), where 牡蠣 and 柿 have the same two moras but opposite pitch accents.

In modern Japanese, there is basically only one way to read each hiragana character. But は and へ are exceptions. は is usually read aloud as HA but is read aloud as WA when it is used as a particle. Likewise, へ is usually read aloud as HE but is read aloud as E when it is used as a particle.

Reading aloud ruby annotations rather than ruby bases implies that CJK ideographic characters in ruby bases will not be passed to the TTS engine, only hiragana characters in ruby annotations will be.

Without CJK ideographic characters, Japanese morphological analysis is likely to fail. For example, consider やがて廃止はいしになる . This sentence means "It will be abolished eventually". But if やがてはいしになる is passed to the TTS engine, は may well be mistakenly interpreted as a particle and read aloud as WA rather than HA. The resulting utterance means "I will eventually become a doctor". A similar problem arises with へ . When へ appears in ruby annotations without the corresponding CJK ideographic characters, it may also be misinterpreted as a particle and read aloud as E rather than HE.

Here are some further examples. All occurrences of は and へ in ruby annotations are likely to be mistakenly read aloud.

人員配置はいち
自然破壊はかい
社会波紋はもん
天皇陛下へいか
大学併願へいがん
学級閉鎖へいさ

As described in 2.1.1 , furigana as a ruby annotation may be attached to only the first occurrence of a CJK ideographic character or a word composed from such characters. Thus, there is a risk that the first occurrence and the others are read aloud differently. For example, consider 智子 as the name of a character in the novel. Tthere are several possible readings of this name, such as さとこ and ともこ . If さとこ as a ruby annotation is attached only to the first occurrence of the name, it will be read as さとこ and the other occurrences may be read as ともこ . The reader would then think that さとこ and ともこ are different characters.

Note

The ~~option~~ strategy of reading aloud ruby annotations only provides an understandable result but does not properly convey the author's intention.

敵とも is read aloud as TOMO, which means 'friend', but 'frenemy' is intended.

生命いのち will be read aloud as INOCHI( いのち ).

The ~~option~~ strategy of reading aloud ruby annotations only works correctly. However, if the first occurrence of a name is accompanied by a ruby annotation and the other occurrences are not, the first occurrence is read aloud differently from the others thus suggesting different persons or places.

For example, 不死川玄弥しなずがわげんや is read aloud as SHINAZUGAWA GENYA correctly. But later occurrences of 不死川玄弥 are read aloud as FUSHIKAWA GENYA if they do not have ruby annotations.

Note

The ~~option~~ strategy of reading aloud ruby annotations only provides incomprehensible results often.

If "1837-1913 江戸幕府最後の将軍" is attached to 徳川慶喜 as a ruby annotation, it will be read aloud as 1837-1913 EDOBAKUFU SAIGO NO SHOGUN (1837-1913 the last shogun of the Edo shogunate), which is reasonable. But if only "1837-1913" is attached as a ruby annotation, the result is 1837-1913, which does not make any sense.

The ~~option~~ strategy of reading aloud ruby annotations only makes two ruby annotations be read aloud while ignoring their ruby base. Since one of the two ruby annotations is typically furigana, the description in 3.2.1 applies. If the other ruby annotation is a Gikun, the description in 3.2.2 applies; if it is an interlinear note, the description in 3.2.4 applies.

Note

~~In this option,~~ This strategy reads aloud ruby bases ~~are read aloud but~~ without reading ruby ~~annotations are not.~~ annotations. For example, foo bar is read aloud as foo.

Note

The ~~option~~ strategy of reading aloud ruby bases only may or may not provide good results, depending on text-to-speech engines.

The following is a quote from [ ACCESSIBLE_E_BOOKS ].

Note

Furthermore, compound words made up from CJK ideographic characters in JIS X 0208 are sometimes read aloud incorrectly.

As the importance of accessibility is well recognized and text-to-speech engines are improved, more and more words will be read aloud correctly. However, there are some words, such as the aforementioned YAMAZAKI, that cannot be read aloud correctly by text-to-speech engines and even native Japanese speakers.

The ~~option~~ strategy of reading aloud ruby bases only results in a perfectly understandable result. However, since gikun is ignored, the author's intent is not completely conveyed.

敵とも is read aloud as TEKI, which means 'enemy', but 'frenemy' is intended.

生命いのち is read out as SEIMEI.

The ~~option~~ strategy of reading ruby bases only leads to incorrect results. However, since every occurrence of a name is read aloud in the same way, users will not be confused.

Every occurrence 不死川玄弥しなずがわ　げんや will always be incorrectly read aloud as FUSHIKAWA GENYA, regardless of the presence or absence of ruby annotations.

The ~~option~~ stragey of reading ruby bases only provides a perfectly understandable result. However, since interlinear notes are ignored, the author's intention is not conveyed well.

徳川慶喜 1837-1913 江戸幕府最後の将軍 (Tokugawa Yoshinobu 1837-1913, the last shogun of the Edo shogunate), will be read aloud as とくがわよしのぶ (Tokugawa Yoshinobu).

The ~~option~~ strategy of reading ruby bases only will ignore the two ruby annotations and read their ruby base only. When one of the two ruby annotations is furigana, the description in 3.3.1 applies. If the other is a gikun, the description in 3.3.2 applies, and if it is an interlinear note, the description in 3.3.4 applies.

Note

Small kana characters ゃ , ゅ , ょ , and っ are too small when they appear in ruby annotations. For this reason, instead of these small characters, full-size kana characters や , ゆ , よ , and つ are used in ruby annotations.

However, since full-size kana characters are pronounced differently from small kana, ruby annotations containing full-size kana are read aloud differently.

CSS has a mechanism for overcoming this problem. Value ' full-size-kana ' of the text-transform property as specified in CSS Text converts small kana characters to full-size kana. It is thus possible to use small kana in ruby annotations while rendering them using full-size kana. Text-to-speech engines can provide correct results even when ruby annotations are read aloud.

When attaching a ruby annotation to a compound word consisting of multiple CJK ideographic characters in an HTML or EPUB document, one way is to create a single HTML ruby element for the entire word. However, in some cases, a separate ruby element is created for each CJK ideographic character. For example, to attach the ruby annotation せいめい to the word 生命 (meaning “life” in Japanese), the typical approach is to create a single ruby element for this word. This ruby element may have a single rt element for “せいめい” or two rt elements (one for “せい” and another for “めい”). However, it is not entirely uncommon to see two ruby elements for this word: one for “生” and another for “命”.

Some people argue that creating a ruby element per compound word is better than creating a ruby element for each character in a compound word. They argue that it becomes easier for the text-to-speech engine to maintain a correspondence table between ruby bases and ruby annotations so that subsequent occurrences of the compound word without ruby can be pronounced correctly.

Meanwhile, others argue that there is a good reason to attach ruby annotations to some, but not all, characters in a compound word. For example, consider 佳人 , where 佳 is taught in junior high schools while 人 is taught in the first grade of elementary schools. Therefore, it makse sense to attach a ruby annotation to 佳 only (one ruby element for 佳 and no ruby element for 人 ). Similarly, it is reasonable to attach ruby annotations to the first and third CJK idegraphic characters in 屯田兵 only but not to the second one (thus, two ruby elements).

Although furigana added for enhanced accessibility is necessary for those readers who have particular difficulties with CJK ideographic characters, it is unnecessary or slightly disturbing for others. If furigana added for enhanced accessibility is distinguishable from normal furigana, it can be made visible or invisible depending on user preferences. It is thus desirable to have a standardized way to indicate furigana added for enhanced accessibility.

In 3. Which should be read aloud, ruby bases or ruby annotations, or both? , we have seen that ruby annotations used as gikun or interlinear notes should be read aloud differently from the other cases. Therefore, it is important to be able to distinguish ruby annotations used as gikun or interlinear notes from other ruby annotations in order to support appropriate reading behavior. How such distinctions are made — whether through linguistic analysis, lexical resources, markup conventions, or other means — is outside the scope of this document.

[ SSML ] and [ PRONUNCIATION-LEXICON ] offer alternatives for conveying phonemic and phonetic pronunciations of CJK ideographic characters to speech synthesis engines. These methods are not intended for visual presentations but can offer superior control over text-to-speech compared to using ruby annotations.

[ SSML ] employs symbol collections (such as IPA and [ JEITA_IT-4006 ]) to represent the sounds of human languages. Phonemic and phonetic pronunciations are conveyed through sequences of these symbols.

[ epub-32 ] allows the use of SSML attributes within XHTML content documents in EPUB publications. In [ epub-33 ], these attributes are relocated to [ epub-tts-10 ]. Meanwhile, the W3C Accessible Platform Architectures Working Group is developing [ spoken-html ], which outlines two potential methods for incorporating SSML attributes into HTML elements.

In Japan, SSML finds extensive application in digital textbooks, adopted by the biggest textbook publisher in Japan. However, it has been noted that attaching SSML attributes to CJK ideographic characters significantly raises authoring costs. In the case of DAISY textbooks in Japan, SSML is not used, as they contain recorded voice. Trade books in Japan do not typically employ SSML either.

PLS ([ PRONUNCIATION-LEXICON ]) enables the use of pronunciation lexicons, which map words to sequences of symbol collections such as those found in IPA or [ JEITA_IT-4006 ].

While SSML attributes are embedded within XHTML content documents in EPUB publications, PLS lexicons in EPUB publications are stored externally to and referenced by XHTML content documents (see Pronunciation Lexicons section in [ epub-tts-10 ]). As of the present, [ spoken-html ] does not offer a mechanism for associationg PLS lexicons with HTML documents.

PLS is a robust tool for rendering unusual names of people and places in text-to-speech applications. In particular, PLS allows every occurrence of a word or phrase to be consistently pronounced, regardless of the presence of ruby annotations. At the time of this writing, PLS is used by at least one digital textbook publisher in Japan.

This section summarizes the kinds of information that may be relevant to text-to-speech rendering of content containing ruby annotations. It does not introduce new requirements or prescribe how such information is to be used, but instead enumerates the information sources discussed in previous sections.

These information sources include, but are not limited to:

Ruby base
Ruby annotation
Morphological analysis (for particle pronunciation and word segmentation)
Contextual reuse of ruby base/annotation pairs
Author-supplied pronunciation hints (e.g., via PLS or SSML)
Heuristics or AI-based inference for legacy content lacking explicit semantics

In many cases, multiple sources of information may be considered together. For example, morphological analysis of the ruby base may affect how adjacent particles are pronounced, while PLS or SSML data may override default pronunciations. Inconsistencies between ruby base and annotation can also provide clues for determining whether the ruby is phonetic or non-phonetic in nature.

In summary, text-to-speech rendering of content containing ruby annotations may draw on multiple layers of information, including linguistic, typographic, and author-supplied data. This document identifies these information sources but intentionally avoids prescribing algorithms, prioritization strategies, or weighting mechanisms. Such implementation decisions may evolve through heuristics, rule-based systems, or AI-based approaches, and are expected to be documented elsewhere.

The conversion of HTML documents and EPUB publications to braille is expected to become increasingly important in the near future.

Japanese braille lacks CJK ideographic characters and does not distinguish between hiragana and katakana. (Note: Han braille in Japan does include CJK ideographic characters, but it is not widely used.)

Braille exhibits some syntactical differences from the Japanese writing system. First, space characters are inserted as delimiters between words. Second, two Japanese particles, は and へ , are transcribed as they are pronounced, meaning は and へ are represented as if they were わ and え , respectively. Third, う pronounced as an elongated sound is represented using the long vowel character. For example, to tranlsate たいよう to braille, たいよう is first converted to たいよー and then translated to braille.

Natural language processing is required to handle these differences during the conversion to braille. However, unlike the case of text-to-speech, intonation is not relevant.

When converting HTML or EPUB content to braille, it is essential to select the correct reading for each CJK ideographic character. Choosing an incorrect reading can result in erroneous braille output. Similar to text-to-speech, ruby annotations provide valuable hints, while [ SSML ] and PLS ([ PRONUNCIATION-LEXICON ]) serve as effective alternatives.

For furigana and the transcription of unusual names of people and places, natural language processing is more effective when using ruby bases (typically containing CJK ideographic characters) as the foundation. In contrast, the correct readings are chosen when using ruby annotations as the basis. It is also possible to combine both ruby bases and ruby annotations.

[ACCESSIBLE_E_BOOKS]: Guidelines for creating accessible e-books for text-to-speech . the Ministry of Internal Affairs and Communications. 2015. URL: https://web.archive.org/web/20220118065321/https://www.soumu.go.jp/main_content/000354698.pdf
[CLREQ]: Requirements for Chinese Text Layout - 中文排版需求 . Fuqiao Xue; Richard Ishida. W3C. 2 December 2025. DNOTE. URL: https://www.w3.org/TR/clreq/
[css-text-3]: CSS Text Module Level 3 . Elika Etemad; Koji Ishii; Florian Rivoal. W3C. 30 September 2024. CRD. URL: https://www.w3.org/TR/css-text-3/
[epub-32]: EPUB 3.2 . W3C. 08 May 2019. W3C Final Community Group Specification. URL: https://www.w3.org/publishing/epub3/epub-spec.html
[epub-33]: EPUB 3.3 . Ivan Herman; Matt Garrish; Dave Cramer. W3C. 27 March 2025. W3C Recommendation. URL: https://www.w3.org/TR/epub-33/
[epub-tts-10]: EPUB 3 Text-to-Speech Enhancements 1.0 . Matt Garrish. W3C. 28 August 2025. W3C Working Group Note. URL: https://www.w3.org/TR/epub-tts-10/
[JEITA_IT-4006]: Symbols for Japanese Text-to-Speech Synthesizer . Japan Electronics and Information Technology Industries Association. March 2010.
[JLREQ]: Requirements for Japanese Text Layout 日本語組版処理の要件(日本語版) . Hiroyuki Chiba; Junzaburo Edamoto; Richard Ishida; Seiichi Kato; Tatsuo KOBAYASHI; Toshi Kobayashi; Nathaniel McCully; Felix Sasaki; Atsushi Shimono; Hajime Shiozawa; Fuqiao Xue et al. W3C. 11 August 2020. W3C Working Group Note. URL: https://www.w3.org/TR/jlreq/
[PRONUNCIATION-LEXICON]: Pronunciation Lexicon Specification (PLS) Version 1.0 . Paolo Baggia. W3C. 14 October 2008. W3C Recommendation. URL: https://www.w3.org/TR/pronunciation-lexicon/
[spoken-html]: Specification for Spoken Presentation in HTML . Irfan Ali; Markku Hakkinen; Paul Grenier; Ruoxi Ran. W3C. 23 September 2021. W3C Working Draft. URL: https://www.w3.org/TR/spoken-html/
[SSML]: Speech Synthesis Markup Language (SSML) Version 1.1 . Daniel Burnett; Zhi Wei Shuang. W3C. 7 September 2010. W3C Recommendation. URL: https://www.w3.org/TR/speech-synthesis11/
[Tokyo Ghoul: re]: Tokyo Ghoul: re, Vol. 8, Chapter 76 “Daso (惰疎) . Sui Ishida. Shueisha. 2016.
[Transliteration Training Course]: Textbook for the Volunteer Transliteration Training Course: Basics (In Japanese, 音訳ボランティア養成講習会テキスト　基礎課程編) . National Council of Japan for the Visually Impaired (In Japanese, 全国視覚障害者情報提供施設協会). March 2010. URL: https://naiiv-books.net/shopdetail/000000000023/

Text-to-Speech Rendering of Electronic Documents Containing Ruby: User Requirements

Abstract

Status of This Document

1. Scope

2. Roles of ruby annotations

2.1 Furigana, background

2.2 Gikun, background

2.3 Unusual names of people and places, background

2.4 Interlinear notes, background

2.5 Others

2.6 Double-sided ruby, background

3. Which should be read aloud, ruby bases or ruby annotations, or both?

3.1 Language-dependent considerations for reading aloud content with ruby annotations

3.2 Reading aloud both ruby bases and ruby annoations

3.1.1 3.2.1 Furigana, when both read aloud

3.1.1.1 3.2.1.1 Examples of harmful double reading: Japanese

3.1.1.2 3.2.1.2 Examples of harmful double reading: English

3.1.2 3.2.2 Gikun, when both read aloud

3.1.3 3.2.3 Unusual names of people and places, when both read aloud

3.1.4 3.2.4 Interlinear notes, when both read aloud

3.1.5 3.2.5 Double-sided ruby, when both read aloud

3.2 3.3 Reading aloud ruby annotations only

3.2.1 3.3.1 Furigana, when ruby annotations read aloud

3.2.1.1 3.3.1.1 Incorrect pitch accent

3.2.1.2 3.3.1.2 Incorrectly pronouncing non-particle は or へ as particles

3.2.1.3 3.3.1.3 Inconsistency between the first and subsequent occurrences

3.2.2 3.3.2 Gikun, when ruby annotations read aloud

3.2.3 3.3.3 Unusual names of people and places, when ruby annotations read aloud

3.2.4 3.3.4 Interlinear notes, when ruby annotations read aloud

3.2.5 3.3.5 Double-sided ruby, when ruby annotations read aloud

3.3 3.4 Reading aloud ruby bases only

3.3.1 3.4.1 Furigana, when bases read aloud

3.3.2 3.4.2 Gikun, when bases read aloud

3.3.3 3.4.3 Unusual names of people and places, when bases read aloud

3.3.4 3.4.4 Interlinear notes, when bases read aloud

3.3.5 3.4.5 Double-sided ruby, when bases read aloud

4. Miscellaneous issues around ruby markup

4.1 Conversion from small kana characters to full-size kana characters

4.2 A single ruby element or multiple ruby elements per one compound word

4.3 Distinguishing furigana added for enhanced accessibility

4.4 Distinguishing ruby annotations used as gikun or interlinear notes

5. Alternatives to ruby

5.1 SSML

5.2 PLS

6. Information sources relevant to text-to-speech rendering of ruby annotations

7. Use of ruby for automatic braille translation

A. References

A.1 Informative references