Copyright © 2025 World Wide Web Consortium . W3C ® liability , trademark and permissive document license rules apply.
This
document
describes
user
requirements
for
the
related
to
text-to-speech
rendering
of
electronic
documents
that
contain
containing
ruby
annotations.
It
examines
the
roles
and
practices
of
ruby
in
different
writing
systems
and
discusses
the
implications
of
various
reading
strategies
for
text-to-speech,
without
prescribing
algorithms
or
implementation-specific
behavior.
This section describes the status of this document at the time of its publication. A list of current W3C publications and the latest revision of this technical report can be found in the W3C standards and drafts index .
This document was published by the Internationalization Working Group as an Editor's Draft.
Publication as an Editor's Draft does not imply endorsement by W3C and its Members.
This is a draft document and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to cite this document as other than a work in progress.
This document was produced by a group operating under the W3C Patent Policy . W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent that the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy .
This document is governed by the 18 August 2025 W3C Process Document .
This
document
addresses
issues
related
to
text-to-speech
functionality
in
HTML
documents
and
EPUB
publications
that
contain
ruby
annotations.
Although
the
typographic
aspects
of
ruby
are
addressed
in
[
JLREQ
]
and
[
CLREQ
],
the
text-to-speech
implications
have
received
little
attention.
This
document
examines
the
various
uses
of
ruby
and
discusses
how
each
usage
should
be
read
aloud,
the
implications
of
different
reading
strategies,
identifying
the
types
of
information
that
are
may
be
relevant
for
text-to-speech.
It
does
not
specify
processing
procedures
or
algorithms,
nor
does
it
introduce
new
mechanisms
or
markup
related
to
reading
ruby
aloud
beyond
those
already
defined;
such
matters
are
outside
the
scope
of
this
document.
The primary purpose of ruby annotations is to indicate how to pronounce CJK ideographic characters, a practice known as Furigana (see also JLReq terminology ).
In contemporary usage, it is uncommon to attach ruby annotations to all CJK ideographic characters in a given document. Instead, it is more common to attach ruby annotations to only some of the CJK ideographic characters.
Ruby annotations find their application in various contexts, including trade books, newspapers, textbooks, teaching materials, and more, but are rarely utilized in business documents.
Even for simple CJK ideographic characters, ruby annotations may be added for some users who have particular difficulties with CJK ideographic characters (in electronic documents, it is easy to make ruby annotations visible or invisible based on user preferences). Such ruby annotations are referred to as "furigana added for enhanced accessibility".
Some simple CJK ideographic characters have more than one possible reading and thus require ruby annotations for disambiguation. This is common for names of people and places. For example, 山崎 (a person's name) may be read as YAMAZAKI or YAMASAKI.
When ruby annotations are applied only to selected CJK ideographic characters in a document, typically only the first occurrence of such characters or words receives an annotation, and subsequent occurrences do not. This practice assumes that users will learn the correct pronunciation from the first occurrence.
Especially in Japan, ruby annotations are also used to indicate something different from the reading of a CJK ideographic character. Such ruby annotations are referred to as Gikun . Gikun is commonly employed in light novels and comics.
Here are some examples of Gikun:
Even
when
Gikun
is
used
for
a
compound
word,
it
is
unlikely
to
be
repeated
for
later
occurrences
of
the
same
word.
Moreover,
different
GIKUN
may
be
added
for
subsequent
occurrences
of
the
same
word.
For
example,
the
next
occurrence
of
生命
may
well
be
Unusual
names
of
people
in
Japan
are
typically
written
using
CJK
ideographic
characters
but
are
pronounced
quite
differently
from
the
standard
reading
of
these
characters.
For
instance,
Character names in comics, animations, and light novels can sometimes be extremely challenging to pronounce. Many of the character names in Demon Slayer (Kimetsu no Yaiba) fall into this category. For example, almost no one can read 不死川 玄弥 as SHINAZUGAWA GENNYA without assistance.
Names
of
places
can
also
be
difficult
to
read
due
to
historical
reasons.
For
instance,
In many instances, the first occurrence of an unusual name is accompanied by a ruby annotation, but subsequent occurrences are not.
Interlinear notes resemble ruby annotations in appearance. A note in JLreq introduces interlinear notes as follows:
In the example shown in a figure referenced in the quoted note ("An example of a note in inter lines") , 徳川慶喜 (Tokugawa Yoshinobu) is accompanied by an interlinear note "1837-1913 江戸幕府最後の将軍 " (1837-1913 the last shogun of the Edo shogunate). Other examples are: a modern kana phrase as an interlinear note for a historical kana phrase, a standard Japanese expression as an interlinear note for an expression in a dialect, a modern CJK ideographic character as an interlinear note for a traditional CJK ideographic character, an English text chunk as an interlinear note for a Japanese text chunk, and an official name as an interlinear note for an abbreviated name.
One could argue that HTML ruby elements should not be used for representing interlinear notes (see Kobayashi Sensei's mail in Japanese ). However, it is not difficult to imagine that ruby elements are actually used for representing interlinear notes.
Ruby annotations can be used for purposes other than indicating the reading of CJK ideographic text. For example, they may be applied to foreign-language text written in non-native scripts, chemical formulas, mathematical expressions, or other specialized notations, in order to convey pronunciation or supplementary information.
However, such uses of ruby annotations fall outside the scope of this note. This document focuses exclusively on ruby annotations associated with CJK ideographic text, where language-dependent reading behavior and accessibility considerations raise specific and non-trivial issues.
A sequence of characters can be accompanied by two ruby annotations, typically consisting of Furigana and either GIKUN or an interlinear note . In an example provided in JLreq ("An example of ruby annotations attached to both sides of the base characters"), 東南 is accompanied by たつみ and とうなん . Here 東南 means 'southeast', with とうなん (TOUNAN) serving as Furigana , and たつみ (TATSUMI) as GIKUN , as 辰巳 (read as TATSUMI ) indicates the same direction as 東南 .
We offer two additional illustrative examples.
In this example, とうよう serves as Furigana , while オリエント is used as Gikun
In this example, おだのぶなが serves as Furigana , while "1534〜82" is presented as an interlinear note .
There
are
three
possible
options:
options
for
text-to-speech
rendering:
(1)
reading
aloud
both
ruby
bases
and
ruby
annotations,
(2)
reading
aloud
ruby
annotations
only,
and
(3)
reading
aloud
ruby
bases
only.
This
section
evaluates
the
consequences
of
these
options,
assuming
the
roles
and
semantics
of
ruby
annotations
described
in
2.
Roles
of
ruby
annotations
.
In
this
option,
both
ruby
bases
and
ruby
annotations
are
read
aloud
(double
reading).
Many
implementations
(screen
readers,
in
particular)
support
only
the
option
of
reading
both
ruby
bases
and
annotations.
For
example,
The option of reading aloud both interferes with readers' understanding significantly.
彼の名前は
This sentence is intended to mean "His name is Dewanai". Double reading completely changes the meaning: it will be interpreted as "His name is not Dewanai".
それでは
This sentence is intened to mean "Nonsense!". Double reading completely changes the meaning: it will be interpreted as "You have to deal with it".
Consider
this
English
sentence
having
a
ruby
annotation:
"My
name
is
Double reading completely changes the meaning: it will be interpreted as "My name is not Knot".
Another
example:
"There
is
a
road
in
Autin
spelled
both
Double reading makes the road name read aloud twice, possibly differently.
Yet
another
example:
"
Double reading makes this compound name read aloud twice, possibly differently.
The option of reading aloud both is sensible. It is common to read aloud ruby annotations first then ruby bases next, but it is sometimes better to read aloud ruby bases first and ruby annotations next [ Transliteration Training Course ]).
The option of reading aloud both interferes with readers' understanding significantly.
The option of reading aloud both is sensible. It is necessary to read aloud ruby bases first then ruby annotations next.
For
example,
Since there are two ruby annotations, double-sided ruby leads to reading aloud three times. One of the ruby annotations is typically furigana, so the description in 3.1.1 applies. If the other ruby annotation is a Gikun, the description in 3.1.2 applies; if it is an interlinear note, the description in 3.1.4 applies.
In
this
option,
ruby
annotations
are
read
aloud
but
ruby
bases
are
not.
For
example,
Even native Japanese speakers may easily assume, without thorough consideration, that reading only ruby annotations used as furigana aloud will provide reasonable results. This common assumption, however, is not always correct.
Each hiragarana character represents a mora (a basic timing unit in phonology), which is typically a single vowel or a consonant followed by a single vowel. The same sequence of moras may mean different words depending on the pitch accent. For example, both 雨 (rain) and 飴 (candy) consists of the same moras: あ and め . However, if the Tokyo accent is used as a basis, the first mora in 雨 has a high pitch, and the second has a low pitch; 飴 has the opposite pitch accent.
Reading
aloud
ruby
annotations
rather
than
ruby
bases
often
leads
to
incorrect
pitch
accent.
As
an
example,
consider
A
similar
example
is
In modern Japanese, there is basically only one way to read each hiragana character. But は and へ are exceptions. は is usually read aloud as HA but is read aloud as WA when it is used as a particle. Likewise, へ is usually read aloud as HE but is read aloud as E when it is used as a particle.
Reading aloud ruby annotations rather than ruby bases implies that CJK ideographic characters in ruby bases will not be passed to the TTS engine, only hiragana characters in ruby annotations will be.
Without
CJK
ideographic
characters,
Japanese
morphological
analysis
is
likely
to
fail.
For
example,
consider
やがて
Here are some further examples. All occurrences of は and へ in ruby annotations are likely to be mistakenly read aloud.
As
described
in
2.1
2.1.1
,
furigana
as
a
ruby
annotation
may
be
attached
to
only
the
first
occurrence
of
a
CJK
ideographic
character
or
a
word
composed
from
such
characters.
Thus,
there
is
a
risk
that
the
first
occurrence
and
the
others
are
read
aloud
differently.
For
example,
consider
智子
as
the
name
of
a
character
in
the
novel.
Tthere
are
several
possible
readings
of
this
name,
such
as
さとこ
and
ともこ
.
If
さとこ
as
a
ruby
annotation
is
attached
only
to
the
first
occurrence
of
the
name,
it
will
be
read
as
さとこ
and
the
other
occurrences
may
be
read
as
ともこ
.
The
reader
would
then
think
that
さとこ
and
ともこ
are
different
characters.
The option of reading aloud ruby annotations only provides an understandable result but does not properly convey the author's intention.
The option of reading aloud ruby annotations only works correctly. However, if the first occurrence of a name is accompanied by a ruby annotation and the other occurrences are not, the first occurrence is read aloud differently from the others thus suggesting different persons or places.
For
example,
The option of reading aloud ruby annotations only provides incomprehensible results often.
If "1837-1913 江戸幕府最後の将軍" is attached to 徳川慶喜 as a ruby annotation, it will be read aloud as 1837-1913 EDOBAKUFU SAIGO NO SHOGUN (1837-1913 the last shogun of the Edo shogunate), which is reasonable. But if only "1837-1913" is attached as a ruby annotation, the result is 1837-1913, which does not make any sense.
The option of reading aloud ruby annotations only makes two ruby annotations be read aloud while ignoring their ruby base. Since one of the two ruby annotations is typically furigana, the description in 3.2.1 applies. If the other ruby annotation is a Gikun, the description in 3.2.2 applies; if it is an interlinear note, the description in 3.2.4 applies.
In
this
option,
ruby
bases
are
read
aloud
but
ruby
annotations
are
not.
For
example,
The option of reading aloud ruby bases only may or may not provide good results, depending on text-to-speech engines.
The following is a quote from [ ACCESSIBLE_E_BOOKS ].
Furthermore, compound words made up from CJK ideographic characters in JIS X 0208 are sometimes read aloud incorrectly.
As the importance of accessibility is well recognized and text-to-speech engines are improved, more and more words will be read aloud correctly. However, there are some words, such as the aforementioned YAMAZAKI, that cannot be read aloud correctly by text-to-speech engines and even native Japanese speakers.
The option of reading aloud ruby bases only results in a perfectly understandable result. However, since gikun is ignored, the author's intent is not completely conveyed.
The option of reading ruby bases only leads to incorrect results. However, since every occurrence of a name is read aloud in the same way, users will not be confused.
Every
occurrence
The option of reading ruby bases only provides a perfectly understandable result. However, since interlinear notes are ignored, the author's intention is not conveyed well.
The option of reading ruby bases only will ignore the two ruby annotations and read their ruby base only. When one of the two ruby annotations is furigana, the description in 3.3.1 applies. If the other is a gikun, the description in 3.3.2 applies, and if it is an interlinear note, the description in 3.3.4 applies.
Small kana characters ゃ , ゅ , ょ , and っ are too small when they appear in ruby annotations. For this reason, instead of these small characters, full-size kana characters や , ゆ , よ , and つ are used in ruby annotations.
However, since full-size kana characters are pronounced differently from small kana, ruby annotations containing full-size kana are read aloud differently.
CSS has a mechanism for overcoming this problem. Value ' full-size-kana ' of the text-transform property as specified in CSS Text converts small kana characters to full-size kana. It is thus possible to use small kana in ruby annotations while rendering them using full-size kana. Text-to-speech engines can provide correct results even when ruby annotations are read aloud.
When
attaching
a
ruby
annotation
to
a
compound
word
consisting
of
multiple
CJK
ideographic
characters
in
an
HTML
or
EPUB
document,
one
way
is
to
create
a
single
HTML
ruby
element
for
the
entire
word.
However,
in
some
cases,
a
separate
ruby
element
is
created
for
each
CJK
ideographic
character.
For
example,
to
attach
the
ruby
annotation
せいめい
to
the
word
生命
(meaning
“life”
in
Japanese),
the
typical
approach
is
to
create
a
single
ruby
element
for
this
word.
This
ruby
element
may
have
a
single
rt
element
for
“せいめい”
or
two
rt
elements
(one
for
“せい”
and
another
for
“めい”).
However,
it
is
not
entirely
uncommon
to
see
two
ruby
elements
for
this
word:
one
for
“生”
and
another
for
“命”.
Some people argue that creating a ruby element per compound word is better than creating a ruby element for each character in a compound word. They argue that it becomes easier for the text-to-speech engine to maintain a correspondence table between ruby bases and ruby annotations so that subsequent occurrences of the compound word without ruby can be pronounced correctly.
Meanwhile,
others
argue
that
there
is
a
good
reason
to
attach
ruby
annotations
to
some,
but
not
all,
characters
in
a
compound
word.
For
example,
consider
佳人
,
where
佳
is
taught
in
junior
high
schools
while
人
is
taught
in
the
first
grade
of
elementary
schools.
Therefore,
it
makse
sense
to
attach
a
ruby
annotation
to
佳
only
(one
ruby
element
for
佳
and
no
ruby
element
for
人
).
Similarly,
it
is
reasonable
to
attach
ruby
annotations
to
the
first
and
third
CJK
idegraphic
characters
in
屯田兵
only
but
not
to
the
second
one
(thus,
two
ruby
elements).
Although
furigana
added
for
enhanced
accessibility
is
necessary
for
those
readers
who
have
particular
difficulties
with
CJK
ideographic
characters,
it
is
unnecessary
or
slightly
disturbing
for
others.
If
furigana
added
for
enhanced
accessibility
is
distinguishable
from
normal
furigana,
it
can
be
made
visible
or
invisible
depending
on
user
preferences.
It
is
thus
necessary
desirable
to
have
a
standardized
way
to
indicate
furigana
added
for
enhanced
accessibility.
In
Section
3,
3.
Which
should
be
read
aloud,
ruby
bases
or
ruby
annotations,
or
both?
,
we
have
seen
that
ruby
annotations
used
as
gikun
or
interlinear
notes
should
be
read
aloud
differently
from
the
other
cases.
Therefore,
it
is
necessary
important
to
be
able
to
distinguish
ruby
annotations
used
as
gikun
or
interlinear
notes
from
other
ruby
annotations,
annotations
in
order
to
ensure
support
appropriate
reading
behavior.
How
such
distinctions
are
made
—
whether
through
linguistic
analysis,
lexical
resources,
markup
conventions,
or
other
means
—
is
outside
the
scope
of
this
document.
[ SSML ] and [ PRONUNCIATION-LEXICON ] offer alternatives for conveying phonemic and phonetic pronunciations of CJK ideographic characters to speech synthesis engines. These methods are not intended for visual presentations but can offer superior control over text-to-speech compared to using ruby annotations.
[ SSML ] employs symbol collections (such as IPA and [ JEITA_IT-4006 ]) to represent the sounds of human languages. Phonemic and phonetic pronunciations are conveyed through sequences of these symbols.
[ epub-32 ] allows the use of SSML attributes within XHTML content documents in EPUB publications. In [ epub-33 ], these attributes are relocated to [ epub-tts-10 ]. Meanwhile, the W3C Accessible Platform Architectures Working Group is developing [ spoken-html ], which outlines two potential methods for incorporating SSML attributes into HTML elements.
In Japan, SSML finds extensive application in digital textbooks, adopted by the biggest textbook publisher in Japan. However, it has been noted that attaching SSML attributes to CJK ideographic characters significantly raises authoring costs. In the case of DAISY textbooks in Japan, SSML is not used, as they contain recorded voice. Trade books in Japan do not typically employ SSML either.
PLS ([ PRONUNCIATION-LEXICON ]) enables the use of pronunciation lexicons, which map words to sequences of symbol collections such as those found in IPA or [ JEITA_IT-4006 ].
While SSML attributes are embedded within XHTML content documents in EPUB publications, PLS lexicons in EPUB publications are stored externally to and referenced by XHTML content documents (see Pronunciation Lexicons section in [ epub-tts-10 ]). As of the present, [ spoken-html ] does not offer a mechanism for associationg PLS lexicons with HTML documents.
PLS is a robust tool for rendering unusual names of people and places in text-to-speech applications. In particular, PLS allows every occurrence of a word or phrase to be consistently pronounced, regardless of the presence of ruby annotations. At the time of this writing, PLS is used by at least one digital textbook publisher in Japan.
This
section
summarizes
the
kinds
of
information
that
may
be
relevant
to
spoken
presentation
text-to-speech
rendering
of
content
containing
ruby
annotations.
It
does
not
introduce
new
requirements
or
prescribe
how
such
information
is
to
be
used,
but
instead
enumerates
the
information
sources
discussed
in
previous
sections.
These information sources include, but are not limited to:
In many cases, multiple sources of information may be considered together. For example, morphological analysis of the ruby base may affect how adjacent particles are pronounced, while PLS or SSML data may override default pronunciations. Inconsistencies between ruby base and annotation can also provide clues for determining whether the ruby is phonetic or non-phonetic in nature.
In
summary,
spoken
presentation
text-to-speech
rendering
of
content
containing
ruby
annotations
may
draw
on
multiple
layers
of
information,
including
linguistic,
typographic,
and
author-supplied
data.
This
document
identifies
these
information
sources
but
intentionally
avoids
prescribing
algorithms,
prioritization
strategies,
or
weighting
mechanisms.
Such
implementation
decisions
may
evolve
through
heuristics,
rule-based
systems,
or
AI-based
approaches,
and
are
expected
to
be
documented
elsewhere.
The conversion of HTML documents and EPUB publications to braille is expected to become increasingly important in the near future.
Japanese braille lacks CJK ideographic characters and does not distinguish between hiragana and katakana. (Note: Han braille in Japan does include CJK ideographic characters, but it is not widely used.)
Braille exhibits some syntactical differences from the Japanese writing system. First, space characters are inserted as delimiters between words. Second, two Japanese particles, は and へ , are transcribed as they are pronounced, meaning は and へ are represented as if they were わ and え , respectively. Third, う pronounced as an elongated sound is represented using the long vowel character. For example, to tranlsate たいよう to braille, たいよう is first converted to たいよー and then translated to braille.
Natural language processing is required to handle these differences during the conversion to braille. However, unlike the case of text-to-speech, intonation is not relevant.
When converting HTML or EPUB content to braille, it is essential to select the correct reading for each CJK ideographic character. Choosing an incorrect reading can result in erroneous braille output. Similar to text-to-speech, ruby annotations provide valuable hints, while [ SSML ] and PLS ([ PRONUNCIATION-LEXICON ]) serve as effective alternatives.
For furigana and the transcription of unusual names of people and places, natural language processing is more effective when using ruby bases (typically containing CJK ideographic characters) as the foundation. In contrast, the correct readings are chosen when using ruby annotations as the basis. It is also possible to combine both ruby bases and ruby annotations.