Rules for Simple Placement of Japanese Ruby

W3C Editor's Draft

This version:
https://w3c.github.io/simple-ruby/
Latest published version:
https://www.w3.org/TR/simple-ruby/
Latest editor's draft:
https://w3c.github.io/simple-ruby/
Previous editor's draft:
https://w3c.github.io/jlreq/docs/simple-ruby/
Editor:
Florian Rivoal (Invited Expert)
Author:
Toshi Kobayashi
Participate:
GitHub w3c/simple-ruby
File a bug
Commit history
Pull requests

This document is also available in this non-normative format: Original Japanese (PDF)


Abstract

This document describes a simple method of ruby composition for Japanese layout realized with technologies like CSS, SVG, and XML-FO, as information for rendering engine implementers. Unlike JLReq [JLREQ], only one layout method for each case is presented in this document, with consideration of best practices and important points in Japanese layout. Points took in consideration are described in § 1. Matters considered by the simple placement rules. Also layout of the double-sided ruby, which has two distinct runs of ruby text attached to the same ruby base character string, is added in this document which is not described in [JLREQ].

[JLREQ] in one part is a record of Japanese layout that has been established in printing industry. It explains multiple ways for one thing, and sometimes they can be very complex. Ruby is one such case. There are so many factors to consider and often requirements contradict each other (c.f. Note. "Protrusion of ruby from base characters"). It is challenging to automate ruby because of the complexity.

It would seem beneficial to come up with a method that is simple and robust, and one that is suitable for automatic processing. The positioning might not be as sophisticated but we must at least make sure that it causes no misunderstanding.

The following is a proposal for a simple processing system. The target audience is implementers and specification writers. It is expected that a full system may be more complex that what is described here, both due to the interaction with other features or other writing systems, and because those designing such system may wish to provide alternative options. Note that the terminology is based on that defined in [JLREQ].

Status of This Document

This is a preview

Do not attempt to implement this version of the specification. Do not reference this version as authoritative in any way. Instead, see https://w3c.github.io/simple-ruby/ for the Editor's draft.

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at https://www.w3.org/TR/.

This document was initially written in Japanese and translated to English by the Japanese Writing Technology Working Group of the Advanced Publishing Laboratory of Keio University.

It represents the subjective view of its authors and contributors as to one possible approach to address the problem, and does not claim to be the only possible solution. It is submitted to present a non-Japanese speaking audience with this particular approach, and to encourage discussion of this topic.

The original Japanese version is available in PDF format.

This document was published by the Internationalization Working Group as an Editor's Draft.

GitHub Issues are preferred for discussion of this specification.

Publication as an Editor's Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

This document was produced by a group operating under the W3C Patent Policy. The group does not expect this document to become a W3C Recommendation. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.

This document is governed by the 1 March 2019 W3C Process Document.

1. Matters considered by the simple placement rules

Ruby is the name given to the small annotations in Japanese content that are rendered alongside base text, usually to provide a pronunciation guide, but sometimes to provide other information. (See the article “What is ruby” by the internationalization Working Group for more information.)

Note: Ruby as notes

1.1 The Difficulties of Ruby Processing

When performing ruby layout in Japanese, the following factors need to be considered in order to decide on the position:

  1. How to handle the correspondence between the base characters and the ruby
  2. What to do when the string of base characters is longer than the ruby string
  3. What to do when the string of base characters is shorter than the ruby string
  4. When the ruby string protrudes from the base character string, whether it can be allowed to be laid over the characters preceding or following, and whether this affects the position of the base characters

    Note: Protrusion of ruby from base characters
  5. When the ruby string protrudes from the base character string, and the base character string is at the start or the end of the line, whether the base character string or the ruby string should be aligned with the line edge
  6. When there are multiple base characters, whether there can be line wrap opportunity between them

In movable type typography, such matters were resolved based generic principles, and could always be corrected during the proofreading phase. Essentially, each case was adjusted individually in a flexible manner.

In computer-based typesetting, the layout needs to be more or less determined based on predetermined rules, but it remained necessary to adjust the results in certain cases, for example by changing the association between base characters and the ruby string, or by switching to a different placement policy.

When thinking about computing placement for web content, it is not practical to decide on the positioning case by case as was done in movable type typography. It is therefore necessary to decide upon comprehensive rules that provide solutions to all the problems listed above, so that placement may be determined fully automatically. Considering all the possibilities that existed in movable type typesetting, the system to be designed needs to be very complex.

1.2 Matters considered by the placement rules

Here are the fundamental assumptions underlying the simple placement rules.

  1. Ruby is used to display the reading or the meaning of the base characters. Therefore, the number one priority here is to avoid misreadings. Specifically, the ruby string which protrudes from the base character string is not allowed to be laid over the characters preceding or following, whether it is a Kanji or Kana character.
    Note: Protruding over surrounding characters
  2. The method is agnostic to horizontal vs vertical writing, and will use the same logic in either case. Specifically, the center of the ruby string and of the base character string are aligned in the inline direction for mono-ruby.
  3. Two-step processing method is taken. In the first step, processing of layout only considers about the ruby string and the base character string (collectively call both of them as the ruby block in this document), to decide relative position of the ruby string and the ruby base character string. In the second step, processing of layout decides a position of the ruby base character string in a line, with consideration of preceding and following characters. In other words, the relative position of the ruby string and the ruby base character string decided in the first step is not modified regarding of any preceding and following characters. Also, this document does not take a method to align the first or last character of the ruby base character string to the line head or the line end, by modifying the relative position of the ruby string and the ruby base character string when the ruby base character string is placed at the line head or the line end. Summarizing the above, resulting positionings by the first step are not modified by the second step at all.
  4. Although there are cases where multiple ways of positioning ruby are shown in [JLREQ] and JIS X 4051 [JISX4051], this document only describes one method based on the policies described above. Also methods described in this document are mostly chosen from ones provided in JIS X 4051 [JISX4051]. In some cases, this document picks optional methods to be allowed as implementation defined, such that protruding ruby string is not laid over any preceding and following Kana characters.
  5. There is a demand to use larger (or smaller) font size for ruby string. In this document, the default font size of ruby string is set to half of the font size of ruby base character string, and examples in figures are shown with the default font size. Sizes of spacing adjustments during justification are defined based on the font size of ruby base character string but not of ruby string, and this makes methods of layout are applicable for cases whose font size of ruby string is not a half of its ruby base character string.
Note: Wrap opportunities

1.3 Types of ruby

Ruby in Japanese may be divided into the following 3 different types, based on the relationship between the ruby and the base characters (see JLReq “3.3.1 Usage of Ruby” [JLREQ]).

  1. Mono-ruby
  2. Jukugo-ruby
  3. Group-ruby
Figure 4 Types of ruby

Which one to use depends on the relationship between the ruby and the base characters. Mono-ruby is used to connect ruby to a single base character, Jukugo-ruby is used when multiple base characters each have a corresponding ruby and at the same time the whole group needs to be processed together, and group-ruby is used when ruby is attached to a group of base characters together (see Figure 4). Each is used when specified.

2. Rules for Simple Placement of Japanese Ruby

2.1 Ruby character size and character placement

The size of the ruby characters and their placement in the inline direction relative to the base characters is as follows:

  1. The size of the ruby is by default set to half of the size of the base characters.
  2. In vertical text, ruby is placed to the right of the base characters, and the character frame of the ruby is placed flush against the character frame of the base characters.
    Figure 5 Example of vertical ruby
  3. In horizontal text, ruby is placed to the top of the base characters, and the character frame of the ruby is placed flush against the character frame of the base characters.
    Figure 6 Example of horizontal ruby

The following sections describe in detail the placement of mono-ruby, jukugo-ruby, and group-ruby. However, since jukugo-ruby is more complex, it is explained last.

2.2 Placement of mono-ruby

Mono-ruby is placed as follows. To align following items to the two-step processing method described in § 1.2 Matters considered by the placement rules, points 1, 2, and 3 are of the first step, and points 4 and 5 are of the second step.

  1. When the ruby is made of two or more characters, each character in the ruby string is placed immediately next to its neighboring character, without any inter-letter spacing. Furthermore, when the ruby is composed of characters such as Grouped numerals (cl-24), Unit symbols (cl-25), Western word space (cl-26), or Western characters (cl-27) [JLREQ] which have their own individual width, they are placed based on each character’s metrics.

    Figure 7 Example mono-ruby with western characters
  2. The center of the ruby string and of the base character string are aligned in the inline direction. (see Figure 8).
  3. Since the base character and its associated ruby form a single unit there is no line wrapping opportunity inside a mono-ruby.
  4. When the ruby string is longer than the base character string, the part of the ruby string that extends beyond the base characters must not hang over the characters preceding or following (see Figure 8). Space is introduced accordingly between these preceding or following characters and the base characters.

    Figure 8 Example 1 of mono-ruby protruding
    However, in the following punctuation marks like Full stops (cl-06) [JLREQ] which have spacing before or after the symbols, the ruby characters do hang over the preceding or following characters (see Figure 9). (Punctuation marks like Full stops (cl-06) [JLREQ] play an important role as breaks between sentences, it is desired to keep constant spacing for preceding or following of these characters that having extra spacing around these characters could change a meaning of breaks between sentences. Also there is no issue like ones noted in note "Protrusion of ruby from base characters". Therefore, this method places a different layout on punctuation marks like Full stops (cl-06) [JLREQ]. )
    Figure 9 Example 2 of mono-ruby protruding
  5. When the ruby string is longer than the base character string, and the ruby falls at the start of the line, then the start of the ruby string is aligned with the line’s start edge (see Figure 10), while if the ruby falls at the end of the line, then the end of the ruby string is aligned with the line’s end edge (see Figure 11).

    Figure 10 Example of mono-ruby at the line start
    Figure 11 Example of mono-ruby at the line end

2.3 Placement of group-ruby

In this section, placement rules of group-ruby are described as combinations of two groups of characters, one as "Western characters" which has proportional width and consisted with characters like Grouped numerals (cl-24), Unit symbols (cl-25), Western word space (cl-26), and Western characters (cl-27) [JLREQ], and another as "Japanese characters" which has fixed fullwidth (see also 2.1.2 Kanji, Hiragana and Katakana [JLREQ]) and consisted with characters like Hiragana (cl-15), Katakana (cl-16), and Ideographic characters (cl-19) [JLREQ]. For Western characters, strings are read by clusters of multiple characters, it is desired to avoid adding spacing between characters for justification. The way they are positioned depends on how their respective lengths would compare if they were each laid out without any inter-letter spacing. When their respective lengths would be the same, both are laid out without inter-letter spacing and placed such that their respective centers in the inline direction are aligned (see Figure 12). For other cases, the placement depends on the following:

Figure 12 Example 1 of group-ruby
Note: Inter-letter spacing in group-ruby

To align following items to the two-step processing method described in § 1.2 Matters considered by the placement rules, points 1, 2, and 3 are of the first step, and points 4 and 5 are of the second step.

  1. For both of ruby string and ruby base character string are consisted with "Japanese characters", the placement depends on the following:

    • When the ruby string is shorter than the base character string, space is inserted between every character in the ruby string as well as at the start and the end of the ruby string so that it becomes the same length as the base character string, then their centers in the inline direction are aligned. The size of the space inserted between each of the ruby characters is twice the size of the space inserted at the end and at the start (see Figure 13).

      Figure 13 Example 2 of group-ruby
      However, the size space inserted at the start and end must be capped at no more than half the size of one base character, and the space inserted between each ruby character is enlarged to compensate (see Figure 14).
      Figure 14 Example 3 of group-ruby
    • When the ruby string is longer than the base character string, space is inserted between every character in the base character string as well as at the start and the end of the base character string so that it becomes the same length as the ruby string, then their centers in the inline direction are aligned. The size of the space inserted between each of the base characters is twice the size of the space inserted at the end and at the start (see Figure 15).

      Figure 15 Example 4 of group-ruby
  2. For ruby string is consisted with "Japanese characters" and ruby base character string is consisted with Western characters, the placement depends on the following (see Figure 16):

    • When the ruby string is shorter than the base character string, space is inserted between every character in the ruby string as well as at the start and the end of the ruby string so that it becomes the same length as the base character string, then their centers in the inline direction are aligned. The size of the space inserted between each of the ruby characters is twice the size of the space inserted at the end and at the start.
    • When the ruby string is longer than the base character string, both are laid out without inter-letter spacing and placed such that their respective centers in the inline direction are aligned. In this case, the ruby string protrudes from the base character string.
    Figure 16 Example of ruby with western characters
  3. For ruby string is consisted with Western characters and ruby base character string is consisted with "Japanese characters", the placement depends on the following (see Figure 16):

    • When the ruby string is shorter than the base character string, both are laid out without inter-letter spacing and placed such that their respective centers in the inline direction are aligned.
    • When the ruby string is longer than the base character string, space is inserted between every character in the base character string as well as at the start and the end of the base character string so that it becomes the same length as the ruby string, then their centers in the inline direction are aligned. The size of the space inserted between each of the base characters is twice the size of the space inserted at the end and at the start.
  4. When the ruby string is longer than the base character string and protrudes, whether and how it hangs over characters preceding or following the base character string is handled in the same way as with mono-ruby (see Figure 17). Also, when the ruby string is longer than the base character string, protrudes, and is located at the start or end of the line, the resulting layout is also identical to that of mono-ruby.

    Figure 17 Example of protruding group-ruby
  5. In the case of group ruby, the base character string and its associated ruby string are treated as a unit, so there is no line wrapping opportunity inside either string.

    Note: Wrap opportunities in group-ruby

2.4 Placement of Jukugo-ruby

Jukugo-ruby is placed as follows:

To align following items to the two-step processing method described in § 1.2 Matters considered by the placement rules, points 1, 2, and 3 are of the first step, and point 4 is of the second step.

  1. With jukugo-ruby, each base character is associated with its own ruby string. When the length of each of these ruby string laid out without inter-letter spacing is shorter than the length of all their corresponding base characters, placement is determined as follows:

    • When the ruby string associated with an individual base character is 1 character long, the ruby character and the base character are placed such that their respective centers in the inline direction are aligned (see Figure 19).
      Figure 19 Example 1 of jukugo-ruby
    • When the ruby string associated with an individual base character is 2 characters long or more, the ruby string is laid out without inter-letter spacing, and placed such that its center and the center of its base character are aligned in the inline direction (see Figure 19).
  2. For simple ruby implementations, if even a single ruby string is longer than its corresponding base character when laid out without inter-letter spacing, the resulting layout would look identical to group-ruby (see Figure 20 and Figure 21).

    Figure 20 Example 2 of jukugo-ruby
    Figure 21 Example 3 of jukugo-ruby
  3. With jukugo-ruby, individual base characters and their associated ruby string are treated as a unit, and line wrap opportunities are allowed between two base characters. When such a line wrap occurs, if a single base character that is part of the jukugo is placed alone at the end or at the start of a line, it is laid out identically to mono-ruby; conversely when several base characters that are part of the jukugo are placed together at the end or start of a line, they are laid out together as has been described in this section about jukugo-ruby (see Figure 22).
    Figure 22 Example of wrapping jukugo-ruby
  4. When the ruby string is longer than the base character string and protrudes, whether and how it hangs over characters preceding or following the base character string is handled in the same way as with mono-ruby. Also, when the ruby string is longer than the base character string, protrudes, and is located at the start or end of the line, the resulting layout is also identical to that of mono-ruby.

3. Placement of Double-Sided Ruby

3.1 Placement of Double-Sided Ruby by Combination of Type of Ruby

Quite complexed methods are required on full rules for placement of double-sided ruby composition. For simple placement of double-sided ruby, rules could be written per combinations of mono-ruby, group-ruby, and jukugo-ruby for two sides. As the same as the two-step processing, consideration of the ruby string that extended beyond the ruby base characters with preceding and following characters, and placement at the line head or the line end are processed as the same way as when the ruby string is used for one side.

Note: Space between adjacent lines and double-sided ruby

3.2 Combination of type of ruby

Possible combinations of type of ruby are as follows:

  1. Mono-ruby and mono-ruby
  2. Group-ruby and group-ruby
  3. Mono-ruby and group-ruby
  4. Mono-ruby and jukugo-ruby
  5. Jukugo-ruby with group-ruby or jukugo-ruby

3.3 Rules for Placement of Double-Sided Ruby per Combinations

In JIS X 4051 [JISX4051], first, second, and third cases in above list of combinations are ruled. (see note in JLReq [JLREQ]) A rule of placement of the third case is to process continuous mono-ruby as group-ruby, and the same as the second case as a result.

For the fourth case of mono-ruby and jukugo-ruby, the first case is applicable with dividing jukugo-ruby into continuous mono-ruby by picking individual pairs of Kanji character as ruby base character and ruby string. For the fifth case of jukugo-ruby with group-ruby or jukugo-ruby, the second case is applicable with handling jukugo-ruby as group-ruby.

In this section, rules for simple placement of double-sided ruby on first and second cases as follows:

In addition, disposition of two ruby strings to two sides follows specified by the contents.

3.4 Placement of combination of mono-ruby and mono-ruby

In a case of combination of mono-ruby and mono-ruby, ruby strings are set solid, and ruby strings are placed so that their center match that of the ruby base character (see Figure 23). For other points, follow the same rules for placement of mono-ruby described in § 2.2 Placement of mono-ruby.

Figure 23 Double-sided rruby example with both mono-ruby

3.5 Placement of combination of group-ruby and group-ruby

When both of the ruby string are shorter than the ruby base character string, follow the rules for placement of group-ruby described in § 2.3 Placement of group-ruby. When the ruby string is consisted with "Japanese characters" defined in § 2.3 Placement of group-ruby, spacing is inserted between every character in the ruby string as well as the start and the end of the ruby string. (see Figure 24).

Figure 24 Double-sided rruby example 1 with both group-ruby

When on of the ruby strings is longer than the base character string, pick up the ruby string with longer length and place that ruby string following the rules for placement of group-ruby described in § 2.3 Placement of group-ruby. When the ruby base character string is consisted with "Japanese characters" defined in § 2.3 Placement of group-ruby, spacing is inserted between every character in the ruby base character string as well as the start and the end of the ruby base character string. Following placement of the ruby base character string, place the shorter ruby string based on the length of the ruby base character string without spacing at the start and the end, but with inter-character spacing when the ruby base character string is "Japanese characters".

When the length of the shorter ruby string is longer than the ruby base character string with inter-character spacing, the shorter ruby string is set solid and ruby string is placed so that its center match that of the ruby base character string (see Figure 25).

Figure 25 Double-sided ruby example 2 with both group-ruby

When the length of the shorter ruby string is shorter than the ruby base character string with inter-character spacing, follow the rules for placement of group-ruby described in § 2.3 Placement of group-ruby, using the length of the ruby base character string with inter-character spacing. When the shorter ruby string is consisted with "Japanese characters" described in § 2.3 Placement of group-ruby, spacing is inserted between every character in the ruby string as well as the start and the end of the ruby string (see Figure 26).

Figure 26 Double-sided ruby example 3 with both group-ruby

For other points, follow the same rules for placement of mono-ruby described in § 2.3 Placement of group-ruby.

A. References

A.1 Informative references

[JISX4051]
Formatting rules for Japanese documents (『日本語文書の組版方法』; JIS X 4051). Japanese Standards Association. 2004.
[JLREQ]
Requirements for Japanese Text Layout. Yasuhiro Anan; Hiroyuki Chiba; Junzaburo Edamoto; Richard Ishida; Tatsuo KOBAYASHI; Toshi Kobayashi; Kenzou Onozawa; Felix Sasaki; Seiichi Kato; Hajime Shiozawa et al. W3C. 3 April 2012. W3C Note. URL: https://www.w3.org/TR/jlreq/