Web Publications

W3C Editor's Draft

This version:
https://w3c.github.io/wpub/
Latest published version:
https://www.w3.org/TR/wpub/
Latest editor's draft:
https://w3c.github.io/wpub/
Editors:
Matt Garrish (DAISY Consortium)
Ivan Herman (W3C) orcid logo
Authors:
Baldur Bjarnason (The Rebus Foundation)
Timothy W. Cole (University of Illinois at Urbana-Champaign) orcid logo
Dave Cramer (Hachette Livre)
Hadrien Gardeur (Feedbooks)
Florian Rivoal (Vivliostyle Inc.)
Participate:
GitHub w3c/wpub
File a bug
Commit history
Pull requests

Abstract

This specification defines a collection of information that describes the structure of Web Publications so that user agents can provide user experiences tailored to reading publications, such as sequential navigation and offline reading. This information includes the default reading order, a list of resources, and publication-wide metadata.

Status of This Document

This is a preview

Do not attempt to implement this version of the specification. Do not reference this version as authoritative in any way. Instead, see https://w3c.github.io/wpub/ for the Editor's draft.

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at https://www.w3.org/TR/.

This draft provides a preliminary outline of a Web Publication. Many details are under active consideration within the Publishing Working Group and are subject to change. The most prominent known issues have been identified in this document and links provided to comment on them.

This document was published by the Publishing Working Group as an Editor's Draft. Comments regarding this document are welcome. Please send them to public-publ-wg@w3.org (subscribe, archives).

Publication as an Editor's Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

This document was produced by a group operating under the W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.

This document is governed by the 1 February 2018 W3C Process Document.

1. Introduction

1.1 What is a Web Publication

This section is non-normative.

A Web Publication is a discoverable and identifiable collection of resources. Information about the Web Publication is expressed in a machine-readable document called a manifest, which is what enables user agents to understand the bounds of the Web Publication and the connection between its resources.

The manifest includes metadata that describe the Web Publication, as a publication has an identity and nature beyond its constituent resources. The manifest also provides a list of all the resources that belong to the Web Publication and a default reading order, which is how it connects resources into a single contiguous work.

A Web Publication is discoverable in one of two ways: resources either include a link to the manifest (via an HTTP Link header or an HTML link element [html]), or the manifest can be loaded directly by a compatible user agent.

With the establishment of Web Publications, user agents can build new experiences tailored specifically for their unique reading needs.

Flowchart depicts the resources of a Web Publication, their attachment to a manifest, and its relationship to the infoset.

Figure 1 Simplified Diagram of the Structure of Web Publications.
A description of the structure diagram is available in the Appendix. Image available in SVG and PNG formats.

1.2 Scope

This section is non-normative.

This specification only defines requirements for the production and rendering of valid Web Publications. As much as possible, it leverages existing Open Web Platform technologies to achieve its goal—that being to allow for a measure of boundedness on the Web without changing the way that the Web itself operates.

Moreover, the specification is designed to adapt automatically to updates to Open Web Platform technologies in order to ensure that Web Publications continue to interoperate seamlessly as the Web evolves (e.g., by referencing the latest published versions instead of specific dated versions).

Further, this specification does not attempt to constrain the nature of a Web Publication: any type of work that can be represented on the Web constitutes a potential Web Publication.

The specification is also intended to facilitate different user agent architectures for the consumption of Web Publications. While a primary goal is that traditional Web user agents (browsers) will be able to consume Web Publications, this should not limit the capabilities of any other possible type of user agent (e.g., applications, whether standalone or running within a user agent, or even Web Publications that include their own user interface). As a result, the specification does not attempt to architect required solutions for situations whose expected outcome will vary depending on the nature of the user agent and the expectations of the user (e.g., how to prompt to initiate a Web Publication, or at what point or how much of a Web Publication to cache for offline use).

1.3 Relationship to Other Specifications

1.3.1 Web App Manifest

Editor's note

We may want to write something here on the relationships...

1.4 Terminology

This document uses terminology defined by the W3C Note "Publishing and Linking on the Web" [publishing-linking], including, in particular, user, user agent, browser, and address.

Identifier

An identifier is metadata that can be used to refer to Web Content in a persistent and unambiguous manner. URLs, URNs, DOIs, ISBNs, and PURLs are all examples of persistent identifiers frequently used in publishing.

Manifest

A manifest represents structured information about a Web Publication, such as informative metadata, a list of all resources, and a default reading order.

Non-empty

For the purposes of this specification, non-empty is used to refer to an element, attribute or property whose text content or value consists of one or more characters after whitespace normalization, where whitespace normalization rules are defined per the host format.

URL

The general term URL is defined by the URL Standard [url]. It is used as in other W3C specifications, like HTML [html]. In particular, a URL allows for the usage of characters from Unicode following [rfc3987]. See the note in the HTML5 specification for further details.

Web Publication

A Web Publication is a collection of one or more resources, organized together through a manifest into a single logical work with a default reading order. The Web Publication is uniquely identifiable and presentable using Open Web Platform technologies.

2. Conformance

As well as sections marked as non-normative, all authoring guidelines, diagrams, examples, and notes in this specification are non-normative. Everything else in this specification is normative.

The key words MAY, MUST, MUST NOT, RECOMMENDED, REQUIRED, SHOULD, and SHOULD NOT are to be interpreted as described in [RFC2119].

2.1 Conformance Classes

This specification defines two conformance classes: one for Web Publications and one for user agents that process them.

A Web Publication conforms to this specification if it meets the following criteria:

A user agent conforms to this specification if it meets the following criteria:

3. Information Set

Editor's note

As the serialization of the manifest remains an open issue, specifics about how properties are compiled into the infoset remain under-specified. This includes, but is not limited to, what specific names the properties will have in the infoset, whether the names in the manifest will be the same as those in the infoset and/or whether mappings to properties from known vocabularies will be used.

The name "infoset" might change depending on feedback. Although this term has a different meaning for individuals familiar with XML, alternatives such as "properties" and "metadata" do not fully capture the nature or purpose of the collected information.

3.1 Explanation

This section is non-normative.

A Web Publication is defined by a set of properties known as its information set (infoset). The infoset is logically divided into two sets of properties: those that describe the publication and those that express key structures. These classifications only exist for the purposes of understanding the function of the properties, however.

The infoset is both abstract and concrete. It is abstract in the sense that it represents a set of information that a user agent has to compile about the Web Publication, but it also becomes concrete when the user agent creates an internal representation of that information.

The infoset is primarily compiled from a Web Publication's manifest, whose serialization requirements are defined in . Some information can be obtained outside the manifest—fallback rules for properties defined in the following subsections allow a user agent to compile information that the author has not provided in the manifest, for example.

3.2 Requirements

The requirements for the expression of infoset properties are as follows:

Descriptive Properties

REQUIRED: address

RECOMMENDED:
Structural Properties

REQUIRED: default reading order

RECOMMENDED:
Editor's note

These requirements reflect the current minimum consensus, though a number of issues remain open that could change whether an item is required or recommended. Refer to the property descriptions for more information.

3.3 Descriptive Properties

3.3.1 Accessibility Report

An accessibility report provides information about the suitability of a Web Publication for consumption by users with varying preferred reading modalities. These reports typically identify the result of an evaluation against established accessibility criteria, such as those provided in [WCAG20], and are an important source of information in determining the usability of a Web Publication.

The infoset SHOULD include a link to an accessibility report when one is available for a Web Publication. It is RECOMMENDED that the report be included as a resource of the Web Publication.

It is also RECOMMENDED that the accessibility report be provided in a human-readable format, such as HTML [html]. Augmenting these reports with machine-processable metadata, such as provided in Schema.org[schema.org], is also RECOMMENDED.

Editor's note

Machine-readable accessibility metadata may be recommended in whatever format is used to externalize publication metadata (e.g., to ensure availability for search). Depending how this externalizing is done, adding machine-processable accessibility metadata to such a record could take precedence over, or complement, the accessibility record.

3.3.2 Address

A Web Publication's address is a URL [url] that represents the primary entry page for the Web Publication. If this URL does not resolve to an HTML document [html], user agents SHOULD NOT provide access to it to users.

The referenced document SHOULD be a resource of the Web Publication. It can be any resource, including one that is not listed in the default reading order. This document MUST include a link to the manifest to ensure a bidirectional linking relationship (i.e., that user agents can also locate the manifest from the document at the address.

If the document is not a Web Publication resource, user agents SHOULD load the first document in the default reading order when initiating the Web Publication.

Note

To improve the usability of Web Publications, particularly in user agents that do not support Web Publications, include navigation aids in the referenced document that facilitate consumption of the content, (e.g., provide a table of contents or a link to one).

The availability of the address does not preclude the creation and use of other identifiers and/or addresses to retrieve a Web Publication, whether in whole or in part.

Note
The Web Publication's address can also be used as value for an identifier link relation [link-relation].

3.3.3 Canonical Identifier

A Web Publication's canonical identifier is a unique identifier that resolves to the preferred version of the Web Publication. The canonical identifier MUST be an address or a value that allows a one-to-one mapping to an address (e.g., a DOI [doi] can be resolved to a URL [url] via a DOI resolver). If a user agent cannot resolve the canonical identifier to an address, it SHOULD ignore the property.

If a Web Publication is hosted at more than one address, the canonical identifier allows a user agent to identify the shared relationship between the versions and to determine which of the available addresses is primary.

The canonical identifier is also intended to provide a measure of permanence above and beyond the Web Publication's address. Even if a Web Publication is permanently relocated to a new address, for example, the canonical identifier will provide a way of locating the new location (e.g., a DOI registry could be updated with the new URL, or a redirect could be added to the URL of the canonical identifier).

When assigned, the canonical identifier needs to be unique to one and only one Web Publication, independent of its address(es). Ensuring uniqueness is outside the scope of this specification, however. The actual uniqueness achievable depends on such factors as the conventions of the identifier scheme used and the degree of control over assignment of identifiers.

Note

If the canonical identifier is a URL, it can be used as the target of a "canonical" link [rfc6596] (e.g., a link element [html] whose rel attribute has the value canonical or an HTTP Link header field [rfc5988] similarly identified).

Issue 58: is a canonical identifier necessarytopic:identifierstopic:metadata

Is a canonical identifier necessary to call out explicitly in the infoset, or can it be handled by other metadata.

3.3.4 Creators

Creators are the individuals or entities responsible for the creation of the Web Publication.

The creator property SHOULD include the role the creator played in the creation of the Web Publication (e.g., 'author', 'illustrator', 'translator').

3.3.5 Language and Base Direction

Each textual property in the Web Publication's infoset (e.g., title, creators) is Localizable [string-meta]. This means it is possible to assign:

  • the natural language, which MUST be a tag that conforms to [bcp47];
  • the base direction, which identifies the display direction for the property using one of the following possible values:
    • ltr: left-to-right;
    • rtl: right-to-left;
    • auto: direction is determined from the value of the property.

The Web Publication's infoset MAY contain global language and the base direction declarations using the same conventions (i.e., [bcp47] tags for language, and the values ltr/rtl/auto for base direction). These are used as defaults for textual properties in the infoset, unless overwritten by the properties themselves.

Note

These features make it possible to add the title of a publication in different languages, or repeat the creators’ names using different scripts.

User agents MUST NOT use the language and base direction outside the context of the manifest (e.g., in the processing or rendering of the Web Publication content). These values are not a replacement for such declarations in each resource as defined by its format.

If a user agent requires the language and one is not available in the infoset (globally or specifically for the property), or the obtained value is invalid, the user agent MAY attempt to determine the language. This specification does not mandate how such a language tag is created. The user agent might:

  • use the non-empty language declaration of the manifest;
  • use the first non-empty language declaration found in a resource in the default reading order;
  • calculate the language using its own algorithm.

If a language tag cannot be determined, user agents MUST use the value "und" (undetermined). If the base direction cannot be determined, user agents MUST assume the value auto.

3.3.6 Last Modification Date

The last modification date is the date when the Web Publication was last updated (i.e., whenever changes were last made to any of the resources of the Web Publication, including the manifest).

This date does not necessarily reflect all changes to the Web Publication (e.g., third-party content could change without the author being aware). User agents SHOULD check the last modification date of individual resources to determine if they have changed and need updating.

3.3.7 Publication Date

The publication date is the date on which the Web Publication was originally published. It represents a static event in the lifecycle of a Web Publication and allows subsequent revisions to be identified and compared.

The exact moment of publication is intentionally left open to interpretation: it could be when the Web Publication is first made available online or could be a point in time before publication when the Web Publication is considered final.

3.3.8 Reading Progression Direction

The reading progression establishes the reading direction from one resource to the next within a Web Publication. The value of this property may be:

  • ltr: left-to-right;
  • rtl: right-to-left;
  • auto: the user agent chooses the direction.

The default value is auto.

This information item has no effect on the rendering of the individual primary resources; it is only relevant for the progression direction from one resource to the other.

Note

The reading progression of a Web Publication is used to adapt such publication level interactions as menu position, swap direction, defining tap zones to lead the user to the next and previous pages, touch gestures, etc.

3.3.9 Title

The title provides the human-readable name of the Web Publication.

When specified in the infoset, the title MUST be non-empty.

If a user agent requires a title and one is not available in the infoset, the user agent MAY create one. This specification does not mandate how such a title is created. The user agent might:

  • use the first non-empty title element found in a resource in the default reading order;
  • provide a language-specific placeholder title (e.g., 'Untitled Publication');
  • use the URL of the manifest;
  • calculate a title using its own algorithm.
Note

A user agent is not expected to produce a meaningful title [wcag20] for a Web Publication when one is not specified.

3.4 Structural Properties

3.4.1 Default Reading Order

The default reading order is a specific progression through a set of Web Publication resources.

A user might follow alternative pathways through the content, but in the absence of such interaction the default reading order defines the expected progression from one resource to the next.

The default reading order MUST include at least one resource.

The default reading order is specified directly in the manifest. If the reading order consists of one single resource, namely the primary entry page of the Web Publication (i.e., the resource the user accesses to reach the manifest, see 5.2 Linking to a Manifest), the default reading order need not be specified in the manifest.

3.4.2 Resource List

The resource list enumerates all resources that are used in the processing and rendering of a Web Publication (i.e., that are within its bounds) and that are not listed in the default reading order. This list is the definitive source that user agents have in determining which referenced resources belong to the Web Publication and which are external to it.

The completeness of the resource list will affect the usability of the Web Publication in certain reading scenarios (e.g., the ability to read the Web Publication offline). For this reason, it is strongly RECOMMENDED to provide a comprehensive list of all of the Web Publication's constituent resources beyond those listed in the default reading order.

In some cases, a comprehensive list of these resources might not be easily achieved (e.g., third-party scripts that reference resources from deep within their source), but a user agent SHOULD still be able to render a Web Publication even if some of these resources are not identified as belonging to the Web Publication (e.g., when it is taken offline without them).

Editor's note
The previous version of the draft included:

If a user agent encounters a resource that it cannot locate in the resource list, it MUST treat the resource as external to the Web Publication (e.g., it might alert the user before loading, open the resource in a new window, or unload the current Web Publication and resume normal Web browsing).

This was not decided on the Toronto F2F, and is still open.

3.4.3 List of extra resources

The list of extra resources enumerates all resources that are used in the processing and rendering of a Web Publication but are not within its bounds (i.e., are not listed in the default reading order or the resource list) but are, rather, external to the Web Publication.

The completeness of the resource list will affect the usability of the Web Publication in certain scenarios (e.g., the ability to access privacy policy information). For this reason, it is strongly RECOMMENDED to provide a comprehensive list of all of the Web Publication's external resources beyond those listed in the default reading order or resource list.

In some cases, a comprehensive list of these resources might not be easily achieved (e.g., third-party scripts that reference resources from deep within their source), but a user agent SHOULD still be able to render a Web Publication even if some of these resources are not identified as belonging to the Web Publication (e.g., when it is taken offline without them).

Issue 225: yet another 'resource list' in the manifest?topic:manifest

At this moment (see also #221 ) we have two "lists" of resources in the manifest, mirroring the infoset: reading order and list of resources. (In JSON, readingOrder and resources). How exactly do we represent a cover reference and/or the privacy policies? We did agree that we represent these in the manifest as external lists (see again #221 and also #222 for initial proposals), but we did not decide yet how exactly the mapping for cover is done.

3.4.4 Table of Contents

The table of contents is a hierarchical list of links that reflects the structural outline of the major sections of the Web Publication. There are no requirements on the completeness of the table of contents, except that, when specified, it MUST include a link to at least one resource.

The table of contents is not specified directly in the manifest. Instead, the manifest SHOULD provide a link to an HTML element in one of the resources (most likely a nav element [html]), with a role attribute value set to doc-toc.

Issue

This question arises only if the table of contents is accepted: can a table of contents navigation element refer, via links, to any resource that is not listed in the default reading order?

3.4.5 Cover

The infoset SHOULD include a reference to a cover image. This image can be used by user agents to present the Web Publication to users (e.g., in a library or bookshelf, or when initially loading the Web Publication).

User agents SHOULD NOT use the cover image as the sole means of selecting or accessing Web Publications. A user agent SHOULD use the Web Publication's title and creators as text alternatives for such interfaces.

More than one cover image MAY be referenced from the infoset to provide alternative sizes and resolutions for different device screens.

A user agent MAY create a cover for a Web Publication if one is not present. This specification does not define requirements for the creation of such cover images (e.g., the user agent could use a placeholder image, generate an image dynamically, or incorporate properties of the infoset into a graphic, such as the title or creators).

Issue 210: Is 'cover' a structural property?topic:manifest

This was discussed and proposed at the F2F meeting in Toronto, but no decision has been taken.

3.4.6 Privacy Policy

Users often have the legal right to know and control what information is collected about them, how such information is stored and for how long, whether it is personally identifiable, and how it can be expunged. Including a statement that addresses all such privacy concerns is consequently an important part of publishing Web Publications. Even if no information is collected, such a declaration increases the trust users have in the content.

To address this concern, a link to a privacy policy can be included in the infoset. It is RECOMMENDED that the privacy policy be included as a resource of the Web Publication.

It is RECOMMENDED that the privacy policy be provided in a human-readable format, such as HTML [html].

Refer to 10. Privacy for more information about privacy considerations in Web Publications.

Issue 203: Info Set for privacy

https://w3c.github.io/wpub/#wp-privacy needs more clarity, and not be so general. Most of the privacy policy collection and enforcement is upstream from the document markup, except where the markup explicitly collects data.

3.5 Extensibility

The infoset is designed to provide a basic set of properties for use by user agents in presenting and rendering a Web Publication, but MAY be extended in the following ways:

  1. through the inclusion of additional properties in the manifest;
  2. by the provision of linked metadata records.

User Agents MAY support additional properties but MUST NOT include unrecognized properties in the infoset. The use of linked records is RECOMMENDED whenever possible, as the use of native formats standardizes and simplifies processing by user agents.

4. Web Publication Manifest

4.1 Overview

A manifest is a serialization of a Web Publication's infoset. The manifest is serialized using the JSON [ecma-404], more specifically the JSON-LD [json-ld] format. The manifest can be a separate JSON-LD file, or it can be part of an HTML resource using the script element in HTML [html]. If the latter, the type attribute of the script element MUST be set to application/ld+json.

Example 1: A Web Publication Manifest included in an HTML resource
<script id="example_manifest" type="application/ld+json">
{
    ...
}
</script>

4.1.1 Descriptive Infoset Properties

Desciptive Properties in the Web Publication Manifest are based, wherever possible, on the terms defined by Schema.org [schema.org] (including hosted extensions of Schema.org). This means that the descriptive infoset properties are mapped to one or several Schema.org properties (inheriting their syntax and semantics).

Note

Schema.org includes a large number of terms that, though relevant for publishing, are not mentioned in this Recommendation. Web Publication authors may use any of those; this document defines only the minimal set of infoset items, and their mapping to Schema.org when appropriate.

Editor's note

There are discussion on whether a best practices document would be created, referring to more schema.org terms. If so, it should be linked from here.

4.1.2 Structural Infoset Properties

Structural Properties in the Web Publication Manifest use terms that refer to one or more external resources (images, script files, separate metadata files, etc.). These terms do not necessarily have a counterpart in Schema.org, and are therefore defined separately by this specification.

Values of structural properties are usually links to external resources, or an array thereof. Such a link may be expressed in one of two ways:

  1. a string encoding the (absolute or relative) URL of the resources; or
  2. an instance of a Schema.org StructuredValue that can be used to express, beyond the URL, the media type and other characteristics of the target resource.
Example 2: Expressing external links, either as strings or as objects
{
        ...
        "resources" : [
            "datatypes.svg",
            {
                "@type"      : "StructuredValue",
                "url"        : "test-utf8.csv",
                "fileFormat" : "text/csv"
            }
	]
}
Note

There will be continuous contacts with Schema.org to see whether some of the structural property terms should not be included in the core Schema.org hierarchy, or one of its extensions.

4.1.3 Web Publication Manifest Contexts

A Web Publication Manifest MUST start by setting the appropriate (JSON-LD) contexts. This context has two major components:

  • the “core” Schema.org context, i.e., http://schema.org;
  • the separate, WP-specific context file: https://www.w3.org/ns/wpub.jsonld

Note that the latter may also add some features to terms defined in Schema.org.

Note

An example for the latter is the requirement for the creator term to be order preserving.

As part of the continuous contacts with Schema.org the additional requirements defined in the WP specific context file may migrate to the core Schema.org.

In practice, this means that a Web Publication Manifest MUST begin with, at the minimum:

Example 3
{
    "@context" : ["http://schema.org", "https://www.w3.org/ns/wpub.jsonld"],
    ...
}

In some cases, this structure may have to be extended by additional, local information, see the 4.2.1.5.1 Default language and direction.

4.1.4 Publication Types

The manifest MUST also include a Publication Type. This MAY be mapped onto the Schema.org CreativeWork type, using the @type of JSON-LD [json-ld]. Schema.org also includes a number of more specific types (see the list on the Schema.org site) which include a type for Article, Book, or Course; these may MAY also be used instead of CreativeWork.

Example 4: Setting type of the publication to be a Book
{
    "@context" : ["http://schema.org", "https://www.w3.org/ns/wpub.jsonld"],
    "@type"    : "Book"
    ...
}

4.2 Specification of Web Publication Manifest Items

4.2.1 Descriptive Infoset Properties

4.2.1.1 Accessibility
Issue 216: Mismatch in the accessibility references in the current text and schema...status:editorialtopic:metadatatopic:schema mapping

The current draft says:

The infoset SHOULD include a link to an accessibility report when one is available for a Web Publication. It is RECOMMENDED that the report be included as a resource of the Web Publication.

which suggests that the acceessiblity report is more a structural property (i.e., linking out to a full report) rather than descriptive. This is in contrast with the discussion in Toronto where accessibility was listed as a descriptive property

Cc @avneeshsingh @GeorgeKerscher

As defined in 3.3.1 Accessibility Report, the Web Publication Manifest MAY include accessibility metadata. These SHOULD be mapped on the family of accessibility terms, as expressed by Schema.org. (A more detailed description of these terms, as well as the possible values, are described on the WebSchemas Wiki site.) These terms are:

Term name with link to definition Short description
accessMode The human sensory perceptual system or cognitive faculty through which a person may process or perceive information.
accessModeSufficient A list of single or combined accessModes that are sufficient to understand all the intellectual content of a resource.
accessibilityAPI Indicates that the resource is compatible with the referenced accessibility API.
accessibilityControl Identifies input methods that are sufficient to fully control the described resource.
accessibilityFeature Content features of the resource, such as accessible media, alternatives and supported enhancements for accessibility.
accessibilityHazard A characteristic of the described resource that is physiologically dangerous to some users.
accessibilitySummary A human-readable summary of specific accessibility features or deficiencies, consistent with the other accessibility metadata but expressing subtleties such as “short descriptions are present but long descriptions will be needed for non-visual users” or “short descriptions are present and no long descriptions are needed.”

Note that the author MAY also provide a reference to a more detailed Accessibility Report, beyond the accessibility information expressed by these terms.

Example 5: Example for accessiblity metadata of a purely textual document
{
    "@context" : ["http://schema.org","https://www.w3.org/ns/wpub.jsonld"],
    "@type"    : "CreativeWork",
    ...
    "accessMode"            : ["textual", "visual"],
    "accessModeSufficient"  : ["textual"],
    ...
}
4.2.1.2 Address

As described in 3.3.2 Address, a Web Publication's address is a URL [url] that represents the primary entry page for the Web Publication. This infoset item MUST be mapped on the url term.

Term name with link to definition Short description
url URL of the primary entry page.
Example 6: Example for setting the address of the main entry point
{
    "@context" : ["http://schema.org","https://www.w3.org/ns/wpub.jsonld"],
    "@type"    : "Book",
    ...
    "url"      : "https://publisher.example.org/mobydick",
    ...
}
4.2.1.3 Canonical Identifier

As described in 3.3.3 Canonical Identifier, a Web Publication's canonical identifier is a unique identifier that resolves to the preferred version of the Web Publication. This infoset item MUST be mapped on the id term, whose value is a URL.

Additionally, the term itentifier MAY also be used; in Schema.org, the value of this term can be a URL, a textual information that represents a unique identification, or more complex objects defined in Schema.org. There are also sub-properties that can be used on their own right, like isbn, issn, or serialNumber. See also the Schema.org description for further details.

Editor's note

Not clear how one would describe the relationships between these two...

Term name with link to definition Short description
id Preferred version of the Web Publication.
Example 7: Example for setting both the canonical identifier as URL and the address of the same document
{
    "@context" : ["http://schema.org","https://www.w3.org/ns/wpub.jsonld"],
    "@type"    : "CreativeWork",
    ...
    "id"       : "http://www.w3.org/TR/tabular-data-model/",
    "url"      : "http://www.w3.org/TR/2015/REC-tabular-data-model-20151217/",
    ...
}
Example 8: Example for setting both the ISBN and the address of the same document
{
    "@context" : ["http://schema.org","https://www.w3.org/ns/wpub.jsonld"],
    "@type"    : "Book",
    ...
    "isbn"     : "1234567890123",
    "url"      : "https://publisher.example.org/mobydick",
    ...
}
4.2.1.4 Creator

As described in 3.3.4 Creators, a Web Publication's creators are the individuals or entities responsible for the creation of the Web Publication. There isn’t one specific single Schema.org term this item must be mapped onto; instead, there are a number of terms and, if this Infoset Item is used, the Web Publication Manifest SHOULD use one of those. The value of these terms are one or more Person objects or, in some cases, Person or an Organization items. These terms are:

Term name with link to definition Short description
author The author of this content. The value can be on or more Person or Organization.
creator The creator of this content. The value can be on or more Person or Organization.
editor The editor of this content. The value can be on or more Person.
publisher The publisher of the creative work. The value can be on or more Person or Organization.
illustrator The illustrator of a publication. The value can be on or more Person.
translator The illustrator of a publication. The value can be on or more Person or Organization.
readBy A person who reads (performs) the audiobook. The value can be on or more Person.
artist The primary artist for a work in a medium other than pencils or digital line art. The value can be on or more Person.
colorist The individual who adds color to inked drawings. The value can be on or more Person.
letterer The individual who adds lettering, including speech balloons and sound effects, to artwork. The value can be on or more Person.
penciler The individual who draws the primary narrative artwork.. The value can be on or more Person.
Example 9: Author of a book
{
    "@context" : ["http://schema.org","https://www.w3.org/ns/wpub.jsonld"],
    "@type"    : "Book",
    ...
    "url"      : "https://publisher.example.org/mobydick",
    "author"   : {
        "@type" : "Person",
        "name"  : "Herman Melville"
    }
}
Example 10: Separate listing of editors, authors, and publisher
{
    "@context"   : ["http://schema.org","https://www.w3.org/ns/wpub.jsonld"],
    "@type"      : "CreativeWork",
    ...
    "identifier" : "http://www.w3.org/TR/tabular-data-model/",
    "url"        : "http://www.w3.org/TR/2015/REC-tabular-data-model-20151217/",
    "author"     : [{
        "@type" : "Person",
        "name" : "Jeni Tennison",
    },{
        "@type" : "Person",
        "name" : "Gregg Kellogg",
    },{
        "@type" : "Person",
        "name" : "Ivan Herman",
        "url"  : "https://www.w3.org/People/Ivan/"
    }],
    "editor"    : [{
        "@type" : "Person",
        "name" : "Jeni Tennison",
    },{
        "@type" : "Person",
        "name" : "Gregg Kellogg",
    }],
    "publisher" : {
        "@type" : "Organization",
        "name" : "World Wide Web Consortium",
        "url"  : "https://www.w3.org/"
    }
    ...
}
Issue 217: Order sensitivity of terms like 'creator', 'author', ...topic:metadatatopic:schema mapping

There is no order sensitivity for many items in Schema.org. It is possible to use a JSON-LD array for, say, the author, but the order is not preserved (technically, the mapping is on a set of statement, not a list)

4.2.1.5 Language and Base Direction

As described in 3.3.5 Language and Base Direction this infoset item refers to several aspects of setting language and direction; these are treated separately.

4.2.1.5.1 Default language and direction

The infoset described in 3.3.5 Language and Base Direction requires the possibility to set a default language and base direction for all textual information in the manifest. These MUST be set by extending the context of the manifest to include the right features.

To set the language, the context information must be extended by the @language term of JSON-LD[json-ld]:

Example 11: Setting the default metadata language to French
{
    "@context"   : [
        "http://schema.org",
        "https://www.w3.org/ns/wpub.jsonld",
        {
            "@language" : "fr"
        }
    ],
    ...
}

The value of the language tag must be set to the language code as defined in [bcp47]. If not set, the default value is en.

Issue 218: Expressing the overall language for the Metadatatopic:internationalizationtopic:manifesttopic:schema mapping

The right approach is to extend the context through

{
   "@language" : "fr"
} 

in the (list of) @context. Although not rejected by schema.org, it is also ignored.

4.2.1.5.2 Item specific language and direction

The infoset described in 3.3.5 Language and Base Direction also requires the possibility to set the language and base direction for any textual information in the manifest. This MUST be set for each item separately, also using the @value and @language terms of JSON-LD[json-ld]:

Example 12: Setting the default metadata language to French
{
    "@context" : ["http://schema.org","https://www.w3.org/ns/wpub.jsonld"],
    "@type"    : "Book",
    ...
    "author" : {
        "@type" : "Person",
        "name" : {
            "@value" : "Marcel Proust",
            "@language" : "fr"
        }
    }
}

The value of the language tag must be set to the language code as defined in [bcp47]. If not set, the default value is the default value of the manifest.

The requirement is that it should be possible to express the language of, say, the title or name of the author individually. This does not seem to work in Schema.org...

This is a completely open issue at this moment, both for JSON-LD and Schema.org... The only (incomplete) approach would be to rely on, and base everything, on the UTF-encoding of the text...

4.2.1.6 Last Modification Date

As described in 3.3.6 Last Modification Date, the last modification date is the date when the Web Publication was last updated. This infoset item MUST be mapped on the dateModified term, whose value is a Date or DateTime, both expressed in ISO 8601 date, or Date Time formats, respectively [iso8601].

Term name with link to definition Short description
dateModified Last modification date of the publication
Example 13: Modification date of the publication
{
    "@context"     : ["http://schema.org","https://www.w3.org/ns/wpub.jsonld"],
    "@type"        : "CreativeWork",
    ...
    "identifier"   : "http://www.w3.org/TR/tabular-data-model/",
    "url"          : "http://www.w3.org/TR/2015/REC-tabular-data-model-20151217/",
    "dateModified" : "2015-12-17",
    ...
}
4.2.1.7 Publication Date

As described in 3.3.7 Publication Date, the last modification date is the date when the Web Publication was originall published. This infoset item MUST be mapped on the datePublished term, whose value is a Date or DateTime, both expressed in ISO 8601 date, or Date Time formats, respectively [iso8601].

Term name with link to definition Short description
datePublished Creation date of the publication
Example 14: Creation and modification date of the publication
{
    "@context"      : ["http://schema.org","https://www.w3.org/ns/wpub.jsonld"],
    "@type"         : "CreativeWork",
    ...
    "identifier"    : "http://www.w3.org/TR/tabular-data-model/",
    "url"           : "http://www.w3.org/TR/2015/REC-tabular-data-model-20151217/",
    "datePublished" : "2015-12-17",
    "dateModified"  : "2016-01-30",
    ...
}
4.2.1.8 Reading Progression Direction

As described in 3.3.8 Reading Progression Direction, this infoset item establishes the reading direction from one resource to the next. There is no corresponding term in Schema.org; instead, this item MUST be mapped on the readingProgression term, defined specifically for Web Publications.

Term name with link to definition Short description
readingProgression Reading direction from one resource to the other; the value of this term MUST be ltr, rtl, or auto (see 3.3.8 Reading Progression Direction for further details).

If this value is not set, its default value is ltr.

Example 15: Reading progression set explicitl to ltr
{
    "@context"           : ["http://schema.org","https://www.w3.org/ns/wpub.jsonld"],
    "@type"              : "Book",
    ...
    "url"                : "https://publisher.example.org/mobydick",
    "readingProgression" : "ltr"
}
4.2.1.9 Title

As described in 3.3.9 Title, the title provides the human-readable name of the Web Publication. If set explicitly in the Manifest (i.e., if it is not to be derived from the title element of the primary entry point), this item MUST be mapped on the Schema.org name term.

Term name with link to definition Short description
name Human-readable name of the Web Publication.
Example 16: Title of the book set explicitly
{
    "@context" : ["http://schema.org","https://www.w3.org/ns/wpub.jsonld"],
    "@type"    : "Book",
    ...
    "url"      : "https://publisher.example.org/mobydick",
    "name"     : "Moby Dick"
}

4.2.2 Structural Infoset Properties

Structural infoset properties typically refer to external resources via a URL. This URL-s may refer to resources playing different roles (e.g., cover or privacy policy), may need some additional metadata for, e.g., accessibility purposes.

Term name Short description Required/Optional
url URL [url] of the resource. Required.
fileFormat Media type, typically the MIME format [rfc2046] of the content e.g. application/zip. Optional.
name Name of the item. Optional.
description Description of the item. Optional.
rel One or more relations; the values are either the relevant relationship terms of the IANA link registry [iana-link-relations], or specially minted URL-s if no suitable link registry item exists. Optional.
Issue 235: Use LinkRole instead of PublicationLinktopic:manifest

(Extracting a discussion on #232 into a separate Issue.)

Schema.org has a (currently "pending") type LinkRole which may be a good alternative to the (publication specific) PublicationLink. Maybe worth considering using the schema.org type.


Ref: #232 (review), #232 (comment), #232 (comment)

Editor's note

If the document is reorganized the "specific" external resources (cover, accessibility report, etc) should be separated in from the general structures and list definitions.

4.2.2.1 Default Reading Order

As defined in 3.4.1 Default Reading Order, the default reading order is a specific progression through a set of Web Publication resources. If present in the Web Publication Manifest (i.e., if the Web Publication has other items in the default reading order than just the primary entry page), this item MUST be mapped on the readingOrder term, defined specifically for Web Publications.

Term name with link to definition Short description
readingOrder An array of:
  • a string, representing the URL [url] of the resource; or
  • an instance of a PublicationLink object
The order in the array is significant.
Example 17: Reading order expressed as a simple list of URL-s
{
    "@context" : ["http://schema.org","https://www.w3.org/ns/wpub.jsonld"],
    "@type"    : "Book",
    ...
    "url"      : "https://publisher.example.org/mobydick",
    "name"     : "Moby Dick",
    "readingOrder" : [
        "html/title.html",
        "html/copyright.html",
        "html/introduction.html",
        "html/epigraph.html",
        "html/c001.html",
        ...
    ]
}
Example 18: Reading order expressed as objects providing more information on items
{
    "@context" : ["http://schema.org","https://www.w3.org/ns/wpub.jsonld"],
    "@type"    : "Book",
    ...
    "url"      : "https://publisher.example.org/mobydick",
    "name"     : "Moby Dick",
    "readingOrder" : [{
        "@type"      : "PublicationList",
        "url"        : "html/title.html",
        "fileFormat" : "text/html",
        "name"       : "Title page"
    },{
        "@type"      : "PublicationList",
        "url"        : "html/copyright.html",
        "fileFormat" : "text/html",
        "name"       : "Copyright page"
    },{
        ...
    }]
}
4.2.2.2 Resource List

As defined in 3.4.2 Resource List, the resource list enumerates all resources that are used in the processing and rendering of a Web Publication (i.e., that are within its bounds) and that are not listed in the default reading order. If present in the Web Publication Manifest (i.e., if the Web Publication has other items in the default reading order than just the primary entry page), this item MUST be mapped on the resources term, defined specifically for Web Publications.

Term name with link to definition Short description
resources An array of:
  • a string, representing the URL [url] of the resource; or
  • an instance of a PublicationLink object
The order in the array is not significant.
Example 19: Listing resources, some via a simple URL, some with more details
{
    "@context"   : ["http://schema.org","https://www.w3.org/ns/wpub.jsonld"],
    "@type"      : "CreativeWork",
    ...
    "identifier" : "http://www.w3.org/TR/tabular-data-model/",
    "url"        : "http://www.w3.org/TR/2015/REC-tabular-data-model-20151217/",
    ...
    "resources"  : [
        "datatypes.html",
        "datatypes.svg",
        "datatypes.png",
        "diff.html",
        {
            "@type"         : "PublicationLink",
            "url"           : "test-utf8.csv",
            "fileFormat"    : "text/csv"
        },{
            "@type"         : "PublicationLink",
            "url"           : "test-utf8-bom.csv",
            "fileFormat"    : "text/csv"
        },{
            ...
        }
    ],
    ...
}
4.2.2.3 List of extra resources

As defined in 3.4.3 List of extra resources, the list of extra resources enumerates all resources that are used in the processing and rendering of a Web Publication but are not within its bounds but are, rather, external to the Web Publication. If present in the Web Publication Manifest, this item MUST be mapped on the extraResources term, defined specifically for Web Publications.

Note

The extraResources to be used in JSON has not yet been decided; waiting on the resolution of issue #225

Term name with link to definition Short description
extraResources An array of:
  • a string, representing the URL [url] of the resource; or
  • an instance of a PublicationLink object
The order in the array is not significant.
Issue 225: yet another 'resource list' in the manifest?topic:manifest

At this moment (see also #221 ) we have two "lists" of resources in the manifest, mirroring the infoset: reading order and list of resources. (In JSON, readingOrder and resources). How exactly do we represent a cover reference and/or the privacy policies? We did agree that we represent these in the manifest as external lists (see again #221 and also #222 for initial proposals), but we did not decide yet how exactly the mapping for cover is done.

4.2.2.4 Table of Contents

As defined in 3.4.4 Table of Contents, the manifest SHOULD provide a link to an HTML element in one of the resources representing the table of content. If present in the Web Publication Manifest this item MUST be mapped on the tableOfContents term, defined specifically for Web Publications.

Term name with link to definition Short description
tableOfContents A (relative or absolute) URL [url]
Example 20: Reference to a table of content element, in the same file that contains the manifest is
<head>
    ...
    <script type="application/ld+json">
    {
        "@context"        : ["http://schema.org","https://www.w3.org/ns/wpub.jsonld"],
        "@type"           : "CreativeWork",
        ...
        "identifier"      : "http://www.w3.org/TR/tabular-data-model/",
        "url"             : "http://www.w3.org/TR/2015/REC-tabular-data-model-20151217/",
        "tableOfContents" : "#toc"
        ...
    }
    </script>
    ...
</head>
<body>
    ...
    <section id="toc" role="doc-toc">
        ...
    </section>
    ...
</body>
Issue

The term tableOfContents has not been approved by the Working Group yet; it is currently a placeholder.

Issue 234: representation of table of contents in the manifesttopic:manifest

(Translating a telco discussion to an issue.)

The current (2018.06.19) draft has a separate term for the ToC infoset item in the manifest, tentatively using tableOfContents. The question is whether:

  1. not use a separate term, but use a PublishingLink instance in resources, using (IANA) rel value of contents
  2. not have an entry in the manifest at all, relying on the fact that the TOC is already defined as a link to an HTML element with a particular aria attribute value; if the TOC is restricted to such an element in the entry page, it is unnecessary to duplicate the information in the manifest
4.2.2.5 Cover

As described in 3.4.5 Cover, the infoset SHOULD include a reference to a cover. When present, link to such resource MUST be expressed using a PublicationLink. The rel value of the PublicationLink MUST include the https://www.w3.org/ns/wpub/cover-page identifier.

Note

The Working Group will attempt to define the cover-page term by IANA, to avoid using a URL.

Example 21: Cover page expressed as resource
{
    "@context"   : ["http://schema.org","https://www.w3.org/ns/wpub.jsonld"],
    "@type"      : "Book",
    ...
    "url"        : "https://publisher.example.org/mobydick",
    "name"       : "Moby Dick",
    "resources"  : [{
        "@type"      : "PublicationLink",
        "url"        : "whale-image.jpg",
        "fileFormat" : "image/jpeg"
        "rel"        : "https://www.w3.org/ns/wpub/cover-page"
    },{
        ...
    }],
    ...
}
4.2.2.6 Privacy Policy

As described in 3.4.6 Privacy Policy, it is RECOMMENDED that the privacy policy be included as a resource of the Web Publication. When present, link to such resource MUST be expressed using a PublicationLink object. The rel value of the PublicationLink MUST include the privacy-policy (IANA) identifier.

Example 22: Privacy policy expressed as an external link
{
    "@context"   : ["http://schema.org","https://www.w3.org/ns/wpub.jsonld"],
    "@type"      : "CreativeWork",
    ...
    "identifier" : "http://www.w3.org/TR/tabular-data-model/",
    "url"        : "http://www.w3.org/TR/2015/REC-tabular-data-model-20151217/",
    ...
    "externalResources"  : [{
        "@type"      : "PublicationLink",
        "url"        : "https://www.w3.org/Consortium/Legal/privacy-statement-20140324",
        "fileFormat" : "text/html",
        "rel"        : "privacy-policy"
    },{
            ...
    }],
    ...
}
4.2.2.7 Accessibility Report

As described in 3.3.1 Accessibility Report the authors MAY provide an accessibility report providing information about the suitability of a Web Publication for consumption by users with varying preferred reading modalities. This report may be complementary to the information expressed by the descriptive properties as described in 4.2.1.1 Accessibility. This report is accessed via an external resource (e.g., and HTML file). When present, link to such resource MUST be expressed using a PublicationLink object. The rel value of the PublicationLink MUST include the https://www.w3.org/ns/wpub#accessibility-report identifier.

Note

The Working Group will attempt to define the accessibility-report term by IANA, to avoid using a URL.

Example 23: Link to an accessibility report
{
    "@context"   : ["http://schema.org","https://www.w3.org/ns/wpub.jsonld"],
    "@type"      : "Book",
    ...
    "url"        : "https://publisher.example.org/mobydick",
    "name"       : "Moby Dick",
    "extraResources"  : [{
        "@type"       : "PublicationLink",
        "url"         : "https://www.publisher.example.org/mobydick-accessibility.html",
        "rel"         : "https://www.w3.org/ns/wpub/accessibility-report"
    },{
        ...
    }],
    ...
}
4.2.2.8 External Metadata

As described in 3.5 Extensibility, the infoset items of a Web Publication MAY be extended by linking to further metadata records. This may include, for example, links to an external ONIX [onix] or BibTeX [bibtex] file. When present, link to such resource MUST be expressed using a PublicationLink object. The rel value of the PublicationLink MUST include the describedby (IANA) identifier.

Example 24: Link to external ONIX for Books Metadata file
{
    "@context"   : ["http://schema.org","https://www.w3.org/ns/wpub.jsonld"],
    "@type"      : "Book",
    ...
    "url"        : "https://publisher.example.org/mobydick",
    "name"       : "Moby Dick",
    "extraResources"  : [{
        "@type"       : "PublicationLink",
        "url"         : "https://www.publisher.example.org/mobydick-onix.xml",
        "fileFormat"  : "application/xml",
        "rel"         : "describedby"
    },{
        ...
    }],
    ...
}

5. Web Publication Construction

5.1 Resources

A Web Publication MUST include at least one HTML document [html] that links to the manifest.

There are no restrictions on a Web Publication beyond this requirement. The Web Publication MAY include references to resources of any media type, both in the default reading order and as dependencies of other resources.

Note

When adding resources to a Web Publication, consider support in user agents. The use of progressive enhancement techniques and the provision of fallback content, as appropriate, will ensure a more consistent reading experience for users regardless of their preferred user agent.

5.2 Linking to a Manifest

Resources SHOULD provide a link to the manifest of the Web Publication to which they belong to enable discovery. Links MUST take one or both of the following forms:

The href attribute may also contain a fragment identifier, referring to the Web Publication Manifest expressed as part of an HTML resource (see 4.1 Overview).

Example 27: Link to a manifest within the same HTML resource
<link href="#example_manifest" rel="publication">
...
<script id="example_manifest" type="application/ld+json">
{
    "@context" : "http://schema.org",
    ...
}
</script>
Issue

The exact value of rel is still to be agreen upon.

Editor's note

The following details might be moved to the lifecycle section in a future draft.

When a resource links to multiple manifests, a user agent MAY choose to present one or more alternatives to the end user, or choose a single alternative on its own. The user agent MAY choose to present any manifest based upon information that it possesses, even one that is not explicitly listed as a parent (e.g., based upon information it calculates or acquires out of band). In the absence of a preference by user agent implementers, selection of the first manifest listed is suggested as a default.

6. Web Publication Lifecycle

Editor's note

The publishing working group is currently evaluating the best approach for implementing Web Publications in user agents. This note is intended to provide an overview of where current thinking is at and what issues are under consideration.

The development of Web Publications is not viewed as a separate forking of the Web, but an enhancement layer that can be supported by user agents. To that end, the primary constraints on any solution for Web Publications are that:

  • the rendering of Web Publications must not interfere with the underlying Web model and APIs — all functionality and enhancements must be layered on top;
  • a Web Publication should not have to carry its own implementation code — functionality is ideally provided by the user agent and/or polyfill.

While this specification will provide implementation flexibility for user agents, there are still a number of areas that have been identified as potentially needing to be detailed. These include:

  • initialization expectations for a Web Publication:

    • automatic initiation v. user prompts;
    • linked v. directly loaded manifests;
    • resources that belong to more than one Web Publication.
  • the creation of a "publication state":

    • persistence of publication information across page loads;
    • location and persistence of UI;
    • indications of supported features;
    • DOM issues such as persistence of numbering schemes.
  • tracking the extent of a Web Publication:

    • taking an entire publication offline;
    • enabling search across documents.
  • establishing the bounds of a Web Publication:

    • when to end the publication state;
    • document history traversal;
    • how to handle links outside the publication.
  • updating of the manifest.

The working group intends to flesh out the lifecycle in later revisions once it is clearer what models are viable and what solutions can be standardized. Input on the feasibility and challenges of these approaches is welcome at any time.

6.1 Obtaining a manifest

The steps for obtaining a manifest are given by the following algorithm. The algorithm, if successful, returns a processed manifest and the manifest URL; otherwise, it terminates prematurely and returns nothing. In the case of nothing being returned, the user agent MUST ignore the manifest declaration.

  1. From the Document of the top-level browsing context, let origin be the Document's origin, and manifest link be the first link element in tree order whose rel attribute contains the token publication.
  2. If origin is an [html] opaque origin, terminate this algorithm.
  3. If manifest link is null, terminate this algorithm.
  4. If manifest link's href attribute's value is the empty string, then abort these steps.
  5. Let manifest URL be the result of parsing the value of the href attribute, relative to the element's base URL. If parsing fails, then abort these steps.
  6. Let request be a new [fetch] request, whose URL is manifest URL, and whose context is the same as the browsing context of the Document.
  7. If the manifest link's crossOrigin attribute's value is 'use-credentials', then set request's credentials to 'include'.
  8. Await the result of performing a fetch with request, letting response be the result.
  9. If response is a network error, terminate this algorithm.
  10. Let text be the result of UTF-8 decoding response's body.
  11. Let manifest be the result of running processing a manifest given text, manifest URL, and the URL that represents the address of the top-level browsing context.
  12. Return manifest and manifest URL.
Note

See the diagram in the appendix for a visual representation of the algorithm.

Editor's note

This section will require additional work if we also decide to allow JSON-LD embedded in HTML.

6.2 Processing the manifest

The steps for processing a manifest are given by the following algorithm. The algorithm takes a text string as an argument, which represents a manifest, and a manifest URL [url], which represents the location of the manifest, and a document URL [url]. The output from inputting a JSON document into this algorithm is a processed manifest.

  1. Let json be the result of parsing text. If parsing throws an error:
    1. Issue a developer warning with any details pertaining to the JSON parsing error.
    2. Set json to be the result of parsing the string "{}".
  2. If Type(json) is not Object:
    1. Issue a developer warning that the manifest needs to be an object.
    2. Set json to be the result of parsing the string "{}".
  3. Extension point: process any proprietary and/or other supported members at this point in the algorithm.
  4. Let manifest be the result of converting [webidl-1] json to a WebPublicationManifest dictionary.
Note

See the diagram in the appendix for a visual representation of the algorithm.

Editor's note

The new JSON-LD based approach will require additional processing from the client. Due to the flexible nature of JSON-LD and schema.org, a simple conversion from JSON to WebIDL won't be enough.

7. Affordances

Editor's note

This section contains placeholders for possible reading enhancements/affordances the user agent may/should/must provide. The list is subject to addition, modification and removal as the enhancements get discussed in more detail.

Issue 143: What should be in the spec for affordances?topic:affordances

Before starting a discussion on the individual affordances' issues, the WG should have a consensus on what exactly is to be defined for each of those.

7.1 Switch to publication mode

When a user agent obtains a manifest it SHOULD display an affordance for switching the display to publication mode.

This affordance has the following requirements:

  1. it MUST inform the user that the current resource is part of a Web Publication
  2. it SHOULD display the title of the Web Publication
  3. it MAY display additional metadata from the infoset

Publication mode is a display mode implemented by the user agent that follows the conventions listed in presentation and navigation.

7.2 Presentation

7.2.1 Layout

The layout and rendering of Web Publications is governed by the same rules that apply to all Web content: HTML documents are styled and laid out according to the rules of CSS, SVG documents are rendered as defined by that format, etc. This specification requires no particular profile or subset of CSS, HTML, or SVG to be supported, other than the expectations set for these technologies by their respective specifications.

Editor's note

This specification intentionally avoids introducing any new layout features. Any shortcoming of the Web platform in terms of layout needs to be addressed for the whole Web platform, which means via CSS.

This working group will work with other relevant groups of the W3C to address platform-wide limitations that negatively impact Web Publications.

For the purposes of layout, each resource of a Web Publication is treated as a separate document. User agents MUST NOT mix content from multiple resources in the same rendering (e.g., CSS floats or absolutely positioned elements from one resource cannot intrude or overlap with content from an other resource).

Note

Despite this general requirement that each resource should be treated as a separate document for the purpose of layout, there are some places where CSS specifications should be amended to be able to deal more intelligently with collections of resources like Web Publications.

One instance is the definition of cross-references, which are currently restricted to work only within a single document. This restriction should be relaxed to allow for cross-references between separate resources of a single Web Publication.

Another related would be to allow counters to accumulate across multiple resources of a single Web Publication (e.g., so that figures in multiple sections may be numbered in a single sequence).

7.2.2 User Settings

When a user agent renders a Web Publication, it SHOULD provide user settings to customize the experience.

User settings MAY include:

  • text size;
  • font family;
  • display mode (night, high contrast, etc.);
  • playback speed (for audio and video resources).

This specification does not cover how user agents override author styles to offer user settings.

Editor's note

To provide user settings in their reader mode, browsers usually get rid of most of the author styles. There is always a tension in reading environments between author styles and the user's preference, which is very hard to balance.

Issue 138: WP affords personalizationtopic:affordances

2.1.11 Personalization
The user must have the possibility to personalize his or her reading experience.

Picking up on #52

7.2.3 Scrolling or Paginating

This section is non-normative.

Publications have historically been presented via paged media, whereas Web pages almost always scroll. As the preferences of individual readers vary, and as different types of publications are better suited for one or the other, this specification encourages user agents to support both, and to offer a choice to their users.

Editor's note

It might be useful for authors to be able to specify a preference between scrolling and pagination, even if a strict requirement is not possible. This should most likely be addressed through an extension of @viewport or of the viewport meta tag(see [css-device-adapt]), or possibly through an extension of @page (see [css-page-3]). This should be discussed with the relevant working groups (CSSWG, WebPlatformWG, WHATWG).

Issue 137: WP affords "paging" through a publicationtopic:affordances

2.1.10 Pagination
It should be possible to see the Web Publication in a “paginated” view.
picking up on #52
See also https://w3c.github.io/wpub/#aff-presentation

7.2.4 Paginated Layout

When a user agent renders a Web Publication in a paginated layout, it MUST lay out each document in the default reading order sequentially, with the last page of a resource being followed by the first page of the subsequent one.

Editor's note

To avoid blank pages, if a resource ends on a left page (resp. right page), the subsequent one should start on a right page (resp. left page) even if the page progression (see [css-page-3]) would otherwise lead to it starting on the opposite page. It should also be possible to use the break-before property (see [css-break-3]) to force the content to resume on the opposite side if that was desired by the author.

[css-page-3] needs to be amended to describe this exception to the general behavior when dealing with collections of documents instead of individual documents.

Editor's note

How is pagination supposed to work when subsequent resources have opposite page progression directions (see [css-page-3]). For example, due to different a different writing mode? This is not necessarily a problem from a layout point of view, as each page is independent, but from an UI point of view. If swiping left means next page until the end of one chapter, and starts meaning previous page in the next chapter because the language is switched from English to Hebrew, this is going to be confusing.

Editor's note

[css-page-3] needs to be amended so that page counters are not automatically reset to at the beginning of each new resource belonging to the same Web Publication.

Issue 137: WP affords "paging" through a publicationtopic:affordances

2.1.10 Pagination
It should be possible to see the Web Publication in a “paginated” view.
picking up on #52
See also https://w3c.github.io/wpub/#aff-presentation

7.3 Navigation

Issue 86: Accessibility requirements for navigationtopic:accessibility

The following wiki describes accessibility requirements for navigation from user perspective, and provides clarifications for some related concepts discussed in the group.
https://github.com/w3c/publ-a11y/wiki/Description-of-Accessibility-Requirements-for-WP

7.3.1 Reading Order

Hyperlinks are the means by which multiple resources are linked together on the Web. When users reach the end of one resource, they have to activate a hyperlink to move to the next resource in the sequence. While this model of navigation is effective, it is also disruptive for immersive reading — it forces users to disengage from the content and perform the actions necessary to activate the links. It is also limited to media types that support hyperlinks.

The default reading order provides an enhancement to the hyperlink model, allowing the user agent to automatically move the user to the next resource when a more natural action occurs, like a swipe across the screen. It is similar conceptually and functionally to the link element's next and prev relationships [html].

User agents MUST provide an affordance for moving forward and backward in the default reading order of a Web Publication.

Issue 144: WP affords moving forward and backwardtopic:affordancestopic:navigation

User agents must provide an affordance for moving forward and backward in the default reading order of a Web Publication

Issue 38: What does "Reading Order" mean in the context of a Web Publication?topic:affordancestopic:navigation

Is it merely the action of some supposed "next" and "previous" interface elements?

Is it required to be publication-wide? Or can it be contextual (choose-your-own-adventure)?

If it's contextual, must there only be one "next"?

This relates to musings on #36, but I think has value all on its own.

I'm also very keen to get input from a wide(r) range of people including those who use Assitive Technology (AT) when reading/listening.

7.3.2 Progression

While reading a Web Publication, the user follows a natural progression within a resource as well as between resources (following the default reading order).

User agents SHOULD provide an affordance that saves this progression in the publication and returns the user to their last location the next time they open the publication.

When the user agent obtains a manifest for the first time, it MAY also prompt the user whether they would like to:

  • continue reading the publication from their current location; or
  • start reading the publication from the first resource in the default reading order.
Issue 145: WP affords saving and retrieve progressiontopic:affordancestopic:navigation

User agents should provide an affordance that saves the reading progression in the publication and return the user to that location the next time that she opens the publication again.

7.3.3 Table of Contents

Issue 146: WP provides a TOC without leaving the resourcepropose closingtopic:affordancestopic:navigation

User agents should provide an affordance for accessing the table of contents without leaving the resource they are currently viewing.

7.3.3.1 Short description

The user agent should provide access to the table of contents from anywhere in the publication without leaving the current resource.

For accessibility reasons, it is RECOMMENDED for User Agents to use a table of contents to allow multiple ways for users to access content.

7.3.3.2 Affordances

The table of contents is a listed as a structural property in the infoset, see 3.4.4 Table of Contents

The table of contents is referred to in the Web Publication Manifest (see ) and is expressed using and HTML element; see 3.4.4 Table of Contents for further details.

User agents MAY use the default reading order in the case a Table of Contents is not explicitly specified to create a table of contents.

7.3.3.3 Use Case References
Req. 12
“There should be a means to indicate the author’s preferred navigation structure among the resources of a Web Publication. A user agent needs to know the sequence in which to present components of a Web Publication to the user, including the starting point.” (See [pwp-ucr])
Req. 13
“Authors of a Web Publication should be able to provide the user agent with information to access random parts of the publication” (See [pwp-ucr])

7.4 Offline Access

Issue 141: WP affords offline reading capabilities.topic:affordancestopic:offline access

A WP can be read in a browser offline with no change in fidelity from the online experience

7.6 Reading State

Issue 145: WP affords saving and retrieve progressiontopic:affordancestopic:navigation

User agents should provide an affordance that saves the reading progression in the publication and return the user to that location the next time that she opens the publication again.

7.6.1 Short description

The user must be able to leave the Web Publication and return to it at the last position they left from. The User Agent must retain the reading position, based on the last known position of the reader in the web publication. The position should be based on the reader's position in the file, within the reading order.

The user agent may retain reading state if the web publication is revised.

7.6.2 Affordances

The navigation of the web publication should be defined in the Default Reading Order required by the Information Set.

User Agents should not have to set the reading state in the following type of resources:

  • External Links (i.e. a link to google.com)
  • Data references (i.e. a linked CSV file)
  • Multimedia content (i.e. a video)

Reading state should only apply to content documents listed as being within the bounds of the Web Publication.

7.6.3 Examples

Example 1:
Sarah is reading a long article on her way to work. She arrives before she has finished, but wants to continue from the place she left off. The user agent should remember her reading state for the next time she opens the publication.

Testing

If a tester opens a web publication in a WP-aware UA, moves ahead in the publication, closes the reader, then reopens it, they should be returned to the last known reading state.

8. Web Publication Locators

This section is non-normative.

Editor's note

The document referred from this section, i.e., Web Annotation Extensions for Web Publications [wpub-ann], has been recently renamed. Its previous was "Locators for Web Publication". The terminology used in this section has to be realigned with the name change.

Locators are used to identify, locate, retrieve, and/or reference locations and content fragments within Web Publications (e.g., for address(es), bookmarks, and annotations). Locators traditionally take the form of fragment identifiers [rfc3986], where the portion of a URL preceded by a number sign character (#) identifies a specific position within the referenced resource.

For some use cases, it is essential to identify and reference a Web Publication resource—or a location in or a segment of a resource—in the scope or context of the Web Publication to which it belongs. A traditional fragment identifier cannot satisfy this requirement, since only the URL of the constituent resource containing the location or content fragment of interest is expressed. The Web Annotation Extensions for Web Publications [wpub-ann] document, based on the Web Annotation Model [annotation-model], addresses this issue by providing the means to express both the URL of the resource and the URL of the Web Publication.

Web Publication Locators also address the problem of referencing into a resource that was not authored with such a need in mind. A fragment identifier can only reference elements with explicit identifiers and locations with explicit anchor points. Web Publication Locators include a variety of selectors that work with the general structures and content of a resource (e.g., text selectors, CSS selectors).

Editor's note

As Web Publication Locators currently rely on a JSON-based expression syntax, it is not yet clear how much of this syntax can be translated to a fragment identifier. This may limit the usefulness beyond expressions that are also JSON-based (e.g., outside of annotations or bookmarks).

Editor's note

Illustrate with example of an easy to understand Web Publication Locator, such as might be used in annotating a simple Web Publication.

The semantics of Web Publication Locators are a mapping and extension of the Web Annotation Data Model [annotation-model] and Vocabulary [annotation-vocab] for describing and referencing a segment of a Web resource. As a result, Web Publication Locators provide the expressiveness needed for a broad range of annotation and bookmarking use cases. Additionally, Web Publication Locators provide a way to identify and reference a location within a Web Publication (i.e., as distinct from identifying and referencing a content fragment consisting of a span of characters or bytes). A Web Publication Locator can be used to identify, retrieve and/or reference a fragment of a Web Publication that spans multiple resources.

Note

In composing a Web Publication Locator, use the canonical identifier of the Web Publication in preference to any alternative addresses. Such use facilitates the collation of Web Publication Locators associated with a particular Web Publication. URLs of Web Publication resources appearing in a Web Publication Locator should match the URL of the resource provided in the infoset.

9. Security

Editor's note
Placeholder for security issues.

10. Privacy

Editor's note
Placeholder for privacy issues.

A. WebIDL

A.1 Introduction

This section is non-normative.

Although a Web Publication manifest is authored as [json-ld], user agents process this information into an internal data structure representing the infoset in order to utilize the properties. The exact manner in which this processing occurs, and how the data is used internally, is user agent-dependent. To ensure interoperability when exposing the infoset items, however, this appendix defines a common, abstract representation of the data structures using the standard formalism of the Web Interface Definition Language [webidl-1] which can express the expected names, datatypes, and possible restrictions for each member of the infoset. (A WebIDL representation can be mapped onto ECMAScript, C, or other programming languages.)

A.2 WebPublicationManifest dictionary

dictionary WebPublicationManifest {
    required DOMString                   url;
             DOMString                   lang;
             TextDirection               direction = "auto";
             TextDirection               readingProgression = "auto";
             sequence<LocalizableString> name;
             DOMString                   id;
             sequence<Contributor>       authors;
             DOMString                   dateModified;
             DOMString                   datePublished;
             sequence<PublicationLink>   links;
             sequence<PublicationLink>   readingOrder;
             sequence<PublicationLink>   resources;
             sequence<PublicationLink>   toc;
};

The WebPublicationManifest has the following members:

url
Contains the address. Required.
lang
Contains the default value language.
direction
Contains the default value for base direction.
readingProgression
Contains the value for the reading progression direction.
name
Contains the title.
id
Contains the canonical identifier.
authors
Contains one or more creators.
dateModified
Contains the last modification date.
datePublished
Contains the publication date.
links
Contains links to external resources.
readingOrder
Contains the default reading order.
resources
Contains the resource list.
toc
Contains the table of contents.

A.3 authors member

Editor's note

The current infoset for creators is not fully defined; this dictionary might be further improved once there is agreement on how they should be handled.

dictionary Contributor {
    required LocalizableString name;
             DOMString         id;
};

The author member is a sequence of Contributor dictionaries where each dictionary has the following members:

name
Contains one or more localizable string for the contributor's name.
id
Contains a canonical identifier for the contributor.

A.4 LocalizableString dictionary

Editor's note

This definition includes a slightly tweaked version of the i18n recommendation that also includes a string value in addition to a language and a direction.

Some metadata in te infoset have strong requirements for internationalization. For those members, this specification relies on the best practices established by the i18n WG and on the LocalizableString dictionary.

dictionary LocalizableString {
    required DOMString     value;
             DOMString     lang;
             TextDirection dir = "auto";
};

When lang or dir are specified in LocalizableString, these values override the default language and base direction specificed in WebPublicationManifest.

LocalizableString has the following members:

value
Contains the localized string.
lang
Contains the language.
dir
Contains the base direction.

A.6 TextDirection enum

enum TextDirection {
    "ltr",
    "rtl",
    "auto"
};

The TextDirection enum can contain the following values:

ltr
Left-to-right text.
rtl
Right-to-left text.
auto
Determined by the user agent.

B. Lifecycle diagrams

This section is non-normative.

These diagrams provide a visual view of the lifecycle steps, as specified in 6. Web Publication Lifecycle.

Flowchart depicts how to obtain a manifest.

Figure 2 Obtain a manifest.
See the normative description of the algorithm in 6.1 Obtaining a manifest. Image available in SVG and PNG formats.

Flowchart depicts how to obtain a manifest.

Figure 3 Process a manifest.
See the normative description of the algorithm in 6.2 Processing the manifest. Image available in SVG and PNG formats.

C. Image Descriptions

This section is non-normative.

Description for the “Structure of Web Publications” diagram:
A simplified diagram of the structure of a Web Publication. The Web Publication is broken down into two elements. The first element is the actual contents (all the real things listed in the manifest). This element is broken down into the CSS, the actual “things” such as the HTML documents, audio, etc, and the images, fonts etc. The actual “things” have an additional subset of items that includes the entry page to the publication and all of the other documents. The second element is the Manifest (JSON). The manifest is used to generate the Information Set (“Infoset”), which consists of a list of all the “things” in the publication, the publication metadata, and the default reading order of content. It is noted in the diagram that the entry page has to link to the manifest. (Return to the diagram of Web Publication.)

D. Acknowledgements

This section is non-normative.

The following people contributed to the development of this specification:

The Working Group would also like to thank the members of the Digital Publishing Interest Group for all the hard work they did paving the road for this specification.

E. References

E.1 Normative references

[annotation-model]
Web Annotation Data Model. Robert Sanderson; Paolo Ciccarese; Benjamin Young. W3C. 23 February 2017. W3C Recommendation. URL: https://www.w3.org/TR/annotation-model/
[bcp47]
Tags for Identifying Languages. A. Phillips; M. Davis. IETF. September 2009. IETF Best Current Practice. URL: https://tools.ietf.org/html/bcp47
[css-break-3]
CSS Fragmentation Module Level 3. Rossen Atanassov; Elika Etemad. W3C. 9 February 2017. W3C Candidate Recommendation. URL: https://www.w3.org/TR/css-break-3/
[ecma-404]
The JSON Data Interchange Format. Ecma International. 1 October 2013. Standard. URL: https://www.ecma-international.org/publications/files/ECMA-ST/ECMA-404.pdf
[ecmascript]
ECMAScript Language Specification. Ecma International. URL: https://tc39.github.io/ecma262/
[fetch]
Fetch Standard. Anne van Kesteren. WHATWG. Living Standard. URL: https://fetch.spec.whatwg.org/
[html]
HTML 5.2. Steve Faulkner; Arron Eicholz; Travis Leithead; Alex Danilo; Sangwhan Moon. W3C. 2017-12-14. W3C Recommendation. URL: https://www.w3.org/TR/html/
Link Relations. URL: https://www.iana.org/assignments/link-relations/link-relations.xhtml
[json-ld]
JSON-LD 1.0. Manu Sporny; Gregg Kellogg; Markus Lanthaler. W3C. 16 January 2014. W3C Recommendation. URL: https://www.w3.org/TR/json-ld/
[publishing-linking]
Publishing and Linking on the Web. Ashok Malhotra; Larry Masinter; Jeni Tennison; Daniel Appelquist. W3C. 30 April 2013. W3C Note. URL: https://www.w3.org/TR/publishing-linking/
[rfc2046]
Multipurpose Internet Mail Extensions (MIME) Part Two: Media Types. N. Freed; N. Borenstein. IETF. November 1996. Draft Standard. URL: https://tools.ietf.org/html/rfc2046
[RFC2119]
Key words for use in RFCs to Indicate Requirement Levels. S. Bradner. IETF. March 1997. Best Current Practice. URL: https://tools.ietf.org/html/rfc2119
[rfc3987]
Internationalized Resource Identifiers (IRIs). M. Duerst; M. Suignard. IETF. January 2005. Proposed Standard. URL: https://tools.ietf.org/html/rfc3987
[rfc5988]
Web Linking. M. Nottingham. IETF. October 2010. Proposed Standard. URL: https://tools.ietf.org/html/rfc5988
[schema.org]
Schema.org. URL: https://schema.org
[url]
URL Standard. Anne van Kesteren. WHATWG. Living Standard. URL: https://url.spec.whatwg.org/
[webidl-1]
WebIDL Level 1. Cameron McCormack. W3C. 15 December 2016. W3C Recommendation. URL: https://www.w3.org/TR/2016/REC-WebIDL-1-20161215/
[wpub-ann]
Web Annotation Extensions for Web Publications. Timothy W. Cole; Ivan Herman.2018-01-04. URL: https://www.w3.org/TR/wpub-ann/

E.2 Informative references

[annotation-vocab]
Web Annotation Vocabulary. Robert Sanderson; Paolo Ciccarese; Benjamin Young. W3C. 23 February 2017. W3C Recommendation. URL: https://www.w3.org/TR/annotation-vocab/
[bibtex]
BibTeX Format Description. URL: http://www.bibtex.org/Format/
[css-device-adapt]
CSS Device Adaptation Module Level 1. Rune Lillesveen; Florian Rivoal; Matt Rakow. W3C. 29 March 2016. W3C Working Draft. URL: https://www.w3.org/TR/css-device-adapt-1/
[css-page-3]
CSS Paged Media Module Level 3. Melinda Grant; Elika Etemad; Håkon Wium Lie; Simon Sapin. W3C. 14 March 2013. W3C Working Draft. URL: https://www.w3.org/TR/css3-page/
[doi]
Information and documentation — Digital object identifier system. 2012-05. Published. URL: https://www.iso.org/standard/43506.html
[iso8601]
Representation of dates and times. ISO 8601:2004.. International Organization for Standardization (ISO). 2004. ISO 8601:2004. URL: http://www.iso.org/iso/catalogue_detail?csnumber=40874
Identifier: A Link Relation to Convey a Preferred URI for Referencing. H. Van de Sompel; M. Nelson; G. Bilder; J. Kunze; S. Warner. IETF. URL: https://tools.ietf.org/html/draft-vandesompel-identifier-00
[onix]
ONIX for Books. URL: http://www.editeur.org/83/Overview
[pwp-ucr]
Web Publications Use Cases and Requirements. Heather Flanagan; Ivan Herman; Leonard Rosenthol. W3C. 2 May 2017. W3C Note. URL: https://www.w3.org/TR/pwp-ucr/
[rfc3986]
Uniform Resource Identifier (URI): Generic Syntax. T. Berners-Lee; R. Fielding; L. Masinter. IETF. January 2005. Internet Standard. URL: https://tools.ietf.org/html/rfc3986
[rfc6596]
The Canonical Link Relation. M. Ohye; J. Kupke. IETF. April 2012. Informational. URL: https://tools.ietf.org/html/rfc6596
[string-meta]
Requirements for Language and Direction Metadata in Data Formats. Addison Phillips; Richard Ishida.2017-12-01. URL: https://w3c.github.io/string-meta/
[WCAG20]
Web Content Accessibility Guidelines (WCAG) 2.0. Ben Caldwell; Michael Cooper; Loretta Guarino Reid; Gregg Vanderheiden et al. W3C. 11 December 2008. W3C Recommendation. URL: https://www.w3.org/TR/WCAG20/
[wcag20]
Web Content Accessibility Guidelines (WCAG) 2.0. Ben Caldwell; Michael Cooper; Loretta Guarino Reid; Gregg Vanderheiden et al. W3C. 11 December 2008. W3C Recommendation. URL: https://www.w3.org/TR/WCAG20/
[WEBIDL]
Web IDL. Cameron McCormack; Boris Zbarsky; Tobie Langel. W3C. 15 December 2016. W3C Editor's Draft. URL: https://heycam.github.io/webidl/