Copyright © 2018 W3C® (MIT, ERCIM, Keio, Beihang). W3C liability, trademark and permissive document license rules apply.
This specification defines a collection of information that describes the structure of Web Publications so that user agents can provide user experiences tailored to reading publications, such as sequential navigation and offline reading. This information includes the default reading order, a list of resources, and publication-wide metadata.
This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at https://www.w3.org/TR/.
This draft provides a draft version of a Web Publication. Many details are under active consideration within the Publishing Working Group and are subject to change. The most prominent known issues have been identified in this document and links provided to comment on them.
The work the past few month has been focused on sections 3. Web Publication Construction and 4. Web Publication Properties, which include the definition of a manifest making use of terms in schema.org, as well as the Lifecycle and WebIDL sections.
This document was published by the Publishing Working Group as an Editor's Draft.
GitHub Issues are preferred for discussion of this specification. Alternatively, you can send comments to our mailing list. Please send them to public-publ-wg@w3.org (archives).
Publication as an Editor's Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.
This document was produced by a group operating under the W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.
This document is governed by the 1 February 2018 W3C Process Document.
This section is non-normative.
A Web Publication is a discoverable and identifiable collection of resources. Information about the Web Publication is expressed in a machine-readable document called a manifest, which is what enables user agents to understand the bounds of the Web Publication and the connection between its resources.
The manifest includes metadata that describe the Web Publication, as a publication has an identity and nature beyond its constituent resources. The manifest also provides a list of all the resources that belong to the Web Publication and a default reading order, which is how it connects resources into a single contiguous work.
A Web Publication is discoverable in one of two ways: resources either include a
link to the manifest (via an HTTP Link header or an HTML link
element [html]), or the manifest can be loaded directly by a compatible
user agent.
With the establishment of Web Publications, user agents can build new experiences tailored specifically for their unique reading needs.
This section is non-normative.
This specification only defines requirements for the production and rendering of valid Web Publications. As much as possible, it leverages existing Open Web Platform technologies to achieve its goal—that being to allow for a measure of boundedness on the Web without changing the way that the Web itself operates.
Moreover, the specification is designed to adapt automatically to updates to Open Web Platform technologies in order to ensure that Web Publications continue to interoperate seamlessly as the Web evolves (e.g., by referencing the latest published versions instead of specific dated versions).
Further, this specification does not attempt to constrain the nature of a Web Publication: any type of work that can be represented on the Web constitutes a potential Web Publication.
The specification is also intended to facilitate different user agent architectures for the consumption of Web Publications. While a primary goal is that traditional Web user agents (browsers) will be able to consume Web Publications, this should not limit the capabilities of any other possible type of user agent (e.g., applications, whether standalone or running within a user agent, or even Web Publications that include their own user interface). As a result, the specification does not attempt to architect required solutions for situations whose expected outcome will vary depending on the nature of the user agent and the expectations of the user (e.g., how to prompt to initiate a Web Publication, or at what point or how much of a Web Publication to cache for offline use).
This document uses terminology defined by the W3C Note "Publishing and Linking on the Web" [publishing-linking], including, in particular, user, user agent, browser, and address.
An identifier is metadata that can be used to refer to Web Content in a persistent and unambiguous manner. URLs, URNs, DOIs, ISBNs, and PURLs are all examples of persistent identifiers frequently used in publishing.
A manifest represents structured information about a Web Publication, such as informative metadata, a list of all resources, and a default reading order.
For the purposes of this specification, non-empty is used to refer to an element, attribute or property whose text content or value consists of one or more characters after whitespace normalization, where whitespace normalization rules are defined per the host format.
The general term URL is defined by the URL Standard [url]. It is used as in other W3C specifications, like HTML [html]. In particular, a URL allows for the usage of characters from Unicode following [rfc3987]. See the note in the HTML5 specification for further details.
A Web Publication is a collection of one or more resources, organized together through a manifest into a single logical work with a default reading order. The Web Publication is uniquely identifiable and presentable using Open Web Platform technologies.
As well as sections marked as non-normative, all authoring guidelines, diagrams, examples, and notes in this specification are non-normative. Everything else in this specification is normative.
The key words MAY, MUST, MUST NOT, OPTIONAL, RECOMMENDED, REQUIRED, SHOULD, and SHOULD NOT are to be interpreted as described in [RFC2119].
This specification defines two conformance classes: one for Web Publications and one for user agents that process them.
A Web Publication conforms to this specification if it meets the following criteria:
A user agent conforms to this specification if it meets the following criteria:
This section is non-normative.
A Web Publication is defined by a set of items known as its information set (infoset). The infoset is both abstract and concrete. It is abstract in the sense that it represents a set of information that a user agent has to compile about the Web Publication, but it also becomes concrete when the user agent creates an internal representation of that information.
A manifest, on the other hand, is a serialization of an infoset created by the author of a Web Publication. The manifest is expressed using the JSON-LD [json-ld] format — a variant of JSON [ecma-404] for expressing linked data. The manifest can be created as a standalone resource or it can be embedded within an HTML document.
Although the infoset is primarily compiled from a Web Publication's manifest, some information is obtained outside the manifest. The table of contents, for example, may be referenced from the manifest but is serialized in an HTML document.
This specification describes the requirements for creating both the infoset and manifest. This section, in particular, details how to create a manifest, and the next lists the various properties common to infosets and manifests.
A Web Publication Manifest MUST start by setting the JSON-LD context [json-ld]. The context has the following two major components:
https://schema.org
https://www.w3.org/ns/wp-context
{
"@context" : ["https://schema.org", "https://www.w3.org/ns/wp-context"],
…
}
The Web Publication context file MAY add features to the properties defined in Schema.org (e.g., the requirement for the creator property to be order preserving).
As part of the continuous contacts with Schema.org the additional features defined in the Web Publication context file could migrate to the core Schema.org vocabulary.
Although Schema.org is often referenced using the
http
URI scheme, the vocabulary is being
migrated to use the secure https
scheme as its default.
This specification requires the use https
when referencing
Schema.org in the manifest.
Various manifest properties can have one or more values. As a general rule, these values can be expressed as [json] arrays. When the property value is an array with a single element, however, the array syntax can be omitted.
Various manifest properties are expected to be expressed as [json] objects. Although the use of objects is usually RECOMMENDED, it is also acceptable to use string values that are interpreted as objects depending on the context. The exact mapping of text values to objects is part of the property or object definitions.
With the exception of the descriptive properties, the Web Publication properties typically link to one or more resources. When a property requires a link value, the link MUST be expressed in one of the following two ways:
PublicationLink
object that can be used to
express the URL, the media type, and other characteristics of the
target resource.In other words, a single string value is a shorthand for a
PublicationLink
object whose url
property
is set to that string value. (See also 3.2.2.2 Text Values or Objects.)
{
…
"resources" : [
"datatypes.svg",
{
"@type" : "PublicationLink",
"url" : "test-utf8.csv",
"encodingFormat" : "text/csv",
"name" : "Test Results",
"description" : "CSV file containing the full data set used."
},
{
"@type" : "PublicationLink",
"url" : "terminology.html",
"encodingFormat" : "text/html",
"rel" : "glossary"
}
]
}
PublicationLink
DefinitionThis specification defines a new type for
links called PublicationLink
. It consists of the
following properties:
Term | Description | Required Value | [schema.org] Mapping | Optionality |
---|---|---|---|---|
url
|
Location of the resource. | A URL [url]. Refer to the property definitions that accept this type for additional restrictions. |
url
|
REQUIRED |
encodingFormat
|
Media type of the resource (e.g.,
text/html ). |
MIME Media Type [rfc2046]. |
encodingFormat
|
OPTIONAL |
name
|
Name of the item. | One or more Text items. |
name
|
OPTIONAL |
description
|
Description of the item. | Text. |
description
|
OPTIONAL |
rel
|
The relationship of the resource to the Web Publication. |
One or more relations. The values are either the relevant relationship terms of the IANA link registry [iana-link-relations], or specially-defined URLs if no suitable link registry item exists. |
(None) | OPTIONAL |
(Extracting a discussion on #232 into a separate Issue.)
Schema.org has a (currently "pending") type LinkRole which may be a good alternative to the (publication specific) PublicationLink. Maybe worth considering using the schema.org type.
The Web Publication Manifest MUST include a Publication Type
using the @type
keyword [json-ld]. The type MAY be
mapped onto the CreativeWork
type [schema.org].
{
"@context" : ["https://schema.org", "https://www.w3.org/ns/wp-context"],
"@type" : "CreativeWork"
…
}
Schema.org also includes a number of more specific types, all subtypes of
CreativeWork
, such as Article, Book, and Course. These MAY be used instead
of CreativeWork
.
{
"@context" : ["https://schema.org", "https://www.w3.org/ns/wp-context"],
"@type" : "Book"
…
}
Each schema.org type defines a set of properties that are valid for use with it. To ensure that the manifest can be validated and processed by schema.org aware processors, the manifest SHOULD contain only the properties associated with the selected type.
If properties from more than one type are needed, the manifest MAY include multiple type declarations.
{
"@context" : ["https://schema.org", "https://www.w3.org/ns/wp-context"],
"@type" : ["Book", "VisualArtwork"],
…
}
User agents SHOULD NOT fail to process manifests that are not valid to their declared schema.org type(s).
Refer to the Schema.org site for the complete list of CreativeWork
subtypes.
The naming, syntax, and requirements for manifest properties are defined in 4. Web Publication Properties.
Although authors only have to understand the serialization requirements for manifest terms, they are encouraged to read through the infoset definitions for each property, as well. The infoset definitions describe, in some cases, how items are compiled in the absence of explicit information in the manifest.
Relative URL strings MAY be used in the manifest. These URLs are resolved into absolute URL strings using a base URL [url].
The base URL for relative URLs is determined as follows:
For embedded manifests, this means that relative URLs are resolved against
the URL of the primary entry page unless the page declares a base
direction (i.e., in a <base>
element in its header or via an xml:base
attribute [html]).
A manifest can be embedded within an HTML document using the script
element [html].
When embedding a manifest, the type
attribute of the containing
script
element MUST be set to
application/ld+json
.
Additionally, the script
element MUST include a unique identifier in
an id
attribute [html]. This identifier ensures that the
manifest can be referenced.
<script id="example_manifest" type="application/ld+json">
{
…
}
</script>
Resources SHOULD provide a link to the manifest of the Web Publication to which they belong to enable discovery. Links MUST take one or both of the following forms:
An HTTP Link
header field [rfc5988] with its
rel
parameter set to the value
"publication
".
Link: <https://example.com/webpub/manifest>; rel=publication
A link
element [html] with its rel
attribute set to the value "publication
".
<link href="https://example.com/webpub/manifest" rel="publication"/>
When a manifest is embedded within an HTML document, the link MUST include a
fragment identifier that references the script
element that
contains the manifest (see 3.3 Embedding a Manifest).
<link href="#example_manifest" rel="publication">
…
<script id="example_manifest" type="application/ld+json">
{
"@context" : ["https://schema.org", "https://www.w3.org/ns/wp-context"],
…
}
</script>
The exact value of rel
is still to
be agreed upon and should be registered by IANA.
The following details might be moved to the lifecycle section in a future draft.
When a resource links to multiple manifests, a user agent MAY choose to present one or more alternatives to the end user, or choose a single alternative on its own. The user agent MAY choose to present any manifest based upon information that it possesses, even one that is not explicitly listed as a parent (e.g., based upon information it calculates or acquires out of band). In the absence of a preference by user agent implementers, selection of the first manifest listed is suggested as a default.
A Web Publication MUST include at least one HTML document [html] that links to the manifest. This page is referred to as the primary entry page of the Web Publication.
There are no restrictions on a Web Publication beyond this requirement. The Web Publication MAY include references to resources of any media type, both in the default reading order and as dependencies of other resources.
When adding resources to a Web Publication, consider support in user agents. The use of progressive enhancement techniques and the provision of fallback content, as appropriate, will ensure a more consistent reading experience for users regardless of their preferred user agent.
The table of contents provides a hierarchical list of links that reflects the structural outline of the major sections of the Web Publication.
The table of contents is expressed via an HTML element in one of the resources (typically a nav
element [html]). This element MUST be identified by the
role
attribute [html] value
"doc-toc
" [dpub-aria-1.0], and MUST be the first element in
the document so designated.
The table of contents SHOULD be located in the primary entry page of the Web Publication. If not, the manifest SHOULD identify the resource that contains the structure.
There are no requirements on the table of contents itself, except that, when specified, it MUST include a link to at least one resource.
Refer to the table of contents property definition for more information on how to identify in the infoset and manifest which resource contains the table of contents.
Do we need a more detailed definition for the HTML TOC format?
This section is non-normative.
Both the Web Publication infoset and manifest are defined by a common set of properties that describe the basic information a user agent requires to process and render a Web Publication. These properties are categorized as followed:
Descriptive properties describe aspects of a Web Publication, such as its title, creator, and language. These properties are primarily drawn from Schema.org and its hosted extensions [schema.org], so they map to one or several Schema.org properties and inherit their syntax and semantics. (The following property categories typically do not have Schema.org equivalents, so are defined specifically for Web Publications.)
Resource categorization properties describe or identify common sets of resources, such as the resource list and default reading order. These properties refer to one or more external resources (images, script files, separate metadata files, etc.).
Informative properties identify resources that contain additional information about the Web Publication, such as its privacy policy or an accessibility report.
Structural properties identify key meta structures of the Web Publication, such as the cover image or the the location of the table of contents.
The categorization of properties is done to simplify comprehension of their purpose; the groupings have no relevance outside this specification (i.e., the groupings do not exist in the infoset or manifest).
Each manifest item drawn from schema.org identifies the property it maps to and includes its defining type in parentheses. Properties are often available in many types, however, as a result of the schema.org inheritance model. Refer to each property definition for more detailed information about where it is valid to use.
Schema.org additionally includes a large number of properties that, though relevant for publishing, are not mentioned in this specification — Web Publication authors can use any of these properties. This document defines only the minimal set of infoset items.
There are discussion on whether a best practices document would be created, referring to more schema.org terms. If so, it should be linked from here.
The requirements for the expression of Web Publication properties are defined by the infoset as follows:
As the infoset properties do not all have to be serialized in the manifest, the requirements for the manifest will differ in some cases. Refer to each property's definition to determine whether it is required in the manifest or can be compiled from other information.
The accessibility properties provides information about the suitability of a Web Publication for consumption by users with varying preferred reading modalities. These properties typically supplement an evaluation against established accessibility criteria, such as those provided in [WCAG20]. (For linking to a detailed accessibility report, see 4.5.1 Accessibility Report.)
The following infoset items are categorized as accessibility properties:
The more detailed descriptions of these properties, as well as the possible values, are described on the WebSchemas Wiki site.
Term | Description | Required Value | [schema.org] Mapping |
---|---|---|---|
accessMode
|
The human sensory perceptual system or cognitive faculty through which a person may process or perceive information. | One or more text(s). Expected values. |
accessMode (CreativeWork) |
accessModeSufficient
|
A list of single or combined accessModes that are sufficient to understand all the intellectual content of a resource. | One or more texts, each a comma-separated list of terms. Expected values. |
accessModeSufficient (CreativeWork) |
accessibilityAPI
|
Indicates that the resource is compatible with the referenced accessibility APIs. | One or more text(s).Expected values. |
accessibilityAPI (CreativeWork) |
accessibilityControl
|
Identifies input methods that are sufficient to fully control the described resource. | One or more text(s). Expected values. |
accessibilityControl (CreativeWork) |
accessibilityFeature
|
Content features of the resource, such as accessible media, alternatives and supported enhancements for accessibility. | One or more text(s). Expected values. |
accessibilityFeature (CreativeWork) |
accessibilityHazard
|
A characteristic of the described resource that is physiologically dangerous to some users. | One or more text(s).Expected values. |
accessibilityHazard (CreativeWork) |
accessibilitySummary
|
A human-readable summary of specific accessibility features or deficiencies, consistent with the other accessibility metadata but expressing subtleties such as “short descriptions are present but long descriptions will be needed for non-visual users” or “short descriptions are present and no long descriptions are needed.” | Text. |
accessibilitySummary (CreativeWork) |
Note that the author MAY also provide a reference to a more detailed Accessibility Report, beyond the accessibility information expressed by these properties.
{
"@context" : ["https://schema.org","https://www.w3.org/ns/wp-context"],
"@type" : "CreativeWork",
…
"accessibilityAPI" : "ARIA",
"accessMode" : ["textual", "visual"],
"accessModeSufficient" : ["textual"],
…
}
A Web Publication's address is a URL [url] that represents the primary entry page for the Web Publication.
If the address does not resolve to an HTML document [html], user agents SHOULD NOT provide access to it to users. A Web Publication MAY have more than one address, but all the addresses MUST resolve to the same document.
The referenced document SHOULD be a resource of the Web Publication. It can be any resource, including one that is not listed in the default reading order. This document MUST include a link to the manifest to ensure a bidirectional linking relationship (i.e., that user agents can also locate the manifest from the document at the address).
If the document is not a Web Publication resource, user agents SHOULD load the first document in the default reading order when initiating the Web Publication.
To improve the usability of Web Publications, particularly in user agents that do not support Web Publications, authors are encouraged to include navigation aids in the referenced document that facilitate consumption of the content, (e.g., provide a table of contents or a link to one).
Term | Description | Required Value | [schema.org] Mapping |
---|---|---|---|
url
|
URL of the primary entry page. | A URL [url]. |
url (Thing) |
{
"@context" : ["https://schema.org","https://www.w3.org/ns/wp-context"],
"@type" : "Book",
…
"url" : "https://publisher.example.org/mobydick",
…
}
A Web Publication's canonical identifier is a unique identifier that resolves to the preferred version of the Web Publication.
Ensuring uniqueness of canonical identifiers is outside the scope of this specification. The actual achievable uniqueness depends on such factors as the conventions of the identifier scheme used and the degree of control over assignment of identifiers.
The canonical identifier is intended to provide a measure of permanence above and beyond the Web Publication's address(es). If a Web Publication is permanently relocated to a new URL, for example, the canonical identifier provides a way of discovering the new location (e.g., a DOI registry could be updated with the new URL, or a redirect could be added to the URL of the canonical identifier). It is also intended to provide a means of identifying instances of the same Web Publication hosted at different URLs.
The canonical identifier MUST be a URL [url].
If a URL is not provided in the manifest, or the value is an invalid URL, the Web Publication does not have a canonical identifier. User agents MUST NOT attempt to construct a canonical identifier from any other identifiers provided in the manifest.
The canonical identifier can be used as the target of a
"canonical" link [rfc6596] (e.g., a link
element [html] whose rel
attribute has the value
canonical
or an HTTP
Link
header field [rfc5988] similarly
identified).
Is a canonical identifier necessary to call out explicitly in the infoset, or can it be handled by other metadata.
Term | Description | Required Value | [schema.org] Mapping |
---|---|---|---|
@id
|
Preferred version of the Web Publication. | A URL [url]. | (None) |
The specification of the canonical identifier MAY be complemented by the
inclusion of additional types of identifiers for the Web Publication
using the identifier
property [schema.org]
and/or its subtypes.
{
"@context" : ["https://schema.org","https://www.w3.org/ns/wp-context"],
"@type" : "CreativeWork",
…
"@id" : "http://www.w3.org/TR/tabular-data-model/",
"url" : "http://www.w3.org/TR/2015/REC-tabular-data-model-20151217/",
…
}
{
"@context" : ["https://schema.org","https://www.w3.org/ns/wp-context"],
"@type" : "Book",
…
"isbn" : "9780123456789",
"url" : "https://publisher.example.org/mobydick",
…
}
A creator is an individual or entity responsible for the creation of the Web Publication. Creators are represented in one of the following two ways:
Person
and Organization
objects, respectively. In other words, a single string value is a shorthand for a
Person
object whose name
property is set to
that string value. (See also 3.2.2.2 Text Values or Objects.)
The following infoset items are categorized as creators:
A Web Publication MAY have more than one of each of these types of creators.
Term | Description | Required Value | [schema.org] Mapping |
---|---|---|---|
artist
|
The primary artist for the publication, in a medium other than pencils or digital line art. | One or more Person . |
artist (VisualArtwork) |
author
|
The author of the publication. | One or more Person and/or Organization . |
author
(CreativeWork) |
colorist
|
The individual who adds color to inked drawings. | One or more Person . |
colorist (VisualArtwork) |
contributor
|
Contributor whose role does not fit to one of the other roles in this table. | One or more Person and/or Organization . |
contributor (CreativeWork) |
creator
|
The creator of the publication. | One or more Person and/or Organization . |
creator (CreativeWork) |
editor
|
The editor of the publication. | One or more Person . |
editor
(CreativeWork) |
illustrator
|
The illustrator of the publication. | One or more Person . |
illustrator (Book) |
inker
|
The individual who traces over the pencil drawings in ink. | One or more Person . |
inker
(VisualArtwork) |
letterer
|
The individual who adds lettering, including speech balloons and sound effects, to artwork. | One or more Person . |
letterer (VisualArtwork) |
penciler
|
The individual who draws the primary narrative artwork. | One or more Person . |
penciler (VisualArtwork) |
publisher
|
The publisher of the publication. | One or more Person and/or Organization . |
publisher (CreativeWork) |
readBy
|
A person who reads (performs) the publication (for audiobooks). | One or more Person . |
readBy (Audiobook)
|
translator
|
The translator of the publication. | One or more Person and/or Organization . |
translator (CreativeWork) |
{
"@context" : ["https://schema.org","https://www.w3.org/ns/wp-context"],
"@type" : "Book",
…
"url" : "https://publisher.example.org/mobydick",
"author" : {
"@type" : "Person",
"name" : "Herman Melville"
}
}
{
"@context" : ["https://schema.org","https://www.w3.org/ns/wp-context"],
"@type" : "CreativeWork",
…
"@id" : "http://www.w3.org/TR/tabular-data-model/",
"url" : "http://www.w3.org/TR/2015/REC-tabular-data-model-20151217/",
"author" : [
"Jeni Tennison",
{
"@type" : "Person",
"name" : "Gregg Kellogg",
},{
"@type" : "Person",
"name" : "Ivan Herman",
"@id" : "https://www.w3.org/People/Ivan/"
}
],
"editor" : [
"Jeni Tennison",
{
"@type" : "Person",
"name" : "Gregg Kellogg",
}
],
"publisher" : {
"@type" : "Organization",
"name" : "World Wide Web Consortium",
"@id" : "https://www.w3.org/"
}
…
}
The Web Publication has a natural language value (e.g., English, French, Chinese), as well as a natural base writing direction (left-to-right or right-to-left). The infoset has entries to set these values, which can influence, for example, the behavior of a user agent (e.g., it might place a pop-up for a table of contents on the right hand side for publications whose natural base direction is right-to-left).
Similarly, each natural language property value in the Web Publication's infoset (e.g., title, creators) is localizable [string-meta], meaning that the same information is available for each.
As a result, the infoset has entries to set:
of both the Web Publication and the natural language properties values of the infoset.
The infoset MAY contain global language and base direction declarations for the Web Publication. The natural language MUST be a tag that conforms to [bcp47], while the base language direction MUST have one of the following values:
ltr
: indicates that the textual values are explicitly
directionally set to left-to-right text;rtl
: indicates that the textual values are explicitly
directionally set to right-to-left text;auto
: indicates that the textual values are explicitly
directionally set to the direction of the first character with a
strong directionality.When specified, these properties are also used as defaults for textual values in the infoset.
It is important to differentiate the language of the publication from the language and the base direction of the individual resources that compose it. If such resources are, for example, in HTML, the language and direction need to be set in those resources, too. The language and base direction of the publication are not inherited.
The global language information MAY be overridden by individual values.
When using Web Publication manifests with bidirectional text, user agents SHOULD identify the base direction of any given natural language value by scanning the text for the first strong directional character. Once the base direction has been identified, user agents MUST determine the appropriate rendering and display of natural language values according to the Unicode Bidirectional Algorithm [bidi]. This could require wrapping additional control characters or markup around the string prior to display, in order to apply the base direction. (See C. Examples for bidirectional texts.)
This section, in particular the features related to text directions, must be reviewed by I18N experts.
If the manifest is embedded in the
primary entry page via a script
element, and the manifest
does not set the global language and/or the base direction (see 4.3.5.2.1 Global Language and Direction), the
lang
and the dir
attributes of the
script
element are used as the global language
and base direction, respectively (see the details on handling the
lang
and dir
attributes in [html]).
It is to be discussed whether this last paragraph, i.e.,
inheriting values from script
, should be kept.
If a user agent requires the language and one is not available in the infoset (globally, or specifically for that property), or the obtained value is invalid, the user agent MAY attempt to determine the language. This specification does not mandate how such a language tag is created. The user agent might:
No default values are specified for the language or the default base direction.
As this infoset item refers to several aspects of setting language and direction, these are treated separately.
Term | Description | Required Value | [schema.org] Mapping |
---|---|---|---|
inLanguage
|
Default language for the Web Publication as well as the textual infoset values | language code as defined in [bcp47] |
inLanguage (Property)
|
inDirection
|
Default base direction for the Web Publication as well as the textual infoset values | ltr , rtl , or
auto |
(None) |
If authors intend to use a manifest, or a manifest
template, both as embedded manifest and as a separate resource, they
are strongly encouraged to set these properties explicitly to avoid
interference of the containing script
element in case
of embedding.
It is possible to set the language for any textual value in the
manifest. This information MUST be set as a localizable string, i.e., using
the @value
and @language
keywords
(instead of a simple string) [json-ld]:
{
"@context" : ["https://schema.org","https://www.w3.org/ns/wp-context"],
"@type" : "Book",
…
"author" : {
"@type" : "Person",
"name" : {
"@value" : "Marcel Proust",
"@language" : "fr"
}
}
}
The value of the language tag MUST be set to a language code as defined in [bcp47].
When used in a context of localizable texts, a simple string value
is a shorthand for a localizable string, with the
@value
set to the string value, and the language
set to the value of the inLanguage
property, if applicable, and
unset otherwise. In other words, the previous example is equivalent
to:
{
"@context" : ["https://schema.org","https://www.w3.org/ns/wp-context"],
"@type" : "Book",
"inLanguage" : "fr",
…
"author" : "Marcel Proust",
}
(See also 3.2.2.2 Text Values or Objects.)
It is not possible to set the direction explicitly for a value.
Setting the direction for a natural text value is currently not possible in JSON-LD [json-ld]. In case the JSON-LD community, as well as the schema.org community, introduces such a feature, future versions of this specification may extend the ability of Web Publication Manifests to include this.
The last modification date is the date when the Web Publication was last updated (i.e., whenever changes were last made to any of the resources of the Web Publication, including the manifest).
The last modification date does not necessarily reflect all changes to the Web Publication (e.g., third-party content could change without the author being aware). User agents SHOULD check the last modification date of individual resources to determine if they have changed and need updating.
Term | Description | Required Value | [schema.org] Mapping |
---|---|---|---|
dateModified
|
Last modification date of the publication. | A Date or
DateTime
value [schema.org], both expressed in ISO 8601 Date,
or Date Time formats, respectively [iso8601]. |
dateModified (CreativeWork) |
{
"@context" : ["https://schema.org","https://www.w3.org/ns/wp-context"],
"@type" : "CreativeWork",
…
"@id" : "http://www.w3.org/TR/tabular-data-model/",
"url" : "http://www.w3.org/TR/2015/REC-tabular-data-model-20151217/",
"dateModified" : "2015-12-17",
…
}
The publication date is the date on which the Web Publication was originally published. It represents a static event in the lifecycle of a Web Publication and allows subsequent revisions to be identified and compared.
The exact moment of publication is intentionally left open to interpretation: it could be when the Web Publication is first made available online or could be a point in time before publication when the Web Publication is considered final.
Term | Description | Required Value | [schema.org] Mapping |
---|---|---|---|
datePublished
|
Creation date of the publication. | A Date or
DateTime , both expressed in ISO
8601 Date, or Date Time formats, respectively
[iso8601]. |
datePublished (CreativeWork) |
{
"@context" : ["https://schema.org","https://www.w3.org/ns/wp-context"],
"@type" : "CreativeWork",
…
"@id" : "http://www.w3.org/TR/tabular-data-model/",
"url" : "http://www.w3.org/TR/2015/REC-tabular-data-model-20151217/",
"datePublished" : "2015-12-17",
"dateModified" : "2016-01-30",
…
}
The reading progression establishes the reading direction from one resource to the next within a Web Publication.
The value of this property may be:
ltr
: left-to-right;rtl
: right-to-left.The default value is ltr
.
This infoset item has no effect on the rendering of the individual primary resources; it is only relevant for the progression direction from one resource to the other.
The reading progression of a Web Publication is used to adapt such publication level interactions as menu position, swap direction, defining tap zones to lead the user to the next and previous pages, touch gestures, etc.
Term | Description | Required Value | [schema.org] Mapping |
---|---|---|---|
readingProgression
|
Reading direction from one resource to the other. | ltr or rtl |
(None) |
If this value is not set, its default value is ltr
.
{
"@context" : ["https://schema.org","https://www.w3.org/ns/wp-context"],
"@type" : "Book",
…
"url" : "https://publisher.example.org/mobydick",
"readingProgression" : "ltr"
}
The title provides the human-readable name of the Web Publication.
The title is specified by the manifest expression, when present. If not included in the manifest:
title
element [html] of the Web Publication’s primary entry page (if present and not empty);title
element [html] of the Web Publication’s primary entry page (if present and not empty).
Relying on the title
element could be
semantically problematic if the Web Publication consists of several HTML
resources (e.g., one per chapter of a book), because the HTML definition defines this element as "metadata" for the
enclosing HTML document, not for a collection of resources. Using this
element is, on the other hand, preferred in the case of a publication
consisting of a single HTML document (e.g., a scholarly journal
article).
When specified in the infoset, the title MUST be non-empty.
If a user agent requires a title and one is not available in the infoset, it MAY create one (e.g., provide a language-specific placeholder title or use the URL of the manifest).
A user agent is not expected to produce a meaningful title [wcag20] for a Web Publication when one is not specified.
Term | Description | Required Value | [schema.org] Mapping |
---|---|---|---|
name
|
Human-readable title of the Web Publication. | One or more text items for the title. |
name (Thing) |
{
"@context" : ["https://schema.org","https://www.w3.org/ns/wp-context"],
"@type" : "Book",
…
"url" : "https://publisher.example.org/mobydick",
"name" : "Moby Dick"
}
Web Publication resources are specified via the default reading order, the resource list, and the links, as defined in this section. These lists contain references to informative properties like the privacy policy, and structural properties like the table of contents.
Note that a particular resource's URL MUST NOT appear in more than one of these lists, and a URL MUST NOT be repeated within a list.
The manifest itself MUST NOT include a reference to itself, i.e., the reference to the manifest MUST NOT appear within these lists.
The default reading order is a specific progression through a set of Web Publication resources.
A user might follow alternative pathways through the content, but in the absence of such interaction the default reading order defines the expected progression from one resource to the next.
The default reading order MUST include at least one resource.
The default reading order is specified directly in the manifest. However, if the reading order consists of only a single resource, namely the primary entry page of the Web Publication, the default reading order need not be specified.
If present in the Web Publication Manifest, this item MUST be mapped on
the readingOrder
term, defined specifically for Web
Publications.
Term | Description | Required Value | [schema.org] Mapping |
---|---|---|---|
readingOrder
|
An array of:
The order in the array is significant. The URLs
MUST NOT include fragment identifiers. Non-HTML
resources SHOULD be expressed as
|
(None) |
{
"@context" : ["https://schema.org","https://www.w3.org/ns/wp-context"],
"@type" : "Book",
…
"url" : "https://publisher.example.org/mobydick",
"name" : "Moby Dick",
"readingOrder" : [
"html/title.html",
"html/copyright.html",
"html/introduction.html",
"html/epigraph.html",
"html/c001.html",
…
]
}
{
"@context" : ["https://schema.org","https://www.w3.org/ns/wp-context"],
"@type" : "Book",
…
"url" : "https://publisher.example.org/mobydick",
"name" : "Moby Dick",
"readingOrder" : [{
"@type" : "PublicationList",
"url" : "html/title.html",
"encodingFormat" : "text/html",
"name" : "Title page"
},{
"@type" : "PublicationList",
"url" : "html/copyright.html",
"encodingFormat" : "text/html",
"name" : "Copyright page"
},{
…
}]
}
The resource list enumerates any additional resources used in the processing and rendering of a Web Publication that are not already listed in the default reading order.
The union of the resource list and default reading order represents the definitive list of resources that belong to the Web Publication. All other resources are external to the Web Publication.
The completeness of the resource list will affect the usability of the Web Publication in certain reading scenarios (e.g., the ability to read the Web Publication offline). For this reason, it is strongly RECOMMENDED to provide a comprehensive list of all of the Web Publication's constituent resources beyond those listed in the default reading order.
In some cases, a comprehensive list of these resources might not be easily achieved (e.g., third-party scripts that reference resources from deep within their source), but a user agent SHOULD still be able to render a Web Publication even if some of these resources are not identified as belonging to the Web Publication (e.g., when it is taken offline without them).
If a user agent encounters a resource that it cannot locate in the resource list, it MUST treat the resource as external to the Web Publication (e.g., it might alert the user before loading, open the resource in a new window, or unload the current Web Publication and resume normal Web browsing).
This was not decided on the Toronto F2F, and is still open.
We talk about the bounds of the publication but we never explicitly define what it means, where it comes from and what a UA is to do with it (and in what specific use cases).
If present in the Web Publication Manifest, this item MUST be mapped on
the resources
term, defined specifically for Web
Publications.
Term | Description | Required Value | [schema.org] Mapping |
---|---|---|---|
resources
|
An array of:
The order in the array is not significant. The
URLs MUST NOT include fragment identifiers. It is
RECOMMENDED to use |
(None) |
{
"@context" : ["https://schema.org","https://www.w3.org/ns/wp-context"],
"@type" : "CreativeWork",
…
"@id" : "http://www.w3.org/TR/tabular-data-model/",
"url" : "http://www.w3.org/TR/2015/REC-tabular-data-model-20151217/",
…
"resources" : [
"datatypes.html",
"datatypes.svg",
"datatypes.png",
"diff.html",
{
"@type" : "PublicationLink",
"url" : "test-utf8.csv",
"encodingFormat" : "text/csv"
},{
"@type" : "PublicationLink",
"url" : "test-utf8-bom.csv",
"encodingFormat" : "text/csv"
},{
…
}
],
…
}
The links property provides a list of resources that are not required for the processing and rendering of a Web Publication (i.e., the content of the Web Publication remains unaffected even if these resources are not available). Linked resources are typically made available to user agents to augment or enhance the processing or rendering of it, such as:
The links
property can also be used to identify resources that
are used in the online rendering of a Web Publication, but that are not
essential to include with it when it is taken offline or packaged (e.g., to
minimize the size). These include:
The links
list SHOULD include resources necessary to render
a linked resource (e.g., scripts, images, style sheets).
Resources listed in the links
list MUST NOT be listed in the
default reading order or resource list.
User agents MAY ignore linked resources, and are not required to take them offline with a Web Publication. These resources SHOULD NOT be included when packaging a Web Publication.
Term | Description | Required Value | [schema.org] Mapping |
---|---|---|---|
links
|
An array of:
The order in the array is not significant. It is
RECOMMENDED to use |
(None) |
An accessibility report provides information about the suitability of a Web Publication for consumption by users with varying preferred reading modalities. These reports typically identify the result of an evaluation against established accessibility criteria, such as those provided in [WCAG21], and are an important source of information in determining the usability of a Web Publication.
The infoset SHOULD include a link to an accessibility report when one is available for a Web Publication. It is RECOMMENDED that the report be included as a resource of the Web Publication.
It is also RECOMMENDED that the accessibility report be provided in a human-readable format, such as HTML [html]. Augmenting these reports with machine-processable metadata, such as provided in Schema.org [schema.org], is also RECOMMENDED.
If present in the manifest, the accessibility report MUST be expressed as
a PublicationLink
. The
rel
value of the PublicationLink
MUST include the
https://www.w3.org/ns/wp#accessibility-report
identifier.
The Working Group will attempt to define the
accessibility-report
term by IANA, to avoid using a
URL.
{
"@context" : ["https://schema.org","https://www.w3.org/ns/wp-context"],
"@type" : "Book",
…
"url" : "https://publisher.example.org/mobydick",
"name" : "Moby Dick",
"links" : [{
"@type" : "PublicationLink",
"url" : "https://www.publisher.example.org/mobydick-accessibility.html",
"rel" : "https://www.w3.org/ns/wp#accessibility-report"
},{
…
}],
…
}
Users often have the legal right to know and control what information is collected about them, how such information is stored and for how long, whether it is personally identifiable, and how it can be expunged. Including a statement that addresses such privacy concerns is consequently an important part of publishing Web Publications. Even if no information is collected, such a declaration increases the trust users have in the content.
A link to a privacy policy can be included in the infoset. It is RECOMMENDED that the privacy policy be included as a resource of the Web Publication.
It is RECOMMENDED that the privacy policy be provided in a human-readable format, such as HTML [html].
Refer to 10. Privacy for more information about privacy considerations in Web Publications.
If present in the manifest, the privacy policy MUST be expressed as a PublicationLink
. The
rel
value of the PublicationLink
MUST include the
privacy-policy
identifier [iana-link-relations].
{
"@context" : ["https://schema.org","https://www.w3.org/ns/wp-context"],
"@type" : "CreativeWork",
…
"@id" : "http://www.w3.org/TR/tabular-data-model/",
"url" : "http://www.w3.org/TR/2015/REC-tabular-data-model-20151217/",
…
"links" : [{
"@type" : "PublicationLink",
"url" : "https://www.w3.org/Consortium/Legal/privacy-statement-20140324",
"encodingFormat" : "text/html",
"rel" : "privacy-policy"
},{
…
}],
…
}
The cover is a resource that user agents can use to present the Web Publication (e.g., in a library or bookshelf, or when initially loading the Web Publication).
The working group has not reached consensus on whether the cover should be any resource or should be limited to images.
The infoset SHOULD include a reference to a cover.
More than one cover MAY be referenced from the infoset (e.g., to provide alternative formats and sizes for different device screens). If multiple covers are specified, each instance MUST define at least one unique property to allow user agents to determine its usability (e.g., a different format, height, width or relationship).
If present in the manifest, the cover MUST be expressed as a PublicationLink
. The
URL expressed in the url
term MUST NOT include a fragment
identifier.
The rel
value of the PublicationLink
MUST include the
https://www.w3.org/ns/wp#cover
identifier.
If the cover is in an image format, a title
and
description
SHOULD be provided. User agents can use
these properties to provide alternative text and descriptions when
necessary for accessibility.
The Working Group will attempt to define the
cover
term by IANA, to avoid using a URL.
{
"@context" : ["https://schema.org","https://www.w3.org/ns/wp-context"],
"@type" : "Book",
…
"url" : "https://publisher.example.org/donquixote",
"name" : "Don Quixote",
"resources" : [{
"@type" : "PublicationLink",
"url" : "cover.html",
"encodingFormat" : "text/html"
"rel" : "https://www.w3.org/ns/wp#cover"
},{
…
}],
…
}
{
"@context" : ["https://schema.org","https://www.w3.org/ns/wp-context"],
"@type" : "Book",
…
"url" : "https://publisher.example.org/mobydick",
"name" : "Moby Dick",
"resources" : [{
"@type" : "PublicationLink",
"url" : "whale-image.jpg",
"encodingFormat" : "image/jpeg",
"rel" : "https://www.w3.org/ns/wp#cover",
"title" : "Moby Dick attacking hunters",
"description" : "A white whale is seen surfacing from the water to attack a small whaling boat"
},{
…
}],
…
}
{
"@context" : ["https://schema.org","https://www.w3.org/ns/wp-context"],
"@type" : "Book",
…
"url" : "https://publisher.example.org/donquixote",
"name" : "Gulliver's Travels",
"resources" : [{
"@type" : "PublicationLink",
"url" : "lilliput.jpg",
"encodingFormat" : "image/jpeg",
"rel" : "https://www.w3.org/ns/wp#cover"
},{
"@type" : "PublicationLink",
"url" : "lilliput.svg",
"encodingFormat" : "image/svg+xml",
"rel" : "https://www.w3.org/ns/wp#cover"
},{
…
}],
…
}
The table of contents property identifies the resource that contains the Web Publication's table of contents.
User agents MUST compute the table-of-contents
as
follows:
rel
value including
contents
[iana-link-relations], the
corresponding url
value identifies the table of
content resource. role
[html] value
doc-toc
[dpub-aria-1.0], the user agent MUST
use that element as the table of contents. If identifying the Table of Content is ambiguous (e.g., several table of
content resources are identified, or several elements with a
role
value doc-toc
is found within the
table of content resource), the user agent MAY choose among them. This
specification does not mandate how this choice is made. The user agent
might:
If this process does not result in a link to the table of contents, the Web Publication does not have a table of contents and this property MUST NOT be included in the infoset.
Depending on the resolution to this issue, the infoset might contain a separate entry for a machine-processable table of contents, restrictions could be placed on the HTML structure of the referenced table of contents, or parsing rules for extracting a table of contents could be added.
If present in the manifest, the table of content MUST be expressed as a
PublicationLink
.
The URL expressed in the url
term MUST NOT include a
fragment identifier.
The rel
value of the PublicationLink
MUST include the
contents
identifier [iana-link-relations].
The link to the table of contents MAY be specified in either the default reading order or resource-list, but MUST NOT be specified in both.
{
"@context" : ["https://schema.org","https://www.w3.org/ns/wp-context"],
"@type" : "Book",
…
"url" : "https://publisher.example.org/mobydick",
"name" : "Moby Dick",
"resources" : [{
"@type" : "PublicationLink",
"url" : "toc_file.html",
"rel" : "contents"
},{
…
}],
…
}
<head>
…
<script type="application/ld+json">
{
"@context" : ["https://schema.org","https://www.w3.org/ns/wp-context"],
"@type" : "CreativeWork",
…
"@id" : "http://www.w3.org/TR/tabular-data-model/",
"url" : "http://www.w3.org/TR/2015/REC-tabular-data-model-20151217/",
…
}
</script>
…
</head>
<body>
…
<section role="doc-toc">
…
</section>
…
</body>
The infoset is designed to provide a basic set of properties for use by user agents in presenting and rendering a Web Publication, but MAY be extended in the following ways:
Although both methods are valid, the use of linked records to extend the infoset is RECOMMENDED.
This specification does not define how such additional properties are compiled, stored or exposed by user agents in their internal representation of the infoset. A user agent MAY ignore some or all extended properties.
Extending the manifest through links to a record, such as an
ONIX [onix] or BibTeX [bibtex] file, MUST be expressed using a
PublicationLink
object,
where:
rel
value of the PublicationLink
SHOULD include a relevant
identifier defined by IANA or by other organizations; if the link record
contains descriptive metadata it MUST include the
describedby
(IANA) identifier; encodingFormat
in the link MUST use the
MIME media type [rfc2046] defined for that particular type of
record, if applicable.Linked records MUST be included in the resource list when they are part of the Web Publication (i.e., are needed for more than just infoset extensibility). Otherwise, they MUST be included in the links list.
{
"@context" : ["https://schema.org","https://www.w3.org/ns/wp-context"],
"@type" : "Book",
…
"url" : "https://publisher.example.org/mobydick",
"name" : "Moby Dick",
"links" : [{
"@type" : "PublicationLink",
"url" : "https://www.publisher.example.org/mobydick-onix.xml",
"encodingFormat" : "application/onix+xml",
"rel" : "describedby"
},{
…
}],
…
}
The application/onix+xml
MIME type has not yet
been registered by IANA at the time of writing this document, and is
included in the example for illustrative purposes only.
Additional properties can be included directly in the manifest. It is RECOMMENDED that these properties be taken from public schemes like [schema.org] or [dcterms] and use values from controlled vocabularies whenever possible. Proprietary terms MAY be used, but it is RECOMMENDED that such terms be included using Compact IRIs [json-ld], with prefixes defined as part of the context.
{
"@context" : ["https://schema.org","https://www.w3.org/ns/wp-context"],
"@type" : "CreativeWork",
…
"@id" : "http://www.w3.org/TR/tabular-data-model/",
"url" : "http://www.w3.org/TR/2015/REC-tabular-data-model-20151217/",
"copyrightYear" : "2015",
"copyrightHolder" : "World Wide Web Consortium",
…
}
{
"@context" : ["https://schema.org","https://www.w3.org/ns/wp-context"],
"@type" : "CreativeWork",
…
"@id" : "http://www.w3.org/TR/tabular-data-model/",
"url" : "http://www.w3.org/TR/2015/REC-tabular-data-model-20151217/",
"dc:subject" : ["Document Structures","Resource Description Framework (RDF)"]
…
}
A prefix definition dc
for [dcterms] is included
in the context file of [schema.org]. This means that it is not necessary
to add the prefix explicitly. The same is true for a number of other public
vocabularies; see the schema.org context file for further details.
A Canonical Web Publication Manifest (or Canonical Manifest) is a version of the Web Publication Manifest where all possible ambiguities on property values (see, e.g., 3.2.2.1 Arrays and Single Values or 3.2.2.2 Text Values or Objects) have been removed, and all values that are possibly harnessed from the primary entry page are incorporated.
To help understanding the result of the algorithm, there is a link to the corresponding canonical manifests for all the examples in B. Manifest Examples .
The steps to convert a Web Publication Manifest into a Canonical Manifest are given by the following algorithm. The algorithm takes the following arguments:
manifest
string, that represent the manifest in
JSONbase
URL string, that represents the base URL for the manifest,
and has the value of: script element
in
the primary entry page, in case the manifest is embedded; ordocument
HTML Document
(DOM) Node [html], representing the primary entry
page
The steps of the algorithm are described below. As an abuse of notation,
P["term"]
refers to the value in the object P
for the
label "term"
, where P
is either manifest
, or
an object appearing within manifest
(e.g., a Person
).
lang
value [html] for the
script
element in the primary entry page, in
case the manifest is embedded; or undefined
otherwisedir
value [html] for the
script
element in the primary entry page, in
case the manifest is embedded; or undefined
otherwisemanifest["name"]
is
undefined
and the manifest is embedded
then locate the title
HTML element [html] using
document
. If that element exists and is not empty, let t
be its
text content and: title
is explicitly set to the value of l
, then add
"name": {"@value": t, "@language": l}
"name": t
manifest
manifest["inLanguage"]
is
undefined
and the value of lang is not
undefined
, add"inLanguage": lang
manifest
manifest["inDirection"]
is
undefined
and the value of dir is not
undefined
, add"inDirection": dir
manifest
manifest["readingOrder"]
is undefined
,
add"readingOrder": [{"@type": "PublicationLink", "url": document.URL}]
manifest
P["term"]
, where P
is any object in
manifest
(including itself) and term
is: @type
; oraccessibilitySummary
; orname
; orrel
v
, then change the relevant
term/value to"term" : [v]
v
in the
manifest["term"]
array, where term
is one of the
creator terms, is a simple string or localizable
string, exchange that element of the array to{"@type":
["Person"], "name": [v]}
v
in the
manifest["term"]
array, where term
is one of the
resource categorization
properties, is a simple string, exchange that element of the array
to{"@type": ["PublicationLink"], "url": v}
P["term"]
, where P
is any object in
manifest
(including itself) and term
is: accessibilitySummary
; orname
; ordescription
v
, then change the relevant term/value to: manifest[inLanguage]
is set to the value of
l
then"term": { "@value": v,"@language":
l
"term": {"@value": v}
P["term"]
, where P
is any object in
manifest
(including itself) and term
is: url
; or@id
u
which is not an absolute URL
string [url], then resolve this value (considered to be a
relative URL) using the value of base
, yielding the value of
au
, and replace the term/value pair by"term":
au
See the diagram in the appendix for a visual representation of the algorithm.
See the diagrams in the appendix for a visual representation of the algorithm.
The steps for obtaining a manifest, starting from the primary entry page, are given by the following algorithm. The algorithm, if successful, returns a processed manifest; otherwise, it terminates prematurely and returns nothing. In the case of nothing being returned, the user agent MUST ignore the manifest declaration.
Document
of the top-level browsing context of the primary entry page, let origin be the Document
's origin, and manifest link be the first link
element in tree order in Document
whose rel
attribute contains the publication
token.
null
, terminate this algorithm.
href
attribute's value is the
empty string, terminate this algorithm.
href
attribute's value is a relative URL, i.e., it points to origin and it has a non-null fragment identifying an identifier id in Document
:
script
element in tree order, whose id
attribute is equal to id and whose type
attribute is equal to application/ld+json
.
null
, terminate this algorithm.href
attribute, relative to the element's base URL. If parsing fails, then abort these steps.
Document
.
crossOrigin
attribute's value is 'use-credentials
', then set request's credentials to 'include
'.
Object
, terminate this algorithm.
Document
as input to the algorithm described in 5. Canonical Manifest.
The algorithm does not describes how error and warning messages should be reported. This is implementation dependent.
The steps for processing a manifest are given by the following algorithm. The algorithm takes a text string as an argument, which represents a canonical manifest. The output from inputting a JSON document into this algorithm is a processed manifest. The goal of the algorithm is to ensure that the data represented in text abides to the minimal requirements on the data, removing, if applicable, non-conformant data.
WebPublicationManifest
dictionary.
accessModeSufficient
and accessibilitySummary
, check whether all tokens listed in manifest[term] are allowed (see the list of expected values for each of those terms). If the check fails, remove the token from the manifest[term] array and issue a warning.
This section contains placeholders for possible reading enhancements/features the user agent may/should/must provide. The list is subject to addition, modification and removal as the enhancements get discussed in more detail.
Before starting a discussion on the individual affordances' issues, the WG should have a consensus on what exactly is to be defined for each of those.
When a user agent obtains a manifest it SHOULD provide the option to switch the display to publication mode.
This feature has the following requirements:
Publication mode is a display mode implemented by the user agent that follows the conventions listed in presentation and navigation.
The layout and rendering of Web Publications is governed by the same rules that apply to all Web content: HTML documents are styled and laid out according to the rules of CSS, SVG documents are rendered as defined by that format, etc. This specification requires no particular profile or subset of CSS, HTML, or SVG to be supported, other than the expectations set for these technologies by their respective specifications.
This specification intentionally avoids introducing any new layout features. Any shortcoming of the Web platform in terms of layout needs to be addressed for the whole Web platform, which means via CSS.
This working group will work with other relevant groups of the W3C to address platform-wide limitations that negatively impact Web Publications.
For the purposes of layout, each resource of a Web Publication is treated as a separate document. User agents MUST NOT mix content from multiple resources in the same rendering (e.g., CSS floats or absolutely positioned elements from one resource cannot intrude or overlap with content from an other resource).
Despite this general requirement that each resource should be treated as a separate document for the purpose of layout, there are some places where CSS specifications should be amended to be able to deal more intelligently with collections of resources like Web Publications.
One instance is the definition of cross-references, which are currently restricted to work only within a single document. This restriction should be relaxed to allow for cross-references between separate resources of a single Web Publication.
Another related would be to allow counters to accumulate across multiple resources of a single Web Publication (e.g., so that figures in multiple sections may be numbered in a single sequence).
When a user agent renders a Web Publication, it SHOULD provide user settings to customize the experience.
User settings MAY include:
This specification does not cover how user agents override author styles to offer user settings.
To provide user settings in their reader mode, browsers usually get rid of most of the author styles. There is always a tension in reading environments between author styles and the user's preference, which is very hard to balance.
2.1.11 Personalization
The user must have the possibility to personalize his or her reading experience.
Picking up on #52
This section is non-normative.
Publications have historically been presented via paged media, whereas Web pages almost always scroll. As the preferences of individual readers vary, and as different types of publications are better suited for one or the other, this specification encourages user agents to support both, and to offer a choice to their users.
It might be useful for authors to be able to specify a preference between
scrolling and pagination, even if a strict requirement is not possible.
This should most likely be addressed through an extension of @viewport
or of the
viewport meta tag(see [css-device-adapt]), or possibly
through an extension of @page
(see [css-page-3]). This should
be discussed with the relevant working groups (CSSWG, WebPlatformWG, WHATWG).
2.1.10 Pagination
It should be possible to see the Web Publication in a “paginated” view.
picking up on #52
See also https://w3c.github.io/wpub/#feature-presentation
When a user agent renders a Web Publication in a paginated layout, it MUST lay out each document in the default reading order sequentially, with the last page of a resource being followed by the first page of the subsequent one.
To avoid blank pages, if a resource ends on a left page (resp. right page), the subsequent one should start on a right page (resp. left page) even if the page progression (see [css-page-3]) would otherwise lead to it starting on the opposite page. It should also be possible to use the break-before property (see [css-break-3]) to force the content to resume on the opposite side if that was desired by the author.
[css-page-3] needs to be amended to describe this exception to the general behavior when dealing with collections of documents instead of individual documents.
How is pagination supposed to work when subsequent resources have opposite page progression directions (see [css-page-3]). For example, due to different a different writing mode? This is not necessarily a problem from a layout point of view, as each page is independent, but from an UI point of view. If swiping left means next page until the end of one chapter, and starts meaning previous page in the next chapter because the language is switched from English to Hebrew, this is going to be confusing.
[css-page-3] needs to be amended so that page counters are not automatically reset to at the beginning of each new resource belonging to the same Web Publication.
2.1.10 Pagination
It should be possible to see the Web Publication in a “paginated” view.
picking up on #52
See also https://w3c.github.io/wpub/#feature-presentation
A WP can be read in a browser offline with no change in fidelity from the online experience
Detail on inter-publication search across multiple resources will be included in a future draft.
User agents should provide an affordance that saves the reading progression in the publication and return the user to that location the next time that she opens the publication again.
The user must be able to leave the Web Publication and return to it at the last position they left from. The User Agent must retain the reading position, based on the last known position of the reader in the web publication. The position should be based on the reader's position in the file, within the reading order.
The user agent may retain reading state if the web publication is revised.
The navigation of the web publication should be defined in the Default Reading Order required by the Information Set.
User Agents should not have to set the reading state in the following type of resources:
Reading state should only apply to content documents listed as being within the bounds of the Web Publication.
Example 1:
Sarah is reading a long article on her way to work. She
arrives before she has finished, but wants to continue from the place she
left off. The user agent should remember her reading state for the next time
she opens the publication.
If a tester opens a web publication in a WP-aware UA, moves ahead in the publication, closes the reader, then reopens it, they should be returned to the last known reading state.
This section is non-normative.
The document referred from this section, i.e., Web Annotation Extensions for Web Publications [wpub-ann], has been recently renamed. Its previous was "Locators for Web Publication". The terminology used in this section has to be realigned with the name change.
Locators are used to identify, locate, retrieve, and/or reference locations and
content fragments within Web Publications (e.g., for address(es), bookmarks,
and annotations). Locators traditionally take the form of fragment
identifiers [rfc3986], where the portion of a URL preceded by a number sign character
(#
) identifies a specific position within the referenced
resource.
For some use cases, it is essential to identify and reference a Web Publication resource—or a location in or a segment of a resource—in the scope or context of the Web Publication to which it belongs. A traditional fragment identifier cannot satisfy this requirement, since only the URL of the constituent resource containing the location or content fragment of interest is expressed. The Web Annotation Extensions for Web Publications [wpub-ann] document, based on the Web Annotation Model [annotation-model], addresses this issue by providing the means to express both the URL of the resource and the URL of the Web Publication.
Web Publication Locators also address the problem of referencing into a resource that was not authored with such a need in mind. A fragment identifier can only reference elements with explicit identifiers and locations with explicit anchor points. Web Publication Locators include a variety of selectors that work with the general structures and content of a resource (e.g., text selectors, CSS selectors).
As Web Publication Locators currently rely on a JSON-based expression syntax, it is not yet clear how much of this syntax can be translated to a fragment identifier. This may limit the usefulness beyond expressions that are also JSON-based (e.g., outside of annotations or bookmarks).
Illustrate with example of an easy to understand Web Publication Locator, such as might be used in annotating a simple Web Publication.
The semantics of Web Publication Locators are a mapping and extension of the Web Annotation Data Model [annotation-model] and Vocabulary [annotation-vocab] for describing and referencing a segment of a Web resource. As a result, Web Publication Locators provide the expressiveness needed for a broad range of annotation and bookmarking use cases. Additionally, Web Publication Locators provide a way to identify and reference a location within a Web Publication (i.e., as distinct from identifying and referencing a content fragment consisting of a span of characters or bytes). A Web Publication Locator can be used to identify, retrieve and/or reference a fragment of a Web Publication that spans multiple resources.
In composing a Web Publication Locator, use the canonical identifier of the Web Publication in preference to any alternative addresses. Such use facilitates the collation of Web Publication Locators associated with a particular Web Publication. URLs of Web Publication resources appearing in a Web Publication Locator should match the URL of the resource provided in the infoset.
This section is non-normative.
Although a Web Publication manifest is authored as [json-ld], user agents process this information into an internal data structure representing the infoset in order to utilize the properties. The exact manner in which this processing occurs, and how the data is used internally, is user agent-dependent. To ensure interoperability when exposing the infoset items, however, this appendix defines a common, abstract representation of the data structures using the standard formalism of the Web Interface Definition Language [webidl-1] which can express the expected names, datatypes, and possible restrictions for each member of the infoset. (A WebIDL representation can be mapped onto ECMAScript, C, or other programming languages.)
WebPublicationManifest
dictionary dictionary WebPublicationManifest
{
required DOMString url
;
required sequence<DOMString> type
;
sequence<DOMString> accessMode
;
sequence<DOMString> accessModeSufficient
;
sequence<DOMString> accessibilityAPI
;
sequence<DOMString> accessibilityControl
;
sequence<DOMString> accessibilityFeature
;
sequence<DOMString> accessibilityHazard
;
LocalizableString
accessibilitySummary
;
DOMString id
;
sequence<Contributor
> artist
;
sequence<Contributor
> author
;
sequence<Contributor
> colorist
;
sequence<Contributor
> contributor
;
sequence<Contributor
> creator
;
sequence<Contributor
> editor
;
sequence<Contributor
> illustrator
;
sequence<Contributor
> inker
;
sequence<Contributor
> letterer
;
sequence<Contributor
> penciler
;
sequence<Contributor
> publisher
;
sequence<Contributor
> readby
;
sequence<Contributor
> translator
;
DOMString inLanguage
;
TextDirection
inDirection
;
DOMString dateModified
;
DOMString datePublished
;
ProgressionDirection
readingProgression
= "ltr";
sequence<LocalizableString
> name
;
required sequence<PublicationLink
> readingOrder
;
sequence<PublicationLink
> resources
= [];
sequence<PublicationLink
> links
= [];
PublicationLink
accessibilityReport
;
PublicationLink
privacyPolicy
;
sequence<PublicationLink
> cover
;
HTMLElement toc
;
};
The
has the following members:WebPublicationManifest
url
type
accessMode
accessModeSufficient
accessibilityAPI
accessibilityControl
accessibilityFeature
accessibilityHazard
accessibilitySummary
id
artist
author
colorist
contributor
creator
editor
illustrator
inker
letterer
penciler
publisher
readby
translator
inLanguage
inDirection
dateModified
datePublished
readingProgression
name
readingOrder
resources
links
accessibilityReport
privacyPolicy
cover
toc
Contributor
membersThese definitions reflect the basic information derived from the schema.org Person and Organization classes. The WebIDL definitions only contain the minimal information for the infoset; user agents MAY interpret a wider range of properties, as defined by schema.org.
The artist, author, etc., members are each
a sequence of
dictionaries, each of whose Contributor
type
member indicates whether the contributor is a Person or an Organization. The members of this dictionary are:
dictionary Contributor
{
sequence<DOMString> type
;
required sequence<LocalizableString
> name
;
DOMString id
;
DOMString url
;
};
type
name
id
url
LocalizableString
dictionarydictionary LocalizableString
{
required DOMString value
;
DOMString language
;
};
When the lang
is specified in
,
this value overrides the default language specified in
LocalizableString
.WebPublicationManifest
has the following members:LocalizableString
value
language
PublicationLink
dictionary dictionary PublicationLink
{
required DOMString url
;
DOMString encodingFormat
;
sequence<LocalizableString
> name
;
LocalizableString
description
;
sequence<DOMString> rel
;
};
The
dictionary contains the following
members:PublicationLink
url
encodingFormat
name
description
rel
TextDirection
enum enum TextDirection
{
"ltr
",
"rtl
",
"auto
"
};
The
enum can contain the following values:TextDirection
ltr
rtl
auto
ProgressionDirection
enum enum ProgressionDirection
{
"ltr
",
"rtl
"
};
The
enum can contain the following
values:ProgressionDirection
ltr
rtl
This section is non-normative.
A manifest for a simple book. The canonical version of this manifest is also available.
{
"@context": ["https://schema.org", "https://www.w3.org/ns/wp-context"],
"@type": "Book",
"url": "https://publisher.example.org/mobydick",
"author": "Herman Melville",
"dateModified": "2018-02-10T17:00:00Z",
"readingOrder": [
"html/title.html",
"html/copyright.html",
"html/introduction.html",
"html/epigraph.html",
"html/c001.html",
"html/c002.html",
"html/c003.html",
"html/c004.html",
"html/c005.html",
"html/c006.html"
],
"resources": [
"css/mobydick.css",
{
"@type": "PublicationLink",
"rel": "https://www.w3.org/ns/wp#cover-page",
"url": "images/cover.jpg",
"encodingFormat": "image/jpeg"
},{
"@type": "PublicationLink",
"url": "html/toc.html",
"rel": "contents"
},{
"@type": "PublicationLink",
"url": "fonts/STIXGeneral.otf",
"encodingFormat": "application/vnd.ms-opentype"
},{
"@type": "PublicationLink",
"url": "fonts/STIXGeneralBol.otf",
"encodingFormat": "application/vnd.ms-opentype"
},{
"@type": "PublicationLink",
"url": "fonts/STIXGeneralBolIta.otf",
"encodingFormat": "application/vnd.ms-opentype"
},{
"@type": "PublicationLink",
"url": "fonts/STIXGeneralItalic.otf",
"encodingFormat": "application/vnd.ms-opentype"
}
]
}
Example for an embedded manifest example. The canonical version of the manifest is, as well as a more elaborate version for the same document are also available.
<!DOCTYPE html>
<html lang="en-US">
<head>
<title>Model for Tabular Data and Metadata on the Web</title>
<link href="#wpm" rel="publication" />
...
<script id="wpm" type="application/ld+json">
{
"@context" : ["https://schema.org", "https://www.w3.org/ns/wp-context"],
"@type" : "CreativeWork",
"@id" : "http://www.w3.org/TR/tabular-data-model/",
"url" : "http://www.w3.org/TR/2015/REC-tabular-data-model-20151217/",
"copyrightYear" : "2015",
"copyrightHolder" : "World Wide Web Consortium",
"creator" : ["Jeni Tennison", "Gregg Kellogg", "Ivan Herman"],
"publisher" : {
"@type" : "Organization",
"name" : "World Wide Web Consortium",
"@id" : "https://www.w3.org/"
},
"datePublished" : "2015-12-17",
"resources" : [
"datatypes.html",
"datatypes.svg",
"datatypes.png",
"diff.html",
{
"@type" : "StructuredValue",
"url" : "test-utf8.csv",
"encodingFormat" : "text/csv"
},
{
"@type" : "StructuredValue",
"url" : "test.xlsx",
"encodingFormat" : "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"
}
],
}
</script>
</head>
<body>
....
<section id="toc" role="doc-toc">
<h2 resource="#h-toc" id="h-toc" class="introductory">Table of Contents</h2>
<ul class="toc">
<li class="tocline"><a class="tocxref" href="#intro"><span class="secno">1. </span>Introduction</a></li>
...
</ul>
</section>
...
</body>
</html>
A manifest for an audiobook. The canonical version of this manifest is also available.
{
"@context": ["https://schema.org", "https://www.w3.org/ns/wp-context"],
"@type": "Audiobook",
"@id": "https://librivox.org/flatland-a-romance-of-many-dimensions-by-edwin-abbott-abbott/",
"url": "https://w3c.github.io/wpub/experiments/audiobook/",
"name": "Flatland: A Romance of Many Dimensions",
"author": "Edwin Abbott Abbott",
"readBy": "Ruth Golding",
"publisher": "Librivox",
"inLanguage": "en",
"dateModified": "2018-06-14T19:32:18Z",
"datePublished": "2008-10-12",
"license": "https://creativecommons.org/publicdomain/zero/1.0/",
"resources": [
{"rel": "cover", "url": "http://ia800704.us.archive.org/9/items/LibrivoxCdCoverArt12/Flatland_1109.jpg", "encodingFormat": "image/jpeg"},
{"rel": "contents", "url": "toc.html", "encodingFormat": "text/html"}
],
"readingOrder": [
{"url": "http://www.archive.org/download/flatland_rg_librivox/flatland_1_abbott.mp3", "encodingFormat": "audio/mpeg", "name": "Part 1, Sections 1 - 3"},
{"url": "http://www.archive.org/download/flatland_rg_librivox/flatland_2_abbott.mp3", "encodingFormat": "audio/mpeg", "name": "Part 1, Sections 4 - 5"},
{"url": "http://www.archive.org/download/flatland_rg_librivox/flatland_3_abbott.mp3", "encodingFormat": "audio/mpeg", "name": "Part 1, Sections 6 - 7"},
{"url": "http://www.archive.org/download/flatland_rg_librivox/flatland_4_abbott.mp3", "encodingFormat": "audio/mpeg", "name": "Part 1, Sections 8 - 10"},
{"url": "http://www.archive.org/download/flatland_rg_librivox/flatland_5_abbott.mp3", "encodingFormat": "audio/mpeg", "name": "Part 1, Sections 11 - 12"},
{"url": "http://www.archive.org/download/flatland_rg_librivox/flatland_6_abbott.mp3", "encodingFormat": "audio/mpeg", "name": "Part 2, Sections 13 - 14"},
{"url": "http://www.archive.org/download/flatland_rg_librivox/flatland_7_abbott.mp3", "encodingFormat": "audio/mpeg", "name": "Part 2, Sections 15 - 17"},
{"url": "http://www.archive.org/download/flatland_rg_librivox/flatland_8_abbott.mp3", "encodingFormat": "audio/mpeg", "name": "Part 2, Sections 18 - 20"},
{"url": "http://www.archive.org/download/flatland_rg_librivox/flatland_9_abbott.mp3", "encodingFormat": "audio/mpeg", "name": "Part 2, Sections 21 - 22"}
]
}
This section is non-normative.
(These examples were originally published in the Activity Streams Recommendation [activitystreams-core].)
Character order in memory | Direction | Method | Expected display |
---|---|---|---|
פעילות
הבינאום,
W3C |
rtl | First strong directional character | פעילות הבינאום, W3C |
The document is titled,
"⁧פעילות
הבינאום,
W3C⁩"
|
ltr | First strong directional character | The document is titled, "פעילות הבינאום, W3C" |
‏HTML היא
שפת
סימון |
rtl | Bidi Control Character | HTML היא שפת
סימון |
‎'سلام' is
hello in Persian |
ltr | Bidi Control Character | 'سلام' is hello in
Persian |
This section is non-normative.
These diagrams provide a visual view of the lifecycle steps, as specified in 6. Web Publication Lifecycle.
dictionaryWebPublicationManifest
{ required DOMStringurl
; required sequence<DOMString>type
; sequence<DOMString>accessMode
; sequence<DOMString>accessModeSufficient
; sequence<DOMString>accessibilityAPI
; sequence<DOMString>accessibilityControl
; sequence<DOMString>accessibilityFeature
; sequence<DOMString>accessibilityHazard
;LocalizableString
accessibilitySummary
; DOMStringid
; sequence<Contributor
>artist
; sequence<Contributor
>author
; sequence<Contributor
>colorist
; sequence<Contributor
>contributor
; sequence<Contributor
>creator
; sequence<Contributor
>editor
; sequence<Contributor
>illustrator
; sequence<Contributor
>inker
; sequence<Contributor
>letterer
; sequence<Contributor
>penciler
; sequence<Contributor
>publisher
; sequence<Contributor
>readby
; sequence<Contributor
>translator
; DOMStringinLanguage
;TextDirection
inDirection
; DOMStringdateModified
; DOMStringdatePublished
;ProgressionDirection
readingProgression
= "ltr"; sequence<LocalizableString
>name
; required sequence<PublicationLink
>readingOrder
; sequence<PublicationLink
>resources
= []; sequence<PublicationLink
>links
= [];PublicationLink
accessibilityReport
;PublicationLink
privacyPolicy
; sequence<PublicationLink
>cover
; HTMLElementtoc
; }; dictionaryContributor
{ sequence<DOMString>type
; required sequence<LocalizableString
>name
; DOMStringid
; DOMStringurl
; }; dictionaryLocalizableString
{ required DOMStringvalue
; DOMStringlanguage
; }; dictionaryPublicationLink
{ required DOMStringurl
; DOMStringencodingFormat
; sequence<LocalizableString
>name
;LocalizableString
description
; sequence<DOMString>rel
; }; enumTextDirection
{ "ltr
", "rtl
", "auto
" }; enumProgressionDirection
{ "ltr
", "rtl
" };
This section is non-normative.
This section is non-normative.
The editors would like to specially thank the following individuals for making significant contributions to the authoring and editing of this specification:
Additionally, the following people were members of the Working Group at the time of publication:
The Working Group would also like to thank the members of the Digital Publishing Interest Group for all the hard work they did paving the road for this specification.