W3C
Group
Unofficial
Draft
Note
Copyright
©
2019-2022
W3C
®
(
MIT
,
ERCIM
,
Keio
,
Beihang
).
W3C
liability
,
trademark
and
permissive
the
document
license
editors/authors.
Text
is
available
under
the
Creative
Commons
Attribution
4.0
International
Public
License
rules
;
additional
terms
may
apply.
Developers share a common problem: they want a simple, but extensible way to create an API for a web service that gets the job done, doesn't design them into a corner, and allows developers to easily interact with their service without reinventing the wheel. JSON-LD [ JSON-LD ] has become an important solution, as it bridges the gap between formally data and more colloquial JSON interfaces used in APIs from numerous providers. This guide attempts to define certain best practices for publishing data using JSON-LD, and interacting with such services.
This
section
describes
the
status
of
this
document
at
the
time
is
a
draft
of
its
publication.
A
list
a
potential
specification.
It
has
no
official
standing
of
current
W3C
publications
any
kind
and
does
not
represent
the
latest
revision
support
or
consensus
of
this
technical
report
can
be
found
in
the
W3C
technical
reports
index
at
https://www.w3.org/TR/.
any
standards
organization.
This unofficial document has been developed by the JSON-LD Working Group .
As well as sections marked as non-normative, all authoring guidelines, diagrams, examples, and notes in this specification are non-normative. Everything else in this specification is normative.
The key words SHOULD and SHOULD NOT in this document are to be interpreted as described in BCP 14 [ RFC2119 ] [ RFC8174 ] when, and only when, they appear in all capitals, as shown here.
This document describes best practices for generating JSON-LD. Where normative language is used, it should be considered advisory.
This section is non-normative.
Coming up with a data format for your API is a common problem. It can be hard to choose between different data representations, what names you want to pick, and even harder if you want to leave room for extensibility. How do you make all these decisions? How do you make your API easy to use so people can use short strings to reference common things, but URLs to enable people to come up with their own so it isn't limiting? How can you make it easy for other people to add their own data in and make it interoperable? How do you consume data from other similar apps? There are technologies that can help you do this. Now, it isn't perfect – sometimes it won't solve your problem, but it could maybe solve a lot of them.
The use of JSON on the web has grown immensely in the last decade, particularly with the explosion of APIs that eschew XML in favor of what is considered to be a more developer friendly format which is directly compatible with JavaScript. As a result, different sites have chosen their own proprietary representations for interacting with their sites, sometimes described using frameworks such as [ swagger ] which imply a particular URI composition for interacting with their services. This practice leads to vendor-specific semantic silos, where the meaning of a particular JSON document makes sense only by programming directly to the API documentation for a given service.
show examples from GitHub, Twitter, …?
As services grow they often introduce incompatible changes leading to a Version 2 or Version 3 of their API requiring developers to update client code to properly handle JSON documents. In many cases, even small changes can lead to incompatibilities. Additionally, composing information from multiple APIs becomes problematic, due to namespace or document format conventions that may differ between API endpoints. Moreover, the same principles are often repeated across different endpoints using arbitrary identifiers (name, email, website, etc.); the community needs to learn to stop repeating itself ( DRY concept) and reuse common conventions, although this does not necessarily have to mean using exactly the same identifiers within the JSON itself (see JSON-LD Context ).
This Note proposes to outline a number of best practices for API designers or JSON developers based on the principles of separation of data model from syntax, the use of discoverable identifiers describing document contents, and general organizing principles that allow documents to be machine understandable (read, interpreted as JSON-LD using Linked Data , RDF and RDFS vocabulary, and data model principles).
Key among these is the notion of vocabulary re-use, so that each endpoint does not need to separately describe the properties and structure of their JSON documents. Schema.org provides a great example of doing this, and includes an extension mechanism that may already be familiar to API designers.
JSON-LD is JSON, and good JSON-LD is first and foremost good JSON. Since it is also Linked Data , developers and especially data publishers may find further useful advice at Data on the Web Best Practices [ dwbp ] and Best Practices for Publishing Linked Data [ ld-bp ].
This section is non-normative.
This section is non-normative.
JSON [ json ] is the most popular format for publishing data through APIs; developers like it, it is easy to parse, and it is supported natively in most programming languages.
For example, the following is reasonably idiomatic JSON which can also be interpreted as JSON-LD, given the appropriate context.
{
"name": "Barack Obama",
"givenName": "Barack",
"familyName": "Obama",
"jobTitle": "44th President of the United States"
}
JSON documents may be in the form of a object, or an array of objects. For most purposes, developers need a single entry point, so the JSON SHOULD be in the form of a single top-level object.
When
possible,
property
values
SHOULD
use
native
JSON
datatypes
such
as
numbers
(
integer
,
decimal
and
floating
point
)
and
booleans
(
true
and
false
).
JSON has a single numeric type, so using native representation of numbers can lose precision.
JSON specifies that the values in an array are ordered, however in many cases arrays are also used for values which are unordered. Unless specified within the JSON-LD Context , multiple array values SHOULD be presumed to be unordered. (See Lists and Sets in [ JSON-LD ]).
By sticking to basic JSON data expression, and providing a JSON-LD Context , all keys used within a JSON document can have unambiguous meaning, as they bind to URLs which describe their meaning.
By
adding
an
@context
entry,
the
previous
example
can
now
be
interpreted
as
JSON-LD.
{
"@context": "http://schema.org",
"name": "Barack Obama",
"givenName": "Barack",
"familyName": "Obama",
"jobTitle": "44th President of the United States"
}
When expanding such a data representation, a JSON-LD processor replaces these terms with the URIs they expand to (as well as making property values unambiguous):
[
{
"http://schema.org/familyName": [{"@value": "Obama"}],
"http://schema.org/givenName": [{"@value": "Barack"}],
"http://schema.org/jobTitle": [{"@value": "44th President of the United States"}],
"http://schema.org/name": [{"@value": "Barack Obama"}]
}
]
Expanded form is not useful as is, but is necessary for performing further algorithmic transformations of JSON-LD data and is useful when validating that JSON-LD entity descriptions say what the publisher means.
Principles
of
Linked
Data
dictate
that
messages
SHOULD
be
self
describing,
which
includes
adding
a
type
to
such
messages.
Many APIs use JSON messages where the type of information being conveyed is inferred from the retrieval endpoint. For example, when retrieving information about a Github Commit, you might see the following response:
{
"sha": "7638417db6d59f3c431d3e1f261cc637155684cd",
"url": "https://api.github.com/repos/octocat/Hello-World/git/commits/7638417db6d59f3c431d3e1f261cc637155684cd",
"author": {
"date": "2014-11-07T22:01:45Z",
"name": "Scott Chacon",
"email": "schacon@gmail.com"
},
"committer": {
"date": "2014-11-07T22:01:45Z",
"name": "Scott Chacon",
"email": "schacon@gmail.com"
},
"message": "added readme, because im a good github citizen\n",
"tree": {
"url": "https://api.github.com/repos/octocat/Hello-World/git/trees/691272480426f78a0138979dd3ce63b77f706feb",
"sha": "691272480426f78a0138979dd3ce63b77f706feb"
},
"parents": [
{
"url": "https://api.github.com/repos/octocat/Hello-World/git/commits/1acc419d4d6a9ce985db7be48c6349a0475975b5",
"sha": "1acc419d4d6a9ce985db7be48c6349a0475975b5"
}
]
}
The only way to know this is a commit s to infer it based on the published API documentation, and the fact that it was returned from an endpoint defined for retrieving information about commits.
{
"@context": "http://schema.org",
"id": "http://www.wikidata.org/entity/Q76",
"type": "Person",
"name": "Barack Obama",
"givenName": "Barack",
"familyName": "Obama",
"jobTitle": "44th President of the United States"
}
Entities described in JSON objects often describe web resources having a URL; entity descriptions SHOULD use an identifier uniquely identifying that entity. In this case, using the resource location as the identity of the object is consistent with this practice.
Adding
an
id
entry
(an
alias
for
@id
)
allows
the
same
person
to
be
referred
to
from
different
locations.
{
"@context": "http://schema.org",
"id": "http://www.wikidata.org/entity/Q76",
"type": "Person",
"name": "Barack Obama",
"givenName": "Barack",
"familyName": "Obama",
"jobTitle": "44th President of the United States"
}
There
can
be
ambiguity
if
an
identifier
describes
the
entity
description,
or
directly
represents
that
entity
itself.
As
an
example,
Barack
Obama
may
have
a
Wikidata
entry
http://www.wikidata.org/entity/Q76
,
but
it
would
be
a
mistake
to
say
that
http://www.wikidata.org/entity/Q76
is
Barack
Obama.
However,
it
is
common
to
use
this
pattern,
particularly
if
the
type
of
the
entity
describes
a
Person,
rather
than
a
WebPage.
When describing attributes, entity references SHOULD be used instead of string literals.
In
some
cases,
when
describing
an
attribute
of
an
entity,
it
is
tempting
to
using
string
values
which
have
no
independent
meaning.
Such
values
are
often
used
for
well
known
things.
A
JSON-LD
context
can
define
a
term
for
such
values,
which
allow
them
to
appear
as
strings
within
the
message,
but
be
associated
with
specific
identifiers.
In
this
case,
the
property
must
be
defined
with
type
@vocab
so
that
values
will
be
interpreted
relative
to
a
vocabulary
rather
than
the
file
location.
{
"@context": ["http://schema.org", {
"gender": {"@id": "schema:gender", "@type": "@vocab"}
}],
"id": "http://www.wikidata.org/entity/Q76",
"type": "Person",
"name": "Barack Obama",
"givenName": "Barack",
"familyName": "Obama",
"jobTitle": "44th President of the United States",
"gender": "Male"
}
See article in SEO Skeptic [ seo-strings-to-things ] for further elaboration on the advantages of using things instead of strings .
When multiple related entity descriptions are provided inline, related entities SHOULD be nested.
For example, when relating one entity to another, where the related entity is described in the same message:
{
"@context": "http://schema.org",
"id": "http://www.wikidata.org/entity/Q76",
"type": "Person",
"name": "Barack Obama",
"givenName": "Barack",
"familyName": "Obama",
"jobTitle": "44th President of the United States",
"spouse": {
"id": "http://www.wikidata.org/entity/Q13133",
"type": "Person",
"name": "Michelle Obama",
"spouse": "http://www.wikidata.org/entity/Q76"
}
}
In
this
example,
the
spouse
relationship
is
bi-directional,
we
have
arbitrarily
rooted
the
message
with
Barack
Obama,
and
created
a
symmetric
relationship
from
Michelle
back
to
Barack
by
reference,
rather
than
by
nesting.
This section is non-normative.
When using a property intended to reference another entity, properties SHOULD be defined to type string values as being references.
For
example,
the
schema:image
property
a
Thing
to
an
Image
:
{
"@context": "http://schema.org",
"id": "http://www.wikidata.org/entity/Q76",
"type": "Person",
"name": "Barack Obama",
"givenName": "Barack",
"familyName": "Obama",
"jobTitle": "44th President of the United States",
"image": "https://commons.wikimedia.org/wiki/File:President_Barack_Obama.jpg"
}
This
will
be
interpreted
as
a
reference,
rather
than
a
string
literal,
because
(at
the
time
of
publication),
the
schema.org
JSON-LD
Context
defines
image
to
be
of
type
@id
:
{
"@context": {
...
"image": { "@id": "schema:image", "@type": "@id"},
...
}
}
If not defined as such in a remote context, terms may be (re-) defined in a local context:
{
"@context": ["http://schema.org", {
"image": { "@id": "schema:image", "@type": "@id"}
}],
"id": "http://www.wikidata.org/entity/Q76",
"type": "Person",
"name": "Barack Obama",
"givenName": "Barack",
"familyName": "Obama",
"jobTitle": "44th President of the United States",
"image": "https://commons.wikimedia.org/wiki/File:President_Barack_Obama.jpg"
}
Unless
specifically
described
ordered
as
an
@list
,
do
not
depend
on
the
order
of
elements
in
an
array.
By default, arrays in JSON-LD do not convey any ordering of contained elements . However, for the processing of contexts, the ordering of elements in arrays does matter. When writing array-based contexts, this fact should be kept in mind.
Ordered
contexts
in
arrays
allow
inheritance
and
overriding
of
context
entries.
When
processing
the
following
example,
the
first
name
entry
will
be
overridden
by
the
second
name
entry.
{
"@context": [
{
"id": "@id",
"name": "http://schema.org/name"
},
{
"name": "http://xmlns.com/foaf/0.1/name"
}
],
"@id": "http://www.wikidata.org/entity/Q76",
"name": "Barack Obama"
}
Order is important when processing protected terms . While the first example will cause a term redefinition error, the second example will not throw this error.
{
"@context": [
{
"@version": 1.1,
"name": {
"@id": "http://schema.org/name",
"@protected": true
}
},
{
"name": "http://xmlns.com/foaf/0.1/name"
}
],
"@id": "http://www.wikidata.org/entity/Q76",
"name": "Barack Obama"
}
{
"@context": [
{
"name": "http://xmlns.com/foaf/0.1/name"
},
{
"@version": 1.1,
"Person": "http://schema.org/Person",
"knows": "http://schema.org/knows",
"name": {
"@id": "http://schema.org/name",
"@protected": true
}
}
],
"@id": "http://www.wikidata.org/entity/Q76",
"name": "Barack Obama"
}
This section is non-normative.
When dereferencing an entity related via a URL, the location SHOULD provide a representation of that entity.
This practices replicates that described in [ ld-bp ] Provide at least one machine-readable representation of the resource identified by the URI .
Corollaries to this best practice is that Cool URIs don't change [ cooluris ], meaning that URLs describing entities SHOULD be stable and not depend on variable information. Also, the URL used to identify an entity is the best API endpoint of that entity (see also 12. API Versioning ).
This section is non-normative.
While most use of JSON-LD SHOULD NOT require a client to change the data representation, JSON-LD does allow the use of various algorithms to re-shape a JSON-LD document. These require the use of the JSON-LD Context , which is typically represented using a link to a remote document. Because it is remote, processing time can be severely impacted by the time it takes to retrieve this context.
Services providing a JSON-LD Context SHOULD set HTTP cache-control headers to allow liberal caching of such contexts, and clients SHOULD attempt to use a locally cached version of these documents.
Typically, libraries used to process JSON-LD documents should do this for you. (See also [ json-ld-best-practice-caching ]).
This section is non-normative.
Describe schema.org extension using Role sub-class, Hydra collections, and LDP collections.
This section is non-normative.
Focus on schema.org?
This section is non-normative.
Describe the use of schema.org Actions and work in Hydra.
Describe anti-pattern of URI construction emphasizing affordances.
This section is non-normative.
Remember that Cool URIs don't change [cooluris]; correctly modeling data allows changes data representation to be limited.
Describe the use of API keys for controlling API versions, rather than the use of different versioned URLs.