CBOR-LD 1.0

A CBOR-based Serialization for Linked Data

W3C Editor's Draft

This version:
https://digitalbazaar.github.io/cbor-ld-spec/
Latest published version:
https://www.w3.org/TR/cbor-ld/
Latest editor's draft:
https://digitalbazaar.github.io/cbor-ld-spec/
Editors:
Manu Sporny ( Digital Bazaar )
Dave Longley ( Digital Bazaar )
Authors:
Manu Sporny ( Digital Bazaar )
Dave Longley ( Digital Bazaar )
Participate:
GitHub digitalbazaar/cbor-ld-spec
File a bug
Commit history
Pull requests

Abstract

CBOR is a compact binary data serialization and messaging format. This specification defines CBOR-LD 1.0, a CBOR-based format to serialize Linked Data. The encoding is designed to leverage the existing JSON-LD ecosystem, which is deployed on hundreds of millions of systems today, to provide a compact serialization format for those seeking efficient encoding schemes for Linked Data. By utilizing semantic compression schemes, compression ratios in excess of 60% better than generalized compression schemes are possible. This format is primarily intended to be a way to use Linked Data in storage and bandwidth constrained programming environments, to build interoperable semantic wire-level protocols, and to efficiently store Linked Data in CBOR-based storage engines.

Status of This Document

This is a preview

Do not attempt to implement this version of the specification. Do not reference this version as authoritative in any way. Instead, see https://digitalbazaar.github.io/cbor-ld-spec/ for the Editor's draft.

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at https://www.w3.org/TR/.

This document is experimental.

There is a reference implementation that is capable of demonstrating the features described in this document.

This document was published by the JSON-LD Community Group as an Editor's Draft.

GitHub Issues are preferred for discussion of this specification. Alternatively, you can send comments to our mailing list. Please send them to public-linked-json@w3.org ( archives ).

Publication as an Editor's Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

This document was produced by a group operating under the W3C Patent Policy . W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy .

This document is governed by the 1 March 2019 W3C Process Document .

1. Introduction

This section is non-normative.

CBOR is a compact binary data serialization and messaging format. This specification defines CBOR-LD 1.0, a CBOR-based format to serialize Linked Data. The encoding is designed to leverage the existing JSON-LD ecosystem, which is deployed on hundreds of millions of systems today, to provide a compact serialization format for those seeking efficient encoding schemes for Linked Data. By utilizing semantic compression schemes, compression ratios in excess of 60% better than generalized compression schemes are possible. This format is primarily intended to be a way to use Linked Data in storage and bandwidth constrained programming environments, to build interoperable semantic wire-level protocols, and to efficiently store Linked Data in CBOR-based storage engines.

1.1 How to Read this Document

This section is non-normative.

This document is a detailed specification for a serialization of Linked Data in CBOR. The document is primarily intended for the following audiences:

1.1.1 Conformance

As well as sections marked as non-normative, all authoring guidelines, diagrams, examples, and notes in this specification are non-normative. Everything else in this specification is normative.

The key word MUST in this document is to be interpreted as described in BCP 14 [ RFC2119 ] [ RFC8174 ] when, and only when, they appear in all capitals, as shown here.

1.2 Contributing

This section is non-normative.

There are a number of ways that one may participate in the development of this specification:

1.3 Design Goals and Rationale

This section is non-normative.

CBOR-LD satisfies the following design goals:

Simplicity
Implementations should be simple to implement given an existing JSON-LD implementation.
Efficient Storage
The encoding process should generate an aggressively compact Linked Data binary format.
Generalized Algorithm
The encoding algorithm must be generalized.
Semantic Compression
The encoding format should maximize compression of Linked Data URLs (terms and values). Focusing here ensures that the algorithms can achieve compression ratios better than generalized compression algorithms.
Raw Binary
Base-encoded binary values, and other compressible data types, should be translated to their raw binary forms from base-encoded formats when possible without sacrificing generality.

Similarly, the following are non-goals.

The following minefields have been identified while working on this specification:

2. Basic Concept

This section is non-normative.

The general CBOR-LD encoding algorithm takes a JSON-LD Document and does the following:

3. Algorithms

3.1 JSON-LD to CBOR-LD Algorithm

This algorithm takes a JSON-LD object jsonldDocument and options as input.

  1. Let result be an empty CBOR-encoded byte array.
  2. Initialize contextUrls to the return value of the "Get Context URLs Algorithm" passing jsonldDocument as input.
  3. If the "Get Context URLs Algorithm" resulted in an error, set result to the return value of the "Generate Uncompressed CBOR-LD Algorithm".
  4. Otherwise, set result to the return value of the "Generate Compressed CBOR-LD Algorithm" passing contextUrls as options.contextUrls .
  5. Return result .

3.2 Uncompressed CBOR-LD Buffer Algorithm

This algorithm takes a JSON-LD object jsonldDocument and options as input.

  1. Let result be an empty CBOR-encoded byte array.
  2. Set the first two bytes (CBOR Tag) to 0x0500 (CBOR-LD - 0x05, Uncompressed - 0x00, Tag 1280))
  3. For every key-value in the map, generate the Uncompressed CBOR-LD Buffer by converting it to the associated CBOR-LD header and value. For complex values (maps, arrays), recursively convert the value to something that will losslessly encode and decode back to JSON-LD.
  4. Return the Uncompressed CBOR-LD Buffer.

3.3 Compressed CBOR-LD Buffer Algorithm

This algorithm takes a JSON-LD object jsonldDocument and options as input. The options MUST contain:

applicationContextMap
A map of application-specific JSON-LD context URL strings that are mapped to their encoded CBOR-LD values. The values MUST be values greater than 32767 (0x7FFF). Values from 0-32767 (0x0-0x7FFF) are reserved for globally recognized JSON-LD Context URL values.
applicationTermMap
A map of JSON-LD terms and their associated CBOR-LD term codecs.
  1. Let result be an empty CBOR-encoded byte array.
  2. Set the first three bytes of result to 0xd90501 (CBOR Tag - 0xd9, CBOR-LD - 0x05, Compressed - CBOR-LD compression algorithm version 1 - 0x01, Tag 1281)).
  3. Initialize termCodecMap to the result of the §  3.4 Get Term Codec Map Algorithm , passing contextUrls as input.
  4. Add to result by recursively processing every name-value pair in jsonldDocument
    1. Let termHint be the value associated with the JSON name in the termCodecMap .
    2. Set the CBOR key to the termHint.value value.
    3. Set the CBOR value to the result of the termHint.valueCompressor function.
  5. Return result .

3.4 Get Term Codec Map Algorithm

This algorithm takes a list of URL strings contextUrls and returns a CBOR-LD term codec map that maps JSON-LD terms to their associated byte values and value compression functions.

  1. Let result be an ordered map.
  2. For each value in contextUrls , dereference the JSON-LD contexts and process every entry.
    1. Set the entry key to the JSON-LD term key.
    2. Set the entry value to an unordered map with two entries.
      1. The first entry should be set to value with an undefined value.
      2. Let compressor be a known global compressor function associated with the @type property, a known local compressor function that was provided to this function, or the generic CBOR compressor function, which returns the bytes associated with a typical CBOR compression of the given datatype.
  3. Let sortedTerms be the value of sorting all of the keys in result .
  4. For every value in the list of sortedTerms set the associated termHint.value value to the associated index of sortedTerms .
  5. Return result .

3.5 Get Context URLs Algorithm

  1. Let result be a ordered map.
  2. Walk the JSON tree, for each JSON name-value pair:
    1. If the name is @context
      1. Add all values that are referenced by a URL to result where the key in the map is set to the JSON value associated with @id .
    2. If a non-URL value is detected, throw an ERR_NON_URL_JSONLD_CONTEXT_DETECTED error.
  3. Return result .

A. Term Codec Registry

The following is a registry of well-known term codecs. These will be registered on a first-come first-serve basis.

Value Context URL Context Name
0x00 - 0x0F RESERVED Reserved for future use.
0x10 https://www.w3.org/ns/activitystreams ActivityStreams 2.0
0x11 https://www.w3.org/2018/credentials/v1 Verifiable Credentials Data Model v1
0x12 https://www.w3.org/ns/did/v1 Decentralized Identifiers (DID) Core Spec v1
0x13 https://w3id.org/security/suites/ed25519-2018/v1 Ed25519Signature2018 Suite
0x14 https://w3id.org/security/suites/ed25519-2020/v1 Ed25519Signature2020 Suite
0x15 https://w3id.org/cit/v1 Concealed Id Token
0x16 https://w3id.org/age/v1 Age Verification
0x17 https://w3id.org/security/suites/x25519-2020/v1 X25519KeyAgreementKey2020 Suite
0x18 https://w3id.org/veres-one/v1 Veres One DID Method
0x19 https://w3id.org/webkms/v1 WebKMS (Key Management System)
0x1A https://w3id.org/zcap/v1 Authorization Capabilities (zCap)
0x1B https://w3id.org/security/suites/hmac-2019/v1 Sha256HmacKey2019 Crypto Suite
0x1C https://w3id.org/security/suites/aes-2019/v1 AesKeyWrappingKey2019 Crypto Suite
0x1D https://w3id.org/vaccination/v1 Vaccination Certificate Vocabulary v0.1
0x1E https://w3id.org/vc-revocation-list-2020/v1 Verifiable Credentials Revocation List 2020
0x1F https://w3id.org/dcc/v1 DCC (Decentralized Credentials Consortium) Core Context
0x20 https://w3id.org/vc/status-list/v1 Verifiable Credentials Status List
0x21 - 0x2F Available for use.
0x30 https://w3id.org/security/data-integrity/v1 Data Integrity v1.0
0x31 https://w3id.org/security/multikey/v1 Multikey v1.0
0x32 https://purl.imsglobal.org/spec/ob/v3p0/context.json Reserved for future use. OpenBadges v3.0.0
0x33 https://w3id.org/security/data-integrity/v2 Data Integrity v2.0
0x34 ecdsa-rdfc-2019 Data Integrity ECDSA RDFC 2019 cryptosuite identifier
0x35 ecdsa-sd-2023 Data Integrity ECDSA-SD 2023 cryptosuite identifier
0x36 eddsa-rdfc-2022 Data Integrity EDDSA RDFC 2022 cryptosuite identifier

B. References

B.1 Normative references

[RFC2119]
Key words for use in RFCs to Indicate Requirement Levels . S. Bradner. IETF. March 1997. Best Current Practice. URL: https://www.rfc-editor.org/rfc/rfc2119
[RFC8174]
Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words . B. Leiba. IETF. May 2017. Best Current Practice. URL: https://www.rfc-editor.org/rfc/rfc8174