Copyright © 2012-2024 World Wide Web Consortium . W3C ® liability , trademark and permissive document license rules apply.
The
formats
CSV
[
RFC4180
]
(comma
separated
values)
and
TSV
[
IANA-TSV
]
(tab
separated
values)
provide
simple,
easy
to
process
formats
for
the
transmission
of
tabular
data.
They
are
supported
as
input
datat
formats
by
many
tools,
particularly
spreadsheets.
This
document
describes
their
use
for
expressing
SPARQL
query
results
from
SELECT
queries.
This section describes the status of this document at the time of its publication. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at https://www.w3.org/TR/.
This specification is published by the RDF Star Working Group as part of the update of specifications for format and errata.
This document was published by the RDF-star Working Group as an Editor's Draft.
Publication as an Editor's Draft does not imply endorsement by W3C and its Members.
This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress. Future updates to this specification may incorporate new features .
This document was produced by a group operating under the W3C Patent Policy . W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy .
This document is governed by the 03 November 2023 W3C Process Document .
This
document
describes
CSV
and
TSV
formats
for
expressing
the
results
of
a
SPARQL
SELECT
query.
They
provide
lowest
common
denominator
formats
between
systems
using
different
implementation
technologies.
Other formats for expression of SPARQL results are the SPARQL Results XML Format [ SPARQL12-RESULTS-XML ] and the SPARQL Results JSON Format [ SPARQL12-RESULTS-JSON ]. Each format is useful in different application scenarios.
The
SPARQL
Results
CSV
Format
is
a
lossy
encoding
of
a
table
of
results.
It
does
not
encode
all
the
details
of
each
RDF
term
in
the
results;
instead,
it
just
gives
a
string
without
indicating
the
type
of
the
term
(IRI,
Literal,
Literal
with
datatype,
Literal
with
language,
or
blank
node).
This
makes
it
simple
to
consume
data,
such
as
text
and
numbers,
in
applications
that
don't
need
to
understand
the
details
of
RDF.
In
some
applications,
guesses
as
to
which
elements
are
hyperlinks
are
made
pragmatically,
for
example,
guessing
that
strings
starting
"
http://
"
are
links.
The SPARQL Results TSV Format does encode the details of RDF terms in the results table, by using the syntax that SPARQL [ SPARQL12-QUERY ] and Turtle [ RDF12-TURTLE ] use. An application receiving a TSV-encoded result set can split each line into elements of the result row, and extract all the details of the RDF terms it wishes to process by simple string processing, without a complete XML or JSON parser as may by required by the more complex SPARQL result formats.
When
this
document
uses
the
words
must
MUST
,
must
not
MUST
NOT
,
should
SHOULD
,
should
not
SHOULD
NOT
,
may
and
recommended
MAY
,
and
the
words
appear
as
emphasized
text,
they
must
be
interpreted
as
described
in
RFC
2119
[
RFC2119
].
The following artificial example is used to illustrate the features of serializing results in each format.
x | literal | Comment (not part of the table) |
---|---|---|
<http://example/x> | String | An IRI and a string consisting of characters S-t-r-i-n-g |
<http://example/x> | String-with-dquote" | String with a double quote in it. |
_:b0 | Blank node | Blank node |
Missing 'x' | No RDF term for the x column | |
This row has no terms in it. | ||
<http://example/x> | No term in the literal column. | |
_:b1 | "String-with-lang"@en | An RDF literal with a language tag |
_:b1 | "String-with-lang-dir"@en--ltr | An RDF literal with a directional language tag |
_:b1 | 123 | An RDF literal, datatype xsd:integer, and lexical form 123. |
The following artificial example is used to illustrate the features of serializing results containing triple terms.
x | triple | Comment (not part of the table) |
---|---|---|
"Alice" | <<( <http://example/alice> <http://example/knows> <http://example/bob> )>> |
A
plain
string
and
triple
term
with
subject
IRI
http://example/alice
,
predicate
IRI
http://example/knows
,
and
object
IRI
http://example/bob
.
|
"Bob" | <<( <http://example/bob> <http://example/knows> <http://example/alice> )>> |
A
plain
string
and
triple
term
with
subject
IRI
http://example/bob
,
predicate
IRI
http://example/knows
,
and
object
IRI
http://example/alice
.
|
"Carol" | <<( <http://example/carol> <http://example/says> "Hello world, my name is \"Alice\"." )>> |
A
plain
string
and
triple
term
with
subject
IRI
http://example/carol
,
predicate
IRI
http://example/says
,
and
object
literal
Hello
world,
my
name
is
"Alice".
.
|
The SPARQL result formats described here conform to the formal specifications of the relevant formats, Comma Separated values (CSV) [ RFC4180 ] and Tab Separated Value (TSV) [ IANA-TSV ].
Systems
providing
these
formats
should
note
that
the
content
types
are
text/csv
for
CSV
and
text/tab-separated-values
for
TSV.
Being
text/*
,
the
default
character
set
is
US-ASCII.
The
charset
parameter
should
SHOULD
be
used
in
conjunction
with
SPARQL
Results;
UTF-8
is
recommended;
giving
us
text/csv;
charset=utf-8
and
text/tab-separated-values;
charset=utf-8
.
The
end-of-line
in
CSV
is
CRLF
,
i.e.,
Unicode
codepoints
13
(
U+000D
)
and
10
(
U+000A
).
The
end-of-line
in
TSV
is
EOL
,
i.e.,
Unicode
codepoint
10
(
U+000A
).
Applications reading these formats are advised to cope with both CRLF and LF as end of line markers and not rely on conformance to the formal specifications.
In the SPARQL Results CSV Format, the results table is serialized as one line listing the variables in the results, using the CSV header line, followed by one line for each query solution. (Note: a line may end up split by newlines in the data). Values in the results are:
The
first
line
of
a
SPARQL
Results
CSV
Format
response
is
the
header
line,
giving
the
names
of
the
variables
used
in
the
result
set.
The
header
line
consists
of
the
variable
names,
without
leading
question
marks
?
,
separated
by
commas.
While
the
text/csv
format
does
not
require
a
header
row,
the
SPARQL
Results
CSV
Format
must
MUST
use
a
header
row.
If
the
content
type
parameter
header
is
used,
it
must
MUST
be
header=present
.
The remaining rows are the values of the results, with each binding determined by the position in the row, corresponding to the entry in the header line.
If
a
variable
is
not
bound,
an
empty
field
is
used
(e.g.
,,
).
Each
row
must
MUST
have
the
same
number
of
fields,
with
each
field
corresponding
to
a
binding
to
the
variable
in
the
header
line
in
the
same
field
position.
The
entry
in
each
field
is
the
string
corresponding
to
the
RDF
term
value.
(cf.
SPARQL
STR()
)
without
syntax
to
denote
what
kind
of
term
it
is.
The
encoding
quoting
rules
of
CSV
format
must
be
used.
Blank
nodes
use
the
_:label
form
from
Turtle
and
SPARQL.
Use
of
the
same
label
indicates
the
same
blank
node
within
the
result
set
but
has
no
significance
outside
the
result
set.
Triple
terms
are
enclosed
in
<<(
…
)>>
,
as
<<(
subject
predicate
object
)>>
,
where
the
subject
,
predicate
,
and
object
terms
are
recursively
serialized.
The
serializations
of
subject
,
predicate
,
and
object
must
MUST
be
separated
by
a
single
space
character
(
SPACE
,
code
point
32,
U+0020
).
The
single
space
character
preceding
subject
,
and
the
single
space
character
following
object
are
optional,
and
are
mainly
valuable
for
human
readability.
If
the
object
is
a
literal,
then
it
must
MUST
be
encapsulated
within
a
pair
of
quotation
marks
""
.
Fields
containing
any
of
"
(
QUOTATION
MARK
,
code
point
34,
U+0022
in
Unicode
[
UNICODE
]),
,
(
COMMA
,
code
point
44,
U+002C
),
LF
(code
point
10,
U+000A
),
or
CR
(code
point
13,
U+000D
)
must
MUST
be
quoted
using
the
quoting
mechanism
of
RFC4180
[
RFC4180
].
Fields
are
delimited
by
a
pair
of
quotation
marks
"
(code
point
U+0022
).
Within
quoted
strings,
all
characters
except
"
,
including
new
line
characters,
have
their
exact
meaning
—
newlines
do
not
end
a
CSV
record.
Inline
"
is
written
using
a
pair
of
quotation
marks
""
.
This
quoting
mechanism
is
applied
recursively
for
terms
in
triple
terms.
Since
literals
in
the
object
position
of
triple
terms
are
encapsulated
within
a
pair
of
quotation
marks
"
,
and
the
full
triple
term
is
always
encapsulated
within
quotation
marks
as
well,
this
requires
the
subsequent
escaping
of
the
"inner"
quotation
marks
encapsulating
the
literal.
Such
cases
will
result
in
triple
terms
in
the
form
of
"<<(
subject
predicate
""
object-literal
""
)>>"
.
The standard CSV format does not distinguish between missing values and empty strings. The SPARQL 1.2 Results CSV Format uses the same representation for unbound variables as for variables bound to an empty string literal. The other SPARQL Result formats (based on JSON, TSV, or XML) can be used if this distinction is required.
x,literal http://example/x,String http://example/x,"String-with-dquote""" _:b0,Blank node ,Missing 'x' , http://example/x, _:b1,String-with-lang _:b1,String-with-lang-dir _:b1,123
x,triple "Alice",<<( http://example/alice http://example/knows http://example/bob )>> "Bob",<<( http://example/bob http://example/knows http://example/alice )>> "Carol","<<( http://example/carol http://example/says ""Hello world, my name is """"Alice""""."" )>>"
In
the
SPARQL
Results
TSV
Format,
the
results
table
is
serialized
as
one
line
listing
the
variables
in
the
results,
followed
by
one
line
for
each
query
solution.
All
RDF
terms
used
in
the
format
are
encoded
in
the
format
specified
by
Turtle
[RDF12-TURTLE]
except
that
the
triple
term
forms
for
the
lexical
part
of
literals
must
not
MUST
NOT
be
used.
These
forms
would
allow
raw
newlines
and
tabs
that
are
part
of
the
TSV
format.
A
TSV
format
SPARQL
result
set
must
use
the
single
quoted
literal
forms,
together
with
any
necessary
escapes
such
as
\t
,
\n
,
and
\r
.
The results table is serialized as one line listing the variables in the results, followed by one line for each query solution. This first line is required by the TSV format [ IANA-TSV ], unlike CSV, where it is optional.
Variables
are
serialized
in
SPARQL
syntax,
using
question
mark
?
character
followed
by
the
variable
name.
Each
row
of
the
result
set
is
serialized
by
sequence
of
RDF
terms
in
SPARQL
syntax,
separated
by
a
tab
(Horizontal
Tab,
Unicode
code
point
U+0009
)
character.
If
a
variable
is
not
bound
in
a
row,
an
empty
field
is
used.
Each
row
must
MUST
have
the
same
number
of
fields,
corresponding
to
the
variables
listed
in
the
first
row.
The SPARQL Results TSV Format serializes RDF terms in the results table by using the syntax that SPARQL [ SPARQL12-QUERY ] and Turtle [ RDF12-TURTLE ] use.
IRIs
are
enclosed
in
<...>
,
literals
are
enclosed
with
double
quotes
"
...
"
or
single
quotes
'
...
'
with
optional
@lang
or
^^
for
datatype.
IRIs
are
written
enclosed
in
<...>
.
They
must
MUST
conform
to
the
IRI
rule
of
Internationalized
Resource
Identifiers
(IRIs)
.
Such
IRIs
include
the
IRI
scheme
and
must
not
MUST
NOT
be
Relative
Reference
.
This
includes
IRIs
used
as
datatypes.
Literals
are
written
with
the
lexical
form
in
quotes.
Tab,
newline,
and
carriage
return
characters
(Unicode
code
points
U+0009
(tab),
U+0010
(line
feed)
and
U+0013
(carriage
return))
are
encoded
in
strings
as
\t
,
\n
and
\r
respectively.
The
long
string
forms
using
triple
quotes
—
"""
or
'''
—
must
not
MUST
NOT
be
used.
The
abbreviated
forms
for
numbers
(XSD
integers,
decimals,
and
doubles)
should
SHOULD
be
used.
Blank
nodes
use
the
_:label
form
from
Turtle
and
SPARQL.
Use
of
the
same
label
indicates
the
same
blank
node
within
the
result
set,
but
has
no
significance
outside
the
result
set.
Triple
terms
are
enclosed
in
<<(
…
)>>
,
as
in
<<(
subject
predicate
object
)>>
,
where
the
subject
,
predicate
,
and
object
terms
are
recursively
serialized.
The
serializations
of
subject
,
predicate
,
and
object
must
MUST
be
separated
by
a
single
space
character
(
SPACE
,
code
point
32,
U+0020
).
The
single
space
character
preceding
subject
,
and
the
single
space
character
following
object
are
optional,
and
are
mainly
valuable
for
human
readability.
Writing
<TAB>
for
a
raw
tab
character
(Unicode
code
point
U+0009
):
?x<TAB>?literal <http://example/x><TAB>"String" <http://example/x><TAB>"String-with-dquote\"" _:blank0<TAB>"Blank node" <TAB>"Missing 'x'" <TAB> <http://example/x><TAB> _:blank1<TAB>"String-with-lang"@en _:blank1<TAB>"String-with-lang-dir"@en--ltr _:blank1<TAB>123
Writing
<TAB>
for
a
raw
tab
character
(Unicode
code
point
U+0009
):
?x<TAB>?triple "Alice"<TAB><<( <http://example/alice> <http://example/knows> <http://example/bob> )>> "Bob"<TAB><<( <http://example/bob> <http://example/knows> <http://example/alice> )>> "Carol"<TAB><<( <http://example/carol> <http://example/says> "Hello world, my name is \"Alice\". )>>
As well as sections marked as non-normative, all authoring guidelines, diagrams, examples, and notes in this specification are non-normative. Everything else in this specification is normative.
The key words MAY , MUST , MUST NOT , SHOULD , and SHOULD NOT in this document are to be interpreted as described in BCP 14 [ RFC2119 ] [ RFC8174 ] when, and only when, they appear in all capitals, as shown here.
This section is non-normative.
TODO
TODO
TODO