Work with RDF-related concepts, datasets, and files in Go.
- Decode TriG, N-Quads, XML, JSON-LD, HTML, and other RDF-based sources.
- Reference IRI constants generated from vocabulary definitions.
- Track data lineage for RDF properties within source files.
- Build higher-level abstractions based on RDF primitives.
Import the module and refer to the code's documentation (pkg.go.dev).
import "github.com/dpb587/rdfkit-go/rdf"
Some sample use cases and starter snippets can be found in the examples
directory.
examples$ go run ./html-extract https://microsoft.com
@base <https://www.microsoft.com/en-us/> .
@prefix og: <http://ogp.me/ns#> .
@prefix schema: <http://schema.org/> .
</>
a schema:WebSite ;
schema:potentialAction [
a schema:SearchAction ;
schema:query-input "required name=search_term_string" ;
schema:target [
a schema:EntryPoint ;
schema:urlTemplate "https://www.microsoft.com/en-us/search/explore?q={search_term_string}&ocid=AID_seo_sitelinks_search"
]
] ;
schema:url </> .
<>
<SHORTCUT> </favicon.ico?v2> ;
<canonical>
</en-us> ,
</en-us> ;
og:description "Explore Microsoft products and services and support for your home or business. Shop Microsoft 365, Copilot, Teams, Xbox, Windows, Azure, Surface and more."@en-US ;
og:title "Microsoft – AI, Cloud, Productivity, Computing, Gaming & Apps"@en-US ;
og:type "website"@en-US ;
og:url "https://www.microsoft.com/en-us"@en-US .
_:b0
a schema:Organization ;
<logo> <https://uhf.microsoft.com/images/microsoft/RE1Mu3b.png> ;
<name> "Microsoft" ;
<url> <https://www.microsoft.com> .
Based on the Resource Description Framework (RDF), there are three primitive value types, aka terms, that are used to represent data: IRIs, literals, and blank nodes. The primitive value types are the basis of triples and other assertions about information.
An IRI records a URL-based identity as a string
value.
resourceIRI := rdf.IRI("http://example.com/resource")
The iriutil
package provides additional support for mapping IRIs from prefixes and CURIEs. Some well-known IRIs are defined in subpackages such as rdfiri
and xsdiri
- see Ontologies for more details.
A literal records more traditional data values, such as booleans and strings. It must include both a datatype (IRI) and its string-encoded, lexical form. The lexical form should always follow the datatype-specific recommendations for what a valid form looks like.
trueLiteral := rdf.Literal{
Datatype: xsdiri.Boolean_Datatype,
LexicalForm: "true",
}
Literals can be tedious to work with, so some well-known data types have factory-style functions. See Ontologies for utilities and other methods using Go primitives.
A blank node represents an anonymous resource and are always created with a unique, internal identifier. Two blank nodes are equivalent if and only if they have the same identifier.
bnode := rdf.NewBlankNode()
The blanknodeutil
package provides additional support for using string-based identifiers (e.g. b0
), mapping blank nodes from implementations, and scoped factories.
A triple is used to describe some sort of statement about the world. Within the triple, a subject is said to have some relationship, the predicate, with an object.
nameTriple := rdf.Triple{
Subject: rdf.NewBlankNode("b0"),
Predicate: schemairi.Name_Property,
Object: xsdliteral.NewString("Web Vocab"),
}
The fields of a triple are restricted to the normative value types they support, described by the table below. Using a triple as a triple term is not yet supported.
Field | IRI | Literal | Blank Node |
---|---|---|---|
Subject | Valid | Invalid | Valid |
Predicate | Valid | Invalid | Invalid |
Object | Valid | Valid | Valid |
A graph is a set of triples, all of which collectively describe the state of a world. An rdfio.Graph*
interface supports basic operations, such as working with triples.
err := storage.PutTriple(ctx, nameTriple)
A dataset is a set of graphs (is a set of triples). By convention, when a graph-agnostic function is invoked, such as PutTriple
, it will be executed against the default graph if the underlying storage is a dataset. The following is equivalent to the previous example, assuming storage
is a dataset.
err := storage.PutGraphTriple(ctx, rdf.DefaultGraph, nameTriple)
The usage of a dataset vs graph vs dataset graphs is very application-specific. Within Go, interfaces are defined for datasets and graphs, but can be used interchangeably for some use cases. For broader discussion on the semantics and logical considerations of datasets, review this W3C Note.
A statement is the representation of a triple within a graph, and it is described by the rdfio.Statement
interface.
iter := storage.NewStatementIterator(ctx)
defer iter.Close()
for iter.Next() {
statement := iter.GetStatement()
statementTriple := statement.GetTriple()
fmt.Fprintf(os.Stderr, "%v\t%v\t%v\n", statementTriple.Subject, statementTriple.Predicate, statementTriple.Object)
}
if err := iter.Err(); err != nil {
panic(err)
}
As an interface, storage implementations may offer additional capabilities for statements.
A node is the representation of a resource (i.e. blank node or IRI) within a graph. Similar to statements, implementations of the rdfio.Node
interface may offer additional capabilities.
iter := storage.NewNodeIterator(ctx)
for iter.Next() {
node := iter.GetNode()
fmt.Fprintf(os.Stderr, "%v\n", node.GetTerm())
}
if err := iter.Err(); err != nil {
panic(err)
}
The inmemory
experimental package currently offers a single, in-memory dataset which may be useful for small collections and labeled property graph conventions.
storage := inmemory.NewDataset()
Better-supported storage or alternative, remote service clients will likely be a focus on the future.
An encoding (or file format) is used to decode and encode RDF data. The following encodings are available under the encoding
package.
Package | Decode | Encode |
---|---|---|
htmljsonld |
Dataset | n/a |
htmlmicrodata |
Graph | n/a |
jsonld |
Dataset | n/a |
nquads |
Dataset | Triple, Quad |
ntriples |
Graph | Triple |
rdfa |
Graph | n/a |
rdfjson |
Graph | Triple |
rdfxml |
Graph | n/a |
trig |
Dataset | n/a |
turtle |
Graph | Triple, Description |
Some encodings do not yet support all syntactic features defined by their official specification, though they should cover common practices. Most are tested against some sort of test suite (such as the ones published by W3C), and the latest results can be found in their testsuites/*/RESULTS.md
files.
Broader support for encoders will likely be added in the future.
Encodings provide a NewDecoder
function which require an io.Reader
and optional DecoderConfig
options. It can be used as an iterator for all statements found in the encoding. Depending on the capabilities of the encoding format, the decoder fulfills either the encoding.DatasetDecoder
or encoding.GraphDecoder
interface.
decoder := nquads.NewDecoder(os.Stdin)
defer decoder.Close()
for decoder.Next() {
statement := decoder.GetStatement()
triple := statement.GetTriple()
fmt.Fprintf(os.Stdout, "[%v] %v\t%v\t%v\n", statement.GetGraphName(), triple.Subject, triple.Predicate, triple.Object)
}
err := decoder.Err()
Most are stream processors, so valid statements may be produced before a syntax error is encountered. When a syntax error occurs, the byte offset (and text offset, when enabled) of the occurrence is included.
Most decoders can capture the exact byte and line+column offsets where a statement's graph name, subject, predicate, and object value was decoded from the source. To include this metadata in the decoded statements, enable CaptureTextOffsets
via the decoder's options. A map of property-offsets can be accessed through the encoding.DecoderTextOffsetsStatement
interface.
for propertyType, propertyOffsets := range statement.(encoding.DecoderTextOffsetsStatement).GetDecoderTextOffsets() {
fmt.Fprintf(
os.Stderr,
"> found %s from L%dC%d (byte %d) until %s (byte %d)\n",
encoding.StatementOffsetsTypeName(propertyType),
propertyOffsets.From.LineColumn[0],
propertyOffsets.From.LineColumn[1],
propertyOffsets.From.Byte,
// same as L%dC%d
propertyOffsets.Until.LineColumn.TextOffsetRangeString(),
propertyOffsets.Until.Byte,
)
}
When working with offsets, consider the following caveats.
- Capturing and processing text offsets comes with a slight impact to performance and memory.
- Offsets for some properties may not always be available due to decoding limitations. For example,
jsonld
does not always capture a predicate's offset, although this will likely be fixed in the future. - Offsets for some properties may be "incomplete" due to stream processing. For example,
turtle
may only refer to the opening[
token of an anonymous resource when the closing]
token has not yet been read.
A few encodings similarly provide a NewEncoder
requiring an io.Writer
and EncoderConfig
options.
encoder := nquads.NewWriter(os.Stdout)
defer encoder.Close()
for _, triple := range tripleList {
err := encoder.PutGraphTriple(ctx, rdf.DefaultGraph, triple)
}
The rdfdescription
package offers an alternative method for describing a resource with statements.
resource := rdfdescription.SubjectResource{
Subject: rdf.IRI("http://example.com/product"),
Statements: rdfdescription.StatementList{
rdfdescription.ObjectStatement{
Predicate: rdfiri.Type_Property,
Object: schemairi.Product_Thing,
},
rdfdescription.AnonResourceStatement{
Predicate: schemairi.Offer_Property,
AnonResource: rdfdescription.AnonResource{
Statements: rdfdescription.StatementList{
rdfdescription.ObjectStatement{
Predicate: rdfiri.Type_Property,
Object: schemairi.Offer_Thing,
},
rdfdescription.ObjectStatement{
Predicate: schemairi.Price_Property,
Object: schemaliteral.NewNumber(55.00),
},
rdfdescription.ObjectStatement{
Predicate: schemairi.PriceCurrency_Property,
Object: schemaliteral.NewText("USD"),
},
},
},
},
},
}
A description can be converted to triples by calling its AsTriples
function.
err := rdfioutil.GraphPutTriples(ctx, storage, resource.AsTriples())
Some encodings support syntax for structured statements, such as Turtle, and implement the rdfdescriptionio.DatasetEncoder
or rdfdescriptionio.GraphEncoder
interface.
err := resourceEncoder.PutResource(ctx, resource)
An ontology (or vocabulary) offers domain-specific conventions for working with data. Several well-known ontologies are within the ontology
package and offer IRI constants, helpers for literals, and other data utilities.
- owl -
owliri
- rdf -
rdfiri
,rdfliteral
, andrdfvalue
- rdfa -
rdfairi
- rdfs -
rdfsiri
- schema -
schemairi
,schemaliteral
, and other utilities - xsd -
xsdiri
,xsdliteral
,xsdvalue
, and other utilities
To help maintain consistency, the following practices are used for the naming and implementations.
- The
{prefix}
should be based on RDFa Core Initial Context, vann:preferredNamespacePrefix, or similarly-defined term. {prefix}iri
package - constants for resource IRIs defined in the vocabulary. Theirigen
command is used for most of these.const Base rdf.IRI
- the preferred base IRI. For example,http://www.w3.org/1999/02/22-rdf-syntax-ns#
.const {Name}_{Type} rdf.IRI
- For example, the statementrdf:type a rdf:Property
becomes the constantrdfiri.Type_Property
with a value ofBase + "type"
. If a resource is defined with multiple types, the first type listed in the vocabulary should be used.
{prefix}literal
package - utility functions for working with literal datatypes, such as the following.func New{Datatype}(...) rdf.Literal
- requiring any necessary parameters of Go types, returns a validrdf.Literal
value.func Map{Datatype}(v string) (literalutil.CustomValue, error)
- map the lexical form of a literal value into a Go-native type.
{prefix}value
package - Go-native types which represent a literal datatype.type {Datatype} {any}
- any builtin that can natively represent the datatype and satisfies theliteralutil.CustomValue
interface.func Map{Datatype}(v string) ({Datatype}, error)
- same as{prefix}literal.Map{Datatype}
, but returning the custom, concrete type.
Mapping functions can decode the lexical form to return a Go-native type which represents the datatype (or error due to invalid input).
trueValue, err := xsdvalue.MapBoolean(trueLiteral.LexicalForm)
trueValue == xsdvalue.Boolean(true)
bool(trueValue) == true
trueValue.AsLiteralTerm() == trueLiteral
A common practice with IRIs is defining prefixes that may be used to expand and compact IRIs. These prefixes are often used in encoding formats.
prefixes := iriutil.NewPrefixMap(
iriutil.PrefixMapping{
Prefix: "ex",
Expanded: "http://example.com/",
},
)
rIRI, ok := prefixes.ExpandPrefix("ex", "resource")
ok && rIRI == rdf.IRI("http://example.com/resource")
_, ok = prefixes.ExpandPrefix("unknown", "resource")
!ok
rNS, rLocalName, ok := prefixes.CompactPrefix(rdf.IRI("http://example.com/resource"))
ok && rNS == "http://example.com/" && rLocalName == "resource"
_, _, ok = prefixes.CompactPrefix(rdf.IRI("https://example.com/secure"))
!ok
The rdfacontext
package provides a list of prefix mappings defined by the W3C at RDFa Core Initial Context. This includes prefixes such as owl:
, rdfa:
, and xsd:
. The list of widely-used prefixes is included as well, which includes prefixes such as dc:
and schema:
.
The curie
package provides several functions for working with CURIE syntax based on CURIE Syntax.
rCURIE, ok := curie.Parse("[ex:resource]")
ok && rCURIE.Safe && rCURIE.Prefix == "ex" && rCURIE.Reference == "resource"
mappings := curie.MappingScope{
Prefixes: prefixes,
}
rIRI, ok := mappings.ExpandCURIE(parsed)
ok && rIRI == "http://example.com/resource"
The cmd/rdfkit
package offers a command line interface for some development utilities. Most notably:
irigen
- generate Go constants from an RDF vocabulary. Used for most of the*iri
packages.pipe
- decode local files or remote URLs, and then re-encode using any of the supported RDF formats.