-
Notifications
You must be signed in to change notification settings - Fork 0
SPARQL
SPARQL is a query language for RDF.
- We have a copy of the Learning Sparql book. Reach out to the team and someone will let you borrow it.
- SPARQL in 11 minutes is a youtube video with a quick walkthrough of the language. Its examples can be downloaded from the Learning Sparql source code section.
- Wikidata has a SPARQL endpoint and a list of example queries.
- The SPARQL 1.1 Query Language has a comprehensive description of the language which is helpful reference once you've gotten some experience.
These resources will cover much more than we do here.
The examples in this wiki are included as files which can be retrieved by cloning the wiki's repo.
git clone https://github.com/GSS-Cogs/knowledge.wiki.git
rdflib
is a python library for interacting with RDF. It has functionality for running SPARQL queries against a datasource.
You'll have to have rdflib
and pandas
installed:
pip3 install pandas
pip3 install rdflib
The g.parse(...)
command can be used to load RDF from a local file (by passing a filepath) or a URL. Once the RDF has been loaded, the g.query(...)
command can be used to execute a SPARQL query and store the results. Those results can then be convered to a pandas
dataframe.
#%%
import pandas as pd
import rdflib
g = rdflib.Graph()
g.parse("http://www.learningsparql.com/2ndeditionexamples/ex002.ttl")
results = g.query("""
# filename: ex003.rq
PREFIX ab: <http://learningsparql.com/ns/addressbook#>
SELECT ?craigEmail
WHERE
{ ab:craig ab:email ?craigEmail . }
""")
pd.DataFrame(results.bindings)
You can run this interactively in a notebook or using the VS Code python interactive window.
Apache Jena has a command line untility, arq
, which can be used to perform SPARQL queries. We can specify local turtle files and sparql files through the --data
and --query
arguments, as a way to practice.
We recommend having docker installed, then from a terminal (terminal app, VS Code) you can run the following (using some examples from the Learning Sparql website):
docker run -it -v $PWD:/workspace -w /workspace stain/jena \
arq \
--data http://www.learningsparql.com/2ndeditionexamples/ex002.ttl \
--query http://www.learningsparql.com/2ndeditionexamples/ex003.rq
arq
will output a nicely formatted table of results.
--------------------------------
| craigEmail |
================================
| "c.ellis@usairwaysgroup.com" |
| "craigellis@yahoo.com" |
--------------------------------
Once you've done that, you can run SPARQL queries by doing something like:
docker run -it -v $PWD:/workspace -w /workspace stain/jena \
arq \
--data files/observation.ttl \
--query files/ex-001.rq
Additionally, we have our own SPARQL endpoint with a query editor at https://beta.gss-data.org.uk/tools/sparql which can run SPARQL queries against the RDF we create.
We can query against this endpoint using rdflib
by including a SERVICE
keyword and the URL of the endpoint (<https://beta.gss-data.org.uk/sparql>
), or alternatively by using the curl
command line utility:
curl https://beta.gss-data.org.uk/sparql \
--data """query=
SELECT *
WHERE {
?s ?p ?o .
}
LIMIT 10
"""
Some sites make their data available via a SPARQL endpoint. With a SPARQL endpoint, you don't have Turtle files to read the RDF structure - but you can discover the data through SPARQL queries.
The following query is a good starting point for exploration. It defines three variables, ?s
, ?p
and ?o
, and will return all triples which match the triple pattern ?s ?p ?o
.
Since all triples match that pattern, you'll get a list of every triple.
The LIMIT
keyword limits the number of results you get (to get quicker results during exploration). Running this query against some data will return 10 triples.
# ex-001.rq
SELECT *
WHERE {
?s ?p ?o
}
LIMIT 10
To continue exploring, it can be helpful to get a list of predicates which are used.
# ex-002.rq
SELECT DISTINCT ?p
WHERE {
?s ?p ?o
}
LIMIT 10
It can also be helpful to get a list of resource classes within the RDF.
# ex-003.rq
SELECT DISTINCT ?class
WHERE {
?s a ?class
}
LIMIT 10
Bringing this together, imagine we discovered http://www.w3.org/ns/dcat#Dataset
was a class within the data, we could then ask for all resources which of that class, and the predicates and objects connected to resources of that class.
This could help us discover what information about datasets is available in the SPARQL endpoint.
SELECT *
WHERE {
?dataset a <http://www.w3.org/ns/dcat#Dataset> .
?dataset ?p ?o .
}
LIMIT 10
We'll show some quick example queries against the following dataset. The dataset contains three people, some personal details and their relationships to one another.
# people.ttl
@prefix ex: <http://example.org/> .
@prefix schema: <http://schema.org/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
ex:Alice a schema:Person ;
schema:name "Alice"@en ;
schema:knows ex:Bob, ex:Charlie ;
schema:gender ex:Female ;
schema:weight 80 ;
schema:birthDate "1990-01-01"^^xsd:date ;
schema:telephone "07712345678" ;
.
ex:Bob a schema:Person ;
schema:name "Bob"@en ;
schema:knows ex:Alice, ex:Charlie ;
schema:gender ex:Male ;
schema:weight 90 ;
schema:birthDate "1994-01-01"^^xsd:date ;
.
ex:Charlie a schema:Person ;
schema:name "Charlie"@en ;
schema:knows ex:Bob ;
schema:gender ex:Male ;
schema:weight 100 ;
schema:birthDate "1998-01-01"^^xsd:date ;
.
Related section in SPARQL 1.1 Query Language (Prefixed names)
We can select the identifiers, names, genders and weights of each of the people in our dataset.
Similarly to Turtle you may include the PREFIX
keyword at the beginning of a SPARQL query to shorten IRIs. Note the slight difference in syntax - there is no preceding @
and no trailing dot .
# people-001.rq
PREFIX schema: <http://schema.org/>
SELECT *
WHERE {
?person a schema:Person ;
schema:name ?name ;
schema:gender ?gender ;
schema:weight ?weight ;
.
}
docker run -it -v $PWD:/workspace -w /workspace stain/jena \
arq \
--data files/people.ttl \
--query files/people-001.rq
--------------------------------------------------------------------------------------
| person | name | gender | weight |
======================================================================================
| <http://example.org/Charlie> | "Charlie"@en | <http://example.org/Male> | 100 |
| <http://example.org/Bob> | "Bob"@en | <http://example.org/Male> | 90 |
| <http://example.org/Alice> | "Alice"@en | <http://example.org/Female> | 80 |
--------------------------------------------------------------------------------------
You can select a subset of variables by replacing the *
in the SELECT
statement to a list of variables separated by spaces e.g. SELECT ?name ?gender ?weight
.
You can limit the number of results returned by a query by using the LIMIT
keyword, which appears at the end of a query. This is helpful during exploratory queries, to only return a small number of results rather than returning everything.
# people-002.rq
PREFIX schema: <http://schema.org/>
SELECT *
WHERE {
?person a schema:Person ;
schema:name ?name ;
schema:gender ?gender ;
schema:weight ?weight ;
.
}
LIMIT 2
docker run -it -v $PWD:/workspace -w /workspace stain/jena \
arq \
--data files/people.ttl \
--query files/people-002.rq
------------------------------------------------------------------------------------
| person | name | gender | weight |
====================================================================================
| ex:Charlie | "Charlie"@en | ex:Male | 100 |
| ex:Bob | "Bob"@en | ex:Male | 90 |
------------------------------------------------------------------------------------
You can order the results of a query by using the ORDER BY
keyword. It will sort things alphabetically or numerically depending on the datatype of the variable.
# people-003.rq
PREFIX schema: <http://schema.org/>
SELECT *
WHERE {
?person a schema:Person ;
schema:name ?name ;
schema:gender ?gender ;
schema:weight ?weight ;
.
}
ORDER BY ?name
docker run -it -v $PWD:/workspace -w /workspace stain/jena \
arq \
--data files/people.ttl \
--query files/people-003.rq
--------------------------------------------------------------------------------------
| person | name | gender | weight |
======================================================================================
| <http://example.org/Alice> | "Alice"@en | <http://example.org/Female> | 80 |
| <http://example.org/Bob> | "Bob"@en | <http://example.org/Male> | 90 |
| <http://example.org/Charlie> | "Charlie"@en | <http://example.org/Male> | 100 |
--------------------------------------------------------------------------------------
You can order in decending order by using the DESC
function.
# people-004.rq
PREFIX ex: <http://example.org/>
PREFIX schema: <http://schema.org/>
SELECT *
WHERE {
?person a schema:Person ;
schema:name ?name ;
schema:gender ?gender ;
schema:weight ?weight ;
.
}
ORDER BY DESC(?weight)
docker run -it -v $PWD:/workspace -w /workspace stain/jena \
arq \
--data files/people.ttl \
--query files/people-004.rq
--------------------------------------------------------------------------------------
| person | name | gender | weight |
======================================================================================
| ex:Charlie | "Charlie"@en | ex:Male | 100 |
| ex:Bob | "Bob"@en | ex:Male | 90 |
| ex:Alice | "Alice"@en | ex:Female | 80 |
--------------------------------------------------------------------------------------
Related section in SPARQL 1.1 Query Language (Creating Values with Expressions)
Related section in SPARQL 1.1 Query Language (Function Definitions)
New variables can be created by using the BIND(... AS ?x)
syntax. SPARQL has many functions for string manipulation and arithmetic. The following example uses the birthDate
and the NOW()
function to compute an age for each individual.
# people-005.rq
PREFIX schema: <http://schema.org/>
SELECT ?person ?age
WHERE {
?person a schema:Person ;
schema:birthDate ?birth_date .
BIND(
(YEAR(NOW()) - YEAR(?birth_date)) -
IF(
(MONTH(NOW())<MONTH(?birth_date) ||
(MONTH(NOW())=MONTH(?birth_date) && DAY(NOW())<DAY(?birth_date))),
1, 0
)
AS ?age)
}
docker run -it -v $PWD:/workspace -w /workspace stain/jena \
arq \
--data files/people.ttl \
--query files/people-005.rq
--------------------------------------
| person | age |
======================================
| <http://example.org/Charlie> | 24 |
| <http://example.org/Bob> | 28 |
| <http://example.org/Alice> | 32 |
--------------------------------------
We can perform operations over groups, similar to SQL.
# people-006.rq
PREFIX schema: <http://schema.org/>
SELECT ?gender (AVG(?weight) as ?average_weight)
WHERE {
?person a schema:Person ;
schema:name ?name ;
schema:gender ?gender ;
schema:weight ?weight ;
.
}
GROUP BY ?gender
docker run -it -v $PWD:/workspace -w /workspace stain/jena \
arq \
--data files/people.ttl \
--query files/people-006.rq
------------------------------------------------
| gender | average_weight |
================================================
| <http://example.org/Male> | 95.0 |
| <http://example.org/Female> | 80.0 |
------------------------------------------------
We can perform operations over groups, similar to SQL. Many logical statments can be included in filters. &&
means AND, ||
means OR.
# people-007.rq
PREFIX schema: <http://schema.org/>
PREFIX ex: <http://example.org/>
SELECT *
WHERE {
?person a schema:Person ;
schema:name ?name ;
schema:gender ?gender ;
schema:weight ?weight ;
.
FILTER (
?weight >= 60 && ?gender = ex:Male
)
}
docker run -it -v $PWD:/workspace -w /workspace stain/jena \
arq \
--data files/people.ttl \
--query files/people-007.rq
------------------------------------------------------------------------------------
| person | name | gender | weight |
====================================================================================
| ex:Charlie | "Charlie"@en | ex:Male | 100 |
| ex:Bob | "Bob"@en | ex:Male | 90 |
------------------------------------------------------------------------------------
We can filter for existence of triples by using FILTER EXISTS {}
and FILTER NOT EXISTS {}
keywords.
The following returns people who know Charlie:
# people-008.rq
PREFIX schema: <http://schema.org/>
PREFIX ex: <http://example.org/>
SELECT *
WHERE {
?person a schema:Person ;
schema:name ?name ;
schema:gender ?gender ;
schema:weight ?weight ;
.
FILTER EXISTS {
?person schema:knows ex:Charlie .
}
}
docker run -it -v $PWD:/workspace -w /workspace stain/jena \
arq \
--data files/people.ttl \
--query files/people-008.rq
----------------------------------------------------------------------------------
| person | name | gender | weight |
==================================================================================
| ex:Bob | "Bob"@en | ex:Male | 90 |
| ex:Alice | "Alice"@en | ex:Female | 80 |
----------------------------------------------------------------------------------
Whereas the following returns people who don't know Charlie:
# people-009.rq
PREFIX schema: <http://schema.org/>
PREFIX ex: <http://example.org/>
SELECT *
WHERE {
?person a schema:Person ;
schema:name ?name ;
schema:gender ?gender ;
schema:weight ?weight ;
.
FILTER NOT EXISTS {
?person schema:knows ex:Charlie .
}
}
docker run -it -v $PWD:/workspace -w /workspace stain/jena \
arq \
--data files/people.ttl \
--query files/people-009.rq
------------------------------------------------------------------------------------
| person | name | gender | weight |
====================================================================================
| ex:Charlie | "Charlie"@en | ex:Male | 100 |
------------------------------------------------------------------------------------
We can notice that Alice has an entry for schema:telephone
but the other people don't. Our queries will only return results where all triple patterns are present, so the following query will only return an entry for Alice.
# people-010.rq
PREFIX schema: <http://schema.org/>
SELECT *
WHERE {
?person a schema:Person ;
schema:name ?name ;
schema:telephone ?telephone_number ;
.
}
docker run -it -v $PWD:/workspace -w /workspace stain/jena \
arq \
--data files/people.ttl \
--query files/people-010.rq
--------------------------------------------------------------
| person | name | telephone_number |
==============================================================
| <http://example.org/Alice> | "Alice"@en | "07712345678" |
--------------------------------------------------------------
If a triple might not be present and we wish to return those results, we can wrap it in the OPTIONAL {}
keyword.
# people-011.rq
PREFIX schema: <http://schema.org/>
SELECT *
WHERE {
?person a schema:Person ;
schema:name ?name ;
.
OPTIONAL {
?person schema:telephone ?telephone_number .
}
}
docker run -it -v $PWD:/workspace -w /workspace stain/jena \
arq \
--data files/people.ttl \
--query files/people-011.rq
------------------------------------------------------------------
| person | name | telephone_number |
==================================================================
| <http://example.org/Charlie> | "Charlie"@en | |
| <http://example.org/Bob> | "Bob"@en | |
| <http://example.org/Alice> | "Alice"@en | "07712345678" |
------------------------------------------------------------------
Property paths give a convenient method for traversing a graph's properties without having to create intermediate variables. The following two queries essentially do the same thing, but the first creates an additional surplus variable whereas the second uses the /
notation between properties to express a property path.
# people-012.rq
PREFIX ex: <http://example.org/>
PREFIX schema: <http://schema.org/>
SELECT *
WHERE {
# Who does Alice know?
ex:Alice schema:knows ?person .
# What are the names of people that Alice knows?
?person schema:name ?name .
}
docker run -it -v $PWD:/workspace -w /workspace stain/jena \
arq \
--data files/people.ttl \
--query files/people-012.rq
-----------------------------------------------
| person | name |
===============================================
| ex:Charlie | "Charlie"@en |
| ex:Bob | "Bob"@en |
-----------------------------------------------
The method using property paths only returns one column, as only one variable is defined.
# people-013.rq
PREFIX ex: <http://example.org/>
PREFIX schema: <http://schema.org/>
SELECT *
WHERE {
# What are the names of people that Alice knows?
ex:Alice schema:knows / schema:name ?name .
}
docker run -it -v $PWD:/workspace -w /workspace stain/jena \
arq \
--data files/people.ttl \
--query files/people-013.rq
----------------
| name |
================
| "Charlie"@en |
| "Bob"@en |
----------------
A row in a dataframe could look as follows:
area | period | sex | life_expectancy |
---|---|---|---|
W06000022 | 2004-01-01T00:00:00/P3Y | Male | 76.7 |
In RDF, we represent the entire row as a qb:Observation
, with additional triples formed from each of the columns. Each columns behaves as a predicate and each of the cell items ("Newport", "Male" etc.) is turned into an object (see Datacubes).
The equivalent data, represented in RDF could look as follows:
@prefix qb: <http://purl.org/linked-data/cube#> .
<http://data.gov.uk/dataset/life-expectancy-by-region-sex-and-time/datacube/obs/W06000022-2004-01-01T00:00:00/P3Y-Male>
a qb:Observation ;
<http://data.gov.uk/dataset/life-expectancy-by-region-sex-and-time/dimension/area>
<http://statistics.data.gov.uk/id/statistical-geography/W06000022> ;
<http://data.gov.uk/dataset/life-expectancy-by-region-sex-and-time/dimension/period>
<http://reference.data.gov.uk/id/gregorian-interval/2004-01-01T00:00:00/P3Y> ;
<http://data.gov.uk/dataset/life-expectancy-by-region-sex-and-time/dimension/sex>
<http://data.gov.uk/dataset/life-expectancy-by-region-sex-and-time/codelist/sex/code/Male> ;
<http://data.gov.uk/dataset/life-expectancy-by-region-sex-and-time/measure/life-expectancy>
76.7 ;
qb:dataSet <http://data.gov.uk/dataset/life-expectancy-by-region-sex-and-time/datacube> ;
.
Given an RDF representation of a data cube, we can retrieve a tabular view of the data using SPARQL by specifying qb:dataSet
and the URI of the RDF data cube (qb:DataSet
) and listing each of the components of the cube as predicates.
# qb-001.rq
PREFIX qb: <http://purl.org/linked-data/cube#>
SELECT *
WHERE {
?observation a qb:Observation ;
qb:dataSet <http://data.gov.uk/dataset/life-expectancy-by-region-sex-and-time/datacube> ;
<http://data.gov.uk/dataset/life-expectancy-by-region-sex-and-time/dimension/area> ?area ;
<http://data.gov.uk/dataset/life-expectancy-by-region-sex-and-time/dimension/period> ?period ;
<http://data.gov.uk/dataset/life-expectancy-by-region-sex-and-time/dimension/sex> ?sex ;
<http://data.gov.uk/dataset/life-expectancy-by-region-sex-and-time/measure/life-expectancy> ?life_expectancy ;
}
The result of this query would return all of the object URIs in a table.
docker run -it -v $PWD:/workspace -w /workspace stain/jena \
arq \
--data files/observation.ttl \
--query files/qb-001.rq
Often, when dealing with a tabular view of the data we don't want URIs, but instead the labels of those resources. We can solve this issue with blank nodes. Note how the query has changed.
# qb-002.rq
PREFIX qb: <http://purl.org/linked-data/cube#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?area ?period ?sex ?life_expectancy
WHERE {
?observation a qb:Observation ;
qb:dataSet <http://data.gov.uk/dataset/life-expectancy-by-region-sex-and-time/datacube> ;
<http://data.gov.uk/dataset/life-expectancy-by-region-sex-and-time/dimension/area> [ rdfs:label ?area ] ;
<http://data.gov.uk/dataset/life-expectancy-by-region-sex-and-time/dimension/period> [ rdfs:label ?period ];
<http://data.gov.uk/dataset/life-expectancy-by-region-sex-and-time/dimension/sex> [ rdfs:label ?sex ];
<http://data.gov.uk/dataset/life-expectancy-by-region-sex-and-time/measure/life-expectancy> ?life_expectancy ;
}
Running this new query returns a tabular view which is preferable for analysis and charting.
docker run -it -v $PWD:/workspace -w /workspace stain/jena \
arq \
--data files/observation.ttl \
--query files/qb-002.rq
------------------------------------------------------------
| area | period | sex | life_expectancy |
============================================================
| "Newport"@en | "2004-2006" | "Male"@en | 76.7 |
------------------------------------------------------------
Equivalently, we can use SPARQL property paths.
# qb-003.rq
PREFIX qb: <http://purl.org/linked-data/cube#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?area ?period ?sex ?life_expectancy
WHERE {
?observation a qb:Observation ;
qb:dataSet <http://data.gov.uk/dataset/life-expectancy-by-region-sex-and-time/datacube> ;
<http://data.gov.uk/dataset/life-expectancy-by-region-sex-and-time/dimension/area>/rdfs:label ?area ;
<http://data.gov.uk/dataset/life-expectancy-by-region-sex-and-time/dimension/period>/rdfs:label ?period ;
<http://data.gov.uk/dataset/life-expectancy-by-region-sex-and-time/dimension/sex>/rdfs:label ?sex ;
<http://data.gov.uk/dataset/life-expectancy-by-region-sex-and-time/measure/life-expectancy> ?life_expectancy ;
}
The following RDF has an example of a book with a title in serveral languages.
# harry-potter.ttl
@prefix ex: <http://example.org#> .
ex:Q102438 a ex:Book ;
ex:name "Harry Potter and the Philosopher's Stone"@en,
"Harri Potter a Maen yr Athronydd"@cy,
"Harry Potter à l'école des sorciers"@fr,
"Harry Potter und der Stein der Weisen"@de,
"Harry Potter y la piedra filosofal"@es ;
.
The following SPARQL query asks for names of the books in the data. As our book has names in multiple languages, we will get back multiple results.
# lang-001.rq
PREFIX ex: <http://example.org#>
SELECT ?book_uri ?title
WHERE {
?book_uri a ex:Book ;
ex:name ?title ;
.
}
--------------------------------------------------------------------------------
| book_uri | title |
================================================================================
| ex:Q102438 | "Harry Potter and the Philosopher's Stone"@en |
| ex:Q102438 | "Harry Potter y la piedra filosofal"@es |
| ex:Q102438 | "Harri Potter a Maen yr Athronydd"@cy |
| ex:Q102438 | "Harry Potter à l'école des sorciers"@fr |
| ex:Q102438 | "Harry Potter und der Stein der Weisen"@de |
--------------------------------------------------------------------------------
This can be confusing at times because we now have multiple rows for a single book. To limit the language of the strings returned, we can use the FILTER
keyword inside our WHERE
statement. The following asks for the title in Welsh (cy
) only.
# lang-002.rq
PREFIX ex: <http://example.org#>
SELECT ?book_uri ?title
WHERE {
?book_uri a ex:Book ;
ex:name ?title ;
.
FILTER(LANG(?title) = "cy") .
}
docker run -it -v $PWD:/workspace -w /workspace stain/jena \
arq --data files/harry-potter.ttl --query files/lang-002.rq
--------------------------------------------------------------------------------
| book_uri | title |
================================================================================
| ex:Q102438 | "Harri Potter a Maen yr Athronydd"@cy |
--------------------------------------------------------------------------------
Logical expressions can be joined together within the FILTER
statement. A double ampersand &&
is a logical AND
, whereas a double vertical bar ||
is a logical OR
.
# lang-003.rq
PREFIX ex: <http://example.org#>
SELECT ?book_uri ?title
WHERE {
?book_uri a ex:Book ;
ex:name ?title ;
.
FILTER(
LANG(?title) = "cy" || LANG(?title) = "en"
) .
}
docker run -it -v $PWD:/workspace -w /workspace stain/jena \
arq --data files/harry-potter.ttl --query files/lang-003.rq
--------------------------------------------------------------------------------
| book_uri | title |
================================================================================
| ex:Q102438 | "Harry Potter and the Philosopher's Stone"@en |
| ex:Q102438 | "Harri Potter a Maen yr Athronydd"@cy |
--------------------------------------------------------------------------------