SPARQL

SPARQL is a query language for RDF.

Resources

We have a copy of the Learning Sparql book. Reach out to the team and someone will let you borrow it.
SPARQL in 11 minutes is a youtube video with a quick walkthrough of the language. Its examples can be downloaded from the Learning Sparql source code section.
Wikidata has a SPARQL endpoint and a list of example queries.
The SPARQL 1.1 Query Language has a comprehensive description of the language which is helpful reference once you've gotten some experience.

These resources will cover much more than we do here.

Tools

The examples in this wiki are included as files which can be retrieved by cloning the wiki's repo.

git clone https://github.com/GSS-Cogs/knowledge.wiki.git

`rdflib`

rdflib is a python library for interacting with RDF. It has functionality for running SPARQL queries against a datasource.

You'll have to have rdflib and pandas installed:

pip3 install pandas
pip3 install rdflib

The g.parse(...) command can be used to load RDF from a local file (by passing a filepath) or a URL. Once the RDF has been loaded, the g.query(...) command can be used to execute a SPARQL query and store the results. Those results can then be convered to a pandas dataframe.

#%%
import pandas as pd
import rdflib

g = rdflib.Graph()

g.parse("http://www.learningsparql.com/2ndeditionexamples/ex002.ttl")

results = g.query("""
# filename: ex003.rq

PREFIX ab: <http://learningsparql.com/ns/addressbook#> 

SELECT ?craigEmail
WHERE
{ ab:craig ab:email ?craigEmail . }

""")

pd.DataFrame(results.bindings)

You can run this interactively in a notebook or using the VS Code python interactive window.

`arq`

Apache Jena has a command line untility, arq, which can be used to perform SPARQL queries. We can specify local turtle files and sparql files through the --data and --query arguments, as a way to practice.

We recommend having docker installed, then from a terminal (terminal app, VS Code) you can run the following (using some examples from the Learning Sparql website):

docker run -it -v $PWD:/workspace -w /workspace stain/jena \
arq \
--data http://www.learningsparql.com/2ndeditionexamples/ex002.ttl \
--query http://www.learningsparql.com/2ndeditionexamples/ex003.rq

arq will output a nicely formatted table of results.

--------------------------------
| craigEmail                   |
================================
| "c.ellis@usairwaysgroup.com" |
| "craigellis@yahoo.com"       |
--------------------------------

Once you've done that, you can run SPARQL queries by doing something like:

docker run -it -v $PWD:/workspace -w /workspace stain/jena \
arq \
--data files/observation.ttl \
--query files/ex-001.rq

IDS-D SPARQL endpoint

Additionally, we have our own SPARQL endpoint with a query editor at https://beta.gss-data.org.uk/tools/sparql which can run SPARQL queries against the RDF we create.

We can query against this endpoint using rdflib by including a SERVICE keyword and the URL of the endpoint (<https://beta.gss-data.org.uk/sparql>), or alternatively by using the curl command line utility:

curl https://beta.gss-data.org.uk/sparql \
--data """query=
    SELECT *
    WHERE {
        ?s ?p ?o .
    }
    LIMIT 10
"""

Exploring RDF via SPARQL

Some sites make their data available via a SPARQL endpoint. With a SPARQL endpoint, you don't have Turtle files to read the RDF structure - but you can discover the data through SPARQL queries.

The following query is a good starting point for exploration. It defines three variables, ?s, ?p and ?o, and will return all triples which match the triple pattern ?s ?p ?o. Since all triples match that pattern, you'll get a list of every triple.

The LIMIT keyword limits the number of results you get (to get quicker results during exploration). Running this query against some data will return 10 triples.

# ex-001.rq

SELECT *
WHERE {
    ?s ?p ?o
}
LIMIT 10

To continue exploring, it can be helpful to get a list of predicates which are used.

# ex-002.rq

SELECT DISTINCT ?p
WHERE {
    ?s ?p ?o
}
LIMIT 10

It can also be helpful to get a list of resource classes within the RDF.

# ex-003.rq

SELECT DISTINCT ?class
WHERE {
    ?s a ?class
}
LIMIT 10

Bringing this together, imagine we discovered http://www.w3.org/ns/dcat#Dataset was a class within the data, we could then ask for all resources which of that class, and the predicates and objects connected to resources of that class.

This could help us discover what information about datasets is available in the SPARQL endpoint.

SELECT *
WHERE {
    ?dataset a <http://www.w3.org/ns/dcat#Dataset> .
    ?dataset ?p ?o .
}
LIMIT 10

Quick example queries

We'll show some quick example queries against the following dataset. The dataset contains three people, some personal details and their relationships to one another.

# people.ttl

@prefix ex: <http://example.org/> .
@prefix schema: <http://schema.org/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

ex:Alice a schema:Person ;
    schema:name "Alice"@en ;
    schema:knows ex:Bob, ex:Charlie ;
    schema:gender ex:Female ;
    schema:weight 80 ;
    schema:birthDate "1990-01-01"^^xsd:date ;
    schema:telephone "07712345678" ;
    .

ex:Bob a schema:Person ;
    schema:name "Bob"@en ;
    schema:knows ex:Alice, ex:Charlie ;
    schema:gender ex:Male ;
    schema:weight 90 ;
    schema:birthDate "1994-01-01"^^xsd:date ;
    .

ex:Charlie a schema:Person ;
    schema:name "Charlie"@en ;
    schema:knows ex:Bob ;
    schema:gender ex:Male ;
    schema:weight 100 ;
    schema:birthDate "1998-01-01"^^xsd:date ;
    .

Selecting

We can select the identifiers, names, genders and weights of each of the people in our dataset.

Similarly to Turtle you may include the PREFIX keyword at the beginning of a SPARQL query to shorten IRIs. Note the slight difference in syntax - there is no preceding @ and no trailing dot .

# people-001.rq

PREFIX schema: <http://schema.org/>

SELECT *
WHERE {
    ?person a schema:Person ;
        schema:name ?name ;
        schema:gender ?gender ;
        schema:weight ?weight ;
        .
}

docker run -it -v $PWD:/workspace -w /workspace stain/jena \
arq \
--data files/people.ttl \
--query files/people-001.rq

--------------------------------------------------------------------------------------
| person                       | name         | gender                      | weight |
======================================================================================
| <http://example.org/Charlie> | "Charlie"@en | <http://example.org/Male>   | 100    |
| <http://example.org/Bob>     | "Bob"@en     | <http://example.org/Male>   | 90     |
| <http://example.org/Alice>   | "Alice"@en   | <http://example.org/Female> | 80     |
--------------------------------------------------------------------------------------

You can select a subset of variables by replacing the * in the SELECT statement to a list of variables separated by spaces e.g. SELECT ?name ?gender ?weight.

Limiting

You can limit the number of results returned by a query by using the LIMIT keyword, which appears at the end of a query. This is helpful during exploratory queries, to only return a small number of results rather than returning everything.

# people-002.rq

PREFIX schema: <http://schema.org/>

SELECT *
WHERE {
    ?person a schema:Person ;
        schema:name ?name ;
        schema:gender ?gender ;
        schema:weight ?weight ;
        .
}
LIMIT 2

docker run -it -v $PWD:/workspace -w /workspace stain/jena \
arq \
--data files/people.ttl \
--query files/people-002.rq

------------------------------------------------------------------------------------
| person                       | name         | gender                    | weight |
====================================================================================
| ex:Charlie                   | "Charlie"@en | ex:Male                   | 100    |
| ex:Bob                       | "Bob"@en     | ex:Male                   | 90     |
------------------------------------------------------------------------------------

Ordering

You can order the results of a query by using the ORDER BY keyword. It will sort things alphabetically or numerically depending on the datatype of the variable.

# people-003.rq

PREFIX schema: <http://schema.org/>

SELECT *
WHERE {
    ?person a schema:Person ;
        schema:name ?name ;
        schema:gender ?gender ;
        schema:weight ?weight ;
        .
}
ORDER BY ?name

docker run -it -v $PWD:/workspace -w /workspace stain/jena \
arq \
--data files/people.ttl \
--query files/people-003.rq

--------------------------------------------------------------------------------------
| person                       | name         | gender                      | weight |
======================================================================================
| <http://example.org/Alice>   | "Alice"@en   | <http://example.org/Female> | 80     |
| <http://example.org/Bob>     | "Bob"@en     | <http://example.org/Male>   | 90     |
| <http://example.org/Charlie> | "Charlie"@en | <http://example.org/Male>   | 100    |
--------------------------------------------------------------------------------------

You can order in decending order by using the DESC function.

# people-004.rq

PREFIX ex: <http://example.org/>
PREFIX schema: <http://schema.org/>

SELECT *
WHERE {
    ?person a schema:Person ;
        schema:name ?name ;
        schema:gender ?gender ;
        schema:weight ?weight ;
        .
}
ORDER BY DESC(?weight)

docker run -it -v $PWD:/workspace -w /workspace stain/jena \
arq \
--data files/people.ttl \
--query files/people-004.rq

--------------------------------------------------------------------------------------
| person                       | name         | gender                      | weight |
======================================================================================
| ex:Charlie                   | "Charlie"@en | ex:Male                     | 100    |
| ex:Bob                       | "Bob"@en     | ex:Male                     | 90     |
| ex:Alice                     | "Alice"@en   | ex:Female                   | 80     |
--------------------------------------------------------------------------------------

Functions

New variables can be created by using the BIND(... AS ?x) syntax. SPARQL has many functions for string manipulation and arithmetic. The following example uses the birthDate and the NOW() function to compute an age for each individual.

# people-005.rq

PREFIX schema: <http://schema.org/>

SELECT ?person ?age
WHERE {
    ?person a schema:Person ;
        schema:birthDate ?birth_date .

    BIND(
        (YEAR(NOW()) - YEAR(?birth_date)) - 
        IF(
            (MONTH(NOW())<MONTH(?birth_date) || 
            (MONTH(NOW())=MONTH(?birth_date) && DAY(NOW())<DAY(?birth_date))),
            1, 0
        )
    AS ?age)
}

docker run -it -v $PWD:/workspace -w /workspace stain/jena \
arq \
--data files/people.ttl \
--query files/people-005.rq

--------------------------------------
| person                       | age |
======================================
| <http://example.org/Charlie> | 24  |
| <http://example.org/Bob>     | 28  |
| <http://example.org/Alice>   | 32  |
--------------------------------------

Grouping and summarising

We can perform operations over groups, similar to SQL.

# people-006.rq

PREFIX schema: <http://schema.org/>

SELECT ?gender (AVG(?weight) as ?average_weight)
WHERE {
    ?person a schema:Person ;
        schema:name ?name ;
        schema:gender ?gender ;
        schema:weight ?weight ;
        .
}
GROUP BY ?gender

docker run -it -v $PWD:/workspace -w /workspace stain/jena \
arq \
--data files/people.ttl \
--query files/people-006.rq

------------------------------------------------
| gender                      | average_weight |
================================================
| <http://example.org/Male>   | 95.0           |
| <http://example.org/Female> | 80.0           |
------------------------------------------------

Filtering

We can perform operations over groups, similar to SQL. Many logical statments can be included in filters. && means AND, || means OR.

# people-007.rq

PREFIX schema: <http://schema.org/>
PREFIX ex: <http://example.org/>

SELECT *
WHERE {
    ?person a schema:Person ;
        schema:name ?name ;
        schema:gender ?gender ;
        schema:weight ?weight ;
        .

    FILTER (
        ?weight >= 60 && ?gender = ex:Male
    )
}

docker run -it -v $PWD:/workspace -w /workspace stain/jena \
arq \
--data files/people.ttl \
--query files/people-007.rq

------------------------------------------------------------------------------------
| person                       | name         | gender                    | weight |
====================================================================================
| ex:Charlie                   | "Charlie"@en | ex:Male                   | 100    |
| ex:Bob                       | "Bob"@en     | ex:Male                   | 90     |
------------------------------------------------------------------------------------

We can filter for existence of triples by using FILTER EXISTS {} and FILTER NOT EXISTS {} keywords.

The following returns people who know Charlie:

# people-008.rq

PREFIX schema: <http://schema.org/>
PREFIX ex: <http://example.org/>

SELECT *
WHERE {
    ?person a schema:Person ;
        schema:name ?name ;
        schema:gender ?gender ;
        schema:weight ?weight ;
        .

    FILTER EXISTS {
        ?person schema:knows ex:Charlie .
    }
}

docker run -it -v $PWD:/workspace -w /workspace stain/jena \
arq \
--data files/people.ttl \
--query files/people-008.rq

----------------------------------------------------------------------------------
| person                     | name       | gender                      | weight |
==================================================================================
| ex:Bob                     | "Bob"@en   | ex:Male                     | 90     |
| ex:Alice                   | "Alice"@en | ex:Female                   | 80     |
----------------------------------------------------------------------------------

Whereas the following returns people who don't know Charlie:

# people-009.rq

PREFIX schema: <http://schema.org/>
PREFIX ex: <http://example.org/>

SELECT *
WHERE {
    ?person a schema:Person ;
        schema:name ?name ;
        schema:gender ?gender ;
        schema:weight ?weight ;
        .

    FILTER NOT EXISTS {
        ?person schema:knows ex:Charlie .
    }
}

docker run -it -v $PWD:/workspace -w /workspace stain/jena \
arq \
--data files/people.ttl \
--query files/people-009.rq

------------------------------------------------------------------------------------
| person                       | name         | gender                    | weight |
====================================================================================
| ex:Charlie                   | "Charlie"@en | ex:Male                   | 100    |
------------------------------------------------------------------------------------

Optional selections

We can notice that Alice has an entry for schema:telephone but the other people don't. Our queries will only return results where all triple patterns are present, so the following query will only return an entry for Alice.

# people-010.rq

PREFIX schema: <http://schema.org/>

SELECT *
WHERE {
    ?person a schema:Person ;
        schema:name ?name ;
        schema:telephone ?telephone_number ;
        .
}

docker run -it -v $PWD:/workspace -w /workspace stain/jena \
arq \
--data files/people.ttl \
--query files/people-010.rq

--------------------------------------------------------------
| person                     | name       | telephone_number |
==============================================================
| <http://example.org/Alice> | "Alice"@en | "07712345678"    |
--------------------------------------------------------------

If a triple might not be present and we wish to return those results, we can wrap it in the OPTIONAL {} keyword.

# people-011.rq

PREFIX schema: <http://schema.org/>

SELECT *
WHERE {
    ?person a schema:Person ;
        schema:name ?name ;
        .
    
    OPTIONAL { 
        ?person schema:telephone ?telephone_number . 
    }
}

docker run -it -v $PWD:/workspace -w /workspace stain/jena \
arq \ 
--data files/people.ttl \
--query files/people-011.rq

------------------------------------------------------------------
| person                       | name         | telephone_number |
==================================================================
| <http://example.org/Charlie> | "Charlie"@en |                  |
| <http://example.org/Bob>     | "Bob"@en     |                  |
| <http://example.org/Alice>   | "Alice"@en   | "07712345678"    |
------------------------------------------------------------------

Property paths

Property paths give a convenient method for traversing a graph's properties without having to create intermediate variables. The following two queries essentially do the same thing, but the first creates an additional surplus variable whereas the second uses the / notation between properties to express a property path.

# people-012.rq

PREFIX ex: <http://example.org/>
PREFIX schema: <http://schema.org/>

SELECT *
WHERE {
    # Who does Alice know?
    ex:Alice schema:knows ?person .

    # What are the names of people that Alice knows?
    ?person schema:name ?name .
}

docker run -it -v $PWD:/workspace -w /workspace stain/jena \
arq \
--data files/people.ttl \
--query files/people-012.rq

-----------------------------------------------
| person                       | name         |
===============================================
| ex:Charlie                   | "Charlie"@en |
| ex:Bob                       | "Bob"@en     |
-----------------------------------------------

The method using property paths only returns one column, as only one variable is defined.

# people-013.rq

PREFIX ex: <http://example.org/>
PREFIX schema: <http://schema.org/>

SELECT *
WHERE {
    # What are the names of people that Alice knows?
    ex:Alice schema:knows / schema:name ?name .
}

docker run -it -v $PWD:/workspace -w /workspace stain/jena \
arq \
--data files/people.ttl \
--query files/people-013.rq

----------------
| name         |
================
| "Charlie"@en |
| "Bob"@en     |
----------------

Using SPARQL with RDF data cubes

A row in a dataframe could look as follows:

area	period	sex	life_expectancy
W06000022	2004-01-01T00:00:00/P3Y	Male	76.7

In RDF, we represent the entire row as a qb:Observation, with additional triples formed from each of the columns. Each columns behaves as a predicate and each of the cell items ("Newport", "Male" etc.) is turned into an object (see Datacubes).

The equivalent data, represented in RDF could look as follows:

@prefix qb: <http://purl.org/linked-data/cube#> .

<http://data.gov.uk/dataset/life-expectancy-by-region-sex-and-time/datacube/obs/W06000022-2004-01-01T00:00:00/P3Y-Male>
    a qb:Observation ;
    <http://data.gov.uk/dataset/life-expectancy-by-region-sex-and-time/dimension/area>
        <http://statistics.data.gov.uk/id/statistical-geography/W06000022> ;
    <http://data.gov.uk/dataset/life-expectancy-by-region-sex-and-time/dimension/period>
        <http://reference.data.gov.uk/id/gregorian-interval/2004-01-01T00:00:00/P3Y> ;
    <http://data.gov.uk/dataset/life-expectancy-by-region-sex-and-time/dimension/sex>
        <http://data.gov.uk/dataset/life-expectancy-by-region-sex-and-time/codelist/sex/code/Male> ;
    <http://data.gov.uk/dataset/life-expectancy-by-region-sex-and-time/measure/life-expectancy>
        76.7 ;
    qb:dataSet <http://data.gov.uk/dataset/life-expectancy-by-region-sex-and-time/datacube> ;
  .

Given an RDF representation of a data cube, we can retrieve a tabular view of the data using SPARQL by specifying qb:dataSet and the URI of the RDF data cube (qb:DataSet) and listing each of the components of the cube as predicates.

# qb-001.rq

PREFIX qb: <http://purl.org/linked-data/cube#>

SELECT *
WHERE {
    ?observation a qb:Observation ;
        qb:dataSet <http://data.gov.uk/dataset/life-expectancy-by-region-sex-and-time/datacube> ;
        <http://data.gov.uk/dataset/life-expectancy-by-region-sex-and-time/dimension/area> ?area ;
        <http://data.gov.uk/dataset/life-expectancy-by-region-sex-and-time/dimension/period> ?period ;
        <http://data.gov.uk/dataset/life-expectancy-by-region-sex-and-time/dimension/sex> ?sex ;
        <http://data.gov.uk/dataset/life-expectancy-by-region-sex-and-time/measure/life-expectancy> ?life_expectancy ;
}

The result of this query would return all of the object URIs in a table.

docker run -it -v $PWD:/workspace -w /workspace stain/jena \
arq \
--data files/observation.ttl \
--query files/qb-001.rq

Asking for labels, not URIs

Often, when dealing with a tabular view of the data we don't want URIs, but instead the labels of those resources. We can solve this issue with blank nodes. Note how the query has changed.

# qb-002.rq

PREFIX qb: <http://purl.org/linked-data/cube#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT ?area ?period ?sex ?life_expectancy
WHERE {
    ?observation a qb:Observation ;
        qb:dataSet <http://data.gov.uk/dataset/life-expectancy-by-region-sex-and-time/datacube> ;
        <http://data.gov.uk/dataset/life-expectancy-by-region-sex-and-time/dimension/area> [ rdfs:label ?area ] ;
        <http://data.gov.uk/dataset/life-expectancy-by-region-sex-and-time/dimension/period> [ rdfs:label ?period ];
        <http://data.gov.uk/dataset/life-expectancy-by-region-sex-and-time/dimension/sex> [ rdfs:label ?sex ];
        <http://data.gov.uk/dataset/life-expectancy-by-region-sex-and-time/measure/life-expectancy> ?life_expectancy ;
}

Running this new query returns a tabular view which is preferable for analysis and charting.

docker run -it -v $PWD:/workspace -w /workspace stain/jena \
arq \
--data files/observation.ttl \
--query files/qb-002.rq

------------------------------------------------------------
| area         | period      | sex       | life_expectancy |
============================================================
| "Newport"@en | "2004-2006" | "Male"@en | 76.7            |
------------------------------------------------------------

Equivalently, we can use SPARQL property paths.

# qb-003.rq

PREFIX qb: <http://purl.org/linked-data/cube#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT ?area ?period ?sex ?life_expectancy
WHERE {
    ?observation a qb:Observation ;
        qb:dataSet <http://data.gov.uk/dataset/life-expectancy-by-region-sex-and-time/datacube> ;
        <http://data.gov.uk/dataset/life-expectancy-by-region-sex-and-time/dimension/area>/rdfs:label ?area ;
        <http://data.gov.uk/dataset/life-expectancy-by-region-sex-and-time/dimension/period>/rdfs:label ?period ;
        <http://data.gov.uk/dataset/life-expectancy-by-region-sex-and-time/dimension/sex>/rdfs:label ?sex ;
        <http://data.gov.uk/dataset/life-expectancy-by-region-sex-and-time/measure/life-expectancy> ?life_expectancy ;
}

Dealing with language strings

The following RDF has an example of a book with a title in serveral languages.

# harry-potter.ttl

@prefix ex: <http://example.org#> .

ex:Q102438 a ex:Book ;
    ex:name "Harry Potter and the Philosopher's Stone"@en,
        "Harri Potter a Maen yr Athronydd"@cy,
        "Harry Potter à l'école des sorciers"@fr,
        "Harry Potter und der Stein der Weisen"@de,
        "Harry Potter y la piedra filosofal"@es ;
    .

The following SPARQL query asks for names of the books in the data. As our book has names in multiple languages, we will get back multiple results.

# lang-001.rq

PREFIX ex: <http://example.org#>

SELECT ?book_uri ?title
WHERE {
    ?book_uri a ex:Book ;
        ex:name ?title ;
        .
}

--------------------------------------------------------------------------------
| book_uri                     | title                                         |
================================================================================
| ex:Q102438                   | "Harry Potter and the Philosopher's Stone"@en |
| ex:Q102438                   | "Harry Potter y la piedra filosofal"@es       |
| ex:Q102438                   | "Harri Potter a Maen yr Athronydd"@cy         |
| ex:Q102438                   | "Harry Potter à l'école des sorciers"@fr      |
| ex:Q102438                   | "Harry Potter und der Stein der Weisen"@de    |
--------------------------------------------------------------------------------

This can be confusing at times because we now have multiple rows for a single book. To limit the language of the strings returned, we can use the FILTER keyword inside our WHERE statement. The following asks for the title in Welsh (cy) only.

# lang-002.rq

PREFIX ex: <http://example.org#>

SELECT ?book_uri ?title
WHERE {
    ?book_uri a ex:Book ;
        ex:name ?title ;
        .

    FILTER(LANG(?title) = "cy") .
}

docker run -it -v $PWD:/workspace -w /workspace stain/jena \
arq --data files/harry-potter.ttl --query files/lang-002.rq

--------------------------------------------------------------------------------
| book_uri                     | title                                         |
================================================================================
| ex:Q102438                   | "Harri Potter a Maen yr Athronydd"@cy         |
--------------------------------------------------------------------------------

Logical expressions can be joined together within the FILTER statement. A double ampersand && is a logical AND, whereas a double vertical bar || is a logical OR.

# lang-003.rq

PREFIX ex: <http://example.org#>

SELECT ?book_uri ?title
WHERE {
    ?book_uri a ex:Book ;
        ex:name ?title ;
        .

    FILTER(
        LANG(?title) = "cy" || LANG(?title) = "en" 
    ) .
}

docker run -it -v $PWD:/workspace -w /workspace stain/jena \
arq --data files/harry-potter.ttl --query files/lang-003.rq

--------------------------------------------------------------------------------
| book_uri                     | title                                         |
================================================================================
| ex:Q102438                   | "Harry Potter and the Philosopher's Stone"@en |
| ex:Q102438                   | "Harri Potter a Maen yr Athronydd"@cy         |
--------------------------------------------------------------------------------

Provide feedback

Saved searches

Use saved searches to filter your results more quickly