Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: proposed Dataset API changes #3060

Draft
wants to merge 9 commits into
base: 8.x
Choose a base branch
from
1 change: 0 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,6 @@ Merged PRs:
* 2024-10-23 - build(deps-dev): bump ruff from 0.6.9 to 0.7.0
[PR #2942](https://github.com/RDFLib/rdflib/pull/2942)


## 2024-10-17 RELEASE 7.1.0

This minor release incorporates just over 100 substantive PRs - interesting
Expand Down
158 changes: 158 additions & 0 deletions dataset_api.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,158 @@
Incorporate the changes proposed from Martynas, with the exception of graphs(), which would now return a dictionary of graph names (URIRef or BNode) to Graph objects (as the graph's identifier would be removed).

```
add add_named_graph(name: IdentifiedNode, graph: Graph) method
add has_named_graph(name: IdentifiedNode) method
add remove_named_graph(name: IdentifiedNode) method
add replace_named_graph(name: IdentifiedNode, graph: Graph)) method
add graphs() method as an alias for contexts()
add default_graph property as an alias for default_context
add get_named_graph as an alias for get_graph
deprecate graph(graph) method
deprecate remove_graph(graph) method
deprecate contexts() method
Using IdentifiedNode as a super-interface for URIRef and BNode (since both are allowed as graph names in RDF 1.1).
```

Make the following enhancements to the triples, quads, and subject/predicate/object APIs.

Major changes:
P1. Remove `default_union` attribute and make the Dataset inclusive.
P2. Remove the Default Graph URI ("urn:x-rdflib:default").
P3. Remove Graph class's "identifier" attribute to align with the W3C spec, impacting Dataset methods which use the Graph class.
P4. Make the graphs() method of Dataset return a dictionary of named graph names to Graph objects.
Enhancements:
P5. Support passing of iterables of Terms to triples, quads, and related methods, similar to the triples_choices method.
P6. Default the triples method to iterate with `(None, None, None)`

With all of the above changes, including those changes proposed by Martynas, here are some examples:

```python
from rdflib import Dataset, Graph, URIRef, Literal
from rdflib.namespace import RDFS

# ============================================
# Adding Data to the Dataset
# ============================================

# Initialize the dataset
d = Dataset()

# Add a single triple to the Default Graph, and a single triple to a Named Graph
g1 = Graph()
g1.add(
(
URIRef("http://example.com/subject-a"),
URIRef("http://example.com/predicate-a"),
Literal("Triple A")
)
)
# merge with the default graph
d.default_graph += g1
# or set the default graph
d.default_graph = g1

# Add a Graph to a Named Graph in the Dataset.
g2 = Graph()
g2.add(
(
URIRef("http://example.com/subject-b"),
URIRef("http://example.com/predicate-b"),
Literal("Triple B")
)
)
d.add_named_graph(name=URIRef("http://example.com/graph-B"), g2)

# ============================================
# Iterate over the entire Dataset returning triples
# ============================================

for triple in d.triples():
print(triple)
# Output:
(rdflib.term.URIRef('http://example.com/subject-a'), rdflib.term.URIRef('http://example.com/predicate-a'), rdflib.term.Literal('Triple A'))
(rdflib.term.URIRef('http://example.com/subject-b'), rdflib.term.URIRef('http://example.com/predicate-b'), rdflib.term.Literal('Triple B'))

# ============================================
# Iterate over the entire Dataset returning quads
# ============================================

for quad in d.quads():
print(quad)
# Output:
(rdflib.term.URIRef('http://example.com/subject-a'), rdflib.term.URIRef('http://example.com/predicate-a'), rdflib.term.Literal('Triple A'), None)
(rdflib.term.URIRef('http://example.com/subject-b'), rdflib.term.URIRef('http://example.com/predicate-b'), rdflib.term.Literal('Triple B'), rdflib.term.URIRef('http://example.com/graph-B'))

# ============================================
# Get the Default graph
# ============================================

dg = d.default_graph # same as current default_context

# ============================================
# Iterate on triples in the Default Graph only
# ============================================

for triple in d.triples(graph="default"):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I question the usefulness of this. Why not simply:

d.default_graph.triples()

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Providing "default_graph" as a convenience necessarily means there will be more than one way to iterate over the triples. There's no functional change from the current classes here, just name changes, you can already Dataset.triples(context=) and you can also Dataset.default_context.triples()

print(triple)
# Output:
(rdflib.term.URIRef('http://example.com/subject-a'), rdflib.term.URIRef('http://example.com/predicate-a'), rdflib.term.Literal('Triple A'))

# ============================================
# Access quads in Named Graphs only
# ============================================

for quad in d.quads(graph="named"):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be equivalent to simply d.quads()? Since the default graph does not produce quads.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or is the graph element of the default graph None?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes the proposal is to have the "graph" of triples in the default graph set to None.

print(quad)
# Output:
(rdflib.term.URIRef('http://example.com/subject-b'), rdflib.term.URIRef('http://example.com/predicate-b'), rdflib.term.Literal('Triple B'), rdflib.term.URIRef('http://example.com/graph-B'))

# ============================================
# Equivalent to iterating over graphs()
# ============================================

for ng_name, ng_object in d.graphs().items():
for quad in d.quads(graph=ng_name):
print(quad)
# Output:
(rdflib.term.URIRef('http://example.com/subject-b'), rdflib.term.URIRef('http://example.com/predicate-b'), rdflib.term.Literal('Triple B'), rdflib.term.URIRef('http://example.com/graph-B'))

# ============================================
# Access triples in the Default Graph and specified Named Graphs.
# ============================================

for triple in d.triples(graph=["default", URIRef("http://example.com/graph-B")]):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

d.triples() doesn't really make sense? There should be Graph.triples() and Dataset.quads() only?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm comfortable with it - SPARQL queries in triplestores where named graphs are used frequently omit the graph, only having basic graph patterns, and we understand this to be across all graphs?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Union graph is an extension feature though, not a feature of an RDF dataset.

print(triple)
# Output:
(rdflib.term.URIRef('http://example.com/subject-a'), rdflib.term.URIRef('http://example.com/predicate-a'), rdflib.term.Literal('Triple A'))
(rdflib.term.URIRef('http://example.com/subject-b'), rdflib.term.URIRef('http://example.com/predicate-b'), rdflib.term.Literal('Triple B'))

# ============================================
# Access quads in the Default Graph and specified Named Graphs.
# ============================================

for quad in d.quads(graph=["default", URIRef("http://example.com/graph-B")]):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for quad in (q for q in d.quads() if q[3] in (None, URIRef("http://example.com/graph-B"))): 

not much longer really.

Copy link
Contributor Author

@recalcitrantsupplant recalcitrantsupplant Feb 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I think this is the point to get a broader consensus on. The way I see it, if including the graph parameter:
Pros:

  • can restrict "named", "default" enums to only be used in the graph= attribute, and not in the quads methods.
  • can separate concerns a bit better, similar to how dataset clauses are used in SPARQL. E.g. set up an instance with graph= to restrict the scope to certain named graphs, then at runtime graphs can be passed in using quads
  • provides a convenience/clean interface for what is a common pattern (for me at least!)

Cons:

  • two ways to do the same thing, as you've pointed out.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'm too used to Jena where Dataset is used via getDefaultModel and getNamedModel, but I don't really see myself needing the new parameters 🤷‍♂️

print(quad)
# Output:
(rdflib.term.URIRef('http://example.com/subject-a'), rdflib.term.URIRef('http://example.com/predicate-a'), rdflib.term.Literal('Triple A'), None)
(rdflib.term.URIRef('http://example.com/subject-b'), rdflib.term.URIRef('http://example.com/predicate-b'), rdflib.term.Literal('Triple B'), rdflib.term.URIRef('http://example.com/graph-B'))

# ============================================
# "Slice" the dataset on specified predicates. Same can be done on subjects, objects, graphs
# ============================================

filter_preds = [URIRef("http://example.com/predicate-a"), RDFS.label]
for quad in d.quads((None, filter_preds, None, None)):
print(quad)
# Output:
(rdflib.term.URIRef('http://example.com/subject-a'), rdflib.term.URIRef('http://example.com/predicate-a'), rdflib.term.Literal('Triple A'), None)

# ============================================
# Serialize the Dataset in a quads format.
# ============================================

print(d.serialize(format="nquads"))
# Output:
<http://example.com/subject-a> <http://example.com/predicate-a> "Triple A" .
<http://example.com/subject-b> <http://example.com/predicate-b> "Triple B" <http://example.com/graph-B> .
```
14 changes: 11 additions & 3 deletions docs/apidocs/examples.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,18 @@ examples Package

These examples all live in ``./examples`` in the source-distribution of RDFLib.

:mod:`~examples.conjunctive_graphs` Module
------------------------------------------
:mod:`~examples.datasets` Module
--------------------------------

.. automodule:: examples.datasets
:members:
:undoc-members:
:show-inheritance:

:mod:`~examples.jsonld_serialization` Module
--------------------------------------------

.. automodule:: examples.conjunctive_graphs
.. automodule:: examples.jsonld_serialization
:members:
:undoc-members:
:show-inheritance:
Expand Down
104 changes: 80 additions & 24 deletions docs/developers.rst
Original file line number Diff line number Diff line change
Expand Up @@ -434,6 +434,8 @@ flag them as expecting to fail.
Compatibility
-------------

RDFLib 8.x is likely to support only the Python versions in bugfix status at the time of its release, so perhaps 3.12+.

RDFlib 7.0.0 release and later only support Python 3.8.1 and newer.

RDFlib 6.0.0 release and later only support Python 3.7 and newer.
Expand All @@ -443,22 +445,46 @@ RDFLib 5.0.0 maintained compatibility with Python versions 2.7, 3.4, 3.5, 3.6, 3
Releasing
---------

These are the major steps for releasing new versions of RDFLib:

#. Create a pre-release PR

* that updates all the version numbers
* merge it with all tests passing

#. Do the PyPI release
#. Do the GitHub release
#. Create a post-release PR

* that updates all version numbers to next (alpha) release
* merge it with all tests passing

#. Let the world know


1. Create a pre-release PR
~~~~~~~~~~~~~~~~~~~~~~~~~~

Create a release-preparation pull request with the following changes:

* Updated version and date in ``CITATION.cff``.
* Updated copyright year in the ``LICENSE`` file.
* Updated copyright year in the ``docs/conf.py`` file.
* Updated main branch version and current version in the ``README.md`` file.
* Updated version in the ``pyproject.toml`` file.
* Updated ``__date__`` in the ``rdflib/__init__.py`` file.
* Accurate ``CHANGELOG.md`` entry for the release.
#. In ``pyproject.toml``, update the version number
#. In ``README.md``, update the *Versions & Releases* section
#. In ``rdflib/__init__.py``, update the ``__date__`` value
#. In ``docs/conf.py``, update copyright year
#. In ``CITATION.cff``, update the version and date
#. In ``LICENSE``, update the copyright year
#. In ``CHANGELOG.md``, write an entry for this release
#. You can use the tool ``admin/get_merged_prs.py`` to assist with compiling a log of PRs and commits since last release

2. Do the PyPI release
~~~~~~~~~~~~~~~~~~~~~~

Once the PR is merged, switch to the main branch, build the release and upload it to PyPI:
Once the pre-release PR is merged, switch to the main branch, build the release and upload it to PyPI:

.. code-block:: bash

# Clean up any previous builds
\rm -vf dist/*
rm -vf dist/*

# Build artifacts
poetry build
Expand Down Expand Up @@ -487,24 +513,54 @@ Once the PR is merged, switch to the main branch, build the release and upload i
## poetry publish -u __token__ -p pypi-<REDACTED>


Once this is done, create a release tag from `GitHub releases
<https://github.com/RDFLib/rdflib/releases/new>`_. For a release of version
6.3.1 the tag should be ``6.3.1`` (without a "v" prefix), and the release title
should be "RDFLib 6.3.1". The release notes for the latest version be added to
the release description. The artifacts built with ``poetry build`` should be
uploaded to the release as release artifacts.
3. Do the GitHub release
~~~~~~~~~~~~~~~~~~~~~~~~

The resulting release will be available at https://github.com/RDFLib/rdflib/releases/tag/6.3.1
Once the PyPI release is done, tag the main branch with the version number of the release. For a release of version
6.3.1 the tag should be ``6.3.1`` (without a "v" prefix):

.. code-block:: bash

git tag 6.3.1

Once this is done, announce the release at the following locations:

* Twitter: Just make a tweet from your own account linking to the latest release.
* RDFLib mailing list.
* RDFLib Gitter / matrix.org chat room.
Push this tag to GitHub:

.. code-block:: bash

git push --tags


Make a release from this tag at https://github.com/RDFLib/rdflib/releases/new

The release title should be "{DATE} RELEASE {VERSION}". See previous releases at https://github.com/RDFLib/rdflib/releases

The release notes should be just the same as the release info in ``CHANGELOG.md``, as authored in the first major step in this release process.

The resulting release will be available at https://github.com/RDFLib/rdflib/releases/tag/6.3.1

4. Create a post-release PR
~~~~~~~~~~~~~~~~~~~~~~~~~~~

Once this is all done, create another post-release pull request with the following changes:

* Set the just released version in ``docker/latest/requirements.in`` and run
``task docker:prepare`` to update the ``docker/latest/requirements.txt`` file.
* Set the version in the ``pyproject.toml`` file to the next minor release with
a ``a0`` suffix to indicate alpha 0.
#. In ``pyproject.toml``, update to the next minor release alpha

* so a 6.3.1 release would have 6.1.4a0 as the next release alpha

#. In ``docker/latest/requirements.in`` set the version to the just released version
#. Use ``task docker:prepare`` to update ``docker/latest/requirements.txt``



5. Let the world know
~~~~~~~~~~~~~~~~~~~~~

Announce the release at the following locations:

* RDFLib mailing list
* RDFLib Gitter / matrix.org chat room
* Twitter: Just make a tweet from your own account linking to the latest release
* related mailing lists
* Jena: users@jena.apache.org
* W3C (currently RDF-Star WG): public-rdf-star@w3.org
Loading
Loading