Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Derive shapes from maps #125

Open
tpluscode opened this issue Apr 13, 2023 · 4 comments
Open

Derive shapes from maps #125

tpluscode opened this issue Apr 13, 2023 · 4 comments

Comments

@tpluscode
Copy link
Contributor

I would like to propose a new feature where minimal SHACL shapes are generated from the mappings. The purpose is to generate a starting point for defining more specific constraints over the output data. For example, given the mapping shown in the language reference

map AirportMapping from airport {
	subject template "http://airport.example.com/{0}" with id;
	
	graphs
		template "http://airport.example.com/graph/stop/{0}" with id;
		constant "http://www.w3.org/ns/r2rml#defaultGraph";
	
	types transit.Stop
	
	properties
		transit.route from stop with datatype xsd.integer;
		wgs84_pos.lat from latitude;
		wgs84_pos.long from longitude;
}

One would be able to produce a shape with minimal constraints.

<AirportMappingShape>
  a sh:NodeShape ;
  sh:targetClass transit:Stop ;
  sh:property 
    <AirportMappingShape/transit:route> ,
    <AirportMappingShape/wgs84_pos:lat> ,
    <AirportMappingShape/wgs84_pos:long> ;
.

<AirportMappingShape/transit:route>
  sh:path transit:route ;
  sh:datatype xsd:integer ;
  sh:nodeKind sh:Literal ;
.

<AirportMappingShape/wgs84_pos:lat>
  sh:path wgs84_pos:lat ;
  sh:nodeKind sh:Literal ;
.

<AirportMappingShape/wgs84_pos:long>
  sh:path wgs84_pos:long ;
  sh:nodeKind sh:Literal ;
.

It's important property shapes are named nodes, so that they would be extendable by adding properties in a separate document and merging them.
Give multiple mappings for same predicate might require sh:or or different node kind such as sh:NamedNodeOrLiteral


To implement this feature, I would propose to slightly adapt (and also simplify) the feature proposed in #115. I will create a draft PR to illustrate

@mchlrch
Copy link
Member

mchlrch commented Apr 19, 2023

Shapes derived from the mapping don't necessarily describe the output graph of the pipeline, often there are post-processing steps after the mapping.

Nevertheless, there are likely cases for which shapes derived from the mapping are useful (maybe also for troubleshooting pipelines or the mapping itself by validating intermediate results).

Some things to consider, if shapes are derived from the mapping (in general, not related to the proposal in PR #126 ... more of a "notes-to-self"):

  • The mapping might be overspecified and not respresentative of the resulting data graph (eg. using an xpath expression that doesn't match anything)
  • A mapping block declaring multiple types would result in a shape targeting multiple classes
  • One graph resource can be populated from multiple mapping blocks. In this case only the sum of the constraints from the resulting multiple NodeShapes would describe the resource (and the derived NodeShapes could not be sh:closed individually)
  • Mapping blocks are aligned to input blocks (eg. a table). One input block can have multiple mapping blocks
  • In the mapping block, we don't have an alias for the property, so the property name would have to be used verbatim. This could turn out to become an issue if the generated shapes are extended with statements from a separate document and the schema changes

(Unrelated to this feature request, but related to the last point of the above list) Decoupling the mapping from the schema by means of pointing from the mapping to shape elements, rather than schema elements could be an option to facilitate handling schema changes (shape-first, shape-as-contract).

My plan is to make xrm more hackable, in order to unlock possibilites for toolchain improvements outside of the xrm editor itself. Like #127 and #128

@mchlrch
Copy link
Member

mchlrch commented Apr 19, 2023

For one-time scaffolding, introspecting the shapes from the output graph of the pipeline might be an alternative.

Here's a query to illustrate this, based on the construct query that SPEX is running in "introspection" mode. I used this in a customer project.

Note: The query has dependencies on spif: functions which GraphDB has built-in. They need to be replaced for running the query on other stores.

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX mobi: <https://schema.mobicorp.ch/>
PREFIX sh: <http://www.w3.org/ns/shacl#>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#> 
PREFIX schema: <http://schema.org/>
PREFIX spif: <http://spinrdf.org/spif#>
CONSTRUCT {
    ?nodeShape a sh:NodeShape .
    ?nodeShape sh:targetClass ?cls .
    ?nodeShape sh:property ?propertyShape .
    ?propertyShape a sh:PropertyShape .
    ?propertyShape sh:path ?property .
    ?propertyShape sh:class ?linktype .
    ?propertyShape sh:datatype ?datatype .
} WHERE {
    VALUES ?cls {
        #            mobi:Table
        #            mobi:Column
        mobi:Mitarbeiter
        mobi:Organisationseinheit
    }
    ?subject a ?cls .
    ?subject ?property ?object .
    OPTIONAL {
        ?object a ?linktype .
    }    
    MINUS {
        # --- blacklist ---
        VALUES ?cls {
            rdf:Property
            owl:TransitiveProperty
            owl:SymmetricProperty
            rdf:List
            rdfs:Class
            rdfs:Datatype
            rdfs:ContainerMembershipProperty
            # -------------
            mobi:ArchitektursichtElement
            mobi:OrganisationsElement
            mobi:ProzessElement
            mobi:FunktionsElement
            mobi:IntegrationsElement
            mobi:InformationsElement
            # -------------
            mobi:Informationsobjekt
            mobi:Informationsobjektbeziehung
            mobi:Informationsattribut
            mobi:Rollenbesetzung
            # -------------
            mobi:edc\/UiView
            mobi:edc\/Link
            sh:PropertyShape
            skos:ConceptScheme
            skos:Concept
        } 
        ?subject a ?cls .
    } 
    BIND(DATATYPE(?object) AS ?datatype)
    BIND(spif:buildURI("<urn:NodeShape:{?1}>", spif:encodeURL(str(?cls))) AS ?nodeShape)
    BIND(spif:buildURI("<urn:PropertyShape:{?1}/{?2}>", spif:encodeURL(str(?cls)), spif:encodeURL(str(?property))) AS ?propertyShape)
}

@tpluscode
Copy link
Contributor Author

Shapes derived from the mapping don't necessarily describe the output graph of the pipeline, often there are post-processing steps after the mapping.

Yes, I realised that too while thinking about my proposal. In museumplus it is just like that. The XRM is only temporary representation and has nothing in common with the final representation.

Maybe I did not mention that precisely, but my idea was that shapes defined in XRM could also be unrelated to the mapping itself.

-node-shape PersonNodeShape from PersonMapping {
+node-shape PersonNodeShape {
}

That way one could take advantage of a simpler syntax although that would be slightly incomplete without nice support for vocabularies (re #14).

My plan is to make xrm more hackable

I cannot really comment on that but I'm intrigued about how hackability helps. Let's discuss that

@mchlrch
Copy link
Member

mchlrch commented Feb 16, 2024

See also https://github.com/RMLio/RML2SHACL

Paper: RML2SHACL: RDF Generation Is Shaping Up
https://lirias.kuleuven.be/retrieve/641696

CC @BenjaminHofstetter

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants