Skip to content

Commit

Permalink
Implement FAIR Project Type for Validation (#28)
Browse files Browse the repository at this point in the history
* added branch dev

* finished license paragraph in readme

* improved logging

* removed obsolete token

* optimized docker configuration

* fixed docker config

* fixed issues with docker compose (issue #3)

* refactored navigation

* styling improvements

* package update

* fixed bug in OWL approach

* prepared interface for verbalization

* removed unnecessary code

* added fair project type

* simplified access to stores; form simplification; navbar improvements

* migrated from rollup to vite

* updated docker config

* jsconfig fix

* updated .gitignore

* ui and code improvments

* mode is now also stored in localStorage

* restructured frontend

* restructured frontend

* improved componentization + refactoring

* minor

* Fix #8 by adding error handling to license retrieval

* Implement FAIR Project Type for Validation (#27)

* Add comments to existing SHACL project shapes #11

* Temporarily fix error when there is no result path #15

* Model FAIR principles as SHACL rules #11

* Fix semantic versioning regex

* Use match-case syntax #20

* Switch from short SHACL notation to more verbose one

* For files, make check in root directory explicit

* Extract the sh:or and sh:and components into node shapes

* Extend repository representation with homepage, tags, and DOIs in readme

* Introduce property shapes

* Split the shapes graph into several files.

* Fix selection and display of project types

* Extend repository representation for FAIR type (#13)

* Add check for valid version increment to repository representation (#13)

* Create separate methods for each repository property (#22)

* Limit repository representation to requirements of project type (#22)

* Remove unnecessary graph returns and process description literal earlier (#22)

* Make check for valid version increment less strict (#13)

* Visualize repository representation ontology (#21)

* Fix cut off letters at the end of words in diagram (#21)

* Adapt verbalized explanation for FAIR project type (#15)

* For sh:or,and,xone: use source shape instead of message (#15)

* Add URLs of IRIs in README.md (#21)

* Replace <repo> "has_default_branch" with <branch> "is_default_branch" (#13, #21, #22)

* Add minCount to qualifiedValueShapes (#11)

Previously, qualifiedValueShape only included qualifiedMinCount. This meant that a graph was valid in which the specified path did not exist at all.

* Add missing minCount constraints (#11)

* Adapt property shapes to changed default branch representation (#11)

* Include number of violations in validation response (#19)

* Show share of fulfilled criteria, v1 (#19)

* Adjust button of progress bar (#19)

* Adapt checks for files in root directory of default branch (#11)

* Use properties from the Software Description Ontology (#24)

* Integration tests for validation of FAIR project type, part 1 (#18)

* Use camelCase for own properties (as in SDO) (#24)

* Replace "has_section" property with more specific ones from SDO (#24)

* Integration tests for validation of FAIR project type, part 2 (#18)

* Introduce mocking (#18)

* Remove OWL part (#25)

* Fix ZeroDivisionError in benchmark.py (#26)

* Adapt FAIR criteria as a result of tests with 26 repos (#14)

* Remove "using" as keyword for usage documentation because of false positives (#14)

---------

Co-authored-by: Leon Martin <Leon@Home>
Co-authored-by: lema <leon_martin@t-online.de>
Co-authored-by: Leon Martin <lema@work>
  • Loading branch information
4 people authored Jan 15, 2024
1 parent facab13 commit ec074ea
Show file tree
Hide file tree
Showing 24 changed files with 1,724 additions and 1,941 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -7,3 +7,4 @@ backend/data/benchmarks/
!backend/data/benchmarks/.gitkeep

.DS_Store
/.idea/
89 changes: 84 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,8 @@
<a href="#usage">Usage</a>
<a href="#repository-representation-ontology">Repository Representation Ontology</a>
<a href="#developer-information">Developer Information</a>
<a href="#license">License</a>
Expand All @@ -27,27 +29,104 @@ Thanks to Docker, only [Docker](https://www.docker.com/) and [Docker Compose](ht

## Usage

After cloning or downloading this repository, simply run `docker compose up` in a command line from the root folder of the repository to start the tool. The frontend can then be accessed it via [http://localhost:3000](http://localhost:3000). (If necessary, the backend can be accessed via [http://localhost:5000](http://localhost:5000).)
After cloning or downloading this repository, simply run `docker compose up` in a command line from the root folder of the repository to start the tool. The frontend can then be accessed via [http://localhost:3000](http://localhost:3000). (If necessary, the backend can be accessed via [http://localhost:5000](http://localhost:5000).)

The frontend currently provides two pages, namely the [Validation page](#the-validation-page) and the [Specification page](#the-specification-page) which can be selected using the navigation bar.

### The Validation Page

Here you can enter the names of the repositories you want to validate against the available project types. If you plan to validate private repositories or want to make multiple requests in short succession, make sure to also enter a GitHub access token, which can be generated in the settings of your GitHub profile (reading rights suffice here).

When you have filled out the form, you can issue the validation of the specified repositories. For the validation, you can choose between the SHACL and the OWL approach using the provided switch. We recommend the SHACL approach due to the comprehensive explanations it provides in case the validation fails. If the validation succeeds a green symbol is presented, otherwise a red symbol is shown. You can view the explanations (one raw and a verbalized one) by pressing the button next to the red symbols.
When you have filled out the form, you can issue the validation of the specified repositories. If the validation succeeds a green symbol is presented, otherwise a red symbol is shown. You can view the explanations (one raw and a verbalized one) by pressing the button next to the red symbols.

### The Specification Page

Here you can view the available project types and the quality constraints that are assigned to them. In the future, it is planned that the project types and criteria can be edited directly here. Currently, you have to edit the SHACL shapes graph or the ontology manually. If you want to change the criteria or add other project types, we strongly recommend editing the shapes graph and thereby using the SHACL approach because this is far easier than editing the ontology.
Here you can view the available project types and the quality constraints that are assigned to them. In the future, it is planned that the project types and criteria can be edited directly here. Currently, you have to edit the SHACL shapes graph manually.

## Repository Representation Ontology
A representation of the given repository is created for validation. Its individual components depend on the corresponding project type. The following visualization shows all possible nodes and edges of this ontology. IRIs (Internationalized Resource Identifiers) are depicted in blue, literals in yellow.

```mermaid
---
title: Ontology for GitHub repositories - maximum cardinality in round brackets
---
flowchart LR
%% NODE SECTION
%% IRIs and Literals that are directly linked to the repository node
repo([<b>Repository</b>]):::iri
visibility[Boolean]:::literal
topic[String]:::literal
description[String]:::literal
homepage[String]:::literal
mainLanguage[String]:::literal
release([<b>Release</b>]):::iri
validVersionIncrement[Boolean]:::literal
branch([<b>Branch</b>]):::iri
issue([<b>Issue</b>]):::iri
license([<b>License</b>]):::iri
readme([<b>Readme file</b>]):::iri
installationInstructions[String]:::literal
usageNotes[String]:::literal
purpose[String]:::literal
softwareRequirements[String]:::literal
citation[String]:::literal
%% Literals that can be reached from the other IRIs
tagName[String]:::literal
branchName[String]:::literal
isDefaultBranch[Boolean]:::literal
fileInRootDirectory[String]:::literal
issueState[String]:::literal
licenseName[String]:::literal
doiInReadme[Boolean]:::literal
%% LINK SECTION
%% Outgoing links of the repository node
repo -- "props:isPrivate (1)" --> visibility
repo -- "sd:keywords (*)" --> topic
repo -- "sd:description (1)" --> description
repo -- "sd:website (1)" --> homepage
repo -- "sd:programmingLanguage (1)" --> mainLanguage
repo -- "sd:hasVersion (*)" --> release
repo -- "props:versionsHaveValidIncrement (1)" --> validVersionIncrement
repo -- "props:hasBranch (*)" --> branch
repo -- "props:hasIssue (*)" --> issue
repo -- "sd:license (1)" --> license
repo -- "sd:readme (1)" --> readme
repo -- "sd:hasInstallationInstructions (1)" --> installationInstructions
repo -- "sd:hasUsageNotes (1)" --> usageNotes
repo -- "sd:hasPurpose (1)" --> purpose
repo -- "sd:softwareRequirements (1)" --> softwareRequirements
repo -- "sd:citation (1)" --> citation
%% Outgoing links of the other IRIs
release -- "sd:hasVersionId (1)" --> tagName
branch -- "sd:name (1)" --> branchName
branch -- "props:isDefaultBranch (1)" --> isDefaultBranch
branch -- "props:hasFileInRootDirectory (*)" --> fileInRootDirectory
issue -- "props:hasState (1)" --> issueState
license -- "sd:name (1)" --> licenseName
readme -- "props:containsDoi (1)" --> doiInReadme
%% STYLING
classDef literal fill:#FFEA85, stroke:#000
classDef iri fill:#00407A, color:white, stroke:#000
```
The IRIs mentioned have the following URL structure:
* Repository: `https://github.com/<user_or_organization_name>/<repository_name>`
* Release: `<repository_URL>/releases/tag/<tag_name>`
* Branch: `<repository_URL>/tree/<branch_name>`
* Issue: `<repository_URL>/issues/<issue_id>`
* License: `<repository_URL>/blob/<path_to_license_file>`
* Readme file: `<repository_URL>/blob/<path_to_readme_file>`

## Developer Information

Instead of running frontend and backend using `docker compose up`, you can run backend and frontend independently for easier debugging.
### Running the Backend

- Run `docker compose run --service-ports --entrypoint bash backend` to get a bash that is attached to the backend container.
- Run `./backend_api.py` to start the backend.
- Run `./api.py` to start the backend.

### Running the Frontend

Expand All @@ -60,7 +139,7 @@ Note that the frontend depends on the backend. The backend should therefore be s

To reproduce the performance benchmarks shown in the paper, perform the following steps:

- Create a file called `github_access_token` in the [backend](./backend/) folder. Then enter your GitHub access token and that file and save.
- Create a file called `git_access_token` in the [backend](./backend/) folder. Then enter your GitHub access token in that file and save.
- Run `docker compose run --service-ports --entrypoint bash backend` to get a bash that is attached to the frontend container.
- Run `./benchmark.py` to start the backend in development mode.

Expand Down
1 change: 0 additions & 1 deletion backend/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,6 @@ RUN apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install --yes pytho
RUN apt-get update && apt-get install --yes python3-pip

RUN pip install -U rdflib
RUN pip install -U owlready2
RUN pip install -U PyGithub
RUN pip install -U markdown
RUN pip install -U bs4
Expand Down
19 changes: 9 additions & 10 deletions backend/api.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
import json
import logging

from flask import Flask, jsonify, request
from flask import Flask, jsonify, request, Response
from flask_cors import CORS

import validation_interface
Expand All @@ -16,28 +16,27 @@


@app.route("/", methods=['GET'])
def hello_world():
def hello_world() -> Response:
return jsonify({"response": "Hello, World!"})


@app.route("/project-type-specifications", methods=['GET'])
def repo_types():
return jsonify(validation_interface.get_project_type_specifcations())
def repo_types() -> Response:
return jsonify(validation_interface.get_project_type_specifications())


@app.route("/validate", methods=['POST'])
def validate():

def validate() -> Response:
request_data = json.loads(request.data)
github_access_token = request_data["accessToken"]
repo_name = request_data["repoName"]
repo_type = request_data["repoType"]
method = request_data["method"]

returncode, report = validation_interface.run_validator(github_access_token, repo_name, repo_type, method)
verbalized = verbalization_interface.run_verbalizer(report, repo_name, repo_type, method)
return_code, number_of_violations, report = validation_interface.run_validator(github_access_token, repo_name,
repo_type)
verbalized = verbalization_interface.run_verbalizer(report, repo_name, repo_type)

results = {"repoName": repo_name, "returnCode": returncode,
results = {"repoName": repo_name, "returnCode": return_code, "numberOfViolations": number_of_violations,
"report": report, "verbalized": verbalized}

return jsonify(results)
Expand Down
99 changes: 27 additions & 72 deletions backend/benchmark.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,6 @@


def run_benchmark():

# retrieved 2022/03/22 from https://github.com/trending?since=monthly
trending_github_repos = [
"Anduin2017/HowToCook",
Expand Down Expand Up @@ -38,119 +37,75 @@ def run_benchmark():
with open("./git_access_token") as file:
github_access_token = file.readline().strip()

benchmark_scenarios = [(github_access_token, repo_name, "FinishedResearchProject") for repo_name in trending_github_repos] + [
(github_access_token, repo_name, "InternalDocumentation") for repo_name in trending_github_repos]
benchmark_scenarios = [(github_access_token, repo_name, "FinishedResearchProject") for repo_name in
trending_github_repos] + [(github_access_token, repo_name, "InternalDocumentation")
for repo_name in trending_github_repos]

for github_access_token, repo_name, repo_type in benchmark_scenarios:
file_name = f"{repo_name.split('/')[1]}-{repo_type}"

cmd_owl = ["./owl_validator.py", "--github_access_token", github_access_token,
"--repo_name", repo_name, "--expected_type", repo_type]

file_name = f"-{repo_name.split('/')[1]}-{repo_type}"

run(["python3", "-m", "cProfile", "-o",
f"data/benchmarks/OWL{file_name}", "-s", "cumulative"] + cmd_owl)

sleep(3)

cmd_shacl = ["./shacl_validator.py", "--github_access_token", github_access_token,
"--repo_name", repo_name, "--expected_type", repo_type]
cmd = ["./shacl_validator.py", "--github_access_token", github_access_token, "--repo_name", repo_name,
"--expected_type", repo_type]

run(["python3", "-m", "cProfile", "-o",
f"data/benchmarks/SHACL{file_name}", "-s", "cumulative"] + cmd_shacl)
f"data/benchmarks/{file_name}", "-s", "cumulative"] + cmd)

sleep(3)


def process_results():
all_finished = []
all_internal = []

all_owl_finished = []
all_shacl_finished = []
all_owl_internal = []
all_shacl_internal = []

shacl_step_durations = [0, 0, 0]
owl_step_durations = [0, 0, 0]

for result_file in glob.glob("./data/benchmarks/SHACL-*"):
stats = pstats.Stats(result_file)

for k, v in stats.stats.items():
_, _, function = k

if function == "test_repo_against_specs":
if "FinishedResearchProject" in result_file:
all_shacl_finished.append(v[3])
else:
all_shacl_internal.append(v[3])

elif function == "create_project_type_representation":
shacl_step_durations[0] += v[3]

elif function == "create_repository_representation":
shacl_step_durations[1] += v[3]

elif function == "run_validation":
shacl_step_durations[2] += v[3]
step_durations = [0, 0, 0]

for result_file in glob.glob("./data/benchmarks/OWL-*"):
for result_file in glob.glob("./data/benchmarks/*"):
stats = pstats.Stats(result_file)

for k, v in stats.stats.items():
_, _, function = k

if function == "test_repo_against_specs":
if function == "validate_repo_against_specs":
if "FinishedResearchProject" in result_file:
all_owl_finished.append(v[3])
all_finished.append(v[3])
else:
all_owl_internal.append(v[3])
all_internal.append(v[3])

elif function == "create_project_type_representation":
owl_step_durations[0] += v[3]
step_durations[0] += v[3]

elif function == "create_repository_representation":
owl_step_durations[1] += v[3]
step_durations[1] += v[3]

elif function == "run_validation":
owl_step_durations[2] += v[3]
step_durations[2] += v[3]

_, ax = plt.subplots(figsize=(6, 3))

ax.set(
ylabel='Seconds',
)

ax.boxplot([all_shacl_finished, all_owl_finished,
all_shacl_internal, all_owl_internal])
ax.boxplot([all_finished, all_internal])

ax.set_xticklabels(
["$SHACL, T_{F}$", "$OWL, T_{F}$", "$SHACL, T_{I}$", "$OWL, T_{I}$"])
["$T_{F}$", "$T_{I}$"])

plt.tight_layout(pad=0)

plt.savefig("./data/benchmarks/benchmark_results.pdf")

owl_total = sum(all_owl_finished) + sum(all_owl_internal)
shacl_total = sum(all_shacl_finished) + sum(all_shacl_internal)

owl_step_one_percent = '{:.2f}%'.format(
owl_step_durations[0]/owl_total*100)
owl_step_two_percent = '{:.2f}%'.format(
owl_step_durations[1]/owl_total*100)
owl_step_three_percent = '{:.2f}%'.format(
owl_step_durations[2]/owl_total*100)
total = sum(all_finished) + sum(all_internal)

shacl_step_one_percent = '{:.2f}%'.format(
shacl_step_durations[0]/shacl_total*100)
shacl_step_two_percent = '{:.2f}%'.format(
shacl_step_durations[1]/shacl_total*100)
shacl_step_three_percent = '{:.2f}%'.format(
shacl_step_durations[2]/shacl_total*100)
step_one_percent = '{:.2f}%'.format(
step_durations[0] / total * 100)
step_two_percent = '{:.2f}%'.format(
step_durations[1] / total * 100)
step_three_percent = '{:.2f}%'.format(
step_durations[2] / total * 100)

logging.info(
f"Using the OWL approach, steps 1/2/3 account for {owl_step_one_percent}/{owl_step_two_percent}/{owl_step_three_percent} of the total runtime.")
logging.info(
f"Using the SHACL approach, steps 1/2/3 account for {shacl_step_one_percent}/{shacl_step_two_percent}/{shacl_step_three_percent} of the total runtime.")
f"Steps 1/2/3 account for {step_one_percent}/{step_two_percent}/{step_three_percent} of the total runtime.")


if __name__ == "__main__":
Expand Down
Loading

0 comments on commit ec074ea

Please sign in to comment.