Implement FAIR Project Type for Validation (#28)

* added branch dev * finished license paragraph in readme * improved logging * removed obsolete token * optimized docker configuration * fixed docker config * fixed issues with docker compose (issue #3) * refactored navigation * styling improvements * package update * fixed bug in OWL approach * prepared interface for verbalization * removed unnecessary code * added fair project type * simplified access to stores; form simplification; navbar improvements * migrated from rollup to vite * updated docker config * jsconfig fix * updated .gitignore * ui and code improvments * mode is now also stored in localStorage * restructured frontend * restructured frontend * improved componentization + refactoring * minor * Fix #8 by adding error handling to license retrieval * Implement FAIR Project Type for Validation (#27) * Add comments to existing SHACL project shapes #11 * Temporarily fix error when there is no result path #15 * Model FAIR principles as SHACL rules #11 * Fix semantic versioning regex * Use match-case syntax #20 * Switch from short SHACL notation to more verbose one * For files, make check in root directory explicit * Extract the sh:or and sh:and components into node shapes * Extend repository representation with homepage, tags, and DOIs in readme * Introduce property shapes * Split the shapes graph into several files. * Fix selection and display of project types * Extend repository representation for FAIR type (#13) * Add check for valid version increment to repository representation (#13) * Create separate methods for each repository property (#22) * Limit repository representation to requirements of project type (#22) * Remove unnecessary graph returns and process description literal earlier (#22) * Make check for valid version increment less strict (#13) * Visualize repository representation ontology (#21) * Fix cut off letters at the end of words in diagram (#21) * Adapt verbalized explanation for FAIR project type (#15) * For sh:or,and,xone: use source shape instead of message (#15) * Add URLs of IRIs in README.md (#21) * Replace <repo> "has_default_branch" with <branch> "is_default_branch" (#13, #21, #22) * Add minCount to qualifiedValueShapes (#11) Previously, qualifiedValueShape only included qualifiedMinCount. This meant that a graph was valid in which the specified path did not exist at all. * Add missing minCount constraints (#11) * Adapt property shapes to changed default branch representation (#11) * Include number of violations in validation response (#19) * Show share of fulfilled criteria, v1 (#19) * Adjust button of progress bar (#19) * Adapt checks for files in root directory of default branch (#11) * Use properties from the Software Description Ontology (#24) * Integration tests for validation of FAIR project type, part 1 (#18) * Use camelCase for own properties (as in SDO) (#24) * Replace "has_section" property with more specific ones from SDO (#24) * Integration tests for validation of FAIR project type, part 2 (#18) * Introduce mocking (#18) * Remove OWL part (#25) * Fix ZeroDivisionError in benchmark.py (#26) * Adapt FAIR criteria as a result of tests with 26 repos (#14) * Remove "using" as keyword for usage documentation because of false positives (#14) --------- Co-authored-by: Leon Martin <Leon@Home> Co-authored-by: lema <leon_martin@t-online.de> Co-authored-by: Leon Martin <lema@work>
uniba-mi · Jan 15, 2024 · ec074ea · ec074ea
1 parent facab13
commit ec074ea
Show file tree

Hide file tree

Showing 24 changed files with 1,724 additions and 1,941 deletions.
diff --git a/.gitignore b/.gitignore
@@ -7,3 +7,4 @@ backend/data/benchmarks/
 !backend/data/benchmarks/.gitkeep
 
 .DS_Store
+/.idea/
diff --git a/README.md b/README.md
@@ -12,6 +12,8 @@
     •
     <a href="#usage">Usage</a>
     •
+    <a href="#repository-representation-ontology">Repository Representation Ontology</a>
+    •
     <a href="#developer-information">Developer Information</a>
     •
     <a href="#license">License</a>
@@ -27,27 +29,104 @@ Thanks to Docker, only [Docker](https://www.docker.com/) and [Docker Compose](ht
 
 ## Usage 
 
-After cloning or downloading this repository, simply run `docker compose up` in a command line from the root folder of the repository to start the tool. The frontend can then be accessed it via [http://localhost:3000](http://localhost:3000). (If necessary, the backend can be accessed via [http://localhost:5000](http://localhost:5000).)
+After cloning or downloading this repository, simply run `docker compose up` in a command line from the root folder of the repository to start the tool. The frontend can then be accessed via [http://localhost:3000](http://localhost:3000). (If necessary, the backend can be accessed via [http://localhost:5000](http://localhost:5000).)
 
 The frontend currently provides two pages, namely the [Validation page](#the-validation-page) and the [Specification page](#the-specification-page) which can be selected using the navigation bar.
 
 ### The Validation Page
 
 Here you can enter the names of the repositories you want to validate against the available project types. If you plan to validate private repositories or want to make multiple requests in short succession, make sure to also enter a GitHub access token, which can be generated in the settings of your GitHub profile (reading rights suffice here).
 
-When you have filled out the form, you can issue the validation of the specified repositories. For the validation, you can choose between the SHACL and the OWL approach using the provided switch. We recommend the SHACL approach due to the comprehensive explanations it provides in case the validation fails. If the validation succeeds a green symbol is presented, otherwise a red symbol is shown. You can view the explanations (one raw and a verbalized one) by pressing the button next to the red symbols.
+When you have filled out the form, you can issue the validation of the specified repositories. If the validation succeeds a green symbol is presented, otherwise a red symbol is shown. You can view the explanations (one raw and a verbalized one) by pressing the button next to the red symbols.
 
 ### The Specification Page
 
-Here you can view the available project types and the quality constraints that are assigned to them. In the future, it is planned that the project types and criteria can be edited directly here. Currently, you have to edit the SHACL shapes graph or the ontology manually. If you want to change the criteria or add other project types, we strongly recommend editing the shapes graph and thereby using the SHACL approach because this is far easier than editing the ontology.  
+Here you can view the available project types and the quality constraints that are assigned to them. In the future, it is planned that the project types and criteria can be edited directly here. Currently, you have to edit the SHACL shapes graph manually.
+
+## Repository Representation Ontology
+A representation of the given repository is created for validation. Its individual components depend on the corresponding project type. The following visualization shows all possible nodes and edges of this ontology. IRIs (Internationalized Resource Identifiers) are depicted in blue, literals in yellow. 
+
+```mermaid
+---
+title: Ontology for GitHub repositories - maximum cardinality in round brackets
+---
+flowchart LR
+%% NODE SECTION
+%% IRIs and Literals that are directly linked to the repository node
+    repo([<b>Repository</b>]):::iri
+    visibility[Boolean]:::literal
+    topic[String]:::literal
+    description[String]:::literal
+    homepage[String]:::literal
+    mainLanguage[String]:::literal
+    release([<b>Release</b>]):::iri
+    validVersionIncrement[Boolean]:::literal
+    branch([<b>Branch</b>]):::iri
+    issue([<b>Issue</b>]):::iri
+    license([<b>License</b>]):::iri
+    readme([<b>Readme file</b>]):::iri
+    installationInstructions[String]:::literal
+    usageNotes[String]:::literal
+    purpose[String]:::literal
+    softwareRequirements[String]:::literal
+    citation[String]:::literal
+
+%% Literals that can be reached from the other IRIs
+    tagName[String]:::literal
+    branchName[String]:::literal
+    isDefaultBranch[Boolean]:::literal
+    fileInRootDirectory[String]:::literal
+    issueState[String]:::literal
+    licenseName[String]:::literal
+    doiInReadme[Boolean]:::literal
+
+%% LINK SECTION
+%% Outgoing links of the repository node
+    repo -- "props:isPrivate (1)" --> visibility
+    repo -- "sd:keywords (*)" --> topic
+    repo -- "sd:description (1)" --> description
+    repo -- "sd:website (1)" --> homepage
+    repo -- "sd:programmingLanguage (1)" --> mainLanguage
+    repo -- "sd:hasVersion (*)" --> release
+    repo -- "props:versionsHaveValidIncrement (1)" --> validVersionIncrement
+    repo -- "props:hasBranch (*)" --> branch
+    repo -- "props:hasIssue (*)" --> issue
+    repo -- "sd:license (1)" --> license
+    repo -- "sd:readme (1)" --> readme
+    repo -- "sd:hasInstallationInstructions (1)" --> installationInstructions
+    repo -- "sd:hasUsageNotes (1)" --> usageNotes
+    repo -- "sd:hasPurpose (1)" --> purpose
+    repo -- "sd:softwareRequirements (1)" --> softwareRequirements
+    repo -- "sd:citation (1)" --> citation
+
+%% Outgoing links of the other IRIs
+    release -- "sd:hasVersionId (1)" --> tagName
+    branch -- "sd:name (1)" --> branchName
+    branch -- "props:isDefaultBranch (1)" --> isDefaultBranch
+    branch -- "props:hasFileInRootDirectory (*)" --> fileInRootDirectory
+    issue -- "props:hasState (1)" --> issueState
+    license -- "sd:name (1)" --> licenseName
+    readme -- "props:containsDoi (1)" --> doiInReadme
+
+%% STYLING
+    classDef literal fill:#FFEA85, stroke:#000
+    classDef iri fill:#00407A, color:white, stroke:#000
+```
+The IRIs mentioned have the following URL structure:
+* Repository: `https://github.com/<user_or_organization_name>/<repository_name>`
+* Release: `<repository_URL>/releases/tag/<tag_name>`
+* Branch: `<repository_URL>/tree/<branch_name>`
+* Issue: `<repository_URL>/issues/<issue_id>`
+* License: `<repository_URL>/blob/<path_to_license_file>`
+* Readme file: `<repository_URL>/blob/<path_to_readme_file>`
 
 ## Developer Information
 
 Instead of running frontend and backend using `docker compose up`, you can run backend and frontend independently for easier debugging.
 ### Running the Backend
 
 - Run `docker compose run --service-ports --entrypoint bash backend` to get a bash that is attached to the backend container.
-- Run `./backend_api.py` to start the backend. 
+- Run `./api.py` to start the backend. 
 
 ### Running the Frontend
 
@@ -60,7 +139,7 @@ Note that the frontend depends on the backend. The backend should therefore be s
 
 To reproduce the performance benchmarks shown in the paper, perform the following steps: 
 
-- Create a file called `github_access_token` in the [backend](./backend/) folder. Then enter your GitHub access token and that file and save. 
+- Create a file called `git_access_token` in the [backend](./backend/) folder. Then enter your GitHub access token in that file and save. 
 - Run `docker compose run --service-ports --entrypoint bash backend` to get a bash that is attached to the frontend container.
 - Run `./benchmark.py` to start the backend in development mode. 
 

diff --git a/backend/Dockerfile b/backend/Dockerfile
@@ -6,7 +6,6 @@ RUN apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install --yes pytho
 RUN apt-get update && apt-get install --yes python3-pip
 
 RUN pip install -U rdflib
-RUN pip install -U owlready2
 RUN pip install -U PyGithub
 RUN pip install -U markdown
 RUN pip install -U bs4

diff --git a/backend/api.py b/backend/api.py
@@ -3,7 +3,7 @@
 import json
 import logging
 
-from flask import Flask, jsonify, request
+from flask import Flask, jsonify, request, Response
 from flask_cors import CORS
 
 import validation_interface
@@ -16,28 +16,27 @@
 
 
 @app.route("/", methods=['GET'])
-def hello_world():
+def hello_world() -> Response:
     return jsonify({"response": "Hello, World!"})
 
 
 @app.route("/project-type-specifications", methods=['GET'])
-def repo_types():
-    return jsonify(validation_interface.get_project_type_specifcations())
+def repo_types() -> Response:
+    return jsonify(validation_interface.get_project_type_specifications())
 
 
 @app.route("/validate", methods=['POST'])
-def validate():
-
+def validate() -> Response:
     request_data = json.loads(request.data)
     github_access_token = request_data["accessToken"]
     repo_name = request_data["repoName"]
     repo_type = request_data["repoType"]
-    method = request_data["method"]
 
-    returncode, report = validation_interface.run_validator(github_access_token, repo_name, repo_type, method)
-    verbalized = verbalization_interface.run_verbalizer(report, repo_name, repo_type, method)
+    return_code, number_of_violations, report = validation_interface.run_validator(github_access_token, repo_name,
+                                                                                   repo_type)
+    verbalized = verbalization_interface.run_verbalizer(report, repo_name, repo_type)
 
-    results = {"repoName": repo_name, "returnCode": returncode,
+    results = {"repoName": repo_name, "returnCode": return_code, "numberOfViolations": number_of_violations,
                "report": report, "verbalized": verbalized}
 
     return jsonify(results)

diff --git a/backend/benchmark.py b/backend/benchmark.py
@@ -10,7 +10,6 @@
 
 
 def run_benchmark():
-
     # retrieved 2022/03/22 from https://github.com/trending?since=monthly
     trending_github_repos = [
         "Anduin2017/HowToCook",
@@ -38,119 +37,75 @@ def run_benchmark():
     with open("./git_access_token") as file:
         github_access_token = file.readline().strip()
 
-    benchmark_scenarios = [(github_access_token, repo_name, "FinishedResearchProject") for repo_name in trending_github_repos] + [
-        (github_access_token, repo_name, "InternalDocumentation") for repo_name in trending_github_repos]
+    benchmark_scenarios = [(github_access_token, repo_name, "FinishedResearchProject") for repo_name in
+                           trending_github_repos] + [(github_access_token, repo_name, "InternalDocumentation")
+                                                     for repo_name in trending_github_repos]
 
     for github_access_token, repo_name, repo_type in benchmark_scenarios:
+        file_name = f"{repo_name.split('/')[1]}-{repo_type}"
 
-        cmd_owl = ["./owl_validator.py", "--github_access_token", github_access_token,
-                   "--repo_name", repo_name, "--expected_type", repo_type]
-
-        file_name = f"-{repo_name.split('/')[1]}-{repo_type}"
-
-        run(["python3", "-m", "cProfile", "-o",
-             f"data/benchmarks/OWL{file_name}", "-s", "cumulative"] + cmd_owl)
-
-        sleep(3)
-
-        cmd_shacl = ["./shacl_validator.py", "--github_access_token", github_access_token,
-                     "--repo_name", repo_name, "--expected_type", repo_type]
+        cmd = ["./shacl_validator.py", "--github_access_token", github_access_token, "--repo_name", repo_name,
+               "--expected_type", repo_type]
 
         run(["python3", "-m", "cProfile", "-o",
-             f"data/benchmarks/SHACL{file_name}", "-s", "cumulative"] + cmd_shacl)
+             f"data/benchmarks/{file_name}", "-s", "cumulative"] + cmd)
 
         sleep(3)
 
 
 def process_results():
+    all_finished = []
+    all_internal = []
 
-    all_owl_finished = []
-    all_shacl_finished = []
-    all_owl_internal = []
-    all_shacl_internal = []
-
-    shacl_step_durations = [0, 0, 0]
-    owl_step_durations = [0, 0, 0]
-
-    for result_file in glob.glob("./data/benchmarks/SHACL-*"):
-        stats = pstats.Stats(result_file)
-
-        for k, v in stats.stats.items():
-            _, _, function = k
-
-            if function == "test_repo_against_specs":
-                if "FinishedResearchProject" in result_file:
-                    all_shacl_finished.append(v[3])
-                else:
-                    all_shacl_internal.append(v[3])
-
-            elif function == "create_project_type_representation":
-                shacl_step_durations[0] += v[3]
-
-            elif function == "create_repository_representation":
-                shacl_step_durations[1] += v[3]
-
-            elif function == "run_validation":
-                shacl_step_durations[2] += v[3]
+    step_durations = [0, 0, 0]
 
-    for result_file in glob.glob("./data/benchmarks/OWL-*"):
+    for result_file in glob.glob("./data/benchmarks/*"):
         stats = pstats.Stats(result_file)
 
         for k, v in stats.stats.items():
             _, _, function = k
 
-            if function == "test_repo_against_specs":
+            if function == "validate_repo_against_specs":
                 if "FinishedResearchProject" in result_file:
-                    all_owl_finished.append(v[3])
+                    all_finished.append(v[3])
                 else:
-                    all_owl_internal.append(v[3])
+                    all_internal.append(v[3])
 
             elif function == "create_project_type_representation":
-                owl_step_durations[0] += v[3]
+                step_durations[0] += v[3]
 
             elif function == "create_repository_representation":
-                owl_step_durations[1] += v[3]
+                step_durations[1] += v[3]
 
             elif function == "run_validation":
-                owl_step_durations[2] += v[3]
+                step_durations[2] += v[3]
 
     _, ax = plt.subplots(figsize=(6, 3))
 
     ax.set(
         ylabel='Seconds',
     )
 
-    ax.boxplot([all_shacl_finished, all_owl_finished,
-                all_shacl_internal, all_owl_internal])
+    ax.boxplot([all_finished, all_internal])
 
     ax.set_xticklabels(
-        ["$SHACL, T_{F}$", "$OWL, T_{F}$", "$SHACL, T_{I}$", "$OWL, T_{I}$"])
+        ["$T_{F}$", "$T_{I}$"])
 
     plt.tight_layout(pad=0)
 
     plt.savefig("./data/benchmarks/benchmark_results.pdf")
 
-    owl_total = sum(all_owl_finished) + sum(all_owl_internal)
-    shacl_total = sum(all_shacl_finished) + sum(all_shacl_internal)
-
-    owl_step_one_percent = '{:.2f}%'.format(
-        owl_step_durations[0]/owl_total*100)
-    owl_step_two_percent = '{:.2f}%'.format(
-        owl_step_durations[1]/owl_total*100)
-    owl_step_three_percent = '{:.2f}%'.format(
-        owl_step_durations[2]/owl_total*100)
+    total = sum(all_finished) + sum(all_internal)
 
-    shacl_step_one_percent = '{:.2f}%'.format(
-        shacl_step_durations[0]/shacl_total*100)
-    shacl_step_two_percent = '{:.2f}%'.format(
-        shacl_step_durations[1]/shacl_total*100)
-    shacl_step_three_percent = '{:.2f}%'.format(
-        shacl_step_durations[2]/shacl_total*100)
+    step_one_percent = '{:.2f}%'.format(
+        step_durations[0] / total * 100)
+    step_two_percent = '{:.2f}%'.format(
+        step_durations[1] / total * 100)
+    step_three_percent = '{:.2f}%'.format(
+        step_durations[2] / total * 100)
 
     logging.info(
-        f"Using the OWL approach, steps 1/2/3 account for {owl_step_one_percent}/{owl_step_two_percent}/{owl_step_three_percent} of the total runtime.")
-    logging.info(
-        f"Using the SHACL approach, steps 1/2/3 account for {shacl_step_one_percent}/{shacl_step_two_percent}/{shacl_step_three_percent} of the total runtime.")
+        f"Steps 1/2/3 account for {step_one_percent}/{step_two_percent}/{step_three_percent} of the total runtime.")
 
 
 if __name__ == "__main__":
Original file line number	Diff line number	Diff line change
Expand Up		@@ -7,3 +7,4 @@ backend/data/benchmarks/
		!backend/data/benchmarks/.gitkeep

		.DS_Store
		/.idea/