Repository of the Knowledge Flows in Interdisciplinary Research project of VU Network Institute.
Is also the main repository for the packages 'triplicator' and 'preprocessor'. For descriptions of these individual modules, please see their directory.
- Pyhon 3.x
- pybtex
- unidecode
- sparqlwrapper
Convert .bib to .ttl:
my_bibtex_file = Bibtex_File('demo.bib')
my_bibtex_file.convert_to_ttl(desired_version_suffix='v0.1',
desired_source_bibliography_name='my-bib',
output_directory='output')
Retrieve articles by DOIs from Open Citations:
doi_list = ['10.1163/187607508X384689', '10.1017/S0954579416000572']
oc_query = Open_Citations_Query()
oc_query.retrieve_articles_by_dois(doi_list, show_progress_bar=True)
oc_query.write_results_to_csv('retrieved_articles.csv')
See the examples directory for more examples. Further documentation provided in docstrings within the code.
Although short samples may be provided, bibliographic databases from VU and UvA are not included in this directory due to copyright reasons related to their respective owners. Data gathered from OpenCitations, however, are made fully available.
The numbers in parentheses before the log entries denote story points.
- (24) Pure dataset enriched using Web of Science database
- (6) OpenCitations ID and Pure ID added to entry fields on LD-R
- (18) Instance NAMES are now identifiers (i.e., Pure IDs and OpenCitations IDs) instead of URI-safe versions of article titles, so that it is now ensured that instance names are always unique (e.g., no more duplicate entries because of a title such as 'Intro'), and it is easier to refer to entries in a citation network. Instance LABELS stayed as article titles
- (7) Very long test outputs (e.g., due to progress bars in logs) removed in order to make it easier to navigate source code
- (6) Strange document types no longer appear in LD-R
- (5) Field names in CSV and BIB parsing operations standardized: Now parsing operations of both bib and csv files result in the same field names
- (8) Merge functionality fully implemented and tested
- (5) Implemented method to retrieve all entries with matching DOIs from OpenCitations
- (5) All DOIs in bib files stored in a file
- (21) Enrich method for OpenCitations and Pure implemented
- (8) Implemented methods for one-statement conversion of bib and csv files to ttl
- (12) Added real-time reporting (e.g., with progress bars in console) and logging for all methods that take a long time to complete
- (8) verbose_input parameter and the functionality removed form bib and csv parsing operations, so that it does not majorly slow down the processes if someone activates it by mistake
- (12) CSV file cleaning script improved (using already-existing cleaning scripts) so that it can be imported into .ttl without issues
- (8) Flexible search (instead of exact match search) is implemented in SPARQL query functionality
- (5) Example scripts folder updated and revised
- (8) All functions in the project converted to their object-oriented equivalents