Skip to content

JohannesHechler/ukhsa

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ukhsa


  • demo pipeline for UKHSA job advert UKHSA01111

  • processes custom version of the sd2011 synthetic dataset into 2 derivatives:

  • attributes (no disclosive columns)

  • identifiers (disclosive)

input data

  • source: SD2011 project
  • I added random numbers of NULL values and an ID column for added realism
  • cf. the data dictionary in inputs/data_dictionary.csv

flowchart


alt text

desk instructions


  • find file inputs/config.yml
  • update parameters as required
  • in particular review:
  • which columns to write to which outputs
  • which column to hash
  • review the output-column mapping in inputs/data_dictionary.csv
  • run script pipeline.py
  • review dictionary quality_metrics in memory
  • review the engineering log in sd2011.log

About

demo pipelines for UKHSA

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages