You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Question/topic for discussion:
Can the manual table creation (pdf -> csv) be simplified? This could be great as this would mean the code would be generalisable. For example, this may help in addressing creating potential workflows for CPRD Gold not just Aurum #12 .
Tasks:
To look into PDF table extraction tools which can assist this process
Are these tools flexible and how accurate are they? If they are prone to error and take time to validate maybe better to leave as is
CPRD's data specifications (example) contain all the info we want for this pipeline but they are in PDF - if it's easy to extract the metadata tables into machine readable format that's great, but don't do it if it's a messy big task. We can think about requesting them in a different format
HDRUK allow you to download a structural metadata csv for CPRD Aurum (https://healthdatagateway.org/en/dataset/692) however the file only contains 'column name' not 'field name' (and field name is the one that is in the data files) and we do not know if it is kept up to date
Question/topic for discussion:
Can the manual table creation (pdf -> csv) be simplified? This could be great as this would mean the code would be generalisable. For example, this may help in addressing creating potential workflows for CPRD Gold not just Aurum #12 .
Tasks:
Other information
Ideas doc attached:
PDFtocsv-ideas.docx
The text was updated successfully, but these errors were encountered: