-
Notifications
You must be signed in to change notification settings - Fork 6
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #35 from andrewsu/main
create DrugCentral_subset
- Loading branch information
Showing
6 changed files
with
102 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
# DrugCentral Creative Mode -- SUBSET | ||
|
||
##Goal | ||
A benchmark data set for returning drugs that treat an indication. Indications are drawn from DrugCentral. This benchmark is a very small subset of the full DrugCentral benchmark. | ||
|
||
##Data Description | ||
DrugCentral provides ~10k indications for ~3000 drugs. `get_indications.py` retrieves these indications and parses out chemical and disease identifiers. It removes cases where one disease has many known drugs. It also samples 5 indications that it writes to `data.tsv`. | ||
|
||
##Benchmarks | ||
This data set is used to create the following benchmarks: | ||
|
||
### Treats | ||
A creative mode query looking for small molecules connected to the indication via a treats predicate | ||
|
||
|
||
##Data Creation | ||
created from DrugCentral via mychem.info using `get_indications.py`. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,31 @@ | ||
_id chebi drug_umls disease_umls drug_name disease_name | ||
ZZHLYYDVIOPZBE-UHFFFAOYSA-N CHEBI:9725 UMLS:C0041031 UMLS:C0002994 alimemazine,alimemazine tartrate,temaril,trimeprazine tartrate,trimeprazine,alimezine,methylpromazine,teralene Angioedema | ||
UREBDLICKHMUKA-DVTGEIKXSA-N CHEBI:3077 UMLS:C0005308 UMLS:C0002994 betamethasone,betadexamethasone,betametasone,betamethazone,bethamethasone Angioedema | ||
ITRJWOMZKQRYTA-RFZYENFJSA-N CHEBI:3897 UMLS:C0056391 UMLS:C0002994 cortisone acetate,cortone acetate,irisone acetate,adrenalex Angioedema | ||
UREBDLICKHMUKA-CXSFZGCWSA-N CHEBI:41879 UMLS:C0011777 UMLS:C0002994 dexamethasone,dexasone,dexmethsone Angioedema | ||
ALEXXDVDDISNDU-JZYPGELDSA-N CHEBI:17609 UMLS:C0063077 UMLS:C0002994 hydrocortisone acetate,cortisol acetate,hydroxycorticosterone acetate Angioedema | ||
VWQWXZAWFPZJDA-CGVGKPPMSA-N CHEBI:5782 UMLS:C0056387 UMLS:C0002994 hydrocortisone sodium succinate,hydrocortisone succinate,hydrocortisone hemisuccinate Angioedema | ||
JYGXADMDTFJGBT-VWUMJDOOSA-N CHEBI:17650 UMLS:C0020268 UMLS:C0002994 hydrocortisone,17-hydroxycorticosterone,acticort,cetacort,hydracort,hydrasson,hydrocortisyl,cortisol Angioedema | ||
CBHCDHNUZWWAPP-UHFFFAOYSA-N CHEBI:135324 UMLS:C0065954 UMLS:C0002994 mepazine,mepazin,mepazine base,meprazine,mesapin,nothiazine,pecazine,mepazine acetate Angioedema | ||
HTMIBDQKFHUPSX-UHFFFAOYSA-N CHEBI:6823 UMLS:C0066101 UMLS:C0002994 methdilazine,methdilazine hydrochloride,methdilazine HCl Angioedema | ||
VHRSUDSXCMQTMA-PJHHCJLFSA-N CHEBI:6888 UMLS:C0721647 UMLS:C0002994 methylprednisolone,medralone,metilbetasone Angioedema | ||
YNDXUCZADRHECN-JNQJZLCISA-N CHEBI:71418 UMLS:C2608734 UMLS:C0002994 nasacort,triamcinolone acetonide,aristicort,allernaze Angioedema | ||
OIGNJSKKLXVSLS-VWUMJDOOSA-N CHEBI:8378 UMLS:C0032950 UMLS:C0002994 prednisolone,prenolone,deltahydrocortisone,hydroretrocortin,hydroretrocortine,metacortandralone Angioedema | ||
XOFYZVNMUHMLCC-ZPOLXVRWSA-N CHEBI:8382 UMLS:C0032952 UMLS:C0002994 prednisone anhydrous,prednisone,1,2-dehydrocortisone,dehydrocortisone Angioedema | ||
ZGUGWUXLJSTTMA-UHFFFAOYSA-N CHEBI:8459 UMLS:C0033399 UMLS:C0002994 promazine,prazin,prazine,romtiazin,promazine hydrochloride,promazine HCl Angioedema | ||
XGMPVBXKDAHORN-RBWIMXSLSA-N CHEBI:9669 UMLS:C0137442 UMLS:C0002994 triamcinolone diacetate,aristocort diacetate,polcortolon Angioedema | ||
GFNANZIMVAIWHM-OBYCQNJPSA-N CHEBI:9667 UMLS:C0040864 UMLS:C0002994 triamcinolone,triamcinlon,triamcinolon,rodinolone Angioedema | ||
KDLRVYVGXIQJDK-AWPVFWJPSA-N CHEBI:3745 UMLS:C0008947 UMLS:C0014013 clindamycin hydrochloride hydrate,clindamycin,7-Chloro-7-deoxylincomycin,7-Chlorolincomycin,7-Deoxy-7(S)-chlorolincomycin,chlolincocin,clincin,clinimycin,Dalacin C,dalacine,clindamycin hydrochloride,clindamycin HCl Empyema of pleura | ||
UFUVLHLTWXBHGZ-MGZQPHGTSA-N CHEBI:3746 UMLS:C1119917 UMLS:C0014013 clindamycin phosphate,cleocin phosphate,clindagel,clindamycin-2-phosphate Empyema of pleura | ||
OHKOGUYZJXTSFX-KZFFXBSXSA-N CHEBI:9587 UMLS:C0040193 UMLS:C0014013 ticarcillin,ticarcillin disodium,ticarcillin sodium Empyema of pleura | ||
JCQLYHFGKNRPGE-RUKGUBFJSA-N CHEBI:6359 UMLS:C0719221 UMLS:C0019151 lactulose,bifiteral,cephulac,D-Lactulose,isolactose,lactulose hydrate Hepatic encephalopathy | ||
057Y626693 CHEBI:7507 UMLS:C0027607 UMLS:C0019151 neomycin sulfate,neomycin,bykomycin,endomixin,fradiomycin sulfate,mycerin sulfate,neomycin sulphate Hepatic encephalopathy | ||
C0027607 CHEBI:7507 UMLS:C0027607 UMLS:C0019151 neomycin sulfate,neomycin,bykomycin,endomixin,fradiomycin sulfate,mycerin sulfate,neomycin sulphate Hepatic encephalopathy | ||
I16QD7X297 CHEBI:7507 UMLS:C0027607 UMLS:C0019151 neomycin sulfate,neomycin,bykomycin,endomixin,fradiomycin sulfate,mycerin sulfate,neomycin sulphate Hepatic encephalopathy | ||
NZCRJKRKKOLAOJ-XRCRFVBUSA-N CHEBI:75246 UMLS:C0073374 UMLS:C0019151 rifaximine,rifaximin,xifaxan,refaximin Hepatic encephalopathy | ||
BJOLKYGKSZKIGU-UHFFFAOYSA-N CHEBI:4753 UMLS:C0301366 UMLS:C0271093 echothiophate,ecothiopate,ecothiophate,phospholine,echothiophate iodide Stargardt's disease | ||
ZKHQWZAMYRWXGA-KQYNXXCUSA-N CHEBI:15422 UMLS:C0001480 UMLS:C0428974 adenosine triphosphate,ATP,triphosphoric acid adenosine ester,adenosine triphosphate disodium hydrate Supraventricular arrhythmia | ||
CJDRUOGAGYHKKD-RQBLFBSQSA-N CHEBI:28462 UMLS:C0001888 UMLS:C0428974 ajmaline hydrochloride,ajmaline HCl,aritmina,rauverid,ajmaline,(+)-Ajmaline,ajmalin,cardiorythmine,gilurytmal,raugalline Supraventricular arrhythmia | ||
NZLBHDRPUJLHCE-UHFFFAOYSA-N CHEBI:135370 UMLS:C1448442 UMLS:C0428974 aprindine,aprindin,aprinidine,aprindine hydrochloride,aprindine HCl,amidonal,fiboran Supraventricular arrhythmia | ||
IXLGLCQSNUMEGQ-PYJPINIGSA-N CHEBI:135740 UMLS:C0075765 UMLS:C0428974 detajmium,detajmium bitartrate,tachmalcor,detajmium bitartrate hydrate Supraventricular arrhythmia | ||
PQXGNJKJMFUPPM-UHFFFAOYSA-N CHEBI:134732 UMLS:C0059688 UMLS:C0428974 ethacizine,ethacizine hydrochloride,etacizin,ethacizin,ethacizine HCl Supraventricular arrhythmia |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
#!/bin/bash | ||
|
||
# get header | ||
head -1 ../DrugCentral_creative/data_full.tsv > data.tsv | ||
|
||
# sample five random UMLS disease IDs | ||
awk -F"\t" -v OFS="\t" '{print $4}' ../DrugCentral_creative/data_full.tsv | sort -u | shuf | head -5 | sort > selected_diseases.txt | ||
|
||
# retrieve all records for those selected diseases | ||
sort -t $'\t' -k4 ../DrugCentral_creative/data_full.tsv > data_full_sorted.tsv | ||
join -t $'\t' -1 1 -2 4 -o 2.1 2.2 2.3 2.4 2.5 2.6 selected_diseases.txt data_full_sorted.tsv >> data.tsv | ||
|
||
# cleanup | ||
rm data_full_sorted.tsv |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
UMLS:C0002994 | ||
UMLS:C0014013 | ||
UMLS:C0019151 | ||
UMLS:C0271093 | ||
UMLS:C0428974 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,29 @@ | ||
{ | ||
"message": { | ||
"query_graph": { | ||
"nodes": { | ||
"disease_umls": { | ||
"ids": [], | ||
"categories": [ | ||
"biolink:DiseaseOrPhenotypicFeature" | ||
] | ||
}, | ||
"chebi": { | ||
"categories": [ | ||
"biolink:SmallMolecule" | ||
] | ||
} | ||
}, | ||
"edges": { | ||
"e01": { | ||
"object": "disease_umls", | ||
"subject": "chebi", | ||
"predicates": [ | ||
"biolink:treats" | ||
], | ||
"knowledge_type": "inferred" | ||
} | ||
} | ||
} | ||
} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters