DanaFarber code test repo

1 Prepare python environment

Python version: 3.8.0

pip install -r requirements.txt

2 Running

Task1.1 - Recursively find all FASTQ files in a directory and report each file name and the percent of sequences in that file that are greater than 30 nucleotides long.

python task1.py find_fastqs <fastq_directory>
# e.g.: python task1.py find_fastqs data/sample_files/fastq

Task1.2 - Given a FASTA file with DNA sequences, find 10 most frequent sequences and return the sequence and their counts in the file.

python task1.py parse_fasta <fasta_file>
# e.g.: python task1.py parse_fasta data/sample_files/fasta/sample.fasta

Task1.3 - Given a chromosome and coordinates, write a program for looking up its annotation. Keep in mind you'll be doing this annotation millions of times. Output Annotated file of gene name that input position overlaps.

python task1.py annotation <chromosome_coordinate_file> <annotation_file> <output_file>
# e.g.: python task1.py annotation data/sample_files/annotate/coordinates_to_annotate.txt data/sample_files/gtf/hg19_annotations.gtf output

Task2 - Report the mean target coverage for the intervals grouped by GC% bins.

python task2.py parse_coverage <coverage_file>
# e.g.: python task2.py parse_coverage data/Example.hs_intervals.txt

Task3.1 - Given a list of variant IDs, using Ensembl API retrieve information about alleles, locations, effects of variants in transcripts, and genes containing the transcripts.

python task3.py query_rsids <rs_id1,rs_id2>
# e.g.: python task3.py query_rsids rs56116432 or python task3.py query_rsids rs56116432,rs2332914

Task3.2 - Create a repository on GitHub and upload your code there. Make some minor changes to your code locally, and use a local Git installation to commit the changes to your GitHub repository.

This is the repository.

Cloud computing and SQL

My answer

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
README.md		README.md
cloud_computing_and_SQL.md		cloud_computing_and_SQL.md
requirements.txt		requirements.txt
task1.py		task1.py
task2.py		task2.py
task3.py		task3.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DanaFarber code test repo

1 Prepare python environment

2 Running

Task1.1 - Recursively find all FASTQ files in a directory and report each file name and the percent of sequences in that file that are greater than 30 nucleotides long.

Task1.2 - Given a FASTA file with DNA sequences, find 10 most frequent sequences and return the sequence and their counts in the file.

Task1.3 - Given a chromosome and coordinates, write a program for looking up its annotation. Keep in mind you'll be doing this annotation millions of times. Output Annotated file of gene name that input position overlaps.

Task2 - Report the mean target coverage for the intervals grouped by GC% bins.

Task3.1 - Given a list of variant IDs, using Ensembl API retrieve information about alleles, locations, effects of variants in transcripts, and genes containing the transcripts.

Task3.2 - Create a repository on GitHub and upload your code there. Make some minor changes to your code locally, and use a local Git installation to commit the changes to your GitHub repository.

Cloud computing and SQL

About

Releases

Packages

Languages

spsc83/DanaFarber

Folders and files

Latest commit

History

Repository files navigation

DanaFarber code test repo

1 Prepare python environment

2 Running

Task1.1 - Recursively find all FASTQ files in a directory and report each file name and the percent of sequences in that file that are greater than 30 nucleotides long.

Task1.2 - Given a FASTA file with DNA sequences, find 10 most frequent sequences and return the sequence and their counts in the file.

Task1.3 - Given a chromosome and coordinates, write a program for looking up its annotation. Keep in mind you'll be doing this annotation millions of times. Output Annotated file of gene name that input position overlaps.

Task2 - Report the mean target coverage for the intervals grouped by GC% bins.

Task3.1 - Given a list of variant IDs, using Ensembl API retrieve information about alleles, locations, effects of variants in transcripts, and genes containing the transcripts.

Task3.2 - Create a repository on GitHub and upload your code there. Make some minor changes to your code locally, and use a local Git installation to commit the changes to your GitHub repository.

Cloud computing and SQL

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages