Skip to content

Commit

Permalink
Merge remote-tracking branch 'origin/main' into lvreynoso/fix_long_re…
Browse files Browse the repository at this point in the history
…ad_byteranges
  • Loading branch information
lvreynoso committed Mar 7, 2024
2 parents 31ad4ad + a459b4c commit 09459f6
Show file tree
Hide file tree
Showing 9 changed files with 269 additions and 0 deletions.
10 changes: 10 additions & 0 deletions workflows/bulk-download/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# syntax=docker/dockerfile:1.4
FROM ubuntu:20.04

LABEL maintainer="CZ ID Team <idseq-tech@chanzuckerberg.com>"

RUN apt-get update && apt-get -y install python3 python3-pip zip
RUN ln -s /usr/bin/python3 /usr/bin/python

COPY requirements.txt requirements.txt
RUN pip3 install -r requirements.txt
4 changes: 4 additions & 0 deletions workflows/bulk-download/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# CZ ID Bulk Downloads Pipeline

# Changelog

52 changes: 52 additions & 0 deletions workflows/bulk-download/manifest.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
workflow_name: bulk-download
specification_language: WDL
description: Generate bulk downloads for the CZID web application
entity_inputs:
files:
name: Files
description: Files to zip or concatenate together
entity_type: file
multivalue: True
samples:
name: Samples
description: Optionally associate this bulk download with some samples
entity_type: sample
multivalue: True
required: False
raw_inputs:
bulk_download_type:
name: Bulk Download Type
description: Concatenate or zip files to create bulk download
type: str
values:
- concatenate
- zip
download_display_name:
name: Download Display Name
description: User facing name for the download
type: str
input_loaders:
- name: files
version: ">=0.0.1"
inputs:
files: ~
outputs:
files: ~
- name: passthrough
version: ">=0.0.1"
inputs:
bulk_download_type: ~
outputs:
bulk_download_type: action
- name: czid_docker
version: ">=0.0.1"
outputs:
docker_image_id: ~
output_loaders:
- name: bulk_download
version: ">=0.0.1"
inputs:
bulk_download_type: ~
download_display_name: ~
workflow_outputs:
file: "bulk_download.file"
3 changes: 3 additions & 0 deletions workflows/bulk-download/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
boto3
awscli
stream-unzip
65 changes: 65 additions & 0 deletions workflows/bulk-download/run.wdl
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
version 1.1

workflow bulk_download {
input {
String action
Array[File] files
String docker_image_id = "czid-bulk-download"
}

if (action == "concatenate") {
call concatenate {
input:
files = files,
docker_image_id = docker_image_id
}
}

if (action == "zip") {
call zip {
input:
files = files,
docker_image_id = docker_image_id
}
}

output {
File? file = select_first([ concatenate.file, zip.file ])
}
}

task concatenate {
input {
String docker_image_id
Array[File] files
}
command <<<
set -euxo pipefail
cat ~{sep=" " files} > concatenated.txt
>>>
output {
File file = "concatenated.txt"
}
runtime {
docker: docker_image_id
}
}

task zip {
input {
String docker_image_id
Array[File] files
}
command <<<
set -euxo pipefail

# Don't store full path of original files in the .zip file
zip --junk-paths result.zip ~{sep=" " files}
>>>
output {
File file = "result.zip"
}
runtime {
docker: docker_image_id
}
}
44 changes: 44 additions & 0 deletions workflows/bulk-download/test/host_filter_1.fastq
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
@M05295:617:000000000-KL64F:1:1101:3078:7376
TTTTGCCGTAACGGCTTTTTACCACAGCCAGCTTGCGGCGCAACACCTCCGCCAGAAAGTTGCCGTTGCCGCAGGCGGGTTCCAGAAAACGGCTCTCGATGCGCTCCGTCTCGCTCTTTACAAGGTCGCACATCGCCTTTACCTCC
+
CCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGDGGGGGGGGGGGGGGGGGGGGGGGGGGFGGGGGDGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG8DF@FDEGGGGGGGGGGGGGGGCDGGFFFFEFFFFFF>/
@M05295:617:000000000-KL64F:1:1101:3125:11405
TCTTTGGTATACTGCAGTGCTTATATGCGGTTTGCTGATTTTTTCGGCGGCAGCTTGTGCAGGAACGATTCTTTCCTGCAATAACCGGCTGAAAAGAAAAAGGAAAAAGATACGCAAGGCGGCACTCTTGTCAACTATGTGCATTA
+
CCCCCGGGGGGGGFFFDFGGGAFGCFGGEGGGG>AECFGEFFGGCEFGGEGGGGGGGGGGGGGGFGGGGFFGGGGGGFGGGGGGFG:FGGGFGGGG7DFCGGGGGDFGEGGGGGGCGGGFGEGGGGGFGCGGGGFF5D9CD7<DF+
@M05295:617:000000000-KL64F:1:1101:2016:13202
CCACCAAATAACACTCAAGGACTTCAAATGTCGGAGAGTGTGAGATGTTCTTTGAAAATTGAATAACGAAACAACAAAGAGGAAATTAAAGATATCCAATTAAAGAAATTTAATGGGTAAAATACAATTTCAAACAATTCTTCTGT
+
CCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGFFGGGGGGFGGGGGGGGGGGGGGFFFFFFFFFFFF
@M05295:617:000000000-KL64F:1:1101:2666:12975
CCTCTTTTTTTTGCAGAAGAGTACACAACTGCTTTATTTTATGCTAAAAGACCCCTGCCTACGCAAAGGCAGAGGTCCGATTTTTTCATAGTCTGGGGAGATAAAACAACTTTCCGATTTCACAGAATGCGCACGGCCTTCCAGAT
+
CCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGFFFFFFFFFBBF
@M05295:617:000000000-KL64F:1:2112:3938:13885
CCTCTTTTTTTTGCAGAAGAGTACACAACTGCTTTATTTTATGCTAAAAGACCCCTGCCTACGCAAAGGCAGAGGTCCGATTTTTTCATAGTCTGGGGAGATAAAACAACTTTCCGATTTCACAGAATGCGCACGGCCTTCCAGAT
+
CCCCCGGGGGGGGGGGGGGGGFGGGGGGGGGGGGGGGGGGFGGGGGGGGGGGGGGGGGGGGGGGGDEGGGGEGGCFGGGGGGEGGCGGGGGGFGG8FGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
@M05295:617:000000000-KL64F:1:2108:9015:16045
CCTCTTTTTTTTGCAGAAGAGTACACAACTGCTTTATTTTATGCTAAAAGACCCCTGCCTACGCAAAGGCAGAGGTCCGATTTTTTCATAGTCTGGGGAGATAAAACAACTTTCCGATTTCACAGAATGCGCACGGCCTTCCAGAT
+
CCCCCGGGGGGGGGGGGGGGGGFGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGFGGGGGGGGGDGGGGCGFGGGGGGGGGGGGGGGGGFGGGGGGGGGGGGGGGGGGGFGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGFG
@M05295:617:000000000-KL64F:1:2106:11795:12187
CCTCTTTTTTTTGCAGAAGAGTACACAACTGCTTTATTTTATGCTAAAAGACCCCTGCCTACGCAAAGGCAGAGGTCCGATTTTTTCATAGTCTGGGGAGATAAAACAACTTTCCGATTTCACAGAATGCGCACGGCCTTCCAGAT
+
CCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGFGFGGGGGGGGGCGGGGGGGDGEFCFFDFFCGGGGGGGFGGGGGGGGGGFEF8FDCEGGGGGGGGGD>FFGGGGGGGGGGGFGGGGDGGGGGGGGGGGGGGG
@M05295:617:000000000-KL64F:1:2117:7228:7910
CCTCTTTTTTTTGCAGAAGAGTACACAACTGCTTTATTTTATGCTAAAAGACCCCTGCCTACGCAAAGGCAGAGGTCCGATTTTTTCATAGTCTGGGGAGATAAAACAACTTTCCGATTTCACAGAATGCGCACGGCCTTCCAGAT
+
CCCCCGGGGGGGGGGGGGGGGFFGGGGGGFFGGGGGGGGGCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGCDGGGGGGGGGGGDGGGGFGGAFGG7FGDGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGF
@M05295:617:000000000-KL64F:1:1119:25439:5751
CCTCTTTTTTTTGCAGAAGAGTACACAACTGCTTTATTTTATGCTAAAAGACCCCTGCCTACGCAAAGGCAGAGGTCCGATTTTTTCATAGTCTGGGGAGATAAAACAACTTTCCGATTTCACAGAATGCGCACGGCCTTCCAGAT
+
CCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGDGFGGGGGGGGGGGGGGGGGGGGGGGGEGGGGGGGGGGGGGGGGGGGGGGGGGGGGGFGGGGGGGGGGGGGGGGF
@M05295:617:000000000-KL64F:1:2101:21157:16235
CCTCTTTTTTTTGCAGAAGAGTACACAACTGCTTTATTTTATGCTAAAAGACCCCTGCCTACGCAAAGGCAGAGGTCCGATTTTTTCATAGTCTGGGGAGATAAAACAACTTTCCGATTTCACAGAATGCGCACGGCCTTCCAGAT
+
CCCCCGGGGGGDGGGGFGGGGGGGGF@FGGGGG<FGGGGGGGGGFGGGGGGGFGEGDGGGGGGGGGCF@F<FFGGFGGGGG=@FGEGGGGFFA9E88CCEGGFAGFGGGGGFGG8CCCEEGGGGG??D??CGGD69DFGFF6DFGF
@M05295:617:000000000-KL64F:1:1101:1908:15400
CGCTCACATGAACGGAATAATACTCTCCCAAATATTCACTTCCCGCCCCATCTTTGTATACTTCCTCTGTTTCAAGATCATATGTTTGATATAAATACGCCTGTCCATCATTCTTCAGATCTACATAAGCAGCCCAGTCATCAATC
+
<8-AC@FCGFGGGGGCFFEGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGFGGGGFGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGFGGGGGGGGGGGGGGFGFGFGGCGFGGGFBFFFFBFBDDF9
44 changes: 44 additions & 0 deletions workflows/bulk-download/test/host_filter_2.fastq
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
@M05295:617:000000000-KL64F:1:1101:3078:7376
CTTTGCCCTCAGATTTGCTTTTGTACCAATTATAGCATATTTCCCGGTTAAATCCACAGATTTTTAGCTATTCGTTTCATCTCTTGAGCCGCTTGTCAAAAGGTACACTTTTTGGCAAGCCCTTCAAAGAGGTGGAACGAATGGCA
+
CCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGEGGGGGGGGGGGGGGGGGGFGFFFFDFFFFF5
@M05295:617:000000000-KL64F:1:1101:3125:11405
TCTTTGGTAAGTCCGAACAGATTTTATTCTACTCCTCGGGTGTTCTGAGCGATTGTTTGTGTTGAAAAGTCATTCAGGTCATAGTACCGCATTGCTTCTGTTCCGTCTTTGGCGATATACTCGACTAAAATGTAATGCTCGGTTAT
+
CCCCCFGFFFGF<FF@CFG8<<F<FGGFGGGGGGGGGGCFEGG7FGF9FC7@FGFAFGGGGGDCFEFGGGGGGGGFGGEGGGFGGGGGGGGGGGFGEGGC@EFGGFFGCGGFFGDEGGGGDFDGGGDCG?FFGBFFFFFBD>@ABD
@M05295:617:000000000-KL64F:1:1101:2016:13202
TTCAGTTCGGGCGGTTCCCCTCATATACCTATTTATTCAGTATATGATACATGGACTTGACTCCATGTGGATTGCTCCATTCGGACATCTACGGATCATATCGTGCTTGCCAATCCCCGTAGCTTTTCGCAGCTTACCACGTCCTT
+
CCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGFGFFGGGGDGDFFGGGGGGGFGFD?FFFFFFFF
@M05295:617:000000000-KL64F:1:1101:2666:12975
CCTTTTTTTTTTTTAACTGGAATCGACATTGATTTTTATATTCCGTCGGCAAGGCAGGTCGTTCAGGTGGCGTATTCCATTCAGGGGGATGCCTATGAGCGCGAAGTCGGAAATCTGAAAAAATTTGCAGCCACCACGACAGAAAC
+
CCCCCGGGGGGGGGGG9FAFGFGFGFGGGGGGGGGGGGGGGGGGGGGGDEGGGFGGDCGF:FGGGGGGGGFC=CFGGGGGFGGGGGGGGGGGGGFGGGGEEGDCC>EFGDEEGGGGGGGGFFDGGGGCDGG7DFGFFF5**@FFFA
@M05295:617:000000000-KL64F:1:2112:3938:13885
CCTTTTTTTTTTTTAACTGGAATCGACATTGATTTTTATATTCCGTCGGCAAGGCAGGTCGTTCAGGTGGCGTATTCCATTCAGGGGGATGCCTCTGAGCGCGAAGTCGGAAATCTGAAAAAATTTGCAGCCACCACGACAGAAAC
+
CCCCCGGGGGGGGGCCFF<FGFGGGFCFGGGGGGGGGGGEFGEFGGGGEG7FFDGFEGGG:CFEFFFGGGEGG+CFFGCGGGGGGGGD+?FFCF,AAF,,@B@F7C@D@CCEGGGFFGFGGDGGGGFFGGCFCFG6C>+;*7*AF5
@M05295:617:000000000-KL64F:1:2108:9015:16045
CCTTTTTTTTTTTTAACTGGAATCGACATTGATTTTTATATTCCGTCGGCAAGGCAGGTCGTTCAGGTGGCGTATTCCATTCAGGGGGATGCCTATGAGCGCGAAGTCGGAAATCTGAAAAAATTTGCAGCCACCACGACAGAAAC
+
CCCCCGGGGGGGGGGFEFG8F8AFGG:FFGFGGGGGGGGGGGGG,9BF:E:FD,@@::F8CCEFEG9FGC,7@CFDC9,CF<AFGFG:+84AAE,@9,E9CBCEE6+7BCEE6=6=F,EFGF>CDB:,CFF9CFFF677C57;>7*
@M05295:617:000000000-KL64F:1:2106:11795:12187
CCTTTTTTTTTTTTAACTGGAATCGACATTGATTTTTATATTCCGTCGGCAAGGCAGGTCGTTCAGGTGGCGTATTCCATTCAGGGGGATGCCTATGAGCGCGAAGTCGGAAATCTGAAAAAATTTGCAGCCACCACGACAGAAAC
+
CCCCCGGGGGGGGGGFGGG8EEFEFGGGGGG,FFGGGGGGGGGGFGG@CEFCDFGG77F@CFFGFGCFD@EFFDFEAEFGFGFGGCCC+@FFCFEFFFFFEGG>C6+@EEEC>DCFFGDGGDEGGGFGGGGFFD6CD66?7BFF5;
@M05295:617:000000000-KL64F:1:2117:7228:7910
CCTTTTTTTTTTTTAACTGGAATCGACATTGATTTTTATATTCCGTCGGCAAGGCAGGTCGTTCAGGTGGCGTATTCCATTCAGGGGGATGCCTATGAGCGCGAAGTCGGAAATCTGAAAAAATTTGCAGCCACCACGACAGAAAC
+
CCCCCGGGGGGGGGFE,C9,FFFGFCFGGGGGGFGGGFFCF<FFG8ECFGCFGGGGG7FFGGEF8FG<ECFGGGGGFGGF@9FEEGG+=F@8FF9F9AEEGGEGGGG<BFEEEFGFFGGGGGGGGGAFFFCFGGCCFCE5CGFF?8
@M05295:617:000000000-KL64F:1:1119:25439:5751
CCTTTTTTTTTTTTAACTGGAATCGACATTGATTTTTATATTCCGTCGGCAAGGCAGGTCGTTCAGGTGGCGTATTCCATTCAGGGGGATGCCTATGAGCGCGAAGTCGGAAATCTGAAAAAATTTGCAGCCACCACGACAGAAAC
+
CCCCCGGGGGGGGGGF9@FGGGFGGFEGGGGCFEGGGGFGGGGGGGGGGGDDGGGGEDGF=FE@FGGGGFEFGGGGGGGGGGCFGGECC@FGEGFDCDFDGGGGCE@FFCFGGGFCFGGGGGFGGFGGGFGCFGGGF>>CGGGGGD
@M05295:617:000000000-KL64F:1:2101:21157:16235
CCTTTTTTTTTTTTAACTGGAATCGACATTGATTTTTATATTCCGTCGGCAAGGCAGGTCGTTCAGGTGGCGTATTCCATTCAGGGGGATGCCTTTGAGCGCGAAGTCGGAAATCTGAAAAAATTTGCAGCCACCACGACAAAAAC
+
CCCCCGGGGGGGGG<<6@<C66CC8EDC@FC<FFF@FFF9FGGGGGFEGGDEEFC=CCF8FGGGGA?EE?8@F=FF9DE,FFBFDFEEECE8BF,E=,DFEDCEC>F8@CEGCDF:,,@+6=@F8FCGG6;+0=00*3**0**33:
@M05295:617:000000000-KL64F:1:1101:1908:15400
TGGTCCTCGCCGTTATGAATCTTACAAAGAAAGTATAGTTGACCATTCAGATATTTTATTGGATGAGAGGTATGCGGAAGCATGGGAATATAAGGACAATCCATTTATTTATGTATCTATCATAGGACCTATTTATGCAACGGGAA
+
CCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGDFGGGGGGGCFGGGDGFGGGGGGGGFFGGGGGGGGGGGGGGGGGGGGFGFGGGDGGGGGGGGGGGGDFGGGGGGGGGGGGGGGGGGGGG8DFGFFFFFFF5@A:
46 changes: 46 additions & 0 deletions workflows/bulk-download/test/test_wdl.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
import os
import zipfile
from test_util import WDLTestCase


def relpath(*args):
""" helper to get a filepath relative to the current file """
return os.path.join(os.path.dirname(__file__), *args)


class TestBulkDownloads(WDLTestCase):
"""Tests for bulk downloads"""
wdl = relpath("..", "run.wdl")
files = [relpath("host_filter_1.fastq"), relpath("host_filter_2.fastq")]

def testConcatenate(self):
"""
Test file concatenation
"""
# Calculate expected concatenation
concat_expected = ""
for file in self.files:
with open(file) as fp:
concat_expected += fp.read()

# Run task
result = self.run_miniwdl(task="concatenate", task_input={"files": self.files})

# Validate
with open(result["outputs"]["concatenate.file"]) as fp_concat_observed:
concat_observed = fp_concat_observed.read()
self.assertEqual(concat_observed, concat_expected)

def testZip(self):
"""
Test zipping files into 1 .zip file
"""
# Run task
result = self.run_miniwdl(task="zip", task_input={"files": self.files})

# Validate
archive = zipfile.ZipFile(result["outputs"]["zip.file"], "r")
for file in self.files:
contents = archive.read(os.path.basename(file))
with open(file) as fp:
self.assertEqual(contents.decode("utf-8"), fp.read())
1 change: 1 addition & 0 deletions workflows/consensus-genome/manifest.yml
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,7 @@ input_loaders:
accession: ~
reference_genome: ~
sars_cov_2: ~
creation_source: ~
outputs:
ref_accession_id: ~
ercc_fasta: ~
Expand Down

0 comments on commit 09459f6

Please sign in to comment.