-
Notifications
You must be signed in to change notification settings - Fork 10
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #67 from gbouras13/dev
Dev
- Loading branch information
Showing
11 changed files
with
296 additions
and
43 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,59 @@ | ||
|
||
# | ||
# hybracter | ||
# | ||
|
||
FROM --platform=linux/amd64 ubuntu:20.04 | ||
FROM staphb/unicycler:0.5.0 | ||
|
||
ENV DEBIAN_FRONTEND="noninteractive" | ||
|
||
ARG LIBFABRIC_VERSION=1.18.1 | ||
|
||
# Install required packages and dependencies | ||
RUN apt -y update \ | ||
&& apt -y install build-essential wget doxygen gnupg gnupg2 curl apt-transport-https software-properties-common \ | ||
git vim gfortran libtool python3-venv ninja-build python3-pip \ | ||
libnuma-dev python3-dev \ | ||
&& apt -y remove --purge --auto-remove cmake \ | ||
&& wget -O - https://apt.kitware.com/keys/kitware-archive-latest.asc 2>/dev/null\ | ||
| gpg --dearmor - | tee /etc/apt/trusted.gpg.d/kitware.gpg >/dev/null \ | ||
&& apt-add-repository -y "deb https://apt.kitware.com/ubuntu/ jammy-rc main" \ | ||
&& apt -y update | ||
|
||
# Build and install libfabric | ||
RUN (if [ -e /tmp/build ]; then rm -rf /tmp/build; fi;) \ | ||
&& mkdir -p /tmp/build \ | ||
&& cd /tmp/build \ | ||
&& wget https://github.com/ofiwg/libfabric/archive/refs/tags/v${LIBFABRIC_VERSION}.tar.gz \ | ||
&& tar xf v${LIBFABRIC_VERSION}.tar.gz \ | ||
&& cd libfabric-${LIBFABRIC_VERSION} \ | ||
&& ./autogen.sh \ | ||
&& ./configure \ | ||
&& make -j 16 \ | ||
&& make install | ||
|
||
# | ||
# Install miniforge | ||
# | ||
RUN set -eux ; \ | ||
curl -LO https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-Linux-x86_64.sh ; \ | ||
bash ./Miniforge3-* -b -p /opt/miniforge3 -s ; \ | ||
rm -rf ./Miniforge3-* | ||
ENV PATH /opt/miniforge3/bin:$PATH | ||
# | ||
# Install conda environment | ||
# | ||
ARG HYBRACTER_VERSION=0.7.1 | ||
RUN set -eux ; \ | ||
mamba install -y -c conda-forge -c bioconda -c defaults \ | ||
hybracter=${HYBRACTER_VERSION}=pyhdfd78af_0 | ||
ENV PATH /opt/miniforge3/bin:$PATH | ||
RUN conda clean -af -y | ||
|
||
RUN hybracter install --medaka | ||
RUN hybracter test-hybrid --threads 8 | ||
RUN hybracter test-long --threads 16 --conda-create-envs-only | ||
RUN rm -rf hybracter_out | ||
|
||
|
Large diffs are not rendered by default.
Oops, something went wrong.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,137 @@ | ||
`hybracter` creates a number of output files in different formats. | ||
|
||
# Main Output | ||
|
||
The main outputs are in the `FINAL_OUTPUT` directory. | ||
|
||
This directory will include: | ||
|
||
## Summary File | ||
|
||
1. `hybracter_summary.tsv` file. This gives the summary statistics for your assemblies with the following columns: | ||
|
||
|Sample |Complete (True or False) | Total_assembly_length | Number_of_contigs | Most_accurate_polishing_round | Longest_contig_length | Longest_contig_coverage| Number_circular_plasmids | | ||
|--------|-----------------------|-------------------------|-------------------|--------|--|--|--| | ||
|
||
|
||
## Summary Assemblies | ||
|
||
2. The `complete` and `incomplete` directories will contain the summary assemblies for all samples. | ||
|
||
All samples that are denoted by hybracter to be complete will have 5 outputs in the `complete` directory: | ||
|
||
* `sample`_summary.tsv containing the summary statistics for that sample. | ||
* `sample`_per_contig_stats.tsv containing the contig names, lengths, GC% and whether the contig is circular. | ||
* `sample`_final.fasta containing the final assembly for that sample. | ||
* `sample`_chromosome.fasta containing only the final chromosome(s) assembly for that sample. | ||
* `sample`_plasmid.fasta containing only the final plasmid(s) assembly for that sample. Note this may be empty. If this is empty, then that sample had no plasmids. | ||
* **Note** - there may be a number of non-circular "plasmid" contigs. Be careful assuming these are truly plasmids and check the plassmbler output in `supplementary_results`. These may be assembly artefacts that should be excluded, or indicate that your long- and short-read sets aren't well matched! | ||
|
||
All samples that are denoted by hybracter to be incomplete will have 3 outputs in the `incomplete` directory: | ||
|
||
* `sample`_summary.tsv containing the summary statistics for that sample. | ||
* `sample`_per_contig_stats.tsv containing the contig names, lengths, GC% and whether the contig is circular. | ||
* `sample`_final.fasta containing the final assembly for that sample. | ||
|
||
# Other Outputs | ||
|
||
## `supplementary_results` directory | ||
|
||
The `supplementary_results` directory contains a number of supplementary results that you might find useful: | ||
|
||
##### 1. `comparisons` directory | ||
|
||
* This directory contains visual representations comparing the effect of each polishing round for each sample using a modified version of Ryan Wick's [compare_assemblies.py script](https://github.com/rrwick/Perfect-bacterial-genome-tutorial/blob/main/scripts/compare_assemblies.py). An example is below | ||
|
||
``` | ||
contig_1 37368-37398: ACCATTTTTGTTTTATTTTTTGTAAAGACAC | ||
contig_1 37368-37397: ACCATTTTTGTTTTA-TTTTTGTAAAGACAC | ||
* | ||
contig_1 43247-43277: CAACGTTGTTTTCCCTGAGCCTAAATAACCA | ||
contig_1 43246-43276: CAACGTTGTTTTCCCCGAGCCTAAATAACCA | ||
* | ||
contig_1 44658-44688: CTTGATCTTTATCTATGATTTCATTAATACT | ||
contig_1 44657-44687: CTTGATCTTTATCTACGATTTCATTAATACT | ||
* | ||
``` | ||
|
||
* If this file is empty, there are no differences between assemblies | ||
|
||
##### 2. `intermediate_chromosome_assemblies` directory | ||
|
||
* This directory contains intermediate chromosome assemblies for all polishing rounds for each sample. | ||
|
||
##### 3. `flye_individual_summaries` directory | ||
|
||
* This directory contains individual sample summaries from Flye for all samples. | ||
|
||
##### 4. `plassembler_individual_summaries` directory | ||
|
||
* This directory contains individual sample summaries from Plassembler for each sample. | ||
|
||
##### 5. `plassembler_all_assembly_summary` directory | ||
|
||
* This directory contains individual sample summaries from Plassembler for all samples. | ||
|
||
##### 6. `pyrodigal_mean_length_summaries` directory | ||
|
||
* For `long`, this directory contains pyrodigal mean CDS length summary files for each polishing round for each sample. | ||
|
||
##### 7. `pyrodigal_mean_length_summaries_plassembler` directory | ||
|
||
* For `long`, this directory contains pyrodigal mean CDS length summary files for each polishing round for each sample for the plassembler assembled plasmids. | ||
|
||
## `processing` directory | ||
|
||
The `processing` directory will contain a number of intermediate directories whose information you might find useful: | ||
|
||
##### 1. `flye` directory | ||
|
||
* This directory will contain the Flye assembly output and associated intermediate files for each sample | ||
|
||
##### 2. `qc` directory | ||
|
||
This directory will contain the filtered, trimmed and contaminant removed FASTQ reads (where applicable) for each sample | ||
|
||
##### 3. `plassembler` directory | ||
|
||
* This directory will contain the Plassembler assembly output and associated intermediate files for each sample | ||
|
||
##### 4. `chrom_pre_polish` directory | ||
|
||
* This directory will contain the pre-polished chromosome assemblies for complete isolates | ||
|
||
##### 5. `complete` and `incomplete` directories | ||
|
||
* These directories will contain the medaka, polypolish and pypolca polishing and dnaapler reorientation intermediate files for each sample | ||
|
||
##### 6. `ale_out_files` directory | ||
|
||
* For `hybrid`, this directory will intermediate ALE files for each assembly polishing round internal to `hybracter` (so can be ignored). | ||
|
||
##### 7. `ale_scores_complete` and `ale_scores_incomplete` directories | ||
|
||
* These directories will containin ALE scores for each assembly polishing round. | ||
|
||
## `stderr` directory | ||
|
||
* This will contain log files for each program in `hybracter`. | ||
|
||
## `versions` directory | ||
|
||
* This will contain the specific versions used for each program in `hybracter`. | ||
|
||
## `flags` directory | ||
|
||
* This will contain flag files internal to `hybracter` (so can be ignored). | ||
|
||
## `completeness` directory | ||
|
||
* This will contain flag files internal to determine completeness internal to `hybracter` (so can be ignored). | ||
|
||
## `benchmarks` directory | ||
|
||
* This will contain benchmarking time and memory usage statistics for each program in hybracter. | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.