Releases · gbouras13/hybracter

30 Jan 03:36

gbouras13

v0.11.2

bbed2fc

v0.11.2 Latest

Latest

Changes Medaka env to use pip - see this issue nanoporetech/medaka#547 - thanks William Shropshire for investigating and alerting me to this and Matthew Croxen for noticing in the first place
Using pip is 5-10x faster

Assets 2

21 Jan 00:09

gbouras13

v0.11.1

3d8db03

v0.11.1

Bug fix for --contaminants that was profoundly broken #115 thanks @nbat64
Bug fix to support dnaapler v1.1.0 with --db "dnaa,cog1474,repa" #116

Contributors

nbat64

Assets 2

04 Dec 07:22

gbouras13

v0.11.0

214833d

v0.11.0

Replaces kmc with lrge when using --auto, a much faster tool designed for the purpose of estimating genome size from long reads. It is very very fast and robust. Thanks @mbhall88 !
- If you input has more than 10000 long reads (it probably should!), lrge will run in default settings. If it has under this, then it will run a (slightly) more computationally expensive all-vs-all mode with all input reads. In practice, if you have low read counts, you probably should take downstream analysis (inclduing lrge and hybracter) with caution anyway.
- According to the preprint (and my less exhaustive testing), lrge is more accurate and much faster than kmc, but I would still be careful using it on data that has lower quality than < Q15.
Nothing else changes - the estimated chromosome size used by Hybracter will still be 80% of the estimate, as it needs to account for plasmids
Adds r1041_e82_400bps_bacterial_methylation as an option for --medakaModel thanks to this issue.
- Note this won't work if you run hybracter on a Mac (as medaka v2 is not available)

Contributors

mbhall88

Assets 2

14 Nov 12:43

gbouras13

v0.10.1

4881a9b

v0.10.1

Adds retry functionality for dnaapler with 1 thread if there is an error - for some genomes on some systems, it was observed that using the default resources (8 threads, 16GB RAM) will lead to an error in dnaapler.
Thanks @richardstoeckl for implementing this
Thanks to some feedback from @oschwengers, modify documentation to warn users that for low quality long read sets (e.g. R9 FAST/HAC or sub Q15 reads), --auto is not recommended. It can tend to overestimate the chromosome size as more erroneous 21-mers will be counted by kmc than expected. Please specify a chromosome size for this type of data going forward.

Contributors

oschwengers and richardstoeckl

Assets 2

18 Oct 08:16

gbouras13

v0.10.0

9bf944a

v0.10.0

Updates Medaka to v2.0.1, implementing the --bacteria option by default.
This is based on the recommendations of Ryan Wick here who found it improved assemblies due to (likely) enhanced methylation error correction.
If you still want to specify a Medaka model, the flag --medaka_override has been added. You need to include this along with your model via --medakaModel. This is most likely useful for older R9 data.
Adds --extra_params_flye parameter if you want to specify extra commands for the Flye assembly step thanks @pdobbler #101

Contributors

pdobbler

Assets 2

07 Oct 23:02

gbouras13

v0.9.1

abf66df

v0.9.1

Small change to the plassembler.yaml config and plassembler rules preventing installation bugs - Unicycler v0.5.1 to be installed in a much simpler fashion via Bioconda thanks @npbhavya. Installation should be a lot less fragile now
The crappy workaround was because Unicycler conda package for MacOS was not built/broken for v0.5.0 - thanks @mencian @tcezard for fixing v0.5.1 bioconda/bioconda-recipes#49602

Contributors

tcezard, npbhavya, and mencian

Assets 2

18 Sep 01:28

gbouras13

v0.9.0

3bbae0b

v0.9.0

--auto for automatic estimation of chromosome size

Thanks to an issue and code from @richardstoeckl, Hybracter can now estimate the estimated chromosome size for each sample by passing --auto.
The implementation uses kmc. Specifically, Hybracter uses kmc to count the number of unique 21mers that appear at least 10 times in your long-read FASTQ file. This is because, for a given assembly of length L, and a k-mer size of k, the total number of unique possible k-mers will be given by ( L – k ) + 1, and if L >> k, then it suffices as an estimate of total assembly size
The estimated chromosome size used by Hybracter will actually be 80% of the number of 21-mers found at least 10 times, as it needs to account for plasmids
If you aren't sure whether you have enough data for assembly (i.e. coverage lower than 20x), be careful using --auto, because the actual assembly size will tend to be larger than the number of unique 21mers found at least 10 times. Therefore, the estimated chromosome size will almost certainly be an underestimate and may lead to Hybracter considering your assembly "complete" when in fact it isn't.
If you use --auto, you do not need to specify the chromosome length in the input. This means you don't need to -c with long-single or hybrid-single and in the input csv sample sheet, you do not need a column with chromosome length.

e.g. for hybracter long you only need 2 columns with sample name and long-read FASTQ file path:

s_aureus_sample1,sample1_long_read.fastq.gz
p_aeruginosa_sample2,sample2_long_read.fastq.gz

and for hybracter hybrid you only need 4 columns with sample name, long-read FASTQ, and R1 and R2 short-read FASTQ file paths:

s_aureus_sample1,sample1_long_read.fastq.gz,sample1_SR_R1.fastq.gz,sample1_SR_R2.fastq.gz
p_aeruginosa_sample2,sample2_long_read.fastq.gz,sample2_SR_R1.fastq.gz,sample2_SR_R2.fastq.gz

Other changes

Hybracter v0.9.0 will automatically support the reorientation of archaeal chromosomes (thanks @richardstoeckl) to begin with the cog1474 Orc1/cdc6 gene.
--datadir can now also accept 2 paths separated by a comma, if you have long reads and short reads in separate directories e.g. --datadir "long_read_dir,short_read_dir" (#76).
--min_depth parameter added. Hybracter will error out if your QC'd long reads have a coverage lower than min_depth for a sample (#89).

Assets 2

03 Sep 06:55

gbouras13

v0.8.0

14d35b3

v0.8.0

Adds --datadir that removes the need to add full paths in sample sheet (thanks @oschwengers)
Update medaka to v1.12.1 to support the newest models (#84 )
- New default medaka model is r1041_e82_400bps_sup_v5.0.0
Adds --mac flag if you are running Hybracter on MacOS - it is now recommended from to run Hybracter on Linux if you want the latest Medaka models.
- This is because ONT do not support bioconda install anymore and the latest version (v1.12.1) from pip doesn't work on Mac
- --mac will install and run Medaka v1.8.0 as in previous versions and use r1041_e82_400bps_sup_v4.2.0 as default

Contributors

oschwengers

Assets 2

04 Apr 21:54

gbouras13

v0.7.3

cbfbbe1

0.7.3

Enforce spades>=v3.15.2 in the plassembler.yaml environment
For some reason, the environment on Linux environments was being solved for v3.14.1, which was causing an error with Unicycler within Plassembler for some samples described (rrwick/Unicycler#318)

Assets 2

02 Apr 05:55

gbouras13

v0.7.2

3706a74

v0.7.2

Adds 'circular=True' to chromosome contig headers where Flye has marked these as such. A bug was introduced in v0.7.0.
Thanks Nicole Lerminiaux for spotting this

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Contributors

Contributors

Contributors

Contributors

Contributors

Contributors

Releases: gbouras13/hybracter

v0.11.2

v0.11.1

Contributors

v0.11.0

Contributors

v0.10.1

Contributors

v0.10.0

Contributors

v0.9.1

Contributors

v0.9.0

v0.8.0

Contributors

0.7.3

v0.7.2