Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reformat error: #15

Open
qian-lab opened this issue Feb 27, 2025 · 1 comment
Open

Reformat error: #15

qian-lab opened this issue Feb 27, 2025 · 1 comment

Comments

@qian-lab
Copy link

Dear,

It throws "The header is expected to ends with index information similar to *:N:0:********"
It seems to read MGI header as ILLUMINA header? see the input fastq file content.
Could you help to find out the reason?

Thanks!

mgikit Version 1.0, linux Cent OS 7.9
R1 fastq text

@V350293461L4C001R00100002790/1
CTTAGGAGTCAGATAAGTCATTGGTT

CDCC?HBGD-EEEDECFA9FCDDF;C

$KitPath/mgikit reformat \ -f $DataPath/$Lane/${FlowID}_${Lane}_read_1.fq.gz \ -r $DataPath/$Lane/${FlowID}_${Lane}_read_2.fq.gz \ --instrument $MachineID \ --output ~/output \ --sample-index 1

[2025-02-27T12:03:36Z INFO mgikit] Complete Command: /share/apps/software/mgikit/mgikit reformat -f /bak/mgiseq2000/R10040100200014/V350293461/L04/V350293461_L04_read_1.fq.gz -r /bak/mgiseq2000/R10040100200014/V350293461/L04/V350293461_L04_read_2.fq.gz --instrument R10040100200014 --output /share/home/qjb/output --sample-index 1
[2025-02-27T12:03:36Z INFO mgikit] Exection start time: 2025-02-27T20:03:36.683819769+08:00
[2025-02-27T12:03:36Z INFO mgikit] Paired ended read input was detected!
[2025-02-27T12:03:36Z INFO mgikit] Paired read or R1: /bak/mgiseq2000/R10040100200014/V350293461/L04/V350293461_L04_read_1.fq.gz
[2025-02-27T12:03:36Z INFO mgikit] Read with Barcode or R2: /bak/mgiseq2000/R10040100200014/V350293461/L04/V350293461_L04_read_2.fq.gz
[2025-02-27T12:03:36Z INFO mgikit] The same output directory will be used for reports.
[2025-02-27T12:03:36Z INFO mgikit] A directory is created: /share/home/qjb/output
[2025-02-27T12:03:36Z INFO mgikit] Output directory: /share/home/qjb/output
[2025-02-27T12:03:36Z INFO mgikit] Reports directory: /share/home/qjb/output
[2025-02-27T12:03:36Z INFO mgikit] Instrumnet: R10040100200014
[2025-02-27T12:03:36Z INFO mgikit] Run: 20250225155407
[2025-02-27T12:03:36Z INFO mgikit] Lane: L04
[2025-02-27T12:03:36Z INFO mgikit] Sample Barcode:
[2025-02-27T12:03:36Z INFO mgikit] Compression level: 1. (0 no compression but fast, 12 best compression but slow.)
[2025-02-27T12:03:36Z INFO mgikit] Read header and Output files: Illumina format.
[2025-02-27T12:03:36Z INFO mgikit] Output buffer size: 67108864
[2025-02-27T12:03:36Z INFO mgikit] Compression buffer size: 131072
[2025-02-27T12:03:36Z INFO mgikit] Detected flowcell from the header of the first read is V350293461.
thread 'main' panicked at src/lib.rs:3333:19:
The header is expected to ends with index information similar to *:N:0:********
stack backtrace:
0: 0x56173d44c1fc - <std::sys::backtrace::BacktraceLock::print::DisplayBacktrace as core::fmt::Display>::fmt::ha4a311b32f6b4ad8
1: 0x56173d39a393 - core::fmt::write::h1866771663f62b81
2: 0x56173d41e312 - std::io::Write::write_fmt::hb549e7444823135e
3: 0x56173d450e43 - std::sys::backtrace::BacktraceLock::print::hddd3a9918ce29aa7
4: 0x56173d451347 - std::panicking::rust_panic_with_hook::he21644cc2707f2c4
5: 0x56173d450ee5 - std::panicking::begin_panic_handler::{{closure}}::h42f7c414fed3cad9
6: 0x56173d450e79 - std::sys::backtrace::__rust_end_short_backtrace::ha26cf5766b4e8c65
7: 0x56173d450e6c - rust_begin_unwind
8: 0x56173d32d5af - core::panicking::panic_fmt::h74866b78e934b1c0
9: 0x56173d40974a - mgikit::reformat::hda303809b91f1cf6
10: 0x56173d35272f - mgikit::main::h15ab00e2ec8c6594
11: 0x56173d33bd33 - std::sys::backtrace::__rust_begin_short_backtrace::h89f0786b8e2a6ffd
12: 0x56173d355c30 - main
13: 0x2accdf85b555 - __libc_start_main
14: 0x56173d336d03 -
15: 0x0 -
Aborted (core dumped)

@deerhunter4
Copy link

Hello,
I had a similar problem. The issue here is that you are using fastq files generated by MGI sequencer during demultiplexing that look like this:

@FT100052745L1C001R00100000441/1
CCTACGGGGGGCAGCAGTAGGGAATCTTCCGCAATGGACGAAAGTCTGACGGAGCAACGCCGCGTGAGTGATGAAGGTTTTCGGATCGTAAAGCTCTGTTGTTAGGGAAGAACAAGTACCGTTCGAATAGGGCGGTACCTTGACGGTACCTAACCAGAAAGCCACGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGTGGCAAGCGTTGTCCGGAATTATTGGGCGTAAAGGGCTCGCAGGCGGTTCCTTAAGTCTGATGTGAAAGCCCCCGGCTCAACCGGGGAGGGTCATTGG
+
IIIIIIIIIIIIGIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIFIIIIIIIIIIIIIIIIIIIII=IIIIIIIIIIIIIIIIIIIIIIIIIDIII;IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
@FT100052745L1C001R00100001173/1

and you should use files generated by splitBarcode that look like this:

@FT100052745L1C001R00100000441 1:N:0:TCCGTTGAAT
CCTACGGGGGGCAGCAGTAGGGAATCTTCCGCAATGGACGAAAGTCTGACGGAGCAACGCCGCGTGAGTGATGAAGGTTTTCGGATCGTAAAGCTCTGTTGTTAGGGAAGAACAAGTACCGTTCGAATAGGGCGGTACCTTGACGGTACCTAACCAGAAAGCCACGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGTGGCAAGCGTTGTCCGGAATTATTGGGCGTAAAGGGCTCGCAGGCGGTTCCTTAAGTCTGATGTGAAAGCCCCCGGCTCAACCGGGGAGGGTCATTGG
+
IIIIIIIIIIIIGIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIFIIIIIIIIIIIIIIIIIIIII=IIIIIIIIIIIIIIIIIIIIIIIIIDIII;IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
@FT100052745L1C001R00100001173 1:N:0:TCCGTTGAAT

So what you need to do before using reformat command is to change the headers of reads in your files. You can do it using this code for read_1 (forward):

zcat samplename_1.fq.gz | awk 'NR % 4 == 1 {sub(/\/1$/, " 1:N:0:TCCGTTGAAT")} {print}' | gzip > modified_samplename_1.fq.gz

and this for read_2 (reverse):

zcat samplename_2.fq.gz | awk 'NR % 4 == 1 {sub(/\/2$/, " 2:N:0:ACCGTTGAAT")} {print}' | gzip > modified_samplename_2.fq.gz

The nucleotide sequence in " 1:N:0:TCCGTTGAAT" has no significance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants