Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cryptic error with kb-python 0.29.1 and the "nucleus" workflow #283

Open
jdidion opened this issue Jan 23, 2025 · 9 comments
Open

Cryptic error with kb-python 0.29.1 and the "nucleus" workflow #283

jdidion opened this issue Jan 23, 2025 · 9 comments

Comments

@jdidion
Copy link

jdidion commented Jan 23, 2025

Describe the issue
I tried to process a relatively small fastq pair with the nucleus workflow and it fails with a cryptic error. This is using the latest version of kb-python (29.1) and a freshly built index. The same sample succeeds when run with the "standard" workflow. This sample also succeeds when run with either "standard" or "nucleus" workflow with kb-python 27.3.

What is the exact command that was run?

kb count -t 16 -m 255G -o . --verbose --h5ad --workflow nucleus -x 10XV3 --em \
  -i Homo_sapiens.GRCh38.ensembl109.Kallisto_29.kb_idx \
  -g Homo_sapiens.GRCh38.ensembl109.Kallisto_29.t2g.txt  \
  -c1 Homo_sapiens.GRCh38.ensembl109.Kallisto_29.cDNA.t2c.txt \
  -c2 Homo_sapiens.GRCh38.ensembl109.Kallisto_29.intron.t2c.txt \
  mysample.R1.fq.gz mysample.R2.fq.gz

Command output (with --verbose flag)

[2025-01-23 00:14:22,939]    INFO [count_lamanno] Using index Homo_sapiens.GRCh38.ensembl109.Kallisto_29.kb_idx to generate BUS file to . from
[2025-01-23 00:14:22,939]    INFO [count_lamanno]         mysample.R1.fq.gz
[2025-01-23 00:14:22,939]    INFO [count_lamanno]         mysample.R2.fq.gz
[2025-01-23 00:14:22,939]   DEBUG [count_lamanno] kallisto bus -i Homo_sapiens.GRCh38.ensembl109.Kallisto_29.kb_idx -o . -x 10XV3 -t 16 mysample.R1.fq.gz mysample.R2.fq.gz
[2025-01-23 00:14:23,040]   DEBUG [count_lamanno]
[2025-01-23 00:14:23,040]   DEBUG [count_lamanno] [bus] Note: Strand option was not specified; setting it to --fr-stranded for specified technology
[2025-01-23 00:15:31,587]   DEBUG [count_lamanno] [index] k-mer length: 31
[2025-01-23 00:16:47,145]   DEBUG [count_lamanno] terminate called after throwing an instance of 'std::length_error'
[2025-01-23 00:16:47,145]   DEBUG [count_lamanno] what():  vector::reserve
[2025-01-23 00:17:00,474]   ERROR [count_lamanno]
[bus] Note: Strand option was not specified; setting it to --fr-stranded for specified technology
[index] k-mer length: 31
terminate called after throwing an instance of 'std::length_error'
what():  vector::reserve
[2025-01-23 00:17:00,475]   ERROR [main] An exception occurred
Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/site-packages/kb_python/main.py", line 1926, in main
    COMMAND_TO_FUNCTION[args.command](parser, args, temp_dir=temp_dir)
  File "/opt/conda/lib/python3.10/site-packages/kb_python/main.py", line 676, in parse_count
    count_velocity(
  File "/opt/conda/lib/python3.10/site-packages/ngs_tools/logging.py", line 62, in inner
    return func(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/kb_python/count.py", line 2480, in count_velocity
    bus_result = kallisto_bus(
  File "/opt/conda/lib/python3.10/site-packages/kb_python/count.py", line 223, in kallisto_bus
    run_executable(command)
  File "/opt/conda/lib/python3.10/site-packages/kb_python/dry/__init__.py", line 25, in inner
    return func(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/kb_python/utils.py", line 203, in run_executable
    raise sp.CalledProcessError(p.returncode, ' '.join(command))
subprocess.CalledProcessError: Command '/opt/conda/lib/python3.10/site-packages/kb_python/bins/compiled/kallisto/kallisto bus -i Homo_sapiens.GRCh38.ensembl109.Kallisto_29.kb_idx -o . -x 10XV3 -t 16 mysample.R1.fq.gz mysample.R2.fq.gz' died with .
[2025-01-23 00:17:00,476]   DEBUG [main] Removing `./tmp` directory
@Yenaled
Copy link
Collaborator

Yenaled commented Jan 23, 2025

A couple of things:

  1. Newer versions of kb-python use the latest version of kallisto. The latest version of kallisto uses a different index (so you need to run "kb ref" again).
  2. The nucleus workflow is not being actively maintained in the newer versions of kallisto/kb-python. Instead, we recommend running --workflow=nac --sum=total to generate total counts (spliced counts plus nascent RNA counts).

@jdidion
Copy link
Author

jdidion commented Jan 23, 2025

Thanks @Yenaled , I didn't realize the workflow name had changed. Will try to re-run with --workflow nac.

@jdidion
Copy link
Author

jdidion commented Jan 23, 2025

The same error occurs when using the nac workflow:

Command:

kb count -t 16 -m 255G -o . --verbose --h5ad -x 10XV3 --em \
  --workflow nac \
  --sum nucleus \
  -i Homo_sapiens.GRCh38.ensembl109.Kallisto_29.kb_idx \
  -g Homo_sapiens.GRCh38.ensembl109.Kallisto_29.t2g.txt \
  -c1 Homo_sapiens.GRCh38.ensembl109.Kallisto_29.cDNA.t2c.txt \
  -c2 Homo_sapiens.GRCh38.ensembl109.Kallisto_29.intron.t2c.txt \
   mysample.R1.fq.gz  mysample.R2.fq.gz

Output:

[2025-01-23 01:58:57,754]    INFO [count_nac]         mysample.R1.fq.gz
[2025-01-23 01:58:57,754]    INFO [count_nac]         mysample.R2.fq.gz
[2025-01-23 01:58:57,754]   DEBUG [count_nac] kallisto bus -i Homo_sapiens.GRCh38.ensembl109.Kallisto_29.kb_idx -o . -x 10XV3 -t 16 mysample.R1.fq.gz mysample.R2.fq.gz
[2025-01-23 01:58:57,855]   DEBUG [count_nac]
[2025-01-23 01:58:57,855]   DEBUG [count_nac] [bus] Note: Strand option was not specified; setting it to --fr-stranded for specified technology
[2025-01-23 02:00:43,684]   DEBUG [count_nac] [index] k-mer length: 31
[2025-01-23 02:01:58,130]   DEBUG [count_nac] terminate called after throwing an instance of 'std::length_error'
[2025-01-23 02:01:58,130]   DEBUG [count_nac] what():  vector::reserve
[2025-01-23 02:02:14,671]   ERROR [count_nac]
[bus] Note: Strand option was not specified; setting it to --fr-stranded for specified technology
[index] k-mer length: 31
terminate called after throwing an instance of 'std::length_error'
what():  vector::reserve
[2025-01-23 02:02:14,671]   ERROR [main] An exception occurred
Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/site-packages/kb_python/main.py", line 1926, in main
    COMMAND_TO_FUNCTION[args.command](parser, args, temp_dir=temp_dir)
  File "/opt/conda/lib/python3.10/site-packages/kb_python/main.py", line 595, in parse_count
    count_nac(
  File "/opt/conda/lib/python3.10/site-packages/ngs_tools/logging.py", line 62, in inner
    return func(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/kb_python/count.py", line 1861, in count_nac
    bus_result = kallisto_bus(
  File "/opt/conda/lib/python3.10/site-packages/kb_python/count.py", line 223, in kallisto_bus
    run_executable(command)
  File "/opt/conda/lib/python3.10/site-packages/kb_python/dry/__init__.py", line 25, in inner
    return func(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/kb_python/utils.py", line 203, in run_executable
    raise sp.CalledProcessError(p.returncode, ' '.join(command))
subprocess.CalledProcessError: Command '/opt/conda/lib/python3.10/site-packages/kb_python/bins/compiled/kallisto/kallisto bus -i Homo_sapiens.GRCh38.ensembl109.Kallisto_29.kb_idx -o . -x 10XV3 -t 16 mysample.R1.fq.gz mysample.R2.fq.gz' died with .
[2025-01-23 02:02:15,325]   DEBUG [main] Removing `./tmp` directory

@Yenaled
Copy link
Collaborator

Yenaled commented Jan 23, 2025

You need to regenerate the index with kb ref using the nac workflow

@Yenaled
Copy link
Collaborator

Yenaled commented Jan 23, 2025

@jdidion
Copy link
Author

jdidion commented Jan 23, 2025

Thanks. FYI the documentation here hasn't been updated to reflect this change: https://www.kallistobus.tools/kb_usage/kb_ref/

@Yenaled
Copy link
Collaborator

Yenaled commented Jan 23, 2025

That documentation is old and a new one is in the works. Currently here https://kallisto.readthedocs.io/en/latest/index.html but not complete and the url will likely change

@jdidion
Copy link
Author

jdidion commented Jan 24, 2025

Thanks @Yenaled that looks to be the fix.

I encountered another error that seems to be due to a difference in the nucleus vs nac workflow - the --em option no longer works with nac (it is incompatible with the -s option in bustools), and in fact it seems the --em option is now hidden in bustools. Do you know how bustools handles multi-mappers now if not with EM?

Thanks

@Yenaled
Copy link
Collaborator

Yenaled commented Jan 24, 2025

The --em algorithm is hidden because it's not really used (and there are no studies showing that it's actually effective, especially with data like UMIs where the counts are sparsely distributed). There's a --mm option that distributes counts uniformly amongst multimappers. In any case, the nac index type is mostly to distribute counts amongst nascent/mature count matrices. I don't think it's a good way to handle multimappers.

For your case, since you want to just do quantification with nascent transcripts included and handling multimappers, I'd just create an index with --workflow=nac, then run kb count (however you'd like, including the multimapping options if you want) against that index using --workflow=standard. You'll get a single count matrix with your quantifications. Hope that makes sense! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants