-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can we used IsoCon for a organism with a large polyploid genome / PBS: job killed: mem 127935680kb exceeded limit 125829120kb #4
Comments
Hi Virgg, IsoCon is designed for targeted Iso-Seq data and this is why you see the long runtime and large memory consumption on a nontargeted dataset, for more details on why see issue 3. However, IsoCon could work (both better and way faster/less memory) if this dataset could be batched into subsets of similar CCS reads (e.g., roughly similar lengths and sequences), see issue 2. This should be "fairly straightforward" by either CCS-read-to-read alignment (minimap2 CCS all-vs all) or alignment to reference. Given that you have reached 23 iterations in the correction phase, the sequences in I might also add that we are working on making a nontargeted version of IsoCon that will use minimap2 for alignments, which will reduce both speed and memory. Best, |
minor correction: Iteration 22 is the last correct one as the file in iteration 23 seems to be truncated (looking at filesize). So statistical test with |
Thanks for you answers. I'll let you know. |
Hi Kristoffer, I run So, I tried to run: I didn't investigate to much but do you have an idea of the problem ? |
Hi Virgg, Yes, it should be stat_filter for running only the statistical test, this is an error in the documentation that I should fix. Regarding the runtime error in stat_filter: you are giving a fastq file to the
Otherwise, with a read fasta file, you have to run
Best, |
Thank you ! It's running, I'll let you know. |
Hi Kristoffer, I have run: which has generated the files listed below and stopped with the error "IOError: truncated file". ###################### uqvperlo@tinaroo1:.../isoconOutput2> ls -l ########################################## ########################################## Do you have an idea about what can generated this pysam error ? Virgg |
Hi Virgg, This should not happen. At the very start of the run, IsoCon prints current parameter settings. Can you confirm that the
The reason I ask is that the pysam error is generated in a code segment that is entered only if the On another note, you can always resume IsoCon from the last |
Hi again, Have you tried pacbio's Iso-Seq ToFU or ToFU2 pipeline for your dataset? I saw that both the ToFU and ToFU2 pipelines contains a separate program Note: The precluster step in ToFU2 is much more sophisticated than preCluster in ToFU as it separates on sequence similarity and not just on the length of the reads. However, I think any of them will give an improvement. |
Hi Kristoffer, I have run
|
Yes, IsoCon has finished without error. My guess is that runtime will be greatly improved with preCluster, but will give fairly similar predictions. However, running IsoCon on a nontargeted dataset with The redundancy is because IsoCon won't statistically validate (and filter) transcripts that differs where they have been cut in ends, with |
Thank you ! |
Hi,
I tried to used IsoCon for the sugarcane transcriptome (10Gb Genome, highly polyploidy, 100-130 chromosomes, Aneuploid with varying ploidy level).
And after 7 days with 24CPU, I had this error, the memory allocated has been exceeded (120gb).
########################### Execution Started #############################
/var/spool/pbs/mom_priv/jobs/100062.tinmgmr1.SC: line 17: : command not found
/var/spool/pbs/mom_priv/jobs/100062.tinmgmr1.SC: line 20: : command not found
=>> PBS: job killed: mem 127935680kb exceeded limit 125829120kb
########################### Job Execution History #############################
JobName:IsoConSugar
SessionId:24450
ResourcesRequested:mem=120gb,ncpus=24,place=free,walltime=326:00:00
ResourcesUsed:cpupercent=2400,cput=1596:37:06,mem=127935680kb,ncpus=24,vmem=211531212kb,walltime=136:17:10
QueueUsed:Long
ExitStatus:271
The files generated are:
4096 Apr 3 18:31 alignments ## empty repertory
78263932 Apr 6 15:08 candidates_step_10.fa
78310448 Apr 6 20:41 candidates_step_11.fa
78330217 Apr 7 02:16 candidates_step_12.fa
78345513 Apr 7 07:50 candidates_step_13.fa
78355500 Apr 7 13:27 candidates_step_14.fa
78359544 Apr 7 19:00 candidates_step_15.fa
78364148 Apr 8 00:35 candidates_step_16.fa
78365950 Apr 8 06:10 candidates_step_17.fa
78366454 Apr 8 11:43 candidates_step_18.fa
78366454 Apr 8 17:17 candidates_step_19.fa
78366454 Apr 8 22:53 candidates_step_20.fa
78366454 Apr 9 04:27 candidates_step_21.fa
78366454 Apr 9 10:01 candidates_step_22.fa
77594624 Apr 9 10:01 candidates_step_23.fa
62809976 Apr 4 16:32 candidates_step_2.fa
67331407 Apr 4 23:02 candidates_step_3.fa
72215051 Apr 5 05:12 candidates_step_4.fa
75426509 Apr 5 11:05 candidates_step_5.fa
77103366 Apr 5 16:46 candidates_step_6.fa
77782120 Apr 5 22:22 candidates_step_7.fa
78103952 Apr 6 03:59 candidates_step_8.fa
78212789 Apr 6 09:35 candidates_step_9.fa
0 Apr 3 18:31 filtered_reads.fa
0 Apr 3 18:31 logfile.txt
So, can we use IsoCon for organisms with large polyploid genome ?
If so do you have a way, a program that we can used to finish the process without to have to re-run and generated again, in this case these 23 files.fa.
Thank you,
Virgg
The text was updated successfully, but these errors were encountered: