-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mkVCF vcfcombine error #18
Comments
Can you provide the complete output? If you're running this on a SLURM hpc, then it would the "out" file. If you are running on a workstation in a bash terminal, then it would be what is output to the screen. This will help me understand the settings you are using and also allow me to identify the offending line of code. |
Hello Thanks for your help, |
These are the types of files that I was looking for. the You should have at least another out file from running mkVCF that would help me with your original issue. |
Further, dDocentHPC is compatible with the ddocent 2.7.8 dependencies and I do know there are unresolved issues with the 2.9.4 dependencies in dDocentHPC. That is probably why you're getting the mkVCF error. You can load the 2.7.8 dependencies here
|
Here are some output files from a test run, this has also the mkVCF output (the out files from above are from a run that did not reach mkVCF step). |
Thanks for this info, I will try v.2.7.8 now. (Sorry, I have overlooked this recommendation on Github) |
Hi, Related to this step: I assume it should be possible to provide an existing reference genome file to dDocentHPC, such as 'reference.fasta' in the classical dDocent script. I have provided a 'reference.fasta' file (also in the dDocentHPC format as 'reference.2.2.fasta') in the mkBAM folder and started the mkBAM analysis. mkBAM completes without error messages, but the -bam files are empty. How do I have to prepare an existing reference file for the mkBAM step. |
Installing ddocent 2.7.8 and bwa-meme seemed to solve the newest mkBAM issue. Will leave this open a little longer to confirm that mkVCF is working now |
Tried it till mkVCF and got this screen message. |
Getting back to the original error, which was not solved by the dDocent conda install package version,
vcfcombine ./raw.$CUTOFFS.vcf/raw.*.$CUTOFFS.vcf | sed -e 's/ \.\:/ \.\/\.\:/g' > TotalRawSNPs.$CUTOFFS.vcf My working version is from I hypothesize that the error is being triggered because there are no vcf files to combine, or they are all 0 bytes. To confirm, I would check to see if the sub vcf files were created by the solution aIf you do, indeed, have vcf files with data in CUTOFFS=2.2 #because these are the cutoffs you used in naming your ref
vcfcombine ./raw.$CUTOFFS.vcf/raw.*.$CUTOFFS.vcf | sed -e 's/ \.\:/ \.\/\.\:/g' > TotalRawSNPs.$CUTOFFS.vcf There are times where the sub vcf files are created in the dDocentHPC script, but they aren't combined into a single final file, but then the same line will run just fine on it's own. solution bif you don't have any vcf files or they are all empty, then the ls mapped.*.$CUTOFFS.bed | sed 's/mapped.//g' | sed 's/.bed//g' | shuf | $PARALLEL "freebayes -b cat.$CUTOFFS-RRG.bam -t mapped.{}.bed -v raw.{}.vcf -f reference.$CUTOFFS.fasta -p $PLOIDY -n $BEST_N_ALLELES -m $MIN_MAPPING_QUAL -q $MIN_BASE_QUAL -E $HAPLOTYPE_LENGTH --min-repeat-entropy $MIN_REPEAT_ENTROPY --min-coverage $MIN_COVERAGE -F $MIN_ALT_FRACTION -C $FREEBAYES_C -G $FREEBAYES_G -3 $FREEBAYES_3 -e $FREEBAYES_e -z $FREEBAYES_z -Q $FREEBAYES_Q -U $FREEBAYES_U -$ $FREEBAYES_DOLLAR --populations popmap.$CUTOFFS ${FREEBAYES_r}${FREEBAYES_report_monomorphic}${FREEBAYES_w}${FREEBAYES_V}${FREEBAYES_a}${FREEBAYES_no_partial_observations}" Check that you have a Check that you have If you are missing either then let me know. solution cIf you have the files necessary for # in the config file under mkVCF:
2 Minimum Mean Depth of Coverage Per Individual Limit analysis to contigs with at least the specified mean depth of coverage per individual
0.375 freebayes -F --min-alternate-fraction (decimal 0-1) There must be at least 1 individual with this fraction of alt reads to evaluate the position. If your individuals are barcoded, then use 0.2. If your data is pooled, then set based upon ~ 1/(numIndivids * ploidy) and average depth of coverage
20 freebayes -C --min-alternate-count (integer) Require at least this count of observations supporting an alternate allele within a single individual in order to evaluate the position. default: 2
20 freebayes -G --min-alternate-total (integer) Require at least this count of observations supporting an alternate allele within the total population in order to use the allele in analysis. default: 1
# this one might take too long to run if you change it
no freebayes --report-monomorphic (no|yes) Report even loci which appear to be monomorphic, and report allconsidered alleles, even those which are not in called genotypes. Loci which do not have any potential alternates have '.' for ALT.
Let me know how this goes |
And for what it's worth, here are the packages in the container I use to run dDocentHPC on an hpc: cbird@e3-w6420b-20:~$ crun conda list
# packages in environment at /opt/ddocent:
#
# Name Version Build Channel
_libgcc_mutex 0.1 conda_forge conda-forge
_openmp_mutex 4.5 0_gnu conda-forge
atk 2.36.0 0 conda-forge
atk-1.0 2.36.0 haf93ef1_0 conda-forge
bedtools 2.29.2 hc088bd4_0 bioconda
bwa 0.7.17 hed695b0_7 bioconda
bzip2 1.0.8 h516909a_2 conda-forge
ca-certificates 2020.4.5.1 hecc5488_0 conda-forge
cairo 1.16.0 hcf35c78_1003 conda-forge
cd-hit 4.8.1 hdbcaa40_0 bioconda
certifi 2020.4.5.1 py37hc8dfbb8_0 conda-forge
chrpath 0.16 h14c3975_1001 conda-forge
coreutils 8.31 h516909a_0 conda-forge
curl 7.68.0 h33f0ec9_1 conda-forge
dbus 1.13.6 he372182_0 conda-forge
ddocent 2.7.8 3 bioconda
expat 2.2.9 he1b5a44_2 conda-forge
fastp 0.20.0 hdbcaa40_0 bioconda
fontconfig 2.13.1 h86ecdb6_1001 conda-forge
freebayes 1.3.2 py37hc088bd4_0 bioconda
freetype 2.10.1 he06d7ca_0 conda-forge
fribidi 1.0.9 h516909a_0 conda-forge
gdk-pixbuf 2.38.2 h3f25603_3 conda-forge
gettext 0.19.8.1 hc5be6a0_1002 conda-forge
giflib 5.2.1 h516909a_2 conda-forge
glib 2.58.3 py37he00f558_1003 conda-forge
gnuplot 5.2.7 h0fb2448_3 conda-forge
gobject-introspection 1.58.2 py37h619baee_1004 conda-forge
graphite2 1.3.13 he1b5a44_1001 conda-forge
grep 2.14 h14c3975_3 bioconda
gst-plugins-base 1.14.5 h0935bb2_2 conda-forge
gstreamer 1.14.5 h36ae1b5_2 conda-forge
gtk2 2.24.32 h586f36d_1 conda-forge
harfbuzz 2.4.0 h9f30f68_3 conda-forge
htslib 1.9 h4da6232_3 bioconda
icu 64.2 he1b5a44_1 conda-forge
jpeg 9c h14c3975_1001 conda-forge
krb5 1.17.1 h2fd8d38_0 conda-forge
ld_impl_linux-64 2.34 h53a641e_0 conda-forge
libclang 9.0.1 default_hde54327_0 conda-forge
libcurl 7.68.0 hf7181ac_1 conda-forge
libdeflate 1.5 h516909a_0 conda-forge
libedit 3.1.20170329 hf8c457e_1001 conda-forge
libffi 3.2.1 he1b5a44_1007 conda-forge
libgcc-ng 9.2.0 h24d8f2e_2 conda-forge
libgd 2.2.5 h307a58e_1007 conda-forge
libgomp 9.2.0 h24d8f2e_2 conda-forge
libiconv 1.15 h516909a_1006 conda-forge
libllvm9 9.0.1 hc9558a2_0 conda-forge
libpng 1.6.37 hed695b0_1 conda-forge
libssh2 1.8.2 h22169c7_2 conda-forge
libstdcxx-ng 9.2.0 hdf63c60_2 conda-forge
libtiff 4.1.0 hc3755c2_3 conda-forge
libuuid 2.32.1 h14c3975_1000 conda-forge
libwebp 1.0.2 h56121f0_5 conda-forge
libxcb 1.13 h14c3975_1002 conda-forge
libxkbcommon 0.10.0 he1b5a44_0 conda-forge
libxml2 2.9.10 hee79883_0 conda-forge
lz4-c 1.8.3 he1b5a44_1001 conda-forge
mawk 1.3.4 h14c3975_2 bioconda
ncurses 6.1 hf484d3e_1002 conda-forge
nspr 4.25 he1b5a44_0 conda-forge
nss 3.47 he751ad9_0 conda-forge
openssl 1.1.1f h516909a_0 conda-forge
pango 1.42.4 h7062337_4 conda-forge
parallel 20200322 0 conda-forge
pcre 8.44 he1b5a44_0 conda-forge
pear 0.9.6 h98de208_5 bioconda
perl 5.26.2 h516909a_1006 conda-forge
pip 20.0.2 py_2 conda-forge
pixman 0.38.0 h516909a_1003 conda-forge
pthread-stubs 0.4 h14c3975_1001 conda-forge
python 3.7.6 h8356626_5_cpython conda-forge
python_abi 3.7 1_cp37m conda-forge
qt 5.12.5 hd8c4c69_1 conda-forge
rainbow 2.0.4 h14c3975_3 bioconda
readline 8.0 hf8c457e_0 conda-forge
samtools 1.9 h10a08f8_12 bioconda
sed 4.7 h1bed415_1000 conda-forge
seqtk 1.3 hed695b0_2 bioconda
setuptools 46.1.3 py37hc8dfbb8_0 conda-forge
sqlite 3.30.1 hcee41ef_0 conda-forge
tk 8.6.10 hed695b0_0 conda-forge
unzip 6.0 h516909a_0 conda-forge
vcflib 1.0.0_rc3 py37hc088bd4_0 bioconda
vcftools 0.1.16 he860b03_3 bioconda
wheel 0.34.2 py_1 conda-forge
xorg-kbproto 1.0.7 h14c3975_1002 conda-forge
xorg-libice 1.0.10 h516909a_0 conda-forge
xorg-libsm 1.2.3 h84519dc_1000 conda-forge
xorg-libx11 1.6.9 h516909a_0 conda-forge
xorg-libxau 1.0.9 h14c3975_0 conda-forge
xorg-libxdmcp 1.1.3 h516909a_0 conda-forge
xorg-libxext 1.3.4 h516909a_0 conda-forge
xorg-libxrender 0.9.10 h516909a_1002 conda-forge
xorg-libxt 1.2.0 h516909a_0 conda-forge
xorg-renderproto 0.11.1 h14c3975_1002 conda-forge
xorg-xextproto 7.3.0 h14c3975_1002 conda-forge
xorg-xproto 7.0.31 h14c3975_1007 conda-forge
xz 5.2.5 h516909a_0 conda-forge
zlib 1.2.11 h516909a_1006 conda-forge
zstd 1.4.4 h3b9ef0a_2 conda-forge |
I do have the cat-file: I do have a mapped bed file: The raw.2.2.vcf folder is indeed empty. |
There's a mawk script starting on line 2121 of dDocentHPC.bash that splits the bed file into N files based upon the number of threads available. If you don't have a set of bed files then I think that's where the issue is. In the present dDocentHPC.bash script, the line of code that deletes those bed files is commented out, so they should remain. You can try copying this mawk script on 2121 into a text editor, remove the tabs, then paste into your terminal and run
Get Outlook for Android<https://aka.ms/AAb9ysg>
…________________________________
From: ARW-UBT ***@***.***>
Sent: Wednesday, January 10, 2024 3:06:42 AM
To: cbirdlab/dDocentHPC ***@***.***>
Cc: Bird, Chris ***@***.***>; Comment ***@***.***>
Subject: Re: [cbirdlab/dDocentHPC] mkVCF vcfcombine error (Issue #18)
Check that you have a cat.2.2-RRG.bam file with >0 bytes.
Check that you have mapped.*.2.2.bed files with >0 bytes.
I do have the cat-file:
241M 9. Jan 16:13 cat.2.2-RRG.bam
I do have a mapped bed file:
362K 9. Jan 16:13 mapped.2.2.bed
but I am not sure whether ther should me more of them according to your wildcard.
The raw.2.2.vcf folder is indeed empty.
—
Reply to this email directly, view it on GitHub<#18 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ADBV4S5Y2LCHQG6NEE35TETYNZDZFAVCNFSM6AAAAABA7TMZZOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOBUGM3DINRSHA>.
You are receiving this because you commented.Message ID: ***@***.***>
|
Additionally, to run the mawk script, you need to first set the variables
CUTOFFS=2.2
NUMProc=4 #this will cause 4 bed files to be made
and run the if then statement on 2111 (remove the tabs)
Get Outlook for Android<https://aka.ms/AAb9ysg>
…________________________________
From: Bird, Chris ***@***.***>
Sent: Wednesday, January 10, 2024 5:20:37 AM
To: cbirdlab/dDocentHPC ***@***.***>; cbirdlab/dDocentHPC ***@***.***>
Cc: Comment ***@***.***>
Subject: Re: [cbirdlab/dDocentHPC] mkVCF vcfcombine error (Issue #18)
There's a mawk script starting on line 2121 of dDocentHPC.bash that splits the bed file into N files based upon the number of threads available. If you don't have a set of bed files then I think that's where the issue is. In the present dDocentHPC.bash script, the line of code that deletes those bed files is commented out, so they should remain. You can try copying this mawk script on 2121 into a text editor, remove the tabs, then paste into your terminal and run
Get Outlook for Android<https://aka.ms/AAb9ysg>
________________________________
From: ARW-UBT ***@***.***>
Sent: Wednesday, January 10, 2024 3:06:42 AM
To: cbirdlab/dDocentHPC ***@***.***>
Cc: Bird, Chris ***@***.***>; Comment ***@***.***>
Subject: Re: [cbirdlab/dDocentHPC] mkVCF vcfcombine error (Issue #18)
Check that you have a cat.2.2-RRG.bam file with >0 bytes.
Check that you have mapped.*.2.2.bed files with >0 bytes.
I do have the cat-file:
241M 9. Jan 16:13 cat.2.2-RRG.bam
I do have a mapped bed file:
362K 9. Jan 16:13 mapped.2.2.bed
but I am not sure whether ther should me more of them according to your wildcard.
The raw.2.2.vcf folder is indeed empty.
—
Reply to this email directly, view it on GitHub<#18 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ADBV4S5Y2LCHQG6NEE35TETYNZDZFAVCNFSM6AAAAABA7TMZZOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOBUGM3DINRSHA>.
You are receiving this because you commented.Message ID: ***@***.***>
|
In order to check if the thinned out data in my test run is causing some of the issues, I decided to use my old 150 bp paired end date from the previous dDocent analyses again (which were sucessfully analyzed by dDocent (and Stacks)) and to feed them into dDocentHPC:
The mkVCF step started and was qiute busy for > 20 minutes, but did stop with the error below. Now, however, without the messages related to the missing vcf or bed files (they are there). |
Oops! I just realized that the 33 vcf files in the raw folder were created, but are emty... |
That's progress that you have the vcf files, troubleshooting combining them together should be straight forward. I had a conversation with chatgpt and it's primary conclusion "that there's a mismatch between the vcfcombine executable and the version of the library (libvcflib.so.1) it is trying to use" seems reasonable. I suspect that As a quick test, you might try changing to the newer ddocent environment and try running the vcfcombine step again. If you look over the chatgpt transcript, it also suggests some solutions to resolve the issue in the older ddocent environment. I additionally tried to have chatgpt narrow down where the libvcflib.so.1 files are located in your dir. |
The vcf files in the raw.2.2.vcf directory are empty (0 Bytes), I assume the vcfcombine will not work. After that I would rerun vcfcombine from the newest dDocent environment. Does this make sense? |
Hello,
The dDocentHPC analyses completed almost to the end, but I get at the very end the following error message:
vcfcombine: symbol lookup error: /home/bt140047/mambaforge/envs/ddocent/bin/../lib/libvcflib.so.1: undefined symbol: wavefront_align
The ddocent conda env contains that vcflib package:
Name Version Build Channel
vcflib 1.0.9 h146fbdb_4 bioconda
And it seems that freebayes needs this version:
$ mamba repoquery whoneeds vcflib
Executing the query vcflib
Name Version Build Depends Channel
ddocent 2.9.4 hdfd78af_1 vcflib >=0.1.11 bioconda
freebayes 1.3.7 h1870644_0 vcflib >=1.0.9,<2.0a0 bioconda
Does anybody have a suggestion how to proceed?
Best regards,
The text was updated successfully, but these errors were encountered: