Yet another Hi-C scaffolding tool

Related tags

Miscellaneous yahs
Overview

YaHS: yet another Hi-C scaffolding tool

Overview

YaHS is scaffolding tool using Hi-C data. It relies on a new algothrim for contig joining detection which considers the topological distribution of Hi-C signals aiming to distingush real interaction signals from mapping nosies. YaHS has been tested in a wide range of genome assemblies. Compared to other Hi-C scaffolding tools, it usually generates more contiguous scaffolds - especially with a higher N90 and L90 statistics. It is also super fast - takes less than 5 minutes to reconstruct the human genome from an assembly of 5,483 contigs with ~45X Hi-C data. See the poster presented in the Bioversity Genemics 2021 conference for more information.

Installation

You need to have a C compiler, GNU make and zlib development files installed. Download the source code from this repo or with git clone https://github.com/c-zhou/yahs.git. Then type make in the source code directory to compile.

Run YaHS

YaHS has two required inputs: a FASTA format file with contig sequences which need to be indexed (with samtools faidx for example) and a BAM/BED/BIN file with the alignment results of Hi-C reads to the contigs. A recommended way to generate the alignment file is to use the Arima Genomics' mapping pipeline. It is also recommened to mark PCR/optical duplicates. Several tools are available out there for marking duplicates such as bammarkduplicates2 from biobambam2 and MarkDuplicates from Picard. The resulted BAM file need to be sorted by read name before feeding to YaHS. This could be done with samtools sort with -n option. The BED format file can be generated from the BAM file (with bedtools bamtobed for example), but do NOT forget to filter out the PCR/optical duplicates. The BED format is accepted mainly to keep consistent with other Hi-C scaffolding tools. There is no need to convert the BAM format to BED format unless you want to compare YaHS to other tools. The BIN format is a binary format specific to YaHS. If the input file is BAM (with .bam extension) or BED (with .bed extension) format, the first step of YaHS is to convert them to BIN format (with .bin extension). This is to save running time as multiple rounds of file IO are needed during the scaffolding process. If you have run YaHS and need to rerun it, the BIN file in the output directory could be reused to save some time - although might be just a few minutes.

Here is an example to run YaHS,

yahs contigs.fa hic-to-contigs.bam

The outputs include several AGP format files and a FASTA format file. The *_inital_break_[0-9]{2}.agp AGP files are for initial assembly error corrections. The *_r[0-9]{2}.agp and related *_r[0-9]{2}_break.agp AGP files are for scaffolding results in each round. The *_scaffolds_final.agp and *_scaffolds_final.fa files are for the final scaffolding results.

There are some optional parameters.

With -o option, you can specify the prefix of the output files. It is ./yash.out by default. If a directory structure is included, the directory needs to be existed.

With -a option, you can specify a AGP format file to ask YaHS to do scaffolding with the scaffolds in the AGP file as the start point.

With -r option, you can specify a range of resultions. It is 50000,100000,200000,500000,1000000,2000000,5000000,10000000,20000000 by default and the upper limit is automatically adjusted with the genome size.

With --no-contig-ec option, you can skip the initial assembly error correction step. With -a option, this will be set automatically.

With --no-scaffold-ec option, YaHS will skip the scaffolding error check in each round. There will no *_r[0-9]{2}_break.agp AGP output files.

Generate HiC contact maps

YaHS offers some auxiliary tools to help generating HiC contact maps for visualisation. A demo is provided in the bash script scripts/run_yahs.sh. To generate and visualise a HiC contact map, the following tools are required.

The first step is to convert the HiC alignment file (BAM/BED/BIN) to a file required by juicer_tools using the tool juicer_pre provided by YaHS. To save time, BIN file is recommended which has already been generated in the scaffolding step. Here is an example bash command:

(juicer_pre hic-to-contigs.bin scaffolds_final.agp contigs.fa.fai | sort -k2,2d -k6,6d -T ./ --parallel=8 -S32G | awk 'NF' > alignments_sorted.txt.part) && (mv alignments_sorted.txt.part alignments_sorted.txt)

The tool juicer_pre takes three positional parameters: the alignments of HiC reads to contigs, the scaffold AGP file and the contig FASTA index file. With -o option, it will write the results to a file. Here, the outputs are directed to stdout as we need a sorted (by scaffold names) file for juicer_tools.

For sorting, we use 8 threads, 32Gb memory and the current directory for temporaries. You might need to adjust these settings according to your device.

The next step is to generate HiC contact matrix using juicer_tools. Here is an example bash command:

(java -jar -Xmx32G juicer_tools_1.22.01.jar pre --threads 12 alignments_sorted.txt out.hic.part scaffolds_final.chrom.sizes) && (mv out.hic.part out.hic) && (rm alignments_sorted.txt)

The juicer_tools's pre command takes three positional parameters: the sorted alignment file generated in the first step, the output file name and the file for scaffold sizes. The file for scaffold sizes should contain two columns - scaffold name and scaffold size, which can be taken from the first two columns of the FASTA index file.

Finally, the output file out.hic could be used for visualisation with Juicebox. More information about juicer_tools and Juicebox can be found here.

Other tools

  • agp_to_fasta creates a FASTA file from a AGP file. It takes two positional parameters: the AGP file and the contig FASTA file. By default, the output will be directed to stdout. You can write to a file with -o option. It also allows changing the FASTA line width with -l option, which by default is 60.

Limitations

YaHS is still under development and only tested with genome assemblies limited to a few species. You are welcomed to use it and report failures. Any suggestions would be appreciated.

Comments
  • Creating .hic and .assembly for editing in juicebox

    Creating .hic and .assembly for editing in juicebox

    Hi Chenxi,

    It looks like yahs will really speed up our scaffolding efforts - so far the scaffolded fastas are looking great. Awesome work! However, I'm having trouble creating the correct input files for our manual curation phase, editing the scaffolds in Juicebox. Our main goal is to relate the underlying contigs (especially when we use the yahs flag --no-contig-ec) to the assembled scaffolds and hic map.

    Using your provided juicebox_pre program and the Juicebox juicer pre, the resulting .hic and .assembly files are not correctly editable in Juicebox. I think SALSA users are having a similar issue https://github.com/marbl/SALSA/issues/154

    I think you can only create a draft assembly for editing with run-assembly-visualizer.sh (https://github.com/aidenlab/3d-dna/blob/master/visualize/run-asm-visualizer.sh). Normally our workflow looks like this. makeAgpFromFasta and agp2assembly.py are 3d-dna scripts. Matlock is provided by Phase Genomics - similar concept to your juicebox_pre to convert the alignments to alignments_sorted.txt.

    FA=yahs.out_scaffolds_final.fa
    makeAgpFromFasta.py $FA genome.agp
    agp2assembly.py genome.agp genome.assembly
    bwa index $FA
    bwa mem -5SP $FA *R1*fastq.gz *R2*fastq.gz | samblaster | samtools view -S -h -b -F 2316 > phasehic.aligned.bam
    matlock bam2 juicer phasehic.aligned.bam phasehic.links.txt
    sort -k2,2 -k6,6 phasehic.links.txt > phasehic.sorted.links.txt
    run-assembly-visualizer.sh -p false genome.assembly phasehic.sorted.links.txt
    

    It seems like I should be able to substitute the alignments_sorted.txt for phasehic.sorted.links.txt in our workflow, but alignments_sorted.txt is missing some columns. Maybe we just need to figure out how to fill these columns?

    [[email protected] yahs]$ head alignments_sorted.txt
    0	scaffold_1	100002074	0	1	scaffold_1	143542799	1
    0	scaffold_1	1000042	0	1	scaffold_1	1000223	1
    0	scaffold_1	100004260	0	1	scaffold_1	100004229	1
    0	scaffold_1	100004310	0	1	scaffold_1	100004310	1
    
    [[email protected] yahs]$ head phasehic.sorted.links
    0 scaffold_1 100002220 0 16 scaffold_1 100002439 1 1 - - 1  - - -
    0 scaffold_1 100002430 0 16 scaffold_1 100002441 1 1 - - 1  - - -
    0 scaffold_1 100002827 0 16 scaffold_1 96197983 1 1 - - 1  - - -
    0 scaffold_1 100002871 0 16 scaffold_1 100003104 1 1 - - 1  - - -
    

    Not sure if this is a very clear question. In short, can you provide any guidance on creating a .assembly and .hic file for the Juicebox run-assembly-visualizer.sh tool?

    Thank you! Amanda

    enhancement 
    opened by Astahlke 23
  • Starting scaffolding graph contruction

    Starting scaffolding graph contruction

    Hi Chenxi,

    Thank you for writing such a handy tool.

    However, when i used the yahs (version is 1.2a) for 1.1g genomes. it causes a segmentation fault:

    [I::find_re_from_seqs] NO. restriction enzyme cutting sites found in sequences: 5043048 [I::find_re_from_seqs] restriction enzyme cutting sites density: 0.004520 [I::run_yahs] RAM total: 376.153GB [I::run_yahs] RAM limit: 356.828GB [I::contig_error_break] dist threshold for contig error break: 19830000 [I::contig_error_break] performed 5 round assembly error correction. Made 52 breaks [I::print_asm_stats] assembly stats: [I::print_asm_stats] N50: 52262000 (n = 7) [I::print_asm_stats] N90: 6523145 (n = 30) [I::run_yahs] scaffolding round 1 resolution = 10000 [I::run_scaffolding] starting norm estimation... [I::run_scaffolding] starting link estimation... [I::inter_link_norms] using noise level 0.001048956614 [I::inter_link_norms] average link count: 4164.498389389475 1892474.000000000000 0.002200557783 [I::run_scaffolding] starting scaffolding graph contruction... Segmentation fault (core dumped)

    what's the problem?

    Regards, PengjuZ

    opened by PengjuZ 12
  • error

    error

    Hi,

    I am running the tool with bam output file from Arima genomics pipeline, and pseudohap style assembly from 10x genomics. But I found that this happens, and it seems something wrong in memory. Could you give me any suggestion please?

    [I::main] dump hic links to binary file yahs.out.bin *** Error in `/work/tools/yahs/yahs': double free or corruption (fasttop): 0x0000000001fef610 *** ======= Backtrace: ========= /lib64/libc.so.6(+0x81489)[0x2ab912b55489] /work/tools/yahs/yahs[0x412bb4] /work/tools/yahs/yahs[0x4317a8] /lib64/libc.so.6(__libc_start_main+0xf5)[0x2ab912af63d5] /work/tools/yahs/yahs[0x4021a9] ======= Memory map: ======== 00400000-00467000 r-xp 00000000 fe:1e55c 144125879188990391 /work/tools/yahs/yahs 00667000-00668000 r--p 00067000 fe:1e55c 144125879188990391 /work/tools/yahs/yahs 00668000-0066c000 rw-p 00068000 fe:1e55c 144125879188990391 /work/tools/yahs/yahs 0066c000-0066d000 rw-p 00000000 00:00 0 01997000-02006000 rw-p 00000000 00:00 0 [heap] 2ab912182000-2ab9121a4000 r-xp 00000000 fd:00 100667902 /usr/lib64/ld-2.17.so 2ab9121a4000-2ab9121a8000 rw-p 00000000 00:00 0 2ab9121cf000-2ab9121d4000 rw-p 00000000 00:00 0 2ab9123a3000-2ab9123a4000 r--p 00021000 fd:00 100667902 /usr/lib64/ld-2.17.so 2ab9123a4000-2ab9123a5000 rw-p 00022000 fd:00 100667902 /usr/lib64/ld-2.17.so 2ab9123a5000-2ab9123a6000 rw-p 00000000 00:00 0 2ab9123a6000-2ab9124a7000 r-xp 00000000 fd:00 100667917 /usr/lib64/libm-2.17.so 2ab9124a7000-2ab9126a6000 ---p 00101000 fd:00 100667917 /usr/lib64/libm-2.17.so 2ab9126a6000-2ab9126a7000 r--p 00100000 fd:00 100667917 /usr/lib64/libm-2.17.so 2ab9126a7000-2ab9126a8000 rw-p 00101000 fd:00 100667917 /usr/lib64/libm-2.17.so 2ab9126a8000-2ab9126bd000 r-xp 00000000 fd:00 100668244 /usr/lib64/libz.so.1.2.7 2ab9126bd000-2ab9128bc000 ---p 00015000 fd:00 100668244 /usr/lib64/libz.so.1.2.7 2ab9128bc000-2ab9128bd000 r--p 00014000 fd:00 100668244 /usr/lib64/libz.so.1.2.7 2ab9128bd000-2ab9128be000 rw-p 00015000 fd:00 100668244 /usr/lib64/libz.so.1.2.7 2ab9128be000-2ab9128d3000 r-xp 00000000 fd:00 100663369 /usr/lib64/libgcc_s-4.8.5-20150702.so.1 2ab9128d3000-2ab912ad2000 ---p 00015000 fd:00 100663369 /usr/lib64/libgcc_s-4.8.5-20150702.so.1 2ab912ad2000-2ab912ad3000 r--p 00014000 fd:00 100663369 /usr/lib64/libgcc_s-4.8.5-20150702.so.1 2ab912ad3000-2ab912ad4000 rw-p 00015000 fd:00 100663369 /usr/lib64/libgcc_s-4.8.5-20150702.so.1 2ab912ad4000-2ab912c96000 r-xp 00000000 fd:00 100667909 /usr/lib64/libc-2.17.so 2ab912c96000-2ab912e96000 ---p 001c2000 fd:00 100667909 /usr/lib64/libc-2.17.so 2ab912e96000-2ab912e9a000 r--p 001c2000 fd:00 100667909 /usr/lib64/libc-2.17.so 2ab912e9a000-2ab912e9c000 rw-p 001c6000 fd:00 100667909 /usr/lib64/libc-2.17.so 2ab912e9c000-2ab912ea1000 rw-p 00000000 00:00 0 2ab912ea1000-2ab912ea3000 r-xp 00000000 fd:00 100667915 /usr/lib64/libdl-2.17.so 2ab912ea3000-2ab9130a3000 ---p 00002000 fd:00 100667915 /usr/lib64/libdl-2.17.so 2ab9130a3000-2ab9130a4000 r--p 00002000 fd:00 100667915 /usr/lib64/libdl-2.17.so 2ab9130a4000-2ab9130a5000 rw-p 00003000 fd:00 100667915 /usr/lib64/libdl-2.17.so 2ab914000000-2ab914021000 rw-p 00000000 00:00 0 2ab914021000-2ab918000000 ---p 00000000 00:00 0 7ffcc042f000-7ffcc0453000 rw-p 00000000 00:00 0 [stack] 7ffcc056b000-7ffcc056d000 r-xp 00000000 00:00 0 [vdso] ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall] /var/spool/torque/mom_priv/jobs/5454962.apollo-acf.SC: line 72: 150539 Aborted (core dumped) $YAHS $ASSEMBLY_LOCATION/$SAMPLE/$SAMPLE.fasta $BAM_LOCATION/$BAM_FILE $OUTPUT_DIR/$LABEL_$SAMPLE

    bug 
    opened by memphisshuffle 9
  • agp_to_fasta fails on agp with header comments

    agp_to_fasta fails on agp with header comments

    Hello! 👋🏼

    I was trying the agp_to_fasta tool included in the v1.1 yahs release (agp_to_fasta outputs version 1.0 though) and it seems to be failing on an AGP file that begins with header lines. I removed the header lines from the AGP and then the tool works perfectly.

    [[email protected] /lustre/fs5/vgl/scratch/labueg/temp]$ $STORE/programs/yahs-1.1/agp_to_fasta
    Usage: agp_to_fasta [options] <scaffolds.agp> <contigs.fa>
    Options:
        -l INT            line width [60]
        -o STR            output to file [stdout]
        --version         show version number
    [[email protected] /lustre/fs5/vgl/scratch/labueg/temp]$ $STORE/programs/yahs-1.1/agp_to_fasta mNycCou1_s1.agp mNycCou1_p1.fasta -o test
    [E::write_fasta_file_from_agp] sequence ?w?N not found
    [[email protected] /lustre/fs5/vgl/scratch/labueg/temp]$ head -n 1 mNycCou1_p1.fasta
    >ptg000001l_1
    [[email protected] /lustre/fs5/vgl/scratch/labueg/temp]$ head mNycCou1_s1.agp 
    ##agp-version	2.0
    # Organism:  
    # Platform:     
    # Model:        
    # Enzyme(s):    
    # BioSample:    
    # BioProject:   
    # Obj_Name	Obj_Start	Obj_End	PartNum	Compnt_Type	CompntId_GapLength	CompntStart_GapType	CompntEnd_Linkage	Orientation_LinkageEvidence
    Super-Scaffold_1	1	56613158	1	W	ptg000001l_1	1	56613158	+
    Super-Scaffold_12	1	936489	1	W	ptg000097l_1	158389	1094877	-
    [[email protected] /lustre/fs5/vgl/scratch/labueg/temp]$ grep -v "^#" mNycCou1_s1.agp > mNycCou1_s1.nocomment.agp
    [[email protected] /lustre/fs5/vgl/scratch/labueg/temp]$ $STORE/programs/yahs-1.1/agp_to_fasta mNycCou1_s1.nocomment.agp mNycCou1_p1.fasta -o test
    [I::main] Version: 1.0
    [I::main] CMD: /lustre/fs5/vgl/store/labueg/programs/yahs-1.1/agp_to_fasta -o test mNycCou1_s1.nocomment.agp mNycCou1_p1.fasta
    [I::main] Real time: 63.428 sec; CPU: 58.669 sec; Peak RSS: 2.815 GB
    

    Just wanted to bring this up in case it would be helpful for the tool to account for headers in AGPs, thank you for developing it!

    opened by abueg 8
  • What is RAM limit?

    What is RAM limit?

    Hi, Yahs told the RAM limit is too low, what is RAM limit, and how to change this limit? [I::run_yahs] RAM total: 125.555GB [I::run_yahs] RAM limit: 0.421GB

    [I::print_asm_stats] assembly stats: [I::print_asm_stats] N50: 136225 (n = 770) [I::print_asm_stats] N90: 23879 (n = 5328) [I::run_yahs] scaffolding round 1 resolution = 5000 [I::run_scaffolding] starting norm estimation... [I::run_scaffolding] No enough memory. Try higher resolutions... End of scaffolding round. [I::run_scaffolding] RAM limit: 0.391GB [I::run_scaffolding] RAM required: 11.514GB

    Best, Kun

    opened by xiekunwhy 7
  • malloc(): memory corruption: ERROR while running yahs

    malloc(): memory corruption: ERROR while running yahs

    Hi,

    I have been trying to run yahs on a slurm cluster. I keep getting the following error: Any idea what can cause this ? Is there any requirements I miss when installing the tool ?

    *** Error in `/home/aurli/yahs/yahs': malloc(): memory corruption: 0x00000000019f4cb0 *** ======= Backtrace: ========= /lib64/libc.so.6(+0x82b36)[0x2ae0e06beb36] /lib64/libc.so.6(__libc_calloc+0xb4)[0x2ae0e06c2214] /home/aurli/yahs/yahs[0x41fc4d] /home/aurli/yahs/yahs[0x410a67] /home/aurli/yahs/yahs[0x411ead] /home/aurli/yahs/yahs[0x420723] /home/aurli/yahs/yahs[0x420ef1] /home/aurli/yahs/yahs[0x40631d] /lib64/libc.so.6(__libc_start_main+0xf5)[0x2ae0e065e555] /home/aurli/yahs/yahs[0x406b31] ======= Memory map: ======== 00400000-00426000 r-xp 00000000 00:2b 9246547079361141454 /domus/h1/aurli/yahs/yahs 00626000-00627000 r--p 00026000 00:2b 9246547079361141454 /domus/h1/aurli/yahs/yahs 00627000-00628000 rw-p 00027000 00:2b 9246547079361141454 /domus/h1/aurli/yahs/yahs 01323000-31cd6000 rw-p 00000000 00:00 0

    opened by aureliendejode 7
  • segmentation fault when scaffolding a plant genome

    segmentation fault when scaffolding a plant genome

    Hi, @c-zhou

    I run the latest yahs from the github issue to scaffolding a 4.5G genome, but it end with the segementation fault. Here is the command I use

    # convert the merge_nodup of juicer output to bed
    utg000001l      68215   68365   A00301:339:HV7YCDSX2:3:2602:6045:36166/1        60      +
    utg000001l      68754   68904   A00301:339:HV7YCDSX2:3:2602:6045:36166/2        32      -
    utg000001l      68241   68391   A00234:846:HTWKCDSX2:1:1122:23809:13385/1       55      +
    utg000001l      68760   68910   A00234:846:HTWKCDSX2:1:1122:23809:13385/2       32      -
    utg000001l      68243   68393   A00234:846:HTWKCDSX2:1:1447:24704:2033/1        59      +
    utg000001l      68759   68909   A00234:846:HTWKCDSX2:1:1447:24704:2033/2        32      -
    utg000001l      68251   68401   A00234:846:HTWKCDSX2:1:1671:23637:36699/1       60      +
    utg000001l      68775   68925   A00234:846:HTWKCDSX2:1:1671:23637:36699/2       34      -
    utg000001l      68266   68416   A00301:340:HTWMWDSX2:3:2511:21052:6136/1        60      +
    
    # yahs
    /data/software/yahs/yahs --no-contig-ec -o yahs_Q30 contigs.fa yahs_Q30.bed
    
    # log
    
    [I::print_asm_stats] assembly stats:
    [I::print_asm_stats] N50: 1556949 (n = 730)
    [I::print_asm_stats] N90: 52585 (n = 5902)
    [I::run_yahs] scaffolding round 1 resolution = 10000
    [1]    48532 segmentation fault  /data/software/yahs/yahs/yahs --no-contig-ec -o yahs_Q30 contigs.fa
    
    
    opened by baozg 7
  • issue with large chromosomes :)

    issue with large chromosomes :)

    Dear Chenxi,

    first of all thank you for such a great tool. It worked great and very FAST for two of my projects. I had some issues in the beginning until I recognised that SALSA2 is not creating a valid agp file, which cannot be used straightforward with yahs. But after solving this a bird was scaffolded in minutes.

    I do have some issues with 2 other projects where the chromosomes are larger than uint32_t. Do you think you can provide an int64_t version of your code. That would be fantastic.

    Here is the stack trace of my problem:

    [I::run_yahs] N50: 1281118493 (n = 11)
    [I::run_yahs] N90: 7507670 (n = 157)
    =================================================================
    ==21583==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x7fa54040d1f0 at pc 0x00000041addf bp 0x7ffc3b1e0a00 sp 0x7ffc3b1e09f8
    READ of size 8 at 0x7fa54040d1f0 thread T0
        #0 0x41adde in intra_link_mat_from_file /projects/dazzler/pippel/prog/yahs/link.c:458
        #1 0x43a69c in run_scaffolding /projects/dazzler/pippel/prog/yahs/yahs.c:146
        #2 0x43bede in run_yahs /projects/dazzler/pippel/prog/yahs/yahs.c:397
        #3 0x43d281 in main /projects/dazzler/pippel/prog/yahs/yahs.c:626
        #4 0x7fb60c488c04 in __libc_start_main (/lib64/libc.so.6+0x21c04)
        #5 0x402c78  (/lustre/projects/dazzler/pippel/prog/yahs/yahs.debug+0x402c78)
    
    0x7fa54040d1f0 is located 4784 bytes to the right of 96491328-byte region [0x7fa53a806800,0x7fa54040bf40)
    allocated by thread T0 here:
        #0 0x7fb60cdeb3b7 in __interceptor_calloc ../../.././libsanitizer/asan/asan_malloc_linux.cpp:154
        #1 0x4195ab in intra_link_mat_init /projects/dazzler/pippel/prog/yahs/link.c:284
        #2 0x41a6c7 in intra_link_mat_from_file /projects/dazzler/pippel/prog/yahs/link.c:432
        #3 0x43a69c in run_scaffolding /projects/dazzler/pippel/prog/yahs/yahs.c:146
        #4 0x43bede in run_yahs /projects/dazzler/pippel/prog/yahs/yahs.c:397
        #5 0x43d281 in main /projects/dazzler/pippel/prog/yahs/yahs.c:626
        #6 0x7fb60c488c04 in __libc_start_main (/lib64/libc.so.6+0x21c04)
    
    enhancement 
    opened by MartinPippel 7
  • Yahs run doesn't complete

    Yahs run doesn't complete

    Hi

    I am trying to run yahs on a genome assembly generated with hifiasm. It looks like the run is never completed. See the log below. Any thoughts on what could be causing it? The genome is indexed with samtools faidx and the BED was generated using the Arima Genomics guidelines.

    My command: yahs poa_with_HiC.asm.hic.hap1.p_ctg.fa paired_RG_duplicates_sorted_final.bed -e GATC,TNA,ANT,TA >./output_yahs.log 2>&1

    Thank you!

    [[email protected] yahs]$ cat output_yahs [I::find_re_from_seqs] NO. restriction enzyme cutting sites found in sequences: 553814228 [I::find_re_from_seqs] restriction enzyme cutting sites density: 0.396177 [I::main] dump hic links (BED) to binary file yahs.out.bin [I::dump_links_from_bed_file] 1 million records processed, 499999 read pairs [I::dump_links_from_bed_file] 2 million records processed, 999999 read pairs [I::dump_links_from_bed_file] 3 million records processed, 1499999 read pairs [I::dump_links_from_bed_file] 4 million records processed, 1999999 read pairs [I::dump_links_from_bed_file] 5 million records processed, 2499999 read pairs [I::dump_links_from_bed_file] 6 million records processed, 2999999 read pairs [I::dump_links_from_bed_file] 7 million records processed, 3499999 read pairs [I::dump_links_from_bed_file] 8 million records processed, 3999999 read pairs [I::dump_links_from_bed_file] 9 million records processed, 4499999 read pairs [I::dump_links_from_bed_file] 10 million records processed, 4999999 read pairs [I::dump_links_from_bed_file] 11 million records processed, 5499999 read pairs [I::dump_links_from_bed_file] 12 million records processed, 5999999 read pairs [I::dump_links_from_bed_file] 13 million records processed, 6499999 read pairs [I::dump_links_from_bed_file] 14 million records processed, 6999999 read pairs [I::dump_links_from_bed_file] 15 million records processed, 7499999 read pairs [I::dump_links_from_bed_file] 16 million records processed, 7999999 read pairs [I::dump_links_from_bed_file] 17 million records processed, 8499999 read pairs [I::dump_links_from_bed_file] 18 million records processed, 8999999 read pairs [I::dump_links_from_bed_file] 19 million records processed, 9499999 read pairs [I::dump_links_from_bed_file] 20 million records processed, 9999999 read pairs [I::dump_links_from_bed_file] 21 million records processed, 10499999 read pairs [I::dump_links_from_bed_file] 22 million records processed, 10999999 read pairs [I::dump_links_from_bed_file] 23 million records processed, 11499999 read pairs [I::dump_links_from_bed_file] 24 million records processed, 11999999 read pairs [I::dump_links_from_bed_file] 25 million records processed, 12499999 read pairs [I::dump_links_from_bed_file] 26 million records processed, 12999999 read pairs [I::dump_links_from_bed_file] 27 million records processed, 13499999 read pairs [I::dump_links_from_bed_file] 28 million records processed, 13999999 read pairs [I::dump_links_from_bed_file] 29 million records processed, 14499999 read pairs [I::dump_links_from_bed_file] 30 million records processed, 14999999 read pairs [I::dump_links_from_bed_file] 31 million records processed, 15499999 read pairs [I::dump_links_from_bed_file] 32 million records processed, 15999999 read pairs [I::dump_links_from_bed_file] 33 million records processed, 16499999 read pairs [I::dump_links_from_bed_file] 34 million records processed, 16999999 read pairs [I::dump_links_from_bed_file] 35 million records processed, 17499999 read pairs [I::dump_links_from_bed_file] 36 million records processed, 17999999 read pairs [I::dump_links_from_bed_file] 37 million records processed, 18499999 read pairs [I::dump_links_from_bed_file] 38 million records processed, 18999999 read pairs [I::dump_links_from_bed_file] 39 million records processed, 19499999 read pairs [I::dump_links_from_bed_file] 40 million records processed, 19999999 read pairs [I::dump_links_from_bed_file] 41 million records processed, 20499999 read pairs [I::dump_links_from_bed_file] 42 million records processed, 20999999 read pairs [I::dump_links_from_bed_file] 43 million records processed, 21499999 read pairs [I::dump_links_from_bed_file] 44 million records processed, 21999999 read pairs [I::dump_links_from_bed_file] 45 million records processed, 22499999 read pairs [I::dump_links_from_bed_file] 46 million records processed, 22999999 read pairs [I::dump_links_from_bed_file] 47 million records processed, 23499999 read pairs [I::dump_links_from_bed_file] 48 million records processed, 23999999 read pairs [I::dump_links_from_bed_file] 49 million records processed, 24499999 read pairs [I::dump_links_from_bed_file] 50 million records processed, 24999999 read pairs [I::dump_links_from_bed_file] 51 million records processed, 25499999 read pairs [I::dump_links_from_bed_file] 52 million records processed, 25999999 read pairs [I::dump_links_from_bed_file] 53 million records processed, 26499999 read pairs [I::dump_links_from_bed_file] 54 million records processed, 26999999 read pairs [I::dump_links_from_bed_file] 55 million records processed, 27499999 read pairs [I::dump_links_from_bed_file] 56 million records processed, 27999999 read pairs [I::dump_links_from_bed_file] 57 million records processed, 28499999 read pairs [I::dump_links_from_bed_file] 58 million records processed, 28999999 read pairs [I::dump_links_from_bed_file] 59 million records processed, 29499999 read pairs [I::dump_links_from_bed_file] 60 million records processed, 29999999 read pairs [I::dump_links_from_bed_file] 61 million records processed, 30499999 read pairs [I::dump_links_from_bed_file] 62 million records processed, 30999999 read pairs [I::dump_links_from_bed_file] 63 million records processed, 31499999 read pairs [I::dump_links_from_bed_file] 64 million records processed, 31999999 read pairs [I::dump_links_from_bed_file] 65 million records processed, 32499999 read pairs [I::dump_links_from_bed_file] 66 million records processed, 32999999 read pairs [I::dump_links_from_bed_file] 67 million records processed, 33499999 read pairs [I::dump_links_from_bed_file] 68 million records processed, 33999999 read pairs [I::dump_links_from_bed_file] 69 million records processed, 34499999 read pairs [I::dump_links_from_bed_file] 70 million records processed, 34999999 read pairs [I::dump_links_from_bed_file] 71 million records processed, 35499999 read pairs [I::dump_links_from_bed_file] 72 million records processed, 35999999 read pairs [I::dump_links_from_bed_file] 73 million records processed, 36499999 read pairs [I::dump_links_from_bed_file] 74 million records processed, 36999999 read pairs [I::dump_links_from_bed_file] 75 million records processed, 37499999 read pairs [I::dump_links_from_bed_file] 76 million records processed, 37999999 read pairs [I::dump_links_from_bed_file] 77 million records processed, 38499999 read pairs [I::dump_links_from_bed_file] 78 million records processed, 38999999 read pairs [I::dump_links_from_bed_file] 79 million records processed, 39499999 read pairs [I::dump_links_from_bed_file] 80 million records processed, 39999999 read pairs [I::dump_links_from_bed_file] 81 million records processed, 40499999 read pairs [I::dump_links_from_bed_file] 82 million records processed, 40999999 read pairs [I::dump_links_from_bed_file] 83 million records processed, 41499999 read pairs [I::dump_links_from_bed_file] 84 million records processed, 41999999 read pairs [I::dump_links_from_bed_file] 85 million records processed, 42499999 read pairs [I::dump_links_from_bed_file] 86 million records processed, 42999999 read pairs [I::dump_links_from_bed_file] 87 million records processed, 43499999 read pairs [I::dump_links_from_bed_file] 88 million records processed, 43999999 read pairs [I::dump_links_from_bed_file] 89 million records processed, 44499999 read pairs [I::dump_links_from_bed_file] 90 million records processed, 44999999 read pairs [I::dump_links_from_bed_file] 91 million records processed, 45499999 read pairs [I::dump_links_from_bed_file] 92 million records processed, 45999999 read pairs [I::dump_links_from_bed_file] 93 million records processed, 46499999 read pairs [I::dump_links_from_bed_file] 94 million records processed, 46999999 read pairs [I::dump_links_from_bed_file] 95 million records processed, 47499999 read pairs [I::dump_links_from_bed_file] 96 million records processed, 47999999 read pairs [I::dump_links_from_bed_file] 97 million records processed, 48499999 read pairs [I::dump_links_from_bed_file] 98 million records processed, 48999999 read pairs [I::dump_links_from_bed_file] 99 million records processed, 49499999 read pairs [I::dump_links_from_bed_file] 100 million records processed, 49999999 read pairs [I::dump_links_from_bed_file] 101 million records processed, 50499999 read pairs [I::dump_links_from_bed_file] 102 million records processed, 50999999 read pairs [I::dump_links_from_bed_file] 103 million records processed, 51499999 read pairs [I::dump_links_from_bed_file] 104 million records processed, 51999999 read pairs [I::dump_links_from_bed_file] 105 million records processed, 52499999 read pairs [I::dump_links_from_bed_file] 106 million records processed, 52999999 read pairs [I::dump_links_from_bed_file] 107 million records processed, 53499999 read pairs [I::dump_links_from_bed_file] 108 million records processed, 53999999 read pairs [I::dump_links_from_bed_file] 109 million records processed, 54499999 read pairs [I::dump_links_from_bed_file] 110 million records processed, 54999999 read pairs [I::dump_links_from_bed_file] 111 million records processed, 55499999 read pairs [I::dump_links_from_bed_file] 112 million records processed, 55999999 read pairs [I::dump_links_from_bed_file] 113 million records processed, 56499999 read pairs [I::dump_links_from_bed_file] 114 million records processed, 56999999 read pairs [I::dump_links_from_bed_file] 115 million records processed, 57499999 read pairs [I::dump_links_from_bed_file] 116 million records processed, 57999999 read pairs [I::dump_links_from_bed_file] 117 million records processed, 58499999 read pairs [I::dump_links_from_bed_file] 118 million records processed, 58999999 read pairs [I::dump_links_from_bed_file] 119 million records processed, 59499999 read pairs [I::dump_links_from_bed_file] 120 million records processed, 59999999 read pairs [I::dump_links_from_bed_file] 121 million records processed, 60499999 read pairs [I::dump_links_from_bed_file] 122 million records processed, 60999999 read pairs [I::dump_links_from_bed_file] 123 million records processed, 61499999 read pairs [I::dump_links_from_bed_file] 124 million records processed, 61999999 read pairs [I::dump_links_from_bed_file] 125 million records processed, 62499999 read pairs [I::dump_links_from_bed_file] 126 million records processed, 62999999 read pairs [I::dump_links_from_bed_file] 127 million records processed, 63499999 read pairs [I::dump_links_from_bed_file] 128 million records processed, 63999999 read pairs [I::dump_links_from_bed_file] 129 million records processed, 64499999 read pairs [I::dump_links_from_bed_file] 130 million records processed, 64999999 read pairs [I::dump_links_from_bed_file] 131 million records processed, 65499999 read pairs [I::dump_links_from_bed_file] 132 million records processed, 65999999 read pairs [I::dump_links_from_bed_file] 133 million records processed, 66499999 read pairs [I::dump_links_from_bed_file] 134 million records processed, 66999999 read pairs [I::dump_links_from_bed_file] 135 million records processed, 67499999 read pairs [I::dump_links_from_bed_file] 136 million records processed, 67999999 read pairs [I::dump_links_from_bed_file] 137 million records processed, 68499999 read pairs [I::dump_links_from_bed_file] 138 million records processed, 68999999 read pairs [I::dump_links_from_bed_file] 139 million records processed, 69499999 read pairs [I::dump_links_from_bed_file] 140 million records processed, 69999999 read pairs [I::dump_links_from_bed_file] 141 million records processed, 70499999 read pairs [I::dump_links_from_bed_file] 142 million records processed, 70999999 read pairs [I::dump_links_from_bed_file] 143 million records processed, 71499999 read pairs [I::dump_links_from_bed_file] 144 million records processed, 71999999 read pairs [I::dump_links_from_bed_file] 145 million records processed, 72499999 read pairs [I::dump_links_from_bed_file] 146 million records processed, 72999999 read pairs [I::dump_links_from_bed_file] 147 million records processed, 73499999 read pairs [I::dump_links_from_bed_file] 148 million records processed, 73999999 read pairs [I::dump_links_from_bed_file] 149 million records processed, 74499999 read pairs [I::dump_links_from_bed_file] 150 million records processed, 74999999 read pairs [I::dump_links_from_bed_file] 151 million records processed, 75499999 read pairs [I::dump_links_from_bed_file] 152 million records processed, 75999999 read pairs [I::dump_links_from_bed_file] 153 million records processed, 76499999 read pairs [I::dump_links_from_bed_file] 154 million records processed, 76999999 read pairs [I::dump_links_from_bed_file] 155 million records processed, 77499999 read pairs [I::dump_links_from_bed_file] 156 million records processed, 77999999 read pairs [I::dump_links_from_bed_file] 157 million records processed, 78499999 read pairs [I::dump_links_from_bed_file] 158 million records processed, 78999999 read pairs [I::dump_links_from_bed_file] 159 million records processed, 79499999 read pairs [I::dump_links_from_bed_file] 160 million records processed, 79999999 read pairs [I::dump_links_from_bed_file] 161 million records processed, 80499999 read pairs [I::dump_links_from_bed_file] dumped 80821099 read pairs from 161642198 records: 42809896 intra links + 38011203 inter links [I::run_yahs] RAM total: 2015.189GB [I::run_yahs] RAM limit: 1670.230GB [I::contig_error_break] dist threshold for contig error break: 9900000 [I::contig_error_break] performed 2 round assembly error correction. Made 10 breaks [I::print_asm_stats] assembly stats: [I::print_asm_stats] N50: 79612346 (n = 6) [I::print_asm_stats] N90: 11622114 (n = 24) [I::print_asm_stats] N100: 2000 (n = 905) [I::run_yahs] scaffolding round 1 resolution = 10000 [I::run_scaffolding] starting norm estimation... [I::run_scaffolding] starting link estimation... [I::inter_link_norms] using noise level 0.000 [I::inter_link_norms] average link count: 981.501 3254199.000 0.000 [I::run_scaffolding] starting scaffolding graph contruction... Full Command: /nfs4/ROOTS/Brunharo_Lab/poa_genome_assembly/yahs/yahs/yahs poa_with_HiC.asm.hic.hap1.p_ctg.fa paired_RG_duplicates_sorted_final.bed -e GATC,TNA,ANT,TA Memory (kb): 6942544 # SWAP (freq): 0 # Waits (freq): 4631 CPU (percent): 66% Time (seconds): 253.25 Time (hh:mm:ss.ms): 4:13.25 System CPU Time (seconds): 3.34 User CPU Time (seconds): 166.17

    opened by caiobrunharo 5
  • assembly N50 (12787427) too small. End of scaffolding

    assembly N50 (12787427) too small. End of scaffolding

    Hi Chen Xi

    I sorted my BAM file by read names. The log file : [I::dump_links_from_bam_file] 314 million records processed, 154274614 read pairs assembly N50 (12787427) too small. End of scaffolding.

    I think my genome contig is relative complete. I changed a lot of times parameters and inputs. But the log also says End of scaffolding

    Best ! Guo Cheng

    [I::dump_links_from_bam_file] 314 million records processed, 154274614 read pairs [I::dump_links_from_bam_file] dumped 154542127 read pairs: 95763255 intra links + 19261793 inter links [I::run_yahs] RAM total: 187.400GB [I::run_yahs] RAM limit: 1.083GB [I::contig_error_break] dist threshold for contig error break: 1000000 [I::contig_error_break] performed 2 round assembly error correction. Made 9 breaks [I::print_asm_stats] assembly stats: [I::print_asm_stats] N50: 10803692 (n = 16) [I::print_asm_stats] N90: 3852502 (n = 40) [I::run_yahs] scaffolding round 1 resolution = 10000 [I::run_scaffolding] starting norm estimation... [I::run_scaffolding] starting link estimation... [I::inter_link_norms] using noise level 0.006897546529 [I::inter_link_norms] average link count: 11438.389371032354 5037531.000000000000 0.002270634041 [I::run_scaffolding] starting scaffolding graph contruction... [I::print_asm_stats] assembly stats: [I::print_asm_stats] N50: 11471531 (n = 16) [I::print_asm_stats] N90: 4097666 (n = 36) [I::run_yahs] scaffolding round 2 resolution = 20000 [I::run_scaffolding] starting norm estimation... [I::run_scaffolding] starting link estimation... [I::inter_link_norms] using noise level 0.029375186223 [I::inter_link_norms] average link count: 8939.121059218443 5723120.000000000000 0.001561931439 [I::run_scaffolding] starting scaffolding graph contruction... [I::print_asm_stats] assembly stats: [I::print_asm_stats] N50: 11595593 (n = 16) [I::print_asm_stats] N90: 4205433 (n = 34) [I::run_yahs] scaffolding round 3 resolution = 50000 [I::run_scaffolding] starting norm estimation... [I::run_scaffolding] starting link estimation... [I::inter_link_norms] using noise level 0.251675046921 [I::inter_link_norms] average link count: 5536.808779269413 3736534.000000000000 0.001481803398 [I::run_scaffolding] starting scaffolding graph contruction... [I::print_asm_stats] assembly stats: [I::print_asm_stats] N50: 11870607 (n = 15) [I::print_asm_stats] N90: 4312936 (n = 33) [I::run_yahs] scaffolding round 4 resolution = 100000 [I::run_scaffolding] starting norm estimation... [I::run_scaffolding] starting link estimation... [I::inter_link_norms] using noise level 1.878855434932 [I::inter_link_norms] average link count: 2203.180432421749 1871969.000000000000 0.001176932114 [I::run_scaffolding] starting scaffolding graph contruction... [I::print_asm_stats] assembly stats: [I::print_asm_stats] N50: 11870607 (n = 15) [I::print_asm_stats] N90: 4205433 (n = 34) [I::run_yahs] scaffolding round 5 resolution = 200000 [I::run_scaffolding] starting norm estimation... [I::run_scaffolding] starting link estimation... [I::inter_link_norms] using noise level 0.969795918367 [I::inter_link_norms] average link count: 4.085832445017 1225.000000000000 0.003335373425 [I::run_scaffolding] starting scaffolding graph contruction... [I::print_asm_stats] assembly stats: [I::print_asm_stats] N50: 12200854 (n = 15) [I::print_asm_stats] N90: 5874100 (n = 32) [I::run_yahs] scaffolding round 6 resolution = 500000 [I::run_scaffolding] starting norm estimation... [I::run_scaffolding] starting link estimation... [I::inter_link_norms] using noise level 45.104278914056 [I::inter_link_norms] average link count: 75.156734682509 69779.000000000000 0.001077068096 [I::run_scaffolding] starting scaffolding graph contruction... [I::print_asm_stats] assembly stats: [I::print_asm_stats] N50: 12685561 (n = 15) [I::print_asm_stats] N90: 6205944 (n = 31) [I::run_yahs] scaffolding round 7 resolution = 1000000 [I::run_scaffolding] starting norm estimation... [I::run_scaffolding] starting link estimation... [I::inter_link_norms] using noise level 172.238464437625 [I::inter_link_norms] average link count: 25.031645569828 16459.000000000000 0.001520848507 [I::run_scaffolding] starting scaffolding graph contruction... [I::print_asm_stats] assembly stats: [I::print_asm_stats] N50: 12787427 (n = 14) [I::print_asm_stats] N90: 6205944 (n = 31) [I::run_yahs] scaffolding round 8 resolution = 2000000 [I::run_yahs] assembly N50 (12787427) too small. End of scaffolding. [I::main] writing FASTA file for scaffolds

    opened by guo-cheng 4
  • Support chromap pairs format?

    Support chromap pairs format?

    Hi, Chenxin

    Can yahs support the pairs format or BAM (missing some information) from chromap? Its speed is much faster than bwa and have similiar result with bwa. juicertools can accept the pairs format for heatmap.

    Best Zhigui

    opened by baozg 4
  • Contig was cutted continually

    Contig was cutted continually

    Hi,

    I found that YaHS cut some contigs continually like following (utg261 in scaffold_28), why don't join these continually pieces or add an option to join them?

    scaffold_28 1 22000 1 W utg261 13001 35000 + scaffold_28 22001 22200 2 N 200 scaffold yes proximity_ligation scaffold_28 22201 34200 3 W utg261 35001 47000 + scaffold_28 34201 34400 4 N 200 scaffold yes proximity_ligation scaffold_28 34401 454400 5 W utg261 47001 467000 + scaffold_28 454401 454600 6 N 200 scaffold yes proximity_ligation scaffold_28 454601 1549263 7 W utg413 9001 1103663 + scaffold_28 1549264 1549463 8 N 200 scaffold yes proximity_ligation scaffold_28 1549464 1849806 9 W utg6105 1 300343 - scaffold_28 1849807 1850006 10 N 200 scaffold yes proximity_ligation scaffold_28 1850007 1896624 11 W utg6020 378001 424618 - scaffold_28 1896625 1896824 12 N 200 scaffold yes proximity_ligation scaffold_28 1896825 2274824 13 W utg6020 1 378000 - scaffold_28 2274825 2275024 14 N 200 scaffold yes proximity_ligation scaffold_28 2275025 3467380 15 W utg675 1 1192356 + scaffold_28 3467381 3467580 16 N 200 scaffold yes proximity_ligation scaffold_28 3467581 3952820 17 W utg1310 1 485240 +

    Best, Kun

    opened by xiekunwhy 1
  • Running yahs by Hi-C library

    Running yahs by Hi-C library

    Hi

    I tried to scaffold my contig level genome with my hic data but failed with the segmentation fault error. I think this error can be occurred due to the large hic alignment, which is about 700Gb in bam file format... So is there any way to run yahs by each hic library and merge at any specific step to avoid segmentation fault core dumped issue?

    Here are the command i used and error messages.

    $ yahs Combined_pseudohap.phased.filtered.0.arcs.fasta Pinetree_HiC.bwa_aln.bam >yahs.log 2>yahs.log2

    1756222 Segmentation fault (core dumped) yahs Combined_pseudohap.phased.filtered.0.arcs.fasta Pinetree_HiC.bwa_aln.bam > yahs.log 2> yahs.log2

    [I::dump_links_from_bam_file] dumped 1240693787 read pairs from 8033981476 records: 710480207 intra links + 530213580 inter links [I::run_yahs] RAM total: 1133.532GB [I::run_yahs] RAM limit: 3.019GB [I::contig_error_break] dist threshold for contig error break: 1000000

    Thank you!

    Sincerely, MJ

    opened by minjeongjj 1
  • Segmentation fault

    Segmentation fault

    Hi

    I am trying to scaffold the genome of a plant species with YAHS. However I am getting an error that I can't seem to find a work around it. Here is the log file, any suggestions on how I can make this work? Thanks!

    [I::find_re_from_seqs] NO. restriction enzyme cutting sites found in sequences: 1184243796 [I::find_re_from_seqs] restriction enzyme cutting sites density: 0.414708 [I::main] dump hic links (BAM) to binary file yahs.out.bin [I::dump_links_from_bam_file] 1 million records processed, 499999 read pairs [I::dump_links_from_bam_file] 2 million records processed, 999999 read pairs [I::dump_links_from_bam_file] 3 million records processed, 1499999 read pairs [I::dump_links_from_bam_file] 4 million records processed, 1999999 read pairs [I::dump_links_from_bam_file] 5 million records processed, 2499999 read pairs [I::dump_links_from_bam_file] 6 million records processed, 2999999 read pairs [I::dump_links_from_bam_file] 7 million records processed, 3499999 read pairs [I::dump_links_from_bam_file] 8 million records processed, 3999999 read pairs [I::dump_links_from_bam_file] 9 million records processed, 4499999 read pairs [I::dump_links_from_bam_file] 10 million records processed, 4999999 read pairs [I::dump_links_from_bam_file] 11 million records processed, 5499999 read pairs [I::dump_links_from_bam_file] 12 million records processed, 5999999 read pairs [I::dump_links_from_bam_file] 13 million records processed, 6499999 read pairs [I::dump_links_from_bam_file] 14 million records processed, 6999999 read pairs [I::dump_links_from_bam_file] 15 million records processed, 7499999 read pairs [I::dump_links_from_bam_file] 16 million records processed, 7999999 read pairs [I::dump_links_from_bam_file] 17 million records processed, 8499999 read pairs [I::dump_links_from_bam_file] 18 million records processed, 8999999 read pairs [I::dump_links_from_bam_file] 19 million records processed, 9499999 read pairs [I::dump_links_from_bam_file] 20 million records processed, 9999999 read pairs [I::dump_links_from_bam_file] 21 million records processed, 10499999 read pairs [I::dump_links_from_bam_file] 22 million records processed, 10999999 read pairs [I::dump_links_from_bam_file] 23 million records processed, 11499999 read pairs [I::dump_links_from_bam_file] 24 million records processed, 11999999 read pairs [I::dump_links_from_bam_file] 25 million records processed, 12499999 read pairs [I::dump_links_from_bam_file] 26 million records processed, 12999999 read pairs [I::dump_links_from_bam_file] 27 million records processed, 13499999 read pairs [I::dump_links_from_bam_file] 28 million records processed, 13999999 read pairs [I::dump_links_from_bam_file] 29 million records processed, 14499999 read pairs [I::dump_links_from_bam_file] 30 million records processed, 14999999 read pairs [I::dump_links_from_bam_file] 31 million records processed, 15499999 read pairs [I::dump_links_from_bam_file] 32 million records processed, 15999999 read pairs [I::dump_links_from_bam_file] 33 million records processed, 16499999 read pairs [I::dump_links_from_bam_file] 34 million records processed, 16999999 read pairs [I::dump_links_from_bam_file] 35 million records processed, 17499999 read pairs [I::dump_links_from_bam_file] 36 million records processed, 17999999 read pairs [I::dump_links_from_bam_file] 37 million records processed, 18499999 read pairs [I::dump_links_from_bam_file] dumped 18744706 read pairs from 37489412 records: 7902407 intra links + 10842299 inter links [I::run_yahs] RAM total: 2015.189GB [I::run_yahs] RAM limit: 91.352GB [DEBUG::run_yahs] perform contig error break... [DEBUG::estimate_dist_thres_from_file] 18744706 read pairs processed, intra links: 7902407 [I::contig_error_break] dist threshold for contig error break: 1000000 [DEBUG::link_mat_from_file] 18744706 read pairs processed, intra links: 6736366 [DEBUG::contig_error_break] number contig breaks in round 1: 213 [DEBUG::link_mat_from_file] 18744706 read pairs processed, intra links: 6734440 [DEBUG::contig_error_break] number contig breaks in round 2: 75 [DEBUG::link_mat_from_file] 18744706 read pairs processed, intra links: 6733786 [DEBUG::contig_error_break] number contig breaks in round 3: 31 [DEBUG::link_mat_from_file] 18744706 read pairs processed, intra links: 6733580 [DEBUG::contig_error_break] number contig breaks in round 4: 11 [DEBUG::link_mat_from_file] 18744706 read pairs processed, intra links: 6733499 [DEBUG::contig_error_break] number contig breaks in round 5: 4 [DEBUG::link_mat_from_file] 18744706 read pairs processed, intra links: 6733491 [DEBUG::contig_error_break] number contig breaks in round 6: 1 [DEBUG::link_mat_from_file] 18744706 read pairs processed, intra links: 6733490 [DEBUG::contig_error_break] number contig breaks in round 7: 0 [I::contig_error_break] performed 7 round assembly error correction. Made 335 breaks [DEBUG::run_yahs] contig error break done [I::print_asm_stats] assembly stats: [I::print_asm_stats] N10: 25519767 (n = 8) [I::print_asm_stats] N20: 20357312 (n = 21) [I::print_asm_stats] N30: 16564000 (n = 37) [I::print_asm_stats] N40: 13303000 (n = 56) [I::print_asm_stats] N50: 10686634 (n = 80) [I::print_asm_stats] N60: 7661000 (n = 112) [I::print_asm_stats] N70: 5459000 (n = 157) [I::print_asm_stats] N80: 3688107 (n = 219) [I::print_asm_stats] N90: 1887573 (n = 324) [I::print_asm_stats] N100: 1000 (n = 1162) [I::run_yahs] scaffolding round 1 resolution = 10000 [I::run_scaffolding] starting norm estimation... [DEBUG::intra_link_mat_from_file] 18744706 read pairs processed, 7687280 intra links [I::run_scaffolding] starting link estimation... [DEBUG::inter_link_mat_from_file] 18744706 read pairs processed, 11056760 inter links [DEBUG::inter_link_mat_from_file] within radius 1: 844 [I::inter_link_norms] using noise level 0.000 [I::inter_link_norms] average link count: 440.383 659526.000 0.001 [I::run_scaffolding] starting scaffolding graph contruction... [DEBUG::run_yahs] perform scaffold error break *** Error in `/nfs4/genome_assembly/yahs/yahs/yahs': free(): invalid pointer: 0x0000000081006e6f *** ======= Backtrace: ========= /usr/lib64/libc.so.6(+0x81329)[0x2b0348084329] /nfs4/genome_assembly/yahs/yahs/yahs[0x41aca7] /nfs4/genome_assembly/yahs/yahs/yahs[0x4217a3] /nfs4/genome_assembly/yahs/yahs/yahs[0x421e21] /nfs4/genome_assembly/yahs/yahs/yahs[0x4063c2] /usr/lib64/libc.so.6(__libc_start_main+0xf5)[0x2b0348025555] /nfs4/genome_assembly/yahs/yahs/yahs[0x406c43] ======= Memory map: ======== 00400000-00428000 r-xp 00000000 00:27 24861649 /nfs4/genome_assembly/yahs/yahs/yahs 00627000-00628000 r--p 00027000 00:27 24861649 /nfs4/genome_assembly/yahs/yahs/yahs 00628000-00629000 rw-p 00028000 00:27 24861649 /nfs4/genome_assembly/yahs/yahs/yahs 00ee4000-c55ee000 rw-p 00000000 00:00 0 [heap] 2b03478c7000-2b03478e9000 r-xp 00000000 08:03 20025425 /usr/lib64/ld-2.17.so 2b03478e9000-2b03478ef000 rw-p 00000000 00:00 0 2b0347ae8000-2b0347ae9000 r--p 00021000 08:03 20025425 /usr/lib64/ld-2.17.so 2b0347ae9000-2b0347aea000 rw-p 00022000 08:03 20025425 /usr/lib64/ld-2.17.so 2b0347aea000-2b0347aeb000 rw-p 00000000 00:00 0 2b0347aeb000-2b0347bec000 r-xp 00000000 08:03 16849272 /usr/lib64/libm-2.17.so 2b0347bec000-2b0347deb000 ---p 00101000 08:03 16849272 /usr/lib64/libm-2.17.so 2b0347deb000-2b0347dec000 r--p 00100000 08:03 16849272 /usr/lib64/libm-2.17.so 2b0347dec000-2b0347ded000 rw-p 00101000 08:03 16849272 /usr/lib64/libm-2.17.so 2b0347ded000-2b0347e03000 r-xp 00000000 00:2a 9055725375 /local/cluster/lib/libz.so.1.2.8 2b0347e03000-2b0348002000 ---p 00016000 00:2a 9055725375 /local/cluster/lib/libz.so.1.2.8 2b0348002000-2b0348003000 rw-p 00015000 00:2a 9055725375 /local/cluster/lib/libz.so.1.2.8 2b0348003000-2b03481c7000 r-xp 00000000 08:03 20025432 /usr/lib64/libc-2.17.so 2b03481c7000-2b03483c6000 ---p 001c4000 08:03 20025432 /usr/lib64/libc-2.17.so 2b03483c6000-2b03483ca000 r--p 001c3000 08:03 20025432 /usr/lib64/libc-2.17.so 2b03483ca000-2b03483cc000 rw-p 001c7000 08:03 20025432 /usr/lib64/libc-2.17.so 2b03483cc000-2b034ebd5000 rw-p 00000000 00:00 0 2b034ebd5000-2b034ebea000 r-xp 00000000 08:03 16849266 /usr/lib64/libgcc_s-4.8.5-20150702.so.1 2b034ebea000-2b034ede9000 ---p 00015000 08:03 16849266 /usr/lib64/libgcc_s-4.8.5-20150702.so.1 2b034ede9000-2b034edea000 r--p 00014000 08:03 16849266 /usr/lib64/libgcc_s-4.8.5-20150702.so.1 2b034edea000-2b034edeb000 rw-p 00015000 08:03 16849266 /usr/lib64/libgcc_s-4.8.5-20150702.so.1 2b0350000000-2b0350021000 rw-p 00000000 00:00 0 2b0350021000-2b0354000000 ---p 00000000 00:00 0 2b03f9ec9000-2b0409eca000 rw-p 00000000 00:00 0 2b0409eca000-2b050df1f000 rw-p 00000000 00:00 0 7ffc8e79c000-7ffc8e7c3000 rw-p 00000000 00:00 0 [stack] 7ffc8e7d2000-7ffc8e7d4000 r-xp 00000000 00:00 0 [vdso] ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall]

    opened by badplantgeek 4
  • No reads in Hi-C contact matrices. This could be because the MAPQ filter is set too high (-q) or because all reads map to the same fragment

    No reads in Hi-C contact matrices. This could be because the MAPQ filter is set too high (-q) or because all reads map to the same fragment

    Dear Authors,

    Thanks for the great tool. I am using this tool to assemble de novo vertebrate genome. I got the following file after running juicer pre -a -o out_JBAT hic-to-contigs.bin scaffolds_final.agp contigs.fa.fai >out_JBAT.log 2>&1 ' and got 5 files: out_JBAT.txt
    out_JBAT.liftover.agp
    out_JBAT.assembly
    out_JBAT.assembly.agp
    out_JBAT.log `

    After running ` java -jar -Xmx32G juicer_tools.1.9.9_jcuda.0.8.jar pre out_JBAT.txt out_JBAT.hic.part <(cat out_JBAT.log | grep PRE_C_SIZE | awk '{print $2" "$3}')) && (mv out_JBAT.hic.part out_JBAT.hic ' I got following error

    Skipping PRE_C_SIZE: assembly 553400379 java.lang.RuntimeException: No reads in Hi-C contact matrices. This could be because the MAPQ filter is set too high (-q) or because all reads map to the same fragment. at juicebox.tools.utils.original.Preprocessor$MatrixZoomDataPP.mergeAndWriteBlocks(Preprocessor.java:1650) at juicebox.tools.utils.original.Preprocessor$MatrixZoomDataPP.access$000(Preprocessor.java:1419) at juicebox.tools.utils.original.Preprocessor.writeMatrix(Preprocessor.java:832) at juicebox.tools.utils.original.Preprocessor.writeBody(Preprocessor.java:582) at juicebox.tools.utils.original.Preprocessor.preprocess(Preprocessor.java:346) at juicebox.tools.clt.old.PreProcessing.run(PreProcessing.java:116) at juicebox.tools.HiCTools.main(HiCTools.java:96)

    Could you please suggest any solution for this

    Thank you Vinita

    opened by vinitamehlawat 2
  • Generate Hi-C contact maps error: [E::make_asm_dict_from_agp] sequence  not found

    Generate Hi-C contact maps error: [E::make_asm_dict_from_agp] sequence not found

    Dear yahs developers,

    I'm trying to scaffold a de novo assembly following your pipeline.

    I'm at the step "Generate Hi-C contact maps" and trying to run the first command:

    /software/yahs/juicer pre ST.yash.out.bin ST.yash.out_scaffolds_final.agp STpurged.fa.fai

    [E::make_asm_dict_from_agp] sequence not found Segmentation fault

    My file sizes: 1.3G ST.yash.out.bin 32K ST.yash.out_scaffolds_final.agp

    STpurged.fa.fai is the index for the original de novo assembly that I want to scaffold.

    What could be the issue?

    Thank you Alex

    opened by alexjvr1 15
Releases(v1.2a.1.patch)
  • v1.2a.1.patch(Aug 4, 2022)

  • v1.2a.1(Jul 27, 2022)

  • v1.2a(Jul 8, 2022)

    In this release, the mapping quality score is added to the binary file, which makes the BIN files generated with the previous versions not compatible anymore. If you see error messages indicating the BIN file is not valid, you need to run YaHS with a BED or BAM file to regenerate a BIN file.

    This change mainly aims to reduce false-positive contig breaks in repetitive regions. When mapping quality filtering is applied, the HiC coverage in these regions is usually low, leading to excessive contig breaks. It is therefore highly recommended to keep low mapping quality reads in the YaHS input file. All the reads will be used in the contig error detection stage. In the scaffolding stage, the parameter -q (10 by default) is applied to select valid read pairs.

    Source code(tar.gz)
    Source code(zip)
  • v1.1(Jun 27, 2022)

  • 1.1a.2(May 5, 2022)

  • 1.1a.1(Apr 25, 2022)

    Pre-release for version 1.1. Bug fix for BAM input processing. Support for >4G scaffolds. Experimental implementation for restriction enzymes.

    Source code(tar.gz)
    Source code(zip)
  • v1.1a(Dec 11, 2021)

    Pre-release for version 1.1. Bug fix and new strategies for scaffolding graph pruning. This version should be more reliable in dealing with high-level background noise.

    Source code(tar.gz)
    Source code(zip)
  • v1.0(Oct 7, 2021)

Owner
null
Yet Another Ghidra Integration for IDA

Yagi Yet Another Ghidra Integration for IDA Overview Yagi intends to include the wonderful Ghidra decompiler into both IDA pro and IDA Free. ?? You ca

Airbus CERT 390 Dec 8, 2022
Yet another abstraction layer - a general purpose C++ library.

Yet Another Abstraction Layer What yaal is a cross platform, general purpose C++ library. This library provides unified, high level, C++ interfaces an

Marcin Konarski 14 Jul 27, 2022
YARP - Yet Another Robot Platform

YARP __ __ ___ ____ ____ \ \/ // || _ \ | _ \ \ // /| || |/ / | |/ / / // ___ || _ \ | _/ /_//_/ |_||_| \_\|_| ===================

Robotology 445 Dec 18, 2022
YACHT: Yet Another C++ Helper Template

YACHT: Yet Another C++ Helper Template A template for C++ projects. Welcome to your YACHT! Because why build a boat from scratch, when you can enjoy a

Dimitri Belopopsky 11 Apr 2, 2022
Yet another matrix client. Click packaging for locally running on Ubuntu Touch

Cinny Click Packaging Cinny is a Matrix client focusing primarily on simple, elegant and secure interface. License Cinny source package licensed under

Nitan Alexandru Marcel 6 Nov 15, 2022
Sysfex - Another system information tool written in C++

Sysfex Another neofetch-like system information fetching tool for linux-based systems written in C++ Installation To install this program using the pr

Mehedi Rahman Mahi 110 Dec 24, 2022
A perfect blend of C, Java, and Python tailored for those who desire a simple yet powerful programming language.

Fastcode A perfect blend of C, Java, and Python tailored for those who desire a simple yet powerful programming language. FastCode is a procedural/str

null 28 Aug 19, 2022
RISC-V has a 128-bit ISA that is fairly developed, but not standardized fully yet.

128-bit RISC-V assembler RISC-V has a 128-bit ISA that is fairly developed, but not standardized fully yet. I am maintaining a RISC-V userspace emulat

Alf-André Walla 39 Nov 20, 2022
The pico can be used to program other devices. Raspberry pi made such an effort. However there is no board yet, that is open-source and can be used with OpenOCD as a general-purpose programmer

pico-probe-programmer The pico can be used to program other devices. Raspberry pi made such an effort. However there is no board yet, that is open-sou

martijn 22 Oct 15, 2022
Implementation of the (not yet written) std::experimental::rational proposal.

Rational Implementation of the (not yet written) std::experimental::rational proposal. Getting started Copy include/std/experimental/rational.hpp to y

Ali Can Demiralp 9 Nov 18, 2022
"the French term for a watch movement that is not completely assembled yet."

Chablon "the French term for a watch movement that is not completely assembled yet." Today it's a program to draw rectangles on the display of a PineT

Daniel Barlow 3 Jan 3, 2022
SomeSmile - a free, open source and not yet cross-platform

SomeSmile - a free, open source and not yet cross-platform Table Of Contents For What? Structure Start Usage Guide How To Build Screenshots End For Wh

SonicTheHedgehog 3 Aug 3, 2022
Another version of EVA using anti-debugging techs && using Syscalls

EVA2 Another version of EVA using anti-debugging techs && using Syscalls First thing: Dont Upload to virus total. this note is for you and not for me.

null 273 Dec 26, 2022
Another try to re-create Project Astoria , or some bridge between A and W...

Bridge 1.0.10.0 Forked from: https://github.com/DroidOnUWP/Bridge Abstract Another "Project Astoria" remake (UWP) Original status: Forgotten (?) My ac

Media Explorer 6 Nov 15, 2022
Subtract one PE file from another!

PEDiffGen A simple PE subtraction utility. PEDiffGen.exe <pe1> <pe2> <output> The above command generates the result of pe1 - pe2 in memory (as in, m

null 19 Nov 26, 2022
This is just another Potato to get SYSTEM via SeImpersonate privileges.

MultiPotato First of all - credit to @splinter_code & @decoder_it for RoguePotato as this code heavily bases on it. This is just another Potato to get

null 467 Dec 29, 2022
Just another short video app (not tiktok) but 3 in 1.

Short videos app - India Another short videos app for Hindi audience. Made with 3 different apis: Moj app Josh app Chingari app Authetication No authe

Not Your Surya 2 Jan 6, 2022
Just another "Won't Fix" Windows Privilege Escalation from User to Domain Admin.

RemotePotato0 Just another "Won't Fix" Windows Privilege Escalation from User to Domain Admin. RemotePotato0 is an exploit that allows you to escalate

null 1.1k Dec 28, 2022
Han: ANother SOLOminer

HAN Han: ANother SOLOminer WARNING: you may have to wait longer than the current age of the universe to find a valid block. Introduction HAN is a solo

Valerio Vaccaro 9 Nov 6, 2022