WFA2-lib: Wavefront alignment algorithm library v2

Overview

WFA2-lib

1. INTRODUCTION

1.1 What is WFA?

The wavefront alignment (WFA) algorithm is an exact gap-affine algorithm that takes advantage of homologous regions between the sequences to accelerate the alignment process. Unlike to traditional dynamic programming algorithms that run in quadratic time, the WFA runs in time O(ns+s^2), proportional to the sequence length n and the alignment score s, using O(s^2) memory. Moreover, the WFA algorithm exhibits simple computational patterns that the modern compilers can automatically vectorize for different architectures without adapting the code. To intuitively illustrate why the WFA algorithm is so interesting, look at the following figure. The left panel shows the cells computed by a classical dynamic programming based algorithm (like Smith-Waterman or Needleman Wunsch). In contrast, the right panel shows the cells computed by the WFA algorithm to obtain the same result (i.e., the optimal alignment).

1.2 What is WFA2-lib?

The WFA2 library implements the WFA algorithm for different distance metrics and alignment modes. It supports various distance functions: indel, edit, gap-lineal, gap-affine, and dual-gap gap-affine distances. The library allows computing only the score or the complete alignment (i.e., CIGAR) (see Alignment Scope). Also, the WFA2 library supports computing end-to-end alignments (a.k.a. global-alignment) and ends-free alignments (including semi-global, glocal, and extension alignment) (see Alignment Span). In the case of long and noisy alignments, the library provides different low-memory modes that significantly reduce the memory usage of the naive WFA algorithm implementation. Beyond the exact-alignment modes, the WFA2 library implements heuristic modes that dramatically accelerate the alignment computation. Additionally, the library provides many other support functions to display and verify alignment results, control the overall memory usage, and more.

1.3 Getting started

Git clone and compile the library, tools, and examples.

$> git clone https://github.com/smarco/WFA2-lib
$> cd WFA2-lib
$> make clean all

1.4 Contents (where to go from here)

Section WFA2-lib features explores the most relevant options and features of the library. Then, the folder tools/ contains tools that can be used to execute and understand the WFA2 library capabilities. Additionally, the folder examples/ contains simple examples illustrating how to integrate the WFA2 code into any tool.

1.5 Important notes and clarifications

  • The WFA algorithm is an exact algorithm. If no heuristic is applied (e.g., band or adaptive pruning), the core algorithm guarantees to always find the optimal solution (i.e., best alignment score). Since its first release, some authors have referenced the WFA as approximated or heuristic, which is NOT the case.

  • Given two sequences of length n, traditional dynamic-programming (DP) based methods (like Smith-Waterman or Needleman-Wunsch) compute the optimal alignment in O(n^2) time, using O(n^2) memory. In contrast, the WFA algorithm requires O(ns+s^2) time and O(s^2) memory (being s the optimal alignment score). Therefore, the memory consumption of the WFA algorithm is not intrinsically higher than that of other methods. Most DP-based methods can use heuristics (like banded, X-drop, or Z-drop) to reduce the execution time and the memory usage at the expense of losing accuracy. Likewise, the WFA algorithm can also use heuristics to reduce the execution time and memory usage.

  • A note for the fierce competitors. I can understand that science and publishing have become a fierce competition these days. Many researchers want their methods to be successful and popular, seeking funding, tenure, or even fame. If you are going to benchmark the WFA using the least favourable configuration, careless programming, and a disadvantageous setup, please, go ahead. But remember, researchers like you have put a lot of effort into developing the WFA. We all joined this "competition" because we sought to find better methods that could truly help other researchers. So, try to be nice, tone down the marketing, and produce fair evaluations and honest publications.

2. USING WFA2-LIB IN YOUR PROJECT

2.1 Simple C example

This simple example illustrates how to align two sequences using the WFA2 library. First, include the WFA2 alignment headers.

#include "wavefront/wavefront_align.h"

Next, create and configure the WFA alignment object. The following example uses the defaults configuration and sets custom gap_affine penalties. Note that mismatch, gap-opening, and gap-extension must be positive values.

// Configure alignment attributes
wavefront_aligner_attr_t attributes = wavefront_aligner_attr_default;
attributes.distance_metric = gap_affine;
attributes.affine_penalties.mismatch = 4;
attributes.affine_penalties.gap_opening = 6;
attributes.affine_penalties.gap_extension = 2;
// Initialize Wavefront Aligner
wavefront_aligner_t* const wf_aligner = wavefront_aligner_new(&attributes);

Finally, call the wavefront_align function.

char* pattern = "TCTTTACTCGCGCGTTGGAGAAATACAATAGT";
char* text    = "TCTATACTGCGCGTTTGGAGAAATAAAATAGT";
wavefront_align(wf_aligner,pattern,strlen(pattern),text,strlen(text)); // Align

Afterwards, we can use the library to display the alignment result (e.g., the alignment score and CIGAR).

// Display CIGAR & score
cigar_print_pretty(stderr,pattern,strlen(pattern),text,strlen(text),
                   &wf_aligner->cigar,wf_aligner->mm_allocator);
fprintf(stderr,"Alignment Score %d\n",wf_aligner->cigar.score);

At the end of the program, it is polite to release the memory used.

wavefront_aligner_delete(wf_aligner); // Free

To compile and run this example, you need to link against the WFA library (-lwfa).

$> gcc -O3 wfa_example.c -o wfa_example -lwfa
$> ./wfa_example

IMPORTANT. Once an alignment object is created, it is strongly recommended to reuse it to compute multiple alignments. Creating and destroying the alignment object for every alignment computed can have a significant overhead. Reusing the alignment object allows repurposing internal data structures, minimising the cost of memory allocations, and avoiding multiple alignment setups and precomputations.

2.2 Simple C++ example

The WFA2 library can be used from C++ code using the C++ bindings. This example is similar to the previous one but uses C++ bindings. First, include the C++ bindings and remember to use the WFA namespace.

#include "bindings/cpp/WFAligner.hpp"
using namespace wfa;

Configure and create the WFA alignment object. In this case, gap-affine distance using custom penalties and the standard memory-usage algorithm (i.e., standard WFA algorithm).

// Create a WFAligner
WFAlignerGapAffine aligner(4,6,2,WFAligner::Alignment,WFAligner::MemoryHigh);

Align two sequences (in this case, given as strings).

string pattern = "TCTTTACTCGCGCGTTGGAGAAATACAATAGT";
string text    = "TCTATACTGCGCGTTTGGAGAAATAAAATAGT";
aligner.alignEnd2End(pattern,text); // Align

Display the result of the alignment.

// Display CIGAR & score
string cigar = aligner.getAlignmentCigar();
cout << "CIGAR: " << cigar  << endl;
cout << "Alignment score " << aligner.getAlignmentScore() << endl;

IMPORTANT. Once an alignment object is created, it is strongly recommended to reuse it to compute multiple alignments. Creating and destroying the alignment object for every alignment computed can have a significant overhead. Reusing the alignment object allows repurposing internal data structures, minimising the cost of memory allocations, and avoiding multiple alignment setups and precomputations.

3. WFA2-LIB FEATURES

  • Exact alignment method that computes the optimal alignment score and/or alignment CIGAR.
  • Supports multiple distance metrics (i.e., indel, edit, gap-lineal, gap-affine, and dual-gap gap-affine).
  • Allows performing end-to-end (a.k.a. global) and ends-free (e.g., semi-global, extension, overlap) alignment.
  • Implements low-memory modes to reduce and control memory consumption.
  • Supports various heuristic strategies to use on top of the core WFA algorithm.
  • WFA2-lib operates with plain ASCII strings. Although we mainly focus on aligning DNA/RNA sequences, the WFA algorithm and the WFA2-lib implementation work with any pair of strings. Moreover, these sequences do not have to be pre-processed (e.g., packed or profiled), nor any table must be precomputed (like the query profile, used within some Smith-Waterman implementations).
  • Due to its simplicity, the WFA algorithm can be automatically vectorized for any SIMD-compliant CPU supported by the compiler. For this reason, the WFA2-lib implementation is independent of any specific ISA or processor model. Unlike other hardware-dependent libraries, we aim to offer a multiplatform pairwise-alignment library that can be executed on different processors and models (e.g., SSE, AVX2, AVX512, POWER-ISA, ARM, NEON, SVE, SVE2, RISCV-RVV, ...).

3.1 Distance Metrics

The WFA2 library implements the wavefront algorithm for the most widely used distance metrics. The practical alignment time can change depending on the distance function, although the computational complexity always remains proportional to the alignment score or distance. The WFA2 library offers the following distance metrics or functions:

  • Indel (or LCS). Produces alignments allowing matches, insertions, and deletions with unitary cost (i.e., {M,I,D} = {0,1,1}) but not mismatches. Also known as the longest common subsequence (LCS) problem. The LCS is defined as the longest subsequence common to both sequences, provided that the characters of the subsequence are not required to occupy consecutive positions within the original sequences.
    PATTERN    A-GCTA-GTGTC--AATGGCTACT-T-T-TCAGGTCCT
               |  ||| |||||    |||||||| | | |||||||||
    TEXT       AA-CTAAGTGTCGG--TGGCTACTATATATCAGGTCCT
    ALIGNMENT  1M1I1D3M1I5M2I2D8M1I1M1I1M1I9M
    // Configuration
    wavefront_aligner_attr_t attributes = wavefront_aligner_attr_default;
    attributes.distance_metric = indel;
  • Edit (a.k.a. Levenshtein). Produces alignments allowing matches, mismatches, insertions, and deletions with unitary cost (i.e., {M,X,I,D} = {0,1,1,1}). Edit or Levenshtein distance between two sequences is the minimum number of single-character edits (i.e., insertions, deletions, or mismatches) required to transform one sequence into the other.
    PATTERN    AGCTA-GTGTCAATGGCTACT-T-T-TCAGGTCCT
               | ||| |||||  |||||||| | | |||||||||
    TEXT       AACTAAGTGTCGGTGGCTACTATATATCAGGTCCT
    ALIGNMENT  1M1X3M1I5M2X8M1I1M1I1M1I9M
    // Configuration
    wavefront_aligner_attr_t attributes = wavefront_aligner_attr_default;
    attributes.distance_metric = edit;
  • Gap-linear (as in Needleman-Wunsch). Produces alignments allowing matches, mismatches, insertions, and deletions. Allows assigning a penalty (a.k.a. cost or weight) to each alignment operation. It computes the optimal alignment, minimizing the overall cost to transform one sequence into the other. Under the gap-linear model, the alignment score is computed based on {X,I}⁠, where X corresponds to the mismatch penalty and the gap penalty is expressed as the function l(N)=N·I (given the length of the gap N and the gap penalty I).
    PATTERN    A-GCTA-GTGTC--AATGGCTACT-T-T-TCAGGTCCT
               |  ||| |||||    |||||||| | | |||||||||
    TEXT       AA-CTAAGTGTCGG--TGGCTACTATATATCAGGTCCT
    ALIGNMENT  1M1I1D3M1I5M2I2D8M1I1M1I1M1I9M
    // Configuration
    wavefront_aligner_attr_t attributes = wavefront_aligner_attr_default;
    attributes.distance_metric = gap_linear;
    attributes.linear_penalties.mismatch = 6; // X > 0 
    attributes.linear_penalties.indel = 2;    // I > 0
  • Gap-affine (as in Smith-Waterman-Gotoh). Linear gap cost functions can lead to alignments populated with small gaps. Long gaps are preferred in certain scenarios, like genomics or evolutionary studies (understood as a single event). Under the gap-affine model, the alignment score is computed based on {X,O,E}⁠, where X corresponds to the mismatch penalty and the gap penalty is expressed as the function g(N)=O+N·E (given the length of the gap N, the gap opening penalty O, and the gap extension penalty E).
    PATTERN    AGCTA-GTGTCAATGGCTACT---TTTCAGGTCCT
               | ||| |||||  ||||||||   | |||||||||
    TEXT       AACTAAGTGTCGGTGGCTACTATATATCAGGTCCT
    ALIGNMENT  1M1X3M1I5M2X8M3I1M1X9M
    // Configuration
    wavefront_aligner_attr_t attributes = wavefront_aligner_attr_default;
    attributes.distance_metric = gap_affine;
    attributes.affine_penalties.mismatch = 6;      // X > 0
    attributes.affine_penalties.gap_opening = 4;   // O >= 0
    attributes.affine_penalties.gap_extension = 2; // E > 0
  • Dual-cost gap-affine distances. Also known as piece-wise gap-affine cost, this distance metric addresses some issues that the regular gap-affine distance has with long gaps. In a nutshell, the regular gap-affine distance can occasionally split long gaps by sporadic mismatches (often when aligning long and noisy sequences). Instead, many users would prefer to increase the open gap cost to produce a single long gap. For that, the dual-cost gap-affine distance (p=2) defines two affine cost functions (i.e., for short and long gaps). Then, the alignment score is computed based on {X,O1,E1,O2,E2}⁠, where X corresponds to the mismatch penalty and the gap penalty is expressed as the function g(N)=min{O1+N·E1,O2+N·E2} (given the length of the gap N, the gap opening penalties O1 and O2, and the gap extension penalties E1 and E2).
    wavefront_aligner_attr_t attributes = wavefront_aligner_attr_default;
    attributes.distance_metric = gap_affine_2p;
    attributes.affine2p_penalties.mismatch = 6;       // X > 0
    attributes.affine2p_penalties.gap_opening1 = 4;   // O1 >= 0
    attributes.affine2p_penalties.gap_extension1 = 2; // E1 > 0
    attributes.affine2p_penalties.gap_opening2 = 12;  // O2 >= 0
    attributes.affine2p_penalties.gap_extension2 = 1; // E2 > 0

3.2 Alignment Scope

Depending on the use case, it is often the case that an application is only required to compute the alignment score, not the complete alignment (i.e., CIGAR). As it happens with traditional dynamic programming algorithms, the WFA algorithm requires less memory (i.e., O(s)) to compute the alignment score. In turn, this results in slighter faster alignment executions. For this reason, the WFA2 library implements two different modes depending on the alignment scope: score-only and full-CIGAR alignment.

The ** score-only alignment ** mode computes only the alignment score. This mode utilizes only the front-wavefronts of the WFA algorithm to keep track of the optimal alignment score. As a result, it requires O(s) memory and, in practice, performs slighter faster than the standard full-CIGAR mode.

    wavefront_aligner_attr_t attributes = wavefront_aligner_attr_default;
    attributes.alignment_scope = compute_score;

The ** full-CIGAR alignment ** computes the sequence of alignment operations (i.e., {'M','X','D','I'}) that transforms one sequence into the other (i.e., alignment CIGAR). The alignment score can be obtained as a by-product of the alignment process, evaluating the score of the alignment CIGAR. This mode requires O(s^2) memory (using the default memory mode, wavefront_memory_high) or less (using the low-memory modes).

    wavefront_aligner_attr_t attributes = wavefront_aligner_attr_default;
    attributes.alignment_scope = compute_alignment;

3.3 Alignment Span

The WFA2 library allows computing alignments with different spans or shapes. Although there is certain ambiguity and confusion in the terminology, we have tried to generalize the different options available to offer flexible parameters that can capture multiple alignment scenarios. During the development of the WFA we decided to adhere to the classical approximate string matching terminology where we align a pattern (a.k.a. query or sequence) against a text (a.k.a. target, database, or reference).

  • End-to-end alignment. Also known as global alignment, this alignment mode forces aligning the two sequences from the beginning to end of both.
    PATTERN    AATTAATTTAAGTCTAGGCTACTTTCGGTACTTTGTTCTT
               ||||    ||||||||||||||||||||||||||   |||
    TEXT       AATT----TAAGTCTAGGCTACTTTCGGTACTTT---CTT
    wavefront_aligner_attr_t attributes = wavefront_aligner_attr_default;
    attributes.alignment_form.span = alignment_end2end;
  • Ends-free alignment. This alignment mode allows leading and trailing insertions or deletions for "free" (i.e., no penalty/cost on the overall alignment score). Moreover, this alignment mode allows determining the maximum gap length allowed for free at the beginning and end of the sequences. Note that this mode does not implement local alignment as it does not allow free insertions and deletions at the beginning/end of the sequences at the same time. However, it allows many different configurations used across different analyses, methods, and tools.
    PATTERN    AATTAATTTAAGTCTAGGCTACTTTCGGTACTTTGTTCTT
                   |||||||||||||||||||||||||||||| ||   
    TEXT       ----AATTTAAGTCTAGGCTACTTTCGGTACTTTCTT---
    wavefront_aligner_attr_t attributes = wavefront_aligner_attr_default;
    attributes.alignment_form.span = alignment_endsfree;
    attributes.alignment_form.pattern_begin_free = pattern_begin_free;
    attributes.alignment_form.pattern_end_free = pattern_end_free;
    attributes.alignment_form.text_begin_free = text_begin_free;
    attributes.alignment_form.text_end_free = text_end_free;
  • Other
Glocal alignment (a.k.a. semi-global or fitting)

  • Glocal alignment (a.k.a. semi-global or fitting). Alignment mode where the pattern is globally aligned and the text is locally aligned. Often due to the large size of one of the sequences (e.g., the text sequence being a genomic reference), this alignment mode forces one sequence (i.e., pattern) to align globally to a substring of the other (i.e., text).
    PATTERN    -------------AATTTAAGTCTAGGCTACTTTC---------------
                            ||||||||| ||||||||||||               
    TEXT       ACGACTACTACGAAATTTAAGTATAGGCTACTTTCCGTACGTACGTACGT
    wavefront_aligner_attr_t attributes = wavefront_aligner_attr_default;
    attributes.alignment_form.span = alignment_endsfree;
    attributes.alignment_form.pattern_begin_free = 0;
    attributes.alignment_form.pattern_end_free = 0;
    attributes.alignment_form.text_begin_free = text_begin_free;
    attributes.alignment_form.text_end_free = text_end_free;

Extension alignment

  • Extension alignment. Alignment mode where the start of both pattern and text sequences are forced to be aligned. However, the ends of both are free. This alignment mode is typically used within seed-and-extend algorithms.
    // Right extension
    wavefront_aligner_attr_t attributes = wavefront_aligner_attr_default;
    attributes.alignment_form.span = alignment_endsfree;
    attributes.alignment_form.pattern_begin_free = 0;
    attributes.alignment_form.pattern_end_free = pattern_end_free;
    attributes.alignment_form.text_begin_free = 0;
    attributes.alignment_form.text_end_free = text_end_free;
    
    PATTERN    AATTTAAGTCTG-CTACTTTCACGCA-GCT----------
               ||||| |||||| ||||||||||| | | |          
    TEXT       AATTTCAGTCTGGCTACTTTCACGTACGATGACAGACTCT
    // Left extension
    wavefront_aligner_attr_t attributes = wavefront_aligner_attr_default;
    attributes.alignment_form.span = alignment_endsfree;
    attributes.alignment_form.pattern_begin_free = pattern_begin_free;
    attributes.alignment_form.pattern_end_free = 0;
    attributes.alignment_form.text_begin_free = text_begin_free;
    attributes.alignment_form.text_end_free = 0;
    
    PATTERN    -------------AAACTTTCACGTACG-TGACAGTCTCT
                              ||||||||||||| |||||| ||||
    TEXT       AATTTCAGTCTGGCTACTTTCACGTACGATGACAGACTCT

Overlapped alignment

  • Overlapped alignment (a.k.a. dovetail).
    // Overlapped (Right-Left)
    wavefront_aligner_attr_t attributes = wavefront_aligner_attr_default;
    attributes.alignment_form.span = alignment_endsfree;
    attributes.alignment_form.pattern_begin_free = pattern_begin_free;
    attributes.alignment_form.pattern_end_free = 0;
    attributes.alignment_form.text_begin_free = 0;
    attributes.alignment_form.text_end_free = text_end_free;
    
    PATTERN    ACGCGTCTGACTGACTGACTAAACTTTCATGTAC-TGACA-----------------
                                   ||||||||| |||| |||||                 
    TEXT       --------------------AAACTTTCACGTACGTGACATATAGCGATCGATGACT
    // Overlapped (Left-Right)
    wavefront_aligner_attr_t attributes = wavefront_aligner_attr_default;
    attributes.alignment_form.span = alignment_endsfree;
    attributes.alignment_form.pattern_begin_free = 0;
    attributes.alignment_form.pattern_end_free = pattern_end_free;
    attributes.alignment_form.text_begin_free = text_begin_free;
    attributes.alignment_form.text_end_free = 0;

    PATTERN    ----------------------ACGCGTCTGACTGACTACGACTACGACTGACTAGCAT
                                     ||||||||| || ||                      
    TEXT       ACATGCATCGATCAGACTGACTACGCGTCTG-CTAAC----------------------

3.4 Memory modes

The WFA2 library implements various memory modes: wavefront_memory_high, wavefront_memory_med, wavefront_memory_low. These modes allow regulating the overall memory consumption at the expense of execution time. The standard WFA algorithm, which stores explicitly all wavefronts in memory, correspond to the mode wavefront_memory_high. The other methods progressively reduce memory usage at the expense of slightly larger alignment times. These memory modes can be used transparently with other alignment options and generate identical results. Note that this option does not affect the score-only alignment mode (it already uses a minimal memory footprint of O(s)).

  wavefront_aligner_attr_t attributes = wavefront_aligner_attr_default;
  attributes.memory_mode = wavefront_memory_med;

3.5 Heuristic modes

The WFA algorithm can be used combined with many heuristics to reduce the alignment time and memory used. As it happens to other alignment methods, heuristics can result in suboptimal solutions and loss of accuracy. Moreover, some heuristics may drop the alignment if the sequences exceed certain divergence thresholds (i.e., x-drop/z-drop). Due to the popularity and efficiency of these methods, the WFA2 library implements many of these heuristics. Note, it is not about how little DP-matrix you compute, but about how good the resulting alignments are.

  • None (for comparison). If no heuristic is used, the WFA behaves exploring cells of the DP-matrix in increasing score order (increasing scores correspond to colours from blue to red).

Full-WFA

  wavefront_aligner_attr_t attributes = wavefront_aligner_attr_default;
  attributes.heuristic.strategy = wf_heuristic_none;
  • Banded alignment. Sets a fixed band in the diagonals preventing the wavefront from growing beyond those limits. It allows setting the minimum diagonal (i.e., min_k) and maximum diagonal (i.e., max_k).

Banded(10,10)

Banded(10,150)

  wavefront_aligner_attr_t attributes = wavefront_aligner_attr_default;
  attributes.heuristic.strategy = wf_heuristic_banded_static;
  attributes.heuristic.min_k = -10;
  attributes.heuristic.max_k = +10;
  • Adaptive-Band alignment. Similar to the static-band heuristic, it allows the band to move towards the diagonals closer to the end of the alignment. Unlike the static-band that is performed on each step, the adaptive-band heuristics allows configuring the number of steps between heuristic band cut-offs.

Adaptive-Band(10,10,1)

Adaptive-Band(50,50,1)

  wavefront_aligner_attr_t attributes = wavefront_aligner_attr_default;
  attributes.heuristic.strategy = wf_heuristic_banded_adaptive;
  attributes.heuristic.min_k = -10;
  attributes.heuristic.max_k = +10;
  attributes.heuristic.steps_between_cutoffs = 1;
  • Adaptive-Wavefront alignment. This WFA heuristic removes outer diagonals that are extremely far behind compared to other ones in the same wavefront. Unlike other methods, the adaptive-wavefront reduction heuristic prunes based on the potential of the diagonal to lead to the optimal solution without previous knowledge of the error between the sequences.

Adaptive-WF(10,50)

Adaptive-WF(10,50,10)

  wavefront_aligner_attr_t attributes = wavefront_aligner_attr_default;
  attributes.heuristic.strategy = wf_heuristic_wfadaptive;
  attributes.heuristic.min_wavefront_length = 10;
  attributes.heuristic.max_distance_threshold = 50;
  attributes.heuristic.steps_between_cutoffs = 1;
  • X-drop. [Under Testing] Implements the classical X-drop heuristic to abandon diagonals (or even alignments) that fall more than X from the previous best-observed score.
  wavefront_aligner_attr_t attributes = wavefront_aligner_attr_default;
  attributes.heuristic.strategy = wf_heuristic_xdrop;
  attributes.heuristic.xdrop = 100;
  attributes.heuristic.steps_between_cutoffs = 100;
  • Z-drop. [Under Testing] Implements the Z-drop heuristic. It drops the diagonals (or even the alignment) if the score drops too fast in the diagonal direction.
  wavefront_aligner_attr_t attributes = wavefront_aligner_attr_default;
  attributes.heuristic.strategy = wf_heuristic_zdrop;
  attributes.heuristic.zdrop = 100;
  attributes.heuristic.steps_between_cutoffs = 100;

4. REPORTING BUGS AND FEATURE REQUEST

Feedback and bug reporting is highly appreciated. Please report any issue or suggestion on github or email to the main developer ([email protected]).

5. LICENSE

WFA2-lib is distributed under MIT licence.

6. AUTHORS

Santiago Marco-Sola ([email protected]) is the main developer and the person you should address your complaints.

Andrea Guarracino and Erik Garrison have contributed to the design of new features and intensive testing of this library.

Miquel Moretó has contributed with fruitful technical discussions and tireless efforts seeking funding, so we could keep working on this project.

7. ACKNOWLEDGEMENTS

  • Baoxing Song and Buckler's lab for their interest and help promoting the WFA and pushing for the inclusion of new features.

  • Juan Carlos Moure and Antonio Espinosa for their collaboration and support of this project.

8. CITATION

Santiago Marco-Sola, Juan Carlos Moure, Miquel Moreto, Antonio Espinosa. "Fast gap-affine pairwise alignment using the wavefront algorithm." Bioinformatics, 2020.

Issues
  • Other Landau-Vishkin implementations

    Other Landau-Vishkin implementations

    Are you aware of any other competitive / comparable in speed implementation of the basic Landau-Vishkin algorithm that WFA extends?

    I tried looking for one, but couldn't find any, and since you also don't compare to them, this may not exist?

    opened by RagnarGrootKoerkamp 12
  • Segmentation fault (core dumped)

    Segmentation fault (core dumped)

    Hi,

    I used the following code to align several pairs of sequences and got segfault on one pair of sequences. The sequences are attached. Please help. Thanks!

      // Configure alignment attributes
      wavefront_aligner_attr_t attributes = wavefront_aligner_attr_default;
      attributes.distance_metric = edit;
      attributes.alignment_scope = compute_score;
      attributes.alignment_form.span = alignment_endsfree;
      attributes.alignment_form.pattern_begin_free = 0;
      attributes.alignment_form.pattern_end_free = 0;
      attributes.alignment_form.text_begin_free = 0;
      attributes.alignment_form.text_end_free = (text_length < pattern_length ? ext_length : pattern_length) / 2;
      // Initialize Wavefront Aligner
      wavefront_aligner_t* const wf_aligner = wavefront_aligner_new(&attributes);
      // Align
      wavefront_align(wf_aligner,pattern,strlen(pattern),text,strlen(text));
      fprintf(stderr,"WFA-Alignment returns score %d\n",wf_aligner->cigar.score);
      // Free
      wavefront_aligner_delete(wf_aligner);
    

    sequences.zip

    bug 
    opened by haowenz 8
  • Unexpected alignment failures

    Unexpected alignment failures

    I have integrated WFA2-lib into an app, and I keep getting difficult to track alignment failures. After some successful alignments, wavefront_align() will just return -1. The exact point where this happens, depends on whether I build in debug or release mode and whether I turn on sanitizers or not. Release mode produces many hundred correct alignments, debug fails after a couple and with ASAN, it fails immediately.

    I suspect there is undefined behaviour somewhere in WFA2-lib.

    Here is a minimal example:

    extern "C" {
    #include <wavefront/wavefront_align.h>
    }
    
    #include <string>
    
    void do_align(std::string_view const pattern, std::string_view const ref)
    {
      // Configure alignment attributes
      wavefront_aligner_attr_t attributes = wavefront_aligner_attr_default;
      attributes.alignment_scope = compute_alignment;
      attributes.distance_metric = gap_affine;
      attributes.affine_penalties.mismatch = 3;
      attributes.affine_penalties.gap_opening = 5;
      attributes.affine_penalties.gap_extension = 1;
      attributes.alignment_form.span = alignment_endsfree;
      attributes.alignment_form.pattern_begin_free = 0;
      attributes.alignment_form.pattern_end_free = 0;
      attributes.alignment_form.text_begin_free = 1;
      attributes.alignment_form.text_end_free = 1;
      attributes.heuristic.strategy = wf_heuristic_wfadaptive;
      attributes.heuristic.min_wavefront_length = 10;
      attributes.heuristic.max_distance_threshold = 10;
      attributes.heuristic.steps_between_cutoffs = 1;
    
      wavefront_aligner_t * const wf_aligner = wavefront_aligner_new(&attributes);
    
      int res = wavefront_align(wf_aligner, pattern.data(), pattern.size(), ref.data(), ref.size());
    
      cigar_print_pretty(stderr, pattern.data(), pattern.size(), ref.data(), ref.size(),
                         &wf_aligner->cigar, wf_aligner->mm_allocator);
      fprintf(stderr,"Alignment Score %d\nResult:%d\n", wf_aligner->cigar.score, res);
    
      assert(res != -1);
      wavefront_aligner_delete(wf_aligner);
    }
    
    int main()
    {
        do_align("GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGTCAGGAGCCGAGCCGACGAGGTGGTGATGTTGGTCGGGCGTGATCCGGGGTGGCGTGACGAGGATGGCGGGGTGGTAGCGGGGGGGGGGGGGGGCGGGCGGGCGGGGGGGGGGGGCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG",
                 "GGGGGGGGGGGGATTATACAGCAAATTTACTTAAAAATGTGTATTAGTCAGATTTTTAGTTACTCATGGGTAAATGCAATCCCTAATTAAGGGTGTGAAGTGAGTGCTGAAACTTGCTTAGGAAAAGAGGTGGAAAAATTGGATGGGAATTAAGCATAGAGGTACCACGAAGTATCTGAAATTGTTTGGTTATGTCTGTAGACAAATCAAATGCTTAAACAAAATAAACTGAAATTTTCAACACATGCACACACACAGTCCTCATACTTTTAGATTTTTAGTTTAAAAAATAAGT");
    }
    

    When building with these options: g++ wfabug.cpp -I ~/devel/WFA2-lib ~/devel/WFA2-lib/lib/libwfa.a it prints:

          ALIGNMENT 12M1X25I1M1X1M48D26I3M20I3M1X1M1X1M2X1M1X1M1X1M1X1M33D1M1X1M1X1M4I2M2X2M3X1M1X7M2X1M2X4M2X3M2X1M1X1M2X1M3X1M1X2M4X4M1X1M8I2M1X1M3X3M2X1M16D1M1X1M1X1M32I1M37D41I1M43D20I1M1I
          ALIGNMENT.COMPACT 1X25I1X48D26I20I1X1X2X1X1X1X33D1X1X4I2X3X1X2X2X2X2X1X2X3X1X4X1X8I1X3X2X16D1X1X32I37D41I43D20I1I
          PATTERN    GGGGGGGGGGGGG-------------------------GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG--------------------------GGG--------------------GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGTCAGGA----GCCGAGCCGACGAGGTGGTGATGTTGGTCGGGCGTGATCCGGGGTGGCGTGACGAGG--------ATGGCGGGGTGGTAGCGGGGGGGGGGGGGGGCGG--------------------------------GCGGGCGGGGGGGGGGGGCGGGGGGGGGGGGGGGGGGG-----------------------------------------GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG--------------------G-
                     ||||||||||||                          | |                                                                          |||                    ||| | |  | | | |                                 | | |    ||  ||   | |||||||  |  ||||  |||  | |  |   | ||    |||| |        || |   |||  |                | | |                                |                                                                              |                                                               | 
          TEXT       GGGGGGGGGGGGATTATACAGCAAATTTACTTAAAAATGTG------------------------------------------------TATTAGTCAGATTTTTAGTTACTCATGGGTAAATGCAATCCCTAATTAAGGGTGTGAAGTGAGTG---------------------------------CTGAAACTTGCTTAGGAAAAGAGGTGGAAAAATTGGATGGGAATTAAGCATAGAGGTACCACGAAGTATCTGAAATTGTTTGGTTAT----------------GTCTGTAGACAAATCAAATGCTTAAACAAAATAAACTG-------------------------------------AAATTTTCAACACATGCACACACACAGTCCTCATACTTTTAG-------------------------------------------ATTTTTAGTTTAAAAAATAAGT
    Alignment Score -553
    Result:0
    

    When building with the address sanitizer: g++ -fsanitize=address wfabug.cpp -I ~/devel/WFA2-lib ~/devel/WFA2-lib/lib/libwfa.a it prints:

          ALIGNMENT
          ALIGNMENT.COMPACT
          PATTERN    GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGTCAGGAGCCGAGCCGACGAGGTGGTGATGTTGGTCGGGCGTGATCCGGGGTGGCGTGACGAGGATGGCGGGGTGGTAGCGGGGGGGGGGGGGGGCGGGCGGGCGGGGGGGGGGGGCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
                     ???????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????
          TEXT       GGGGGGGGGGGGATTATACAGCAAATTTACTTAAAAATGTGTATTAGTCAGATTTTTAGTTACTCATGGGTAAATGCAATCCCTAATTAAGGGTGTGAAGTGAGTGCTGAAACTTGCTTAGGAAAAGAGGTGGAAAAATTGGATGGGAATTAAGCATAGAGGTACCACGAAGTATCTGAAATTGTTTGGTTATGTCTGTAGACAAATCAAATGCTTAAACAAAATAAACTGAAATTTTCAACACATGCACACACACAGTCCTCATACTTTTAGATTTTTAGTTTAAAAAATAAGT
    Alignment Score -2147483648
    Result:-1
    a.out: wfabug.cpp:36: void do_align(std::string_view, std::string_view): Assertion `res != -1' failed.
    [1]    673339 IOT instruction (core dumped)  ./a.out
    
    bug 
    opened by h-2 5
  • Updated Makefile with option BUILD_MINIMAL

    Updated Makefile with option BUILD_MINIMAL

    Hi @smarco, I was having trouble syncing my old branch, so I started a new one. This updates the Makefile and adds the BUILD_MINIMAL option. This replaces the BUILD_CPP option in the previous pull request as I think it better describes the purpose of the option. If BUILD_MINIMAL is set, only the core C lib should be built without CPP bindings and extras. Perhaps this could instead be named BUILD_CLIB_ONLY?

    I also added the -fPIC flag which is currently needed for the python pywfa build. Thanks!

    opened by kcleal 3
  • Rust bindings and sign of returned score

    Rust bindings and sign of returned score

    We're writing a rust wrapper around WFA to do some testing.

    It seems that there is an inconsistency in the sign of the returned cost:

    • for unit and lcs, the returned value is positive: the 'cost' of the alignment.
    • for other cases gap_linear and gap_affine and gap_affine_2p, the returned value is negative: the 'score' of the alignment.

    You can see this is my code here and here respectively.

    I was trying to find some docs on this difference, but the readme does not mention this difference currently. Is it intentional?

    opened by RagnarGrootKoerkamp 3
  • [questions] benchmarks; scores VS penalties

    [questions] benchmarks; scores VS penalties

    Hi,

    this looks like a great project! I have two questions:

    1. Do you have benchmarks against libraries that implement NW? Like parasail and seqan? You say that auto-vectorisation of the library works well, do you have benchmarks for this? Do you assume that this is comparable to the speed-up that libraries with explicit vectorisation (inter-sequence batch vectorisation or anti-diagonals) get?
    2. You are currently only offering penalties (negative scores) and the match-"score" is fixed to 0. Is this a limitation of the algorithm, or would non-negative scores work as well? I think that would be a requirement to get Protein scoring matrixes to work?

    Thanks!

    opened by h-2 3
  • Meet in the middle

    Meet in the middle

    For Dijkstra, a common technique to reduce explored states is to use meet in the middle, where you search both from the start and end and stop as you expand a state from both ends.

    WFA is basically just a more efficient implementation of Dijkstra, since it also expands in order of increasing g (distance from start). Thus, meet in the middle should just work here.

    All you'd need to do is run is from both sides to the middle of the string (or alternate and run both sides to equal score). It shouldn't be hard to detect when the two searches overlap. Probably a check that the M from the start and M from the end on the same diagonal add to more than the sequence length is sufficient.

    The benefit here is that now the runtime is 2*(s/2)^2 = s^2/2, so around twice as fast as the original,. assuming the implementation doesn't slow down too much.

    (For non-global alignment this may be more tricky. Not sure yet.)

    I don't know if it's worth the effort right now and how much a 2x speedup matters compared to other optimizations you may be thinking of, but anyway at some point this could be nice.

    opened by RagnarGrootKoerkamp 3
  • Build issue on centos

    Build issue on centos

    Hi, Thanks for the really nice library. I have no problems building on my Mac using clang using make BUILD_TOOLS=0 BUILD_EXAMPLES=0 clean all

    However I run into problem when installing on centos, installing the c++ code (old gcc version?):

    make clean all
    rm -rf bin build lib
    make --directory=tools/align_benchmark clean
    make[1]: Entering directory `/scratch/c.sbi8kc2/WFA2-lib/tools/align_benchmark'
    rm -rf ./build
    make[1]: Leaving directory `/scratch/c.sbi8kc2/WFA2-lib/tools/align_benchmark'
    make --directory=examples clean
    make[1]: Entering directory `/scratch/c.sbi8kc2/WFA2-lib/examples'
    rm -rf bin
    make[1]: Leaving directory `/scratch/c.sbi8kc2/WFA2-lib/examples'
    make --directory=alignment all
    make[1]: Entering directory `/scratch/c.sbi8kc2/WFA2-lib/alignment'
    gcc -Wall -g -O3 -march=native -I.. -c affine_penalties.c -o ../build/affine_penalties.o
    gcc -Wall -g -O3 -march=native -I.. -c affine2p_penalties.c -o ../build/affine2p_penalties.o
    gcc -Wall -g -O3 -march=native -I.. -c cigar.c -o ../build/cigar.o
    gcc -Wall -g -O3 -march=native -I.. -c score_matrix.c -o ../build/score_matrix.o
    make[1]: Leaving directory `/scratch/c.sbi8kc2/WFA2-lib/alignment'
    make --directory=bindings/cpp all
    make[1]: Entering directory `/scratch/c.sbi8kc2/WFA2-lib/bindings/cpp'
    g++ -Wall -g -O3 -march=native -I../.. -c WFAligner.cpp -o ../../build/cpp/WFAligner.o
    WFAligner.cpp:57:3: warning: identifier ‘nullptr’ is a keyword in C++11 [-Wc++0x-compat]
       this->wfAligner = nullptr;
       ^
    WFAligner.cpp: In constructor ‘wfa::WFAligner::WFAligner(wfa::WFAligner::AlignmentScope, wfa::WFAligner::MemoryModel)’:
    WFAligner.cpp:57:21: error: ‘nullptr’ was not declared in this scope
       this->wfAligner = nullptr;
                         ^
    make[1]: *** [../../build/cpp/WFAligner.o] Error 1
    make[1]: Leaving directory `/scratch/c.sbi8kc2/WFA2-lib/bindings/cpp'
    make: *** [bindings/cpp] Error 2
    

    I tried deleting the /bindings/cpp from the Makefile so SUBDIRS looks like:

    SUBDIRS=alignment \
            system \
            utils \
            wavefront
    

    I also removed this line from the Makefile:

    $(AR) $(AR_FLAGS) $(LIB_WFA_CPP) $(FOLDER_BUILD)/*.o $(FOLDER_BUILD_CPP)/*.o 2> /dev/null The build now works with make BUILD_TOOLS=0 BUILD_EXAMPLES=0 clean all

    Would it be possible to add a c++ free rule to the makefile? Thanks again!

    opened by kcleal 3
  • Score is `-INF` when aligning two empty sequences

    Score is `-INF` when aligning two empty sequences

    When using BiWFA/ultralow memory mode and indel costs, aligning two empty sequences seems to return -2147483648 as the cost. This only happens when calling the library directly (via rust), and not when running align_benchmark as

    align_benchmark -a indel-wfa --input input  -o output --wfa-memory-mode ultralow
    

    where input simply contains

    >
    <
    

    I.e., this could very well be a bug on my side in the rust-c interface.

    opened by RagnarGrootKoerkamp 2
  • ends-free match score not used

    ends-free match score not used

    I have problem understanding the "ends-free" span mode. The match penalty (value <= 0) has no impact on the alignment and it is not counted towards the returned score.

    Here is an example: changing the match score has no impact. mismatch = 4, gap open = 6, gap extension = 2 pattern_begin_free=len(pattern)-1, pattern_end_free=0, text_begin_free=0, text_end_free=len(text)-1

    9D1X8I      ALIGNMENT
    9D1X8I      ALIGNMENT.COMPACT
          PATTERN    CCCCCCCCCC--------
    
          TEXT       ---------TCTTTTTTT
    
    opened by wongs2 2
  • Wrong output for two ~5Mb long sequences

    Wrong output for two ~5Mb long sequences

    I am using WFA2-lib as a library to align two MHC sequences. The edit distance should be 123502 but WFA2 gives 2164110 in very short time. The two sequences are MHC-00GRCh38 and MHC-CHM13 from file "MHC-61.agc" at this Zenodo record. The code calling WFA2 is here:

    https://github.com/lh3/lv89/blob/a16c30bba590e7de734edbdeeab55e4849e9ed20/main.c#L65-L71

    PS: the output is correct for two ~100k sequences.

    opened by lh3 2
  • Alignment with match score not optimal

    Alignment with match score not optimal

    Using the latest v2.2, with match score, the alignment is not optimal preferring a gap and not a mismatch. The alignment result is "8D1I1M7I", expecting "7D1X1M7I"

    int main(int argc,char* argv[]) {
      // Patter & Text
      char* pattern = "CCCCCCCCC";
      char* text    = "TCTTTTTTT";
      // Configure alignment attributes
      wavefront_aligner_attr_t attributes = wavefront_aligner_attr_default;
      attributes.distance_metric = gap_affine;
      attributes.affine_penalties.match = -10;
      attributes.affine_penalties.mismatch = 4;
      attributes.affine_penalties.gap_opening = 6;
      attributes.affine_penalties.gap_extension = 2;
    
      attributes.alignment_form.span = alignment_endsfree;
      attributes.alignment_form.pattern_begin_free = strlen(pattern)-1;
      attributes.alignment_form.pattern_end_free = 0;
      attributes.alignment_form.text_begin_free = 0;
      attributes.alignment_form.text_end_free = strlen(text)-1;
      // Initialize Wavefront Aligner
      wavefront_aligner_t* const wf_aligner = wavefront_aligner_new(&attributes);
      // Align
      wavefront_align(wf_aligner,pattern,strlen(pattern),text,strlen(text));
      fprintf(stderr,"WFA-Alignment returns score %d\n",wf_aligner->cigar.score);
      // Display alignment
      fprintf(stderr,"  PATTERN  %s\n",pattern);
      fprintf(stderr,"  TEXT     %s\n",text);
      fprintf(stderr,"  SCORE (RE)COMPUTED %d\n",
          cigar_score_gap_affine(&wf_aligner->cigar,&attributes.affine_penalties));
      cigar_print_pretty(stderr,
          pattern,strlen(pattern),text,strlen(text),
          &wf_aligner->cigar,wf_aligner->mm_allocator);
      // Free
      wavefront_aligner_delete(wf_aligner);
    }
    

    The result:

    WFA-Alignment returns score 64
      PATTERN  CCCCCCCCC
      TEXT     TCTTTTTTT
      SCORE (RE)COMPUTED -40
          ALIGNMENT	8D1I1M7I
          ALIGNMENT.COMPACT	8D1I7I
          PATTERN    CCCCCCCC-C-------
                              |
          TEXT       --------TCTTTTTTT
    
    opened by wongs2 4
  • how to use extension

    how to use extension

    Hi, I'm using the code below to evaluate using wfa-lib to extend given a seed. Based on the docs, I though it would have to extend left and then right in 2 different calls, but it seems that's not the case.

    In brief, given:

     
       //              0123456789012345678901234567890
       char *p =      "AAAAAAACCGGTTGGAAAAAAAAAAAAAAAAAA";
       //         0123456788901234
       char *t = "AAAAAAAAAAAACCGGTTGGAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA";
    
                 int pattern_begin_free = 7;
                 int pattern_end_free = 0;
                 int text_begin_free = 11;
                 int text_end_free =0;
    

    I get the full alignment:

    
          PATTERN    -----AAAAAAACCGGTTGGAAAAAAAAAAAAAAAAAA---------------
                          |||||||||||||||||||||||||||||||||               
          TEXT       AAAAAAAAAAAACCGGTTGGAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
    
    

    So, for seed and extend, is it only needed to do something like this and it will indeed extend right as well?

    Also, is there an efficient way for this to get the start, end of the alignment in the pattern and text? I don't need the full cigar or backtace, only the extents of the alignment.

    thanks in advance.

    here is the code I use for testing.

    #include "wavefront/wavefront_align.h"
    
    
    int wf_align(char *pattern, char *text, distance_metric_t distance_metric,
                 int match, /* 0 or less */
                 int mismatch, /* 1 or more */ 
                 int gap_open, /* 1 or more */
                 int gap_extend, /* 1 or more */
                 int pattern_begin_free,
                 int pattern_end_free,
                 int text_begin_free,
                 int text_end_free
       ) {
    
        wavefront_aligner_attr_t attr = wavefront_aligner_attr_default;
        attr.distance_metric = distance_metric;
        attr.distance_metric = gap_affine;
        attr.affine_penalties.match = match;
        attr.affine_penalties.mismatch = mismatch;
        attr.affine_penalties.gap_opening = gap_open;
        attr.affine_penalties.gap_extension = gap_extend;
        attr.alignment_scope = compute_alignment;
    
        attr.heuristic.strategy = wf_heuristic_wfadaptive;
        attr.heuristic.min_wavefront_length = 10;
        attr.heuristic.max_distance_threshold = 50;
        attr.heuristic.steps_between_cutoffs = 1;
       
        attr.alignment_form.span = alignment_endsfree;
        // right extension
        attr.alignment_form.pattern_begin_free = pattern_begin_free;
        attr.alignment_form.pattern_end_free = pattern_end_free;
        attr.alignment_form.text_begin_free = text_begin_free;
        attr.alignment_form.text_end_free = text_end_free;
    
        // left extension
        /*
        attr.alignment_form.pattern_begin_free = pattern_begin_free;
        attr.alignment_form.pattern_end_free = 0;
        attr.alignment_form.text_begin_free = text_begin_free;
        attr.alignment_form.text_end_free = 0;
        */
    
        wavefront_aligner_t* const wf_aligner = wavefront_aligner_new(&attr);
    
        wavefront_align(wf_aligner, pattern, strlen(pattern), text, strlen(text));
        cigar_print_pretty(stderr,
          pattern,strlen(pattern),text,strlen(text),
          &wf_aligner->cigar,wf_aligner->mm_allocator);
    
        wavefront_aligner_delete(wf_aligner);
    }
    
    int main() {
       //              0123456789012345678901234567890
       char *p =      "AAAAAAACCGGTTGGAAAAAAAAAAAAAAAAAA";
       //         0123456788901234
       char *t = "AAAAAAAAAAAACCGGTTGGAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA";
    
       wf_align(p, t, gap_affine, 0, 2, 2, 1, 7, 0, 11, 0);
    
    }
    
    
    opened by brentp 3
  • left shift of negative value

    left shift of negative value

    If you compile the latest https://github.com/armintoepfer/aligner-testbed with gcc10.3.0 with asan and ubsan

    $ meson -Db_sanitize=address,undefined
    $ ninja
    $ ./at ../data/clr1.txt --rounds 1 --log-level INFO --miniwfa=false --ksw2=false
    | 20220421 10:31:22.817 | INFO | Number of sequence pairs : 1
    ../subprojects/wfa/wavefront/wavefront_extend.c:116:20: runtime error: load of misaligned address 0x7f567e1c877e for type 'uint64_t', which requires 8 byte alignment
    0x7f567e1c877e: note: pointer points here
     3f 3f 3f 3f 41 54  43 54 43 54 43 54 43 41  41 43 41 41 43 41 43 43  43 41 43 47 54 41 47 47  41 47
                 ^
    ../subprojects/wfa/wavefront/wavefront_extend.c:116:38: runtime error: load of misaligned address 0x7f567e1d6462 for type 'uint64_t', which requires 8 byte alignment
    0x7f567e1d6462: note: pointer points here
     21 21  21 21 41 41 54 43 54 43  41 43 47 43 54 43 41 41  43 43 41 43 41 41 43 41  41 43 47 47 41 47
                  ^
    ../subprojects/wfa/wavefront/wavefront_extend.c:124:13: runtime error: load of misaligned address 0x7f567e1c87b3 for type 'uint64_t', which requires 8 byte alignment
    0x7f567e1c87b3: note: pointer points here
     54  43 54 43 54 43 41 41 41  43 43 41 43 41 41 43 41  41 43 47 47 41 47 47 41  47 47 41 47 47 41 41
                  ^
    ../subprojects/wfa/wavefront/wavefront_extend.c:124:31: runtime error: load of misaligned address 0x7f567e1d649d for type 'uint64_t', which requires 8 byte alignment
    0x7f567e1d649d: note: pointer points here
     54 43 54 43 41 54 43  41 41 43 41 41 43 47 41  41 43 41 41 43 47 47 41  47 47 41 47 47 41 47 47  41
                 ^
    ../subprojects/wfa/wavefront/wavefront_backtrace.c:217:12: runtime error: left shift of negative value -1073741823
    ../subprojects/wfa/wavefront/wavefront_backtrace.c:186:12: runtime error: left shift of negative value -1073741824
    ../subprojects/wfa/wavefront/wavefront_backtrace.c:245:12: runtime error: left shift of negative value -1073741822
    | 20220421 10:31:37.533 | INFO | WFA2 C    : 0 / 14s 715ms
    

    This is on a x86 AMD EPYC 7702 with no march.

    opened by armintoepfer 1
  • EXC_BAD_ACCESS

    EXC_BAD_ACCESS

    I'm running into an issue that I can't produce if I just give it one sequence pair...

    wfa::WFAlignerGapAffine2Pieces aligner(4, 4, 2, 24, 1, wfa::WFAligner::Alignment, wfa::WFAligner::MemoryHigh);
    
    * thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=EXC_I386_GPFLT)
        frame #0: 0x000000010026c28b libwfa.2.1.0.dylib`wavefronts_backtrace_del2_ext(wf_aligner=0x0000000128808010, score=22455, k=-34) at wavefront_backtrace.c:184:20
       181 	  if (score < 0) return WAVEFRONT_OFFSET_NULL;
       182 	  wavefront_t* const d2wavefront = wf_aligner->wf_components.d2wavefronts[score];
       183 	  if (d2wavefront != NULL &&
    -> 184 	      d2wavefront->lo <= k+1 &&
       185 	      k+1 <= d2wavefront->hi) {
       186 	    return BACKTRACE_PIGGYBACK_SET(d2wavefront->offsets[k+1],backtrace_D2_ext);
       187 	  } else {
    

    ASAN/UBSAN gives something else...

    ../subprojects/wfa/wavefront/wavefront_extend.c:116:20: runtime error: load of misaligned address 0x00010ce54c33 for type 'uint64_t', which requires 8 byte alignment
    0x00010ce54c33: note: pointer points here
     3f  3f 3f 3f 54 47 43 43 54  47 54 43 41 47 47 47 54  43 43 54 47 54 54 47 47  41 41 47 47 47 43 54
                  ^
    ../subprojects/wfa/wavefront/wavefront_extend.c:116:38: runtime error: load of misaligned address 0x00010ce5784a for type 'uint64_t', which requires 8 byte alignment
    0x00010ce5784a: note: pointer points here
     21 21  21 21 54 54 47 43 43 54  47 54 43 41 47 47 47 54  43 43 54 47 54 47 47 41  41 47 47 47 43 41
                  ^
    ../subprojects/wfa/wavefront/wavefront_extend.c:124:13: runtime error: load of misaligned address 0x00010d82ff8a for type 'uint64_t', which requires 8 byte alignment
    0x00010d82ff8a: note: pointer points here
     47 54  43 41 47 47 47 54 43 43  54 47 54 47 47 41 41 47  47 47 43 54 47 54 41 41  54 41 47 41 47 47
                  ^
    ../subprojects/wfa/wavefront/wavefront_extend.c:124:31: runtime error: load of misaligned address 0x00010d8305e4 for type 'uint64_t', which requires 8 byte alignment
    0x00010d8305e4: note: pointer points here
      47 54 43 41 47 47 47 54  43 43 54 47 54 47 47 41  41 47 47 47 43 41 54 54  54 43 41 54 41 47 47 47
    

    You can try to reproduce with https://github.com/armintoepfer/clr-align-challenge and then

    lldb -- ./cas ../data/long.txt
    
    opened by armintoepfer 12
  • Speed up WFA when two sequences differ greatly in length

    Speed up WFA when two sequences differ greatly in length

    When patching gaps between two anchors, we sometimes need to align two sequences of vastly different lengths. Suppose we are aligning a 10bp sequence against a 100kb sequence. The active band [lo,hi] in the original WFA will grow from 1 to 100010 in size. The total number of iterations is about 1000102/2. Based on the running time of WFA2-lib, I guess WFA2-lib has a similar behavior. This is not necessary given that a wavefront can only consist of 10 cells in this example.

    Generally, a key observation is that the active band is determined by the cells in the current stripe (the wfalm terminology). Sometimes we can decrease "hi" or increase "lo" when the wavefront hits the end of the query or the target. I added a hack to miniwfa to achieve that. For 10bp vs 100kb, WFA2-lib takes 44 sec while the modified miniwfa takes 0.2 sec. ksw2 only takes 0.01s mainly because WFA doesn't have an advantage on such examples and partly because my hack is inefficient. I think there should be a faster and cleaner solution but I haven't found that.

    bug feature discussion 
    opened by lh3 3
Owner
Santiago Marco-Sola
Santiago Marco-Sola
Lite.AI 🚀🚀🌟 is a user-friendly C++ lib for awesome🔥🔥🔥 AI models based on onnxruntime, ncnn or mnn. YOLOX, YoloV5, YoloV4, DeepLabV3, ArcFace, CosFace, Colorization, SSD

Lite.AI ?????? is a user-friendly C++ lib for awesome?????? AI models based on onnxruntime, ncnn or mnn. YOLOX??, YoloV5??, YoloV4??, DeepLabV3??, ArcFace??, CosFace??, Colorization??, SSD??, etc.

Def++ 2k Aug 9, 2022
Lite.AI 🚀🚀🌟 is a user friendly C++ lib of 60+ awesome AI models. YOLOX🔥, YoloV5🔥, YoloV4🔥, DeepLabV3🔥, ArcFace🔥, CosFace🔥, RetinaFace🔥, SSD🔥, etc.

Lite.AI ?? ?? ?? Introduction. Lite.AI ?? ?? ?? is a simple and user-friendly C++ library of awesome ?? ?? ?? AI models. It's a collection of personal

Def++ 1.9k Aug 3, 2022
copc-lib provides an easy-to-use interface for reading and creating Cloud Optimized Point Clouds

copc-lib copc-lib is a library which provides an easy-to-use reader and writer interface for COPC point clouds. This project provides a complete inter

Rock Robotic 18 Jun 15, 2022
JBDL: A JAX-Based Body Dynamics Algorithm Library forRobotics

JBDL: A JAX-Based Body Dynamics Algorithm Library forRobotics

Tencent Robotics X 19 Aug 1, 2022
An Efficient Implementation of Analytic Mesh Algorithm for 3D Iso-surface Extraction from Neural Networks

AnalyticMesh Analytic Marching is an exact meshing solution from neural networks. Compared to standard methods, it completely avoids geometric and top

Jiabao Lei 39 Jul 12, 2022
deploy yolox algorithm use deepstream

YOLOX(Megvii-BaseDetection) Deploy DeepStream ?? ?? This project base on https://github.com/Megvii-BaseDetection/YOLOX and https://zhuanlan.zhihu.com/

null 75 Jul 14, 2022
An Efficient Implementation of Analytic Mesh Algorithm for 3D Iso-surface Extraction from Neural Networks

AnalyticMesh Analytic Marching is an exact meshing solution from neural networks. Compared to standard methods, it completely avoids geometric and top

null 39 Jul 12, 2022
A C++ implementation of the MNN correction algorithm

C++ library for MNN correction Overview This library provides functionality for batch correction of arbitrary data via the use of mutual nearest neigh

Aaron Lun 1 Nov 24, 2021
The optical flow algorithm RAFT implemented with C++(Libtorch+TensorRT)

RAFT_CPP Attention/注意 There are some bug here,output the wrong result 代码存在bug,估计出来的光流值不准确,解决中 Quick Start 0.Export RAFT onnx model 首先加载训练完成的模型权重: pars

ChenJianqu 15 May 20, 2022
Compress life's valuable information using Huffman Coding algorithm!

Super Duper Compressor Compress and decompress files lossless using this amazing tool! No more spending your hard earned money to buy a brand new IBM

Matthew Ng 4 Mar 27, 2022
The dgSPARSE Library (Deep Graph Sparse Library) is a high performance library for sparse kernel acceleration on GPUs based on CUDA.

dgSPARSE Library Introdution The dgSPARSE Library (Deep Graph Sparse Library) is a high performance library for sparse kernel acceleration on GPUs bas

dgSPARSE 52 Jul 28, 2022
C-based/Cached/Core Computer Vision Library, A Modern Computer Vision Library

Build Status Travis CI VM: Linux x64: Raspberry Pi 3: Jetson TX2: Backstory I set to build ccv with a minimalism inspiration. That was back in 2010, o

Liu Liu 6.9k Jul 30, 2022
Edge ML Library - High-performance Compute Library for On-device Machine Learning Inference

Edge ML Library (EMLL) offers optimized basic routines like general matrix multiplications (GEMM) and quantizations, to speed up machine learning (ML) inference on ARM-based devices. EMLL supports fp32, fp16 and int8 data types. EMLL accelerates on-device NMT, ASR and OCR engines of Youdao, Inc.

NetEase Youdao 176 Jul 21, 2022
The Robotics Library (RL) is a self-contained C++ library for rigid body kinematics and dynamics, motion planning, and control.

Robotics Library The Robotics Library (RL) is a self-contained C++ library for rigid body kinematics and dynamics, motion planning, and control. It co

Robotics Library 592 Jul 29, 2022
A GPU (CUDA) based Artificial Neural Network library

Updates - 05/10/2017: Added a new example The program "image_generator" is located in the "/src/examples" subdirectory and was submitted by Ben Bogart

Daniel Frenzel 91 Jun 13, 2022
Header-only library for using Keras models in C++.

frugally-deep Use Keras models in C++ with ease Table of contents Introduction Usage Performance Requirements and Installation FAQ Introduction Would

Tobias Hermann 882 Aug 7, 2022
simple neural network library in ANSI C

Genann Genann is a minimal, well-tested library for training and using feedforward artificial neural networks (ANN) in C. Its primary focus is on bein

Lewis Van Winkle 1.3k Aug 4, 2022
oneAPI Deep Neural Network Library (oneDNN)

oneAPI Deep Neural Network Library (oneDNN) This software was previously known as Intel(R) Math Kernel Library for Deep Neural Networks (Intel(R) MKL-

oneAPI-SRC 2.9k Aug 8, 2022
A lightweight C library for artificial neural networks

Getting Started # acquire source code and compile git clone https://github.com/attractivechaos/kann cd kann; make # learn unsigned addition (30000 sam

Attractive Chaos 608 Aug 4, 2022