Exact(4)
In the prefix phase, a pair of reads is reported if the prefix of one of the reads exactly matches a substring of the other read at the given seed length.
For i = 6, we want to find the longest common substring starting at S A [ 6 ] = 10 (marked by an arrow) that exactly matches a substring starting at some position in the other sequence.
In the example below, for position i = 4 in S1 and with k = 2 mismatches, our approach would return the following k-mismatch common substring, starting at position j = 2 in S2: To obtain this k-mismatch common substring, our program would first determine the longest common substring for position i = 4 in S1 that exactly matches a substring in S2.
As explained in Section 2.2, kmacs searches for each position i in one sequence the maximum substring starting at i that matches a substring in the second sequence.
Similar(56)
To do so, we first calculate for each position i in S1 the length s 1 (i ) of the longest common substring starting at i matching a substring of S2, as is done in ACS.
Consider, e.g. position i = 2 in the first sequence of the above example: Here, the substring AT starting at position 2 in S1 is the longest substring starting at this position and matching a substring of S2 but this substring occurs at positions 1, 5 and 10 in S2.
Formally, the length of the longest substring starting at a position SA[ i] and matching a substring of the respective other sequence is given as follows: (4) s (S A [ i ] ) = m a x (min p 1 (i ) < x ≤ i L C P [ x ], min i < y ≤ p 2 (i ) L C P [ y ] ) with p1 and p2 defined as above.
For a single sequence S and a position SA[ i] in S, the enhanced suffix array of S can be used to find the length of the longest substring in S starting at a different position in S and matching a substring starting at SA[ i].
To find the length of the longest substring starting at SA[ i] in one sequence, matching a substring of the other sequence, and its occurrences there, we need to look up the largest integer p 1 (i ) with p 1 (i ) < i, such that SA [ p 1 (i ) ] belongs to the other sequence.
To find possible additional matching positions, we consider all indices p ≤ p 1 (i ) in descending order, as long as one has the following inequality: L C P [ p + 1 ] ≤ min p 1 (i ) < x ≤ i L C P [ x ] For all such p that belong to the other sequence, the positions SA[ p] are occurrences of longest substrings matching a substring starting at i.
According to Equation (4), we get the following: s (S A [ 6 ] ) = max { min { 5, 3 }, min { 1, 0 } } = max { 3, 0 } = 3. Position 10 in S corresponds to position 3 in the original sequence S2, so, as a result, we obtain s 2 (3 ) = 3, i.e. the longest substring starting at position 3 in S2 matching a substring from S1 has length 3 (the substring itself is ' ana').
Write better and faster with AI suggestions while staying true to your unique style.
Since I tried Ludwig back in 2017, I have been constantly using it in both editing and translation. Ever since, I suggest it to my translators at ProSciEditing.

Justyna Jupowicz-Kozak
CEO of Professional Science Editing for Scientists @ prosciediting.com