This code implements **global sequence alignment** using a **dynamic programming...

September 1, 2025 at 11:55 AM

def global_align(seqA,seqB): # Initiating variables m, n = len(seqA), len(seqB) # Be careful with indexing problems, such as +-1 errors. When in doubt, print. S,trace = initiate_global_dp(m,n) # Fill in the rest of the dynamic programming matrix, and the trace for i in range(1,m+1): for j in range(1,n+1): match = S[i-1,j-1] + match_score(seqA[i-1],seqB[j-1]) delete = S[i-1,j] + match_score(seqA[i-1],'-') insert = S[i,j-1] + match_score('-',seqB[j-1]) S[i,j] = max(match,delete,insert) if match >= max(insert,delete): trace[i,j,:] = (-1,-1,0) elif delete >= insert: trace[i,j,:] = (-1,0,0) else: trace[i,j,:] = (0,-1,0) score_of_the_alignment = S[m,n] return S, trace, score_of_the_alignment

This code implements global sequence alignment using a dynamic programming (DP) approach. It is often used in bioinformatics for comparing biological sequences like DNA, RNA, or proteins. Below is a breakdown of what this code does:


1. Define Input:

The function global_align takes two sequences as input: seqA and seqB.

2. Set Up Variables:

  • m and n store the lengths of seqA and seqB respectively.
  • S is a dynamic programming matrix that stores the scores of alignments.
  • trace is a matrix to trace back the alignment decisions.

initiate_global_dp(m, n) is assumed to initialize two matrices S (score matrix) and trace. Typically:

  • S is initialized with gap penalties in the first row/column.
  • trace helps reconstruct the alignment later.

3. Dynamic Programming to Compute Scores:

The core of the function is the nested loops over i (index of seqA) and j (index of seqB).

  • For every position (i, j) in the alignment:

    • Compute three scores representing possible choices:
      • match: Align the characters seqA[i] and seqB[j].
      • delete: Insert a gap in seqB (delete a character from seqA).
      • insert: Insert a gap in seqA.
    • These scores are computed using a function match_score, which is assumed to take two characters and return a match/mismatch/gap penalty score.
  • Compute the maximum score among match, delete, and insert, and update S[i,j].


4. Traceback Matrix:

The trace matrix stores the direction of movement that led to the chosen score:

  • (-1, -1, 0) for a match or mismatch (diagonal).
  • (-1, 0, 0) for a gap in seqB (move vertically).
  • (0, -1, 0) for a gap in seqA (move horizontally).

This will later help in reconstructing the alignment.


5. Alignment Score:

The final alignment score is stored in S[m, n], which represents the optimal alignment score for the full sequences seqA and seqB.


6. Return Values:

The function returns:

  • S: The score matrix.
  • trace: The traceback matrix.
  • score_of_the_alignment: The final alignment score.

Key Assumptions:

  • initiate_global_dp(m, n) initializes S and trace.
  • match_score correctly scores matches, mismatches, and gaps.
  • Gaps are represented by '-'.

What This Code Does:

  • It calculates the optimal global sequence alignment score and traceback for two input sequences using dynamic programming.
Generate your own explanations
Download our vscode extension
Read other generated explanations

Built by @thebuilderjr
Sponsored by beam analytics
Read our terms and privacy policy
Forked from openai-quickstart-node