Longest Common Substring in Python

The longest common substring is a fundamental problem in strings. It attempts to locate the most extended continuous character sequence in all input strings given two or more strings. Numerous applications of this issue exist in bioinformatics, plagiarism detection, text analysis, and other fields.

Dynamic programming is widely used to solve the longest common substring issue efficiently. The method entails building a 2D table to hold the lengths of common suffixes for every pair of substrings that may be formed from the input strings. The table may be updated and characters compared to determine the longest common substring.

Each character in the strings is iterated over by the algorithm, which then fills the table appropriately. It records the longest common substring seen thus far, its maximum length, and its final location. Using this knowledge, the longest common substring may be recovered once all characters have been processed.

The dynamic programming technique has an O(m * n) time complexity, where m and n are the lengths of the input strings. Additionally, the space complexity is O(m * n).

Example1:

Let String1= aabbcdeef

       String2= abbcdefgh

The longest common substring is abbcde(i.e., aabbcdeef, abbcdefgh)

Example2:

Let String1= abcdefghij

       String2= defghij

The longest common substring is defghij (i.e. abcdefghij, defghij)

Example3:

Let String1= Sweet home is the best place

       String2= It is the best car to buy 

The longest common substring: is the best (i.e., Sweet home is the best place, It is the best car to buy)

Implementation Code in Python

def longest_common_substring(str1, sub):

    # Create a 2D matrix to store the lengths of the longest common suffixes of substrings.

    m = [[0] * (1 + len(sub)) for i in range(1 + len(str1))]

    longest, x_longest = 0, 0

    for x in range(1, 1 + len(str1)):

        for y in range(1, 1 + len(sub)):

            if str1[x - 1] == sub[y - 1]:

# If the characters match, update the matrix value and check for a longer common substring.

m[x][y] = m[x - 1][y - 1] + 1

if m[x][y] > longest:

longest = m[x][y]

x_longest = x

else:

# Change the matrix value to zero if the characters don't match.

m[x][y] = 0

    # Extract the longest common substring using the ending position and the length.

common_substring = str1[x_longest - longest: x_longest]

    return common_substring

# Example usage:

str1 = "Welcome to The world"

print("The given string is:")

print(str1)

# It prints the longest common substring

sub1 = "world"

print("The first sub-string is:")

print(sub1)

print("The longest common subsequence between '", str1, "' and '", sub1, "' is:")

print(longest_common_substring(str1, sub1))

# It prints the longest common substring

sub2 = "12345"

print("The second sub-string is:")

print(sub2)

print("The longest common subsequence between '", str1, "' and '", sub2, "' is:")

print(longest_common_substring(str1, sub2))

# It prints the longest common substring

Output:

Longest Common Substring in Python

Explanation:

Two input arguments, str1 (the primary string) and sub (the sub-part to look for in str1), are needed by the method longest_common_substring.

  • To hold the lengths of the longest regular suffixes of substrings, a 2D matrix called m is built. (len(str1) + 1) x (len(sub) + 1) are its dimensions.
  • Initial values for the variables longest and x_longest are 0. The length of the longest common substring and its placement inside the main string will be tracked using these variables.
  • Each character in str1 and sub is iterated by using two nested loops.
  • It determines if each character pair is identical (a match) to each other.
  • If the characters match, it modifies the value in matrix m according to the length of the common suffix that ends at str1[x - 1] and sub[y - 1] if the characters match.
  • A longer common substring is discovered if the revised length (m[x][y]) is longer than the most extended length currently available. Longest and x_longest are consequently adjusted appropriately.
  • Because a common substring must be continuous, the matrix value m[x][y] is set to zero if the characters don't match.
  • The ending location (x_longest) and length of the longest common substring (longest) are used to separate the longest common substring from the main string str1.
  • The common_substring is the result of the function.
  • The sample shows how to use different parts of the primary string str1 to access the longest_common_substring function.
  • The longest common substring between str1 and each sub will be identified by the function and printed.