Re: [Tutor] longest common substring

Andreas Perstinger Fri, 11 Nov 2011 05:12:46 -0800

On 2011-11-11 05:14, lina wrote:

def LongestCommonSubstring(S1, S2):
     M = [[0]*(1+len(S2)) for i in xrange(1+len(S1))] ## creat 4*5 matrix
     longest, x_longest = 0, 0
     for x in xrange(1,1+len(S1)):                 ## read each row
         for y in xrange(1,1+len(S2)):             ## read each coloumn
             if S1[x-1] == S2[y-1]:
                 M[x][y] = M[x-1][y-1]+1
                 if M[x][y]>  longest:
                     longest = M[x][y]
                     x_longest = x
                 else:
                     M[x][y] = 0
     return S1[x_longest-longest:x_longest]


That's still not the right version.

If you compare your version to the one at wikibooks (http://en.wikibooks.org/wiki/Algorithm_Implementation/Strings/Longest_common_substring#Python), you'll see that the else-branch is wrongly indented (one level toodeep). It belongs to the first if-comparison:


if S1 ...
     M[x][y] ...
     if M[x][y] ...
        ...
else: ...

if __name__=="__main__":

     a=open("atom-pair_4.txt","r").readline().strip()

     b=open("atom-pair_8.txt","r").readline().strip()

     
print(LongestCommonSubstring(LongestCommonSubstring(a,a),LongestCommonSubstring(b,b)))

                                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^
??? What do you try to accomplish here ???

You call "LongestCommonSubstring" with identical strings, thus theresult must be the same string.

Why not

print(LongestCommonSubstring(a, b))

as you did the line below with "c" and "d"?

Further more I think that your problems start with bad data files. Inevery file there is just one very long line which looks like a stringrepresentation of a list of two-digits strings. This complicates furtherprocessing because you have to deal with all the unnecessary commas,blanks and single quotes between your numbers and the square brackets atthe beginning and the end of the line.

  $ python3 LongestCommonSubstring.py
2189
['
['82']

The results are wrong.
c, d are the string from file atom-pair_4,txt, exactly the same as a,
d is the same as b.

and even for (c,d) results are not correct, visually we can see some
similar groups, not mention the longest groups.

And even if you use the correct function from wikibooks I can anticipateanother problem :-)The implementation from wikibooks just returns the first commonsubstring which it finds in the first string:


>>> WikibooksLongestCommonSubstring("ABAB","BABA")
'ABA'
>>> WikibooksLongestCommonSubstring("BABA", "ABAB")
'BAB'

If there are more possible substrings with the same length (as in theexample above) only the first one is returned.

But in your example there are at least two different pathways (I'vefound three) which have the same length, as changing the order of theparameters will show you:


>>> WikibooksLongestCommonSubstring(c, d)
['61', '70', '61']
>>> WikibooksLongestCommonSubstring(d, c)
['83', '61', '83']

Bye, Andreas

_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] longest common substring

Reply via email to