On 2011-11-11 05:14, lina wrote:
def LongestCommonSubstring(S1, S2):
     M = [[0]*(1+len(S2)) for i in xrange(1+len(S1))] ## creat 4*5 matrix
     longest, x_longest = 0, 0
     for x in xrange(1,1+len(S1)):                 ## read each row
         for y in xrange(1,1+len(S2)):             ## read each coloumn
             if S1[x-1] == S2[y-1]:
                 M[x][y] = M[x-1][y-1]+1
                 if M[x][y]>  longest:
                     longest = M[x][y]
                     x_longest = x
                 else:
                     M[x][y] = 0
     return S1[x_longest-longest:x_longest]

That's still not the right version.

If you compare your version to the one at wikibooks ( http://en.wikibooks.org/wiki/Algorithm_Implementation/Strings/Longest_common_substring#Python ), you'll see that the else-branch is wrongly indented (one level too deep). It belongs to the first if-comparison:

if S1 ...
     M[x][y] ...
     if M[x][y] ...
        ...
else: ...


if __name__=="__main__":

     a=open("atom-pair_4.txt","r").readline().strip()

     b=open("atom-pair_8.txt","r").readline().strip()

     
print(LongestCommonSubstring(LongestCommonSubstring(a,a),LongestCommonSubstring(b,b)))
                                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^
??? What do you try to accomplish here ???
You call "LongestCommonSubstring" with identical strings, thus the result must be the same string.
Why not

print(LongestCommonSubstring(a, b))

as you did the line below with "c" and "d"?

Further more I think that your problems start with bad data files. In every file there is just one very long line which looks like a string representation of a list of two-digits strings. This complicates further processing because you have to deal with all the unnecessary commas, blanks and single quotes between your numbers and the square brackets at the beginning and the end of the line.


  $ python3 LongestCommonSubstring.py
2189
['
['82']

The results are wrong.
c, d are the string from file atom-pair_4,txt, exactly the same as a,
d is the same as b.

and even for (c,d) results are not correct, visually we can see some
similar groups, not mention the longest groups.

And even if you use the correct function from wikibooks I can anticipate another problem :-) The implementation from wikibooks just returns the first common substring which it finds in the first string:

>>> WikibooksLongestCommonSubstring("ABAB","BABA")
'ABA'
>>> WikibooksLongestCommonSubstring("BABA", "ABAB")
'BAB'

If there are more possible substrings with the same length (as in the example above) only the first one is returned.

But in your example there are at least two different pathways (I've found three) which have the same length, as changing the order of the parameters will show you:

>>> WikibooksLongestCommonSubstring(c, d)
['61', '70', '61']
>>> WikibooksLongestCommonSubstring(d, c)
['83', '61', '83']

Bye, Andreas

_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Reply via email to