On 2011-11-11 05:14, lina wrote:
def LongestCommonSubstring(S1, S2):
M = [[0]*(1+len(S2)) for i in xrange(1+len(S1))] ## creat 4*5 matrix
longest, x_longest = 0, 0
for x in xrange(1,1+len(S1)): ## read each row
for y in xrange(1,1+len(S2)): ## read each coloumn
if S1[x-1] == S2[y-1]:
M[x][y] = M[x-1][y-1]+1
if M[x][y]> longest:
longest = M[x][y]
x_longest = x
else:
M[x][y] = 0
return S1[x_longest-longest:x_longest]
That's still not the right version.
If you compare your version to the one at wikibooks (
http://en.wikibooks.org/wiki/Algorithm_Implementation/Strings/Longest_common_substring#Python
), you'll see that the else-branch is wrongly indented (one level too
deep). It belongs to the first if-comparison:
if S1 ...
M[x][y] ...
if M[x][y] ...
...
else: ...
if __name__=="__main__":
a=open("atom-pair_4.txt","r").readline().strip()
b=open("atom-pair_8.txt","r").readline().strip()
print(LongestCommonSubstring(LongestCommonSubstring(a,a),LongestCommonSubstring(b,b)))
^^^^^^^^^^^^^^^^^^^^^^^^^^^
??? What do you try to accomplish here ???
You call "LongestCommonSubstring" with identical strings, thus the
result must be the same string.
Why not
print(LongestCommonSubstring(a, b))
as you did the line below with "c" and "d"?
Further more I think that your problems start with bad data files. In
every file there is just one very long line which looks like a string
representation of a list of two-digits strings. This complicates further
processing because you have to deal with all the unnecessary commas,
blanks and single quotes between your numbers and the square brackets at
the beginning and the end of the line.
$ python3 LongestCommonSubstring.py
2189
['
['82']
The results are wrong.
c, d are the string from file atom-pair_4,txt, exactly the same as a,
d is the same as b.
and even for (c,d) results are not correct, visually we can see some
similar groups, not mention the longest groups.
And even if you use the correct function from wikibooks I can anticipate
another problem :-)
The implementation from wikibooks just returns the first common
substring which it finds in the first string:
>>> WikibooksLongestCommonSubstring("ABAB","BABA")
'ABA'
>>> WikibooksLongestCommonSubstring("BABA", "ABAB")
'BAB'
If there are more possible substrings with the same length (as in the
example above) only the first one is returned.
But in your example there are at least two different pathways (I've
found three) which have the same length, as changing the order of the
parameters will show you:
>>> WikibooksLongestCommonSubstring(c, d)
['61', '70', '61']
>>> WikibooksLongestCommonSubstring(d, c)
['83', '61', '83']
Bye, Andreas
_______________________________________________
Tutor maillist - Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor