On Fri, Nov 11, 2011 at 9:10 PM, Andreas Perstinger <andreas.perstin...@gmx.net> wrote: > On 2011-11-11 05:14, lina wrote: >> >> def LongestCommonSubstring(S1, S2): >> M = [[0]*(1+len(S2)) for i in xrange(1+len(S1))] ## creat 4*5 matrix >> longest, x_longest = 0, 0 >> for x in xrange(1,1+len(S1)): ## read each row >> for y in xrange(1,1+len(S2)): ## read each coloumn >> if S1[x-1] == S2[y-1]: >> M[x][y] = M[x-1][y-1]+1 >> if M[x][y]> longest: >> longest = M[x][y] >> x_longest = x >> else: >> M[x][y] = 0 >> return S1[x_longest-longest:x_longest] > > That's still not the right version. > > If you compare your version to the one at wikibooks ( > http://en.wikibooks.org/wiki/Algorithm_Implementation/Strings/Longest_common_substring#Python > ), you'll see that the else-branch is wrongly indented (one level too deep). > It belongs to the first if-comparison: > > if S1 ... > M[x][y] ... > if M[x][y] ... > ... > else: ...
Thanks, I was so careless and most important failure to give a deep understanding. > > >> if __name__=="__main__": >> >> a=open("atom-pair_4.txt","r").readline().strip() >> >> b=open("atom-pair_8.txt","r").readline().strip() >> >> >> print(LongestCommonSubstring(LongestCommonSubstring(a,a),LongestCommonSubstring(b,b))) > > ^^^^^^^^^^^^^^^^^^^^^^^^^^^ > ??? What do you try to accomplish here ??? > You call "LongestCommonSubstring" with identical strings, thus the result > must be the same string. Yes, the results is the same string, actually I want to remove the distractions of "," and as you noticed, the blanks and single quotes between numbers and the square brackets at the two terminal of the line. a=open("atom-pair_4.txt","r").readline().strip() read so many unnecessary and I have a trouble of removing those distractions. > Why not > > print(LongestCommonSubstring(a, b)) > > as you did the line below with "c" and "d"? > > Further more I think that your problems start with bad data files. In every > file there is just one very long line which looks like a string > representation of a list of two-digits strings. This complicates further > processing because you have to deal with all the unnecessary commas, blanks > and single quotes between your numbers and the square brackets at the > beginning and the end of the line. > > >> $ python3 LongestCommonSubstring.py >> 2189 >> [' >> ['82'] >> >> The results are wrong. >> c, d are the string from file atom-pair_4,txt, exactly the same as a, >> d is the same as b. >> >> and even for (c,d) results are not correct, visually we can see some >> similar groups, not mention the longest groups. > > And even if you use the correct function from wikibooks I can anticipate > another problem :-) > The implementation from wikibooks just returns the first common substring > which it finds in the first string: > >>>> WikibooksLongestCommonSubstring("ABAB","BABA") > 'ABA' >>>> WikibooksLongestCommonSubstring("BABA", "ABAB") > 'BAB' > > If there are more possible substrings with the same length (as in the > example above) only the first one is returned. You are right, I did not think of this parts before. and actually the initiative wish was to find possible paths, I mean, possible substrings, all possible substrings. not the longest one, but at least bigger than 3. > > But in your example there are at least two different pathways (I've found > three) which have the same length, as changing the order of the parameters > will show you: > >>>> WikibooksLongestCommonSubstring(c, d) > ['61', '70', '61'] >>>> WikibooksLongestCommonSubstring(d, c) > ['83', '61', '83'] > > Bye, Andreas Thanks for your time, Best regards, > > _______________________________________________ > Tutor maillist - Tutor@python.org > To unsubscribe or change subscription options: > http://mail.python.org/mailman/listinfo/tutor > _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor