lina wrote: > On Wed, Nov 2, 2011 at 12:14 AM, Peter Otten <__pete...@web.de> wrote: >> lina wrote: >> >>>> sorted(new_dictionary.items()) >>> >>> Thanks, it works, but there is still a minor question, >>> >>> can I sort based on the general numerical value? >>> >>> namely not: >>> : >>> : >>> 83ILE 1 >>> 84ALA 2 >>> 8SER 0 >>> 9GLY 0 >>> : >>> : >>> >>> rather 8 9 ...83 84, >>> >>> Thanks, >> >> You need a custom key function for that one: >> >>>>> import re >>>>> def gnv(s): >> ... parts = re.split(r"(\d+)", s) >> ... parts[1::2] = map(int, parts[1::2]) >> ... return parts >> ... >>>>> items = [("83ILE", 1), ("84ALA", 2), ("8SER", 0), ("9GLY", 0)] >>>>> sorted(items, key=lambda pair: (gnv(pair[0]), pair[1])) >> [('8SER', 0), ('9GLY', 0), ('83ILE', 1), ('84ALA', 2)] > > > Thanks, I can follow the procedure and get the exact results, but > still don't understand this part > > parts = re.split(r'"(\d+)",s) > > r"(\d+)", sorry, > >>>> items > [('83ILE', 1), ('84ALA', 2), ('8SER', 0), ('9GLY', 0)] > > >>>> parts = re.split(r"(\d+)",items) > Traceback (most recent call last): > File "<pyshell#78>", line 1, in <module> > parts = re.split(r"(\d+)",items) > File "/usr/lib/python3.2/re.py", line 183, in split > return _compile(pattern, flags).split(string, maxsplit) > TypeError: expected string or buffer
I was a bit lazy and hoped you would accept the gnv() function as a black box... Here's a step-through: re.split() takes a pattern where to split the string and a string. In the following example the pattern is the character "_": >>> re.split("_", "alpha_beta___gamma") ['alpha', 'beta', '', '', 'gamma'] You can see that this simple form works just like "alpha_beta___gamma".split("_"), and finds an empty string between two adjacent "_". If you want both "_" and "___" to work as a single separator you can change the pattern to "_+", where the "+" means one or more of the previous: >>> re.split("_+", "alpha_beta___gamma") ['alpha', 'beta', 'gamma'] If we want to keep the separators, we can wrap the whole expression in parens: >>> re.split("(_+)", "alpha_beta___gamma") ['alpha', '_', 'beta', '___', 'gamma'] Now for the step that is a bit unobvious: we can change the separator to include all digits. Regular expressions have two ways to spell "any digit": [0-9] or \d: >>> re.split("([0-9]+)", "alpha1beta123gamma") ['alpha', '1', 'beta', '123', 'gamma'] I chose the other (which will also accept non-ascii digits) >>> re.split(r"(\d+)", "alpha1beta123gamma") ['alpha', '1', 'beta', '123', 'gamma'] At this point we are sure that the list contains a sequence of non-integer- str, integer-str, ..., non-integer-str, the first and the last always being a non-integer str. >>> parts = re.split(r"(\d+)", "alpha1beta123gamma") So >>> parts[1::2] ['1', '123'] will always give us the parts that can be converted to an integer >>> parts ['alpha', '1', 'beta', '123', 'gamma'] >>> parts[1::2] = map(int, parts[1::2]) >>> parts ['alpha', 1, 'beta', 123, 'gamma'] We need to do the conversion because strings won't sort the way we like: >>> sorted(["2", "20", "10"]) ['10', '2', '20'] >>> sorted(["2", "20", "10"], key=int) ['2', '10', '20'] We now have the complete gnv() function >>> def gnv(s): ... parts = re.split(r"(\d+)", s) ... parts[1::2] = map(int, parts[1::2]) ... return parts ... and can successfully sort a simple list of strings like >>> values = ["83ILE", "84ALA", "8SER", "9GLY"] >>> sorted(values, key=gnv) ['8SER', '9GLY', '83ILE', '84ALA'] The sorted() function calls gnv() internally for every item in the list and uses the results to determine the order of the items. When sorted()/list.sort() did not feature the key argument you could do this manually with "decorate sort undecorate": >>> decorated = [(gnv(item), item) for item in values] >>> decorated [(['', 83, 'ILE'], '83ILE'), (['', 84, 'ALA'], '84ALA'), (['', 8, 'SER'], '8SER'), (['', 9, 'GLY'], '9GLY')] >>> decorated.sort() >>> decorated [(['', 8, 'SER'], '8SER'), (['', 9, 'GLY'], '9GLY'), (['', 83, 'ILE'], '83ILE'), (['', 84, 'ALA'], '84ALA')] >>> undecorated ['8SER', '9GLY', '83ILE', '84ALA'] For your actual data >>> items [('83ILE', 1), ('84ALA', 2), ('8SER', 0), ('9GLY', 0)] you need to extract the first from an (x, y) pair >>> def first_gnv(item): ... return gnv(item[0]) ... >>> first_gnv(("83ILE", 1)) ['', 83, 'ILE'] but what if there are items with the same x? In that case the order is undefined: >>> sorted([("83ILE", 1), ("83ILE", 2)], key=first_gnv) [('83ILE', 1), ('83ILE', 2)] >>> sorted([("83ILE", 2), ("83ILE", 1)], key=first_gnv) [('83ILE', 2), ('83ILE', 1)] Let's take y into account, too: >>> def first_gnv(item): ... return gnv(item[0]), item[1] ... >>> sorted([("83ILE", 1), ("83ILE", 2)], key=first_gnv) [('83ILE', 1), ('83ILE', 2)] >>> sorted([("83ILE", 2), ("83ILE", 1)], key=first_gnv) [('83ILE', 1), ('83ILE', 2)] We're done! >>> sorted(items, key=first_gnv) [('8SER', 0), ('9GLY', 0), ('83ILE', 1), ('84ALA', 2)] (If you look back into my previous post, can you find the first_gnv() function?) _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor