> I have a list of strings that contains slightly more than a > million items. Each item is a string of 8 capital letters like so: > > ['MIBMMCCO', 'YOWHHOY', ...] > > I need to check and see if the letters 'OFHCMLIP' are one of the items in the > list but there is no way to tell in what order the letters will appear. So I > can't just search for the string 'OFHCMLIP'. I just need to locate any strings > that are made up of those letters no matter their order. > > I suppose I could loop over the list and loop over each item using a bunch of > if statements exiting the inner loop as soon as I find a letter is not in the > string, but there must be a better way. > > I'd appreciate hearing about a better way to attack this. > > thanks, Jim
If I only had to do this once, over only a million items (given today's CPU power), so I'd probably do something like the below using sets. I couldn't tell from your text whether you wanted to see all of the entries in 'OFHCMLIP' in each entry or if you wanted to see only that some subset were present. So, here's a script that will produce a partial match and exact match. Note, I made a 9-character string, too because you had a 7-character string as your second sample -- mostly to point out that the 9-character string satisfies an exact match although it sports an extra character. farm = ['MIBMMCCO', 'YOWHHOY', 'OFHCMLIP', 'OFHCMLIPZ', 'FHCMLIP', 'NEGBQJKR'] needle = set('OFHCMLIP') for haystack in farm: partial = needle.intersection(haystack) exact = needle.intersection(haystack) == needle print(haystack, exact, ''.join(sorted(partial))) On the other hand, there are probably lots of papers on how to do this much more efficiently. -Martin MIBMMCCO False CIMO YOWHHOY False HO OFHCMLIP True CFHILMOP OFHCMLIPZ True CFHILMOP FHCMLIP False CFHILMP NEGBQJKR False -- Martin A. Brown http://linux-ip.net/ _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor