> I have a list of strings that contains slightly more than a 
> million items. Each item is a string of 8 capital letters like so:
>
> ['MIBMMCCO', 'YOWHHOY', ...]
>
> I need to check and see if the letters 'OFHCMLIP' are one of the items in the
> list but there is no way to tell in what order the letters will appear. So I
> can't just search for the string 'OFHCMLIP'. I just need to locate any strings
> that are made up of those letters no matter their order.
>
> I suppose I could loop over the list and loop over each item using a bunch of
> if statements exiting the inner loop as soon as I find a letter is not in the
> string, but there must be a better way.
>
> I'd appreciate hearing about a better way to attack this.
>
> thanks,  Jim

If I only had to do this once, over only a million items (given 
today's CPU power), so I'd probably do something like the below 
using sets.  I couldn't tell from your text whether you wanted to 
see all of the entries in 'OFHCMLIP' in each entry or if you wanted 
to see only that some subset were present.  So, here's a script that 
will produce a partial match and exact match.

Note, I made a 9-character string, too because you had a 7-character 
string as your second sample -- mostly to point out that the 
9-character string satisfies an exact match although it sports an 
extra character.

  farm = ['MIBMMCCO', 'YOWHHOY', 'OFHCMLIP', 'OFHCMLIPZ', 'FHCMLIP', 'NEGBQJKR']
  needle = set('OFHCMLIP')
  for haystack in farm:
      partial = needle.intersection(haystack)
      exact = needle.intersection(haystack) == needle
      print(haystack, exact, ''.join(sorted(partial)))

On the other hand, there are probably lots of papers on how to do 
this much more efficiently.

-Martin

MIBMMCCO False CIMO
YOWHHOY False HO
OFHCMLIP True CFHILMOP
OFHCMLIPZ True CFHILMOP
FHCMLIP False CFHILMP
NEGBQJKR False 


-- 
Martin A. Brown
http://linux-ip.net/
_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

Reply via email to