On 07/25/2018 07:29 PM, Martin A. Brown wrote:

I have a list of strings that contains slightly more than a
million items. Each item is a string of 8 capital letters like so:

['MIBMMCCO', 'YOWHHOY', ...]

I need to check and see if the letters 'OFHCMLIP' are one of the items in the
list but there is no way to tell in what order the letters will appear. So I
can't just search for the string 'OFHCMLIP'. I just need to locate any strings
that are made up of those letters no matter their order.

I suppose I could loop over the list and loop over each item using a bunch of
if statements exiting the inner loop as soon as I find a letter is not in the
string, but there must be a better way.

I'd appreciate hearing about a better way to attack this.

thanks,  Jim

If I only had to do this once, over only a million items (given
today's CPU power), so I'd probably do something like the below
using sets.  I couldn't tell from your text whether you wanted to
see all of the entries in 'OFHCMLIP' in each entry or if you wanted
to see only that some subset were present.  So, here's a script that
will produce a partial match and exact match.

Note, I made a 9-character string, too because you had a 7-character
string as your second sample -- mostly to point out that the
9-character string satisfies an exact match although it sports an
extra character.

Sorry, that was a typo, they are all 8 characters in length.

   farm = ['MIBMMCCO', 'YOWHHOY', 'OFHCMLIP', 'OFHCMLIPZ', 'FHCMLIP', 
'NEGBQJKR']
   needle = set('OFHCMLIP')
   for haystack in farm:
       partial = needle.intersection(haystack)
       exact = needle.intersection(haystack) == needle
       print(haystack, exact, ''.join(sorted(partial)))

On the other hand, there are probably lots of papers on how to do
this much more efficiently.

-Martin

Thanks for your help. Steven came up with a solution that works well for me.

Regards,  Jim


_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

Reply via email to