Re: [Tutor] Printing regular expression match

Srinivas Iyyer Sat, 03 Dec 2005 17:29:02 -0800

Hi Danny, 
thanks for your email. 

In the example I've shown, there are no odd elements
except for character case.

In the real case I have a list of 100 gene names for
Humans. 
The human gene names are conventioanlly represented in
higher cases (eg.DDX3X).  However, NCBI's gene_info
dataset the gene names are reported in lowercase (eg.
ddx3x).  I want to extract the rest of the information
for DDX3X that I have from NCBI's file (given that
dataset is in tab delim format). 

my approach was if i can define DDX3X is identical
ddx3x then I want to print that line from the other
list (NCBI's gene_info dataset). 

I guess, I understood your suggestion wrongly.  In
such case, why do I have to drop something from list b
(which is over 150 K lines). If I can create a sublist
of all elements in b (a small list of 100) then it is
more easy. this is my opinion. 

-srini

--- Danny Yoo <[EMAIL PROTECTED]> wrote:

> 
> 
> On Sat, 3 Dec 2005, Srinivas Iyyer wrote:
> > >>> a
> > ['apple', 'boy', 'boy', 'apple']
> >
> > >>> b
> > ['Apple', 'BOY', 'APPLE-231']
> >
> > >>> for i in a:
> >     pat = re.compile(i,re.IGNORECASE)
> >     for m in b:
> >             if pat.match(m):
> >                     print m
> 
> 
> Hi Srinivas,
> 
> We may want to change the problem so that it's less
> focused on "print"ing
> results directly.  We can rephrase the question as a
> list "filtering"
> operation: we want to keep the elements of b that
> satisfy a certain
> criteron.
> 
> 
> Let's give a name to that criterion now:
> 
> ######
> def doesNameMatchSomePrefix(word, prefixes):
>     """Returns True if the input word is matched by
> some prefix in
>     the input list of prefixes.  Otherwise, returns
> False."""
>     # ... fill me in
> 
> ######
> 
> 
> Can you write doesNameMatchSomePrefix()?  In fact,
> you might not even need
> regexes to write an initial version of it.
> 
> 
> 
> If you can write that function, then what you're
> asking:
> 
> > I do not want python to print both elenents from
> lists a and b.  I just
> > want only the elements in the list B.
> 
> should not be so difficult: it'll be a
> straightforward loop across b,
> using that helper function.
> 
> 
> 
> (Optimization can be done to make
> doesNameMatchSomePrefix() fast, but you
> probably should concentrate on correctness first. 
> If you're interested in
> doing something like this for a large number of
> prefixes, you might be
> interested in:
> 
>    
>
http://hkn.eecs.berkeley.edu/~dyoo/python/ahocorasick/
> 
> which has more details and references to specialized
> modules that attack
> the problem you've shown us so far.)
> 
> 
> Good luck!
> 
> 

__________________________________________ 
Yahoo! DSL  Something to write home about. 
Just $16.99/mo. or less. 
dsl.yahoo.com 

_______________________________________________
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Printing regular expression match

Reply via email to