On Mon, 1 Feb 2010 16:30:02 +0100
Norman Khine <nor...@khine.net> wrote:

> On Mon, Feb 1, 2010 at 1:19 PM, Kent Johnson <ken...@tds.net> wrote:
> > On Mon, Feb 1, 2010 at 6:29 AM, Norman Khine <nor...@khine.net> wrote:
> >
> >> thanks, what about the whitespace problem?
> >
> > \s* will match any amount of whitespace includin newlines.
> 
> thank you, this worked well.
> 
> here is the code:
> 
> ###
> import re
> file=open('producers_google_map_code.txt', 'r')
> data =  repr( file.read().decode('utf-8') )
> 
> block = re.compile(r"""openInfoWindowHtml\(.*?\\ticon: myIcon\\n""")
> b = block.findall(data)
> block_list = []
> for html in b:
>       namespace = {}
>       t = re.compile(r"""<strong>(.*)<\/strong>""")
>       title = t.findall(html)
>       for item in title:
>               namespace['title'] = item
>       u = re.compile(r"""a href=\"\/(.*)\">En savoir plus""")
>       url = u.findall(html)
>       for item in url:
>               namespace['url'] = item
>       g = re.compile(r"""GLatLng\((\-?\d+\.\d*)\,\\n\s*(\-?\d+\.\d*)\)""")
>       lat = g.findall(html)
>       for item in lat:
>               namespace['LatLng'] = item
>       block_list.append(namespace)
> 
> ###
> 
> can this be made better?

The 3 regex patterns are constants: they can be put out of the loop.

You may also rename b to blocks, and find a more a more accurate name for 
block_list; eg block_records, where record = set of (named) fields.

A short desc and/or example of the overall and partial data formats can greatly 
help later review, since regex patterns alone are hard to decode.

The def of "namespace" would be clearer imo in a single line:
    namespace = {title:t, url:url, lat:g}
This also reveals a kind of name confusion, doesn't it?


Denis




________________________________

la vita e estrany

http://spir.wikidot.com/
_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Reply via email to