On Mon, 1 Feb 2010 16:30:02 +0100 Norman Khine <nor...@khine.net> wrote:
> On Mon, Feb 1, 2010 at 1:19 PM, Kent Johnson <ken...@tds.net> wrote: > > On Mon, Feb 1, 2010 at 6:29 AM, Norman Khine <nor...@khine.net> wrote: > > > >> thanks, what about the whitespace problem? > > > > \s* will match any amount of whitespace includin newlines. > > thank you, this worked well. > > here is the code: > > ### > import re > file=open('producers_google_map_code.txt', 'r') > data = repr( file.read().decode('utf-8') ) > > block = re.compile(r"""openInfoWindowHtml\(.*?\\ticon: myIcon\\n""") > b = block.findall(data) > block_list = [] > for html in b: > namespace = {} > t = re.compile(r"""<strong>(.*)<\/strong>""") > title = t.findall(html) > for item in title: > namespace['title'] = item > u = re.compile(r"""a href=\"\/(.*)\">En savoir plus""") > url = u.findall(html) > for item in url: > namespace['url'] = item > g = re.compile(r"""GLatLng\((\-?\d+\.\d*)\,\\n\s*(\-?\d+\.\d*)\)""") > lat = g.findall(html) > for item in lat: > namespace['LatLng'] = item > block_list.append(namespace) > > ### > > can this be made better? The 3 regex patterns are constants: they can be put out of the loop. You may also rename b to blocks, and find a more a more accurate name for block_list; eg block_records, where record = set of (named) fields. A short desc and/or example of the overall and partial data formats can greatly help later review, since regex patterns alone are hard to decode. The def of "namespace" would be clearer imo in a single line: namespace = {title:t, url:url, lat:g} This also reveals a kind of name confusion, doesn't it? Denis ________________________________ la vita e estrany http://spir.wikidot.com/ _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor