On Tue, Oct 16, 2012 at 6:40 AM, eryksun <eryk...@gmail.com> wrote: > On Mon, Oct 15, 2012 at 1:17 PM, Norman Khine <nor...@khine.net> wrote: >> >> i made an update: https://gist.github.com/3891927 which works based on >> some of the recommendations posted here. >> >> any suggestions for improvement? > > I can't make any specific recommendations about using BeautifulSoup > since I haven't used it in a long time. Here are a few general > suggestions. > > PEP 8 recommends using 4-space indents instead of tabs, if possible. > > Move constant definitions out of loops. For example, there's no point > to repeatedly building the dict literal and assigning it to "headers" > for each page_no. Also, "headers" isn't used. You need to make a > Request object: > > http://docs.python.org/library/urllib2#urllib2.Request > > > Don't put unnecessary parentheses (round brackets) around assigned > expressions. For example, replace > > req = ('%s%s' % (origin_site, page_no)) > ... > assoc_link = (tag.string) > > with > > req = '%s%s' % (origin_site, page_no) > ... > assoc_link = tag.string > > I'd also move the "%s" into origin_site: > > origin_site = ('http://typo3.nimes.fr/index.php?' > 'id=annuaire_assos&theme=0&rech=&num_page=%s') > > req = origin_site % page_no > > The parentheses around the string literal are necessary because it's > split over two lines. (I use Gmail's webmail interface, which line > wraps at 69 columns -- sometimes; it tries to be clever.) > > Use "continue" to avoid unnecessary nesting: > > for page_no in pages: > ... > try: > doc = urllib2.urlopen(req) > except urllib2.URLError, e: > continue > soup = BeautifulSoup.BeautifulSoup(doc) > for row in soup.findAll('tr', {'class': 'menu2'}): > assoc_data = [] > ... > for i in soup.findAll('a', {'href': '#'}): > if 'associations' not in i.attrMap['onclick']: > continue > req = i.attrMap['onclick'].split("'")[1] > try: > doc = urllib2.urlopen(req) > except urllib2.URLError, e: > continue > soup = BeautifulSoup.BeautifulSoup(doc) > > > Don't test "if emails != []". Use "if emails" instead. A non-empty > list is always True, and an empty list is always False. > > Finally, in several places you create an empty list and populate it > with a for loop. For simple cases like these, using comprehensions and > generator expressions executes faster, yet it's still easy to code and > easy to understand. For example, you can replace the following: > > assoc_data = [] > for assoc_theme in soup.findAll('u'): > assoc_data.append(assoc_theme.renderContents()) > for assoc_name in soup.findAll('td', {'width': '70%'}): > assoc_data.append(assoc_name.renderContents()) > > with something like this: > > assoc_data = [assoc_theme.renderContents() > for assoc_theme in soup.findAll('u')] > assoc_data.extend(assoc_name.renderContents() > for assoc_name in soup.findAll('td', {'width': '70%'})) > > At least to me this is just as readable as the original, and the > generated code is more efficient. If, however, BeautifulSoup is the > limiting factor here, the efficiency gain will be chump change in the > big picture. Still, in a simple case like this it's hard to argue > against using a comprehension or generator expression. In cases that > use complex expressions for the value and conditions, I think it makes > more sense to use a for loop, which is easier to read and debug.
thanks, i made the changes https://gist.github.com/3891927 -- %>>> "".join( [ {'*':'@','^':'.'}.get(c,None) or chr(97+(ord(c)-83)%26) for c in ",adym,*)&uzq^zqf" ] ) _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor