> anchors = soup.findAll('a', { 'name' : re.compile('^A.*$')})
> for x in anchors:
>    print x
>    x = x.next
>    while getattr(x, 'name') != 'a':
>      print x

> And get into endless loops. I can't help thinking there are simple and 
> obvious ways to do this, probably many, but as a rank beginner, they are 
> escaping me.


Hi Jon,

Whenever I hear "infinite loop", I look for "while" loops.  There's 
something funky with the while loop in the above code.

     while getattr(x, 'name') != 'a':
         print x

If we assume that the test of while loop ever succeeds, then there's a 
problem: when does the test ever fail?  Nothing in the body of the for 
loop does anything to change the situation.  So that part doesn't quite 
work.  So, for the moment, strip out the while loop.  Let's simplify the 
behavior so that it only shows the anchors:

############################################################
anchors = soup.findAll('a', { 'name' : re.compile('^A.*$')})
for anchor in anchors:
     print anchor
############################################################

This shouldn't raise any infinite loops.



>From your question, it sounds like you want to get a list of the sibling 
elements after each particular anchor.  The documentation at:

http://www.crummy.com/software/BeautifulSoup/documentation.html#nextSibling%20and%20previousSibling

doesn't make this as clear as I'd like, but what you want is probably not 
the 'next' attribute of an object, but a 'nextSibling' attribute.


You might find the following definitions helpful:

#############################################################
def get_siblings_to_next_anchor(anchor):
     """Anchor Tag -> element list

     Given an anchor element, returns all the nextSiblings elements up to
     (but not including) the next anchor as a list of either Tags or
     NavigatableStrings."""

     elt = anchor.nextSibling
     results = []
     while (elt != None) and (not is_anchor(elt)):
         results.append(elt)
         elt = elt.nextSibling
     return results


def is_anchor(elt):
     """element -> boolean
     Returns true if the element is an anchor Tag."""

     if isinstance(elt, NavigableString):
         return False
     else:
         return elt.name == 'a'
#############################################################

They should help you get the results you want.


Good luck!
_______________________________________________
Tutor maillist  -  [EMAIL PROTECTED]
http://mail.python.org/mailman/listinfo/tutor

Reply via email to