By the way, my guess as to why this is working for me the way it does is that the statement
out_list = part_list is actually linking these two objects, making them one. My intention had been to just assign values from one to the other, but I think I have done far more than that. In this case, if that is true, then it has worked out well for me, giving me a feedback loop through the data. However, I can see that it could also be a pitfall if this behavior is not clearly understood. Am I right? Am I way off base? Either way, I could use some elaboration about it. --Bill On Mon, Jan 24, 2011 at 23:56, Bill Allen <walle...@gmail.com> wrote: > This is a bit embarrassing, but I have crafted a bit of code that does > EXACTLY what I what, but I am now a bit baffled as to precisely why. I have > written a function to do a bit of webscraping by following links for a > project at work. If I leave the code as is, it behaves like it is > recursively passing through the data tree- which is what I want. However, > if I change it only slightly, it makes only one pass through the top level > data. What I do not understand is why is ever behaves as if it is recursive > as the function is only called once. > > If I comment out_list=[] and let out_list-=part_list be used the following > parses through the whole tree of data as if recursive. If I use out_list=[] > and comment out_list=part_list, it only processes to top level of the data > tree. > > The function is called only once as: Exploded_BOM_List = > get_BOM(first_num) in which I pass it a single part number to start with. > The webscraping bit goes to a particular webpage about that part where it > then picks up more part numbers and repeats the process. > > So can anyone help me understand why this actually works? Certainly no > complaints here about it, but I would like to better understand why changes > the behavior so profoundly. All the print statements are just to I could > follow out the data flow while working on this. By following the data flow, > I am finding that part_list is actually having values added to it during the > time the function is running. Problem is, I don't see clearly why that > should be so. > > def get_BOM(part_list): > x=re.compile('part='+'.*?'+'>') > BOM_List = [] > > # out_list = [] > out_list = part_list > print("called get_BOM") > pass_num = 0 > for part_num in part_list: > mypath = "http://xxx.xxx.xxx.xxx/cgi-bin/search/part-url.cgi?part=" > + part_num > mylines = urllib.urlopen(mypath).readlines() > print("pass number ", pass_num) > print(mypath) > print("PL:",part_list) > for item in mylines: > if "http://" in item: > if "part=" in item: > xstring=str(x.findall(item)).strip('"[\'part=>\']"') > BOM_List.append(xstring) > print("BL:",BOM_List) > for bom_item in BOM_List: > if bom_item not in out_list: > out_list.append(bom_item) > print("OL:",out_list) > pass_num += 1 > return(out_list) > > > > > > > > > >
_______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor