Ok, I have definately verified this to myself. The following works perfectly and is a little easier to understand. In this version, I am plainly modifying my parts_list iterator thus producing the effect of an iterator that is growing over the course of the operation of the code. So, I am convinced that I had previously assigned part_list to out_list by reference, not value as I mistaken thought when I first wrote the code, which explains it. It was a silly mistake born from still being new in Python and thinking in terms of another language I know that typically assigns by value instead. It had no occurred to me initially that it was possible to modify an iterator in this way. I do not think most languages would allow this.
Question, is it possible to copy values from one object to another in such a way as they are not just references one to the other? Sorry about asking questions and then answering them. Things began to become more clear with each question I asked. def get_BOM(part_list): x=re.compile('part='+'.*?'+'>') BOM_List = [] pass_num = 0 for part_num in part_list: mypath = "http://172.25.8.13/cgi-bin/search/part-url.cgi?part=" + part_num mylines = urllib.urlopen(mypath).readlines() for item in mylines: if "http://" in item: if "part=" in item: xstring=str(x.findall(item)).strip('"[\'part=>\']"') BOM_List.append(xstring) for bom_item in BOM_List: if bom_item not in part_list: part_list.append(bom_item) pass_num += 1 return(part_list) On Tue, Jan 25, 2011 at 00:05, Bill Allen <walle...@gmail.com> wrote: > By the way, my guess as to why this is working for me the way it does is > that the statement > > out_list = part_list > > is actually linking these two objects, making them one. My intention had > been to just assign values from one to the other, but I think I have done > far more than that. In this case, if that is true, then it has worked out > well for me, giving me a feedback loop through the data. However, I can see > that it could also be a pitfall if this behavior is not clearly understood. > Am I right? Am I way off base? Either way, I could use some elaboration > about it. > > > --Bill > > > > > > > On Mon, Jan 24, 2011 at 23:56, Bill Allen <walle...@gmail.com> wrote: > >> This is a bit embarrassing, but I have crafted a bit of code that does >> EXACTLY what I what, but I am now a bit baffled as to precisely why. I have >> written a function to do a bit of webscraping by following links for a >> project at work. If I leave the code as is, it behaves like it is >> recursively passing through the data tree- which is what I want. However, >> if I change it only slightly, it makes only one pass through the top level >> data. What I do not understand is why is ever behaves as if it is recursive >> as the function is only called once. >> >> If I comment out_list=[] and let out_list-=part_list be used the following >> parses through the whole tree of data as if recursive. If I use out_list=[] >> and comment out_list=part_list, it only processes to top level of the data >> tree. >> >> The function is called only once as: Exploded_BOM_List = >> get_BOM(first_num) in which I pass it a single part number to start with. >> The webscraping bit goes to a particular webpage about that part where it >> then picks up more part numbers and repeats the process. >> >> So can anyone help me understand why this actually works? Certainly no >> complaints here about it, but I would like to better understand why changes >> the behavior so profoundly. All the print statements are just to I could >> follow out the data flow while working on this. By following the data flow, >> I am finding that part_list is actually having values added to it during the >> time the function is running. Problem is, I don't see clearly why that >> should be so. >> >> def get_BOM(part_list): >> x=re.compile('part='+'.*?'+'>') >> BOM_List = [] >> >> # out_list = [] >> out_list = part_list >> print("called get_BOM") >> pass_num = 0 >> for part_num in part_list: >> mypath = " >> http://xxx.xxx.xxx.xxx/cgi-bin/search/part-url.cgi?part=" + part_num >> mylines = urllib.urlopen(mypath).readlines() >> print("pass number ", pass_num) >> print(mypath) >> print("PL:",part_list) >> for item in mylines: >> if "http://" in item: >> if "part=" in item: >> xstring=str(x.findall(item)).strip('"[\'part=>\']"') >> BOM_List.append(xstring) >> print("BL:",BOM_List) >> for bom_item in BOM_List: >> if bom_item not in out_list: >> out_list.append(bom_item) >> print("OL:",out_list) >> pass_num += 1 >> return(out_list) >> >> >> >> >> >> >> >> >> >> >
_______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor