By the way, my guess as to why this is working for me the way it does is
that the statement

out_list = part_list

is actually linking these two objects, making them one.   My intention had
been to just assign values from one to the other, but I think I have done
far more than that.   In this case, if that is true, then it has worked out
well for me, giving me a feedback loop through the data.  However, I can see
that it could also be a pitfall if this behavior is not clearly understood.
Am I right?   Am I way off base?  Either way, I could use some elaboration
about it.


--Bill





On Mon, Jan 24, 2011 at 23:56, Bill Allen <walle...@gmail.com> wrote:

> This is a bit embarrassing, but I have crafted a bit of code that does
> EXACTLY what I what, but I am now a bit baffled as to precisely why.  I have
> written a function to do a bit of webscraping by following links for a
> project at work.  If I leave the code as is, it behaves like it is
> recursively passing through the data tree- which is what I want.  However,
> if I change it only slightly, it makes only one pass through the top level
> data.  What I do not understand is why is ever behaves as if it is recursive
> as the function is only called once.
>
> If I comment out_list=[] and let out_list-=part_list be used the following
> parses through the whole tree of data as if recursive.  If I use out_list=[]
> and comment out_list=part_list, it only processes to top level of the data
> tree.
>
> The function is called only once as:  Exploded_BOM_List =
> get_BOM(first_num)  in which I pass it a single part number to start with.
> The webscraping bit goes to a particular webpage about that part where it
> then picks up more part numbers and repeats the process.
>
> So can anyone help me understand why this actually works?  Certainly no
> complaints here about it, but I would like to better understand why changes
> the behavior so profoundly.  All the print statements are just to I could
> follow out the data flow while working on this.  By following the data flow,
> I am finding that part_list is actually having values added to it during the
> time the function is running.   Problem is, I don't see clearly why that
> should be so.
>
> def get_BOM(part_list):
>     x=re.compile('part='+'.*?'+'>')
>     BOM_List = []
>
> #    out_list = []
>     out_list = part_list
>     print("called get_BOM")
>     pass_num = 0
>     for part_num in part_list:
>         mypath = "http://xxx.xxx.xxx.xxx/cgi-bin/search/part-url.cgi?part=";
> + part_num
>         mylines = urllib.urlopen(mypath).readlines()
>         print("pass number ", pass_num)
>         print(mypath)
>         print("PL:",part_list)
>         for item in mylines:
>             if "http://"; in item:
>                 if "part=" in item:
>                     xstring=str(x.findall(item)).strip('"[\'part=>\']"')
>                     BOM_List.append(xstring)
>                     print("BL:",BOM_List)
>         for bom_item in BOM_List:
>             if bom_item not in out_list:
>                 out_list.append(bom_item)
>                 print("OL:",out_list)
>         pass_num += 1
>     return(out_list)
>
>
>
>
>
>
>
>
>
>
_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Reply via email to