On 02/06/15 08:27, Alan Gauld wrote:
The following is a sample of the test code, as well as the url/posts of the pages as produced by the Firefox/Firebug process.
I'm not really answering your question but addressing some issues in your code...
execfile('/apps/parseapp2/ascii_strip.py') execfile('dir_defs_inc.py')
I'm not sure what these do but usually its better to import the files as modules then execute their functions directly.
appDir="/apps/parseapp2/" # data output filename datafile="unlvDept.dat" # global var for the parent/child list json plist={} cname="unlv.lwp" #---------------------------------------- if __name__ == "__main__": # main app
It makes testing (and reuse) easier if you put the main code in a function called main() and then just call that here. Also your code could be broken up into smaller functions which again will make testing and debugging easier.
# # get the input struct, parse it, determine the level # cmd="echo '' > "+datafile proc=subprocess.Popen(cmd, shell=True,stdout=subprocess.PIPE) res=proc.communicate()[0].strip()
Its easier and more efficient/reliable to create the file directly from Python. Calling the subprocess modyule each time starts up extra processes. Also you store the result but never use it...
cmd="echo '' > "+cname proc=subprocess.Popen(cmd, shell=True,stdout=subprocess.PIPE) res=proc.communicate()[0].strip()
See above
cmd='curl -vvv ' cmd=cmd+'-A "Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.0.11) Gecko/2009061118 Fedora/3.0.11-1.fc9 Firefox/3.0.11"' cmd=cmd+' --cookie-jar '+cname+' --cookie '+cname+' ' cmd=cmd+'-L "http://www.lonestar.edu/class-search.htm"'
You build up strings like this many times but its very inefficient. There are several better options:
1) create a list of substrings then use join() to convert the list to a string. 2) use a triple quoted string to create the string once only. And since you are mostly passing them to Popen look at the docs to see how to pass a list of args instead of one large string, its more secure and generally better practice.
cmd='curl -vvv ' cmd=cmd+'-A "Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.0.11) Gecko/2009061118 Fedora/3.0.11-1.fc9 Firefox/3.0.11"' cmd=cmd+' --cookie-jar '+cname+' --cookie '+cname+' ' cmd=cmd+'-L "https://campus.lonestar.edu/classsearch.htm"' #initial page cmd='curl -vvv ' cmd=cmd+'-A "Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.0.11) Gecko/2009061118 Fedora/3.0.11-1.fc9 Firefox/3.0.11"' cmd=cmd+' --cookie-jar '+cname+' --cookie '+cname+' ' cmd=cmd+'-L "https://my.unlv.nevada.edu/psc/lvporprd/EMPLOYEE/HRMS/c/COMMUNITY_ACCESS.CLASS_SEARCH.GBL"' proc=subprocess.Popen(cmd, shell=True,stdout=subprocess.PIPE) res2=proc.communicate()[0].strip() print res2 sys.exit()
Since this is non conditional you always exit here so nothing else ever gets executed. This may be the cause of your problem?
# s contains HTML not XML text d = libxml2dom.parseString(res2, html=1) #-----------Form------------ selpath="//input[@id='ICSID']//attribute::value" sel_ = d.xpath(selpath) if (len(sel_) == 0): sys.exit() val="" ndx=0 for a in sel_: val=a.textContent.strip() print val #sys.exit() if(val==""): sys.exit() #build the 1st post ddd=1 post=""
This does nothing since you immediately replace it with the next line.
post="ICAJAX=1" post=post+"&ICAPPCLSDATA=" post=post+"&ICAction=DERIVED_CLSRCH_SSR_EXPAND_COLLAPS%24149%24%241" post=post+"&ICActionPrompt=false" post=post+"&ICAddCount=" post=post+"&ICAutoSave=0" post=post+"&ICBcDomData=undefined" post=post+"&ICChanged=-1" post=post+"&ICElementNum=0" post=post+"&ICFind=" post=post+"&ICFocus=" post=post+"&ICNAVTYPEDROPDOWN=0" post=post+"&ICResubmit=0" post=post+"&ICSID="+urllib.quote(val) post=post+"&ICSaveWarningFilter=0" post=post+"&ICStateNum="+str(ddd) post=post+"&ICType=Panel" post=post+"&ICXPos=0" post=post+"&ICYPos=114" post=post+"&ResponsetoDiffFrame=-1" post=post+"&SSR_CLSRCH_WRK_SSR_OPEN_ONLY$chk$3=N" post=post+"&SSR_CLSRCH_WRK_SUBJECT$0=ACC" post=post+"&TargetFrameName=None"
Since these are all hard coded strings you might as well have just hard coded the final string and saved a lot of processing. (and code space)
cmd='curl -vvv ' cmd=cmd+'-A "Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.0.11) Gecko/2009061118 Fedora/3.0.11-1.fc9 Firefox/3.0.11"' cmd=cmd+' --cookie-jar '+cname+' --cookie '+cname+' ' cmd=cmd+'-e "https://my.unlv.nevada.edu/psc/lvporprd/EMPLOYEE/HRMS/c/COMMUNITY_ACCESS.CLASS_SEARCH.GBL?&"
This looks awfully similar to the code up above. Could you have reused the command? Maybe with some parameters - check out string formatting operations. eg: 'This string takes %s as a parameter" % 'a string'
I'll stop here, its all getting a bit repetitive. Which is, in itself a sign that you need to create some functions. -- Alan G Author of the Learn to Program web site http://www.alan-g.me.uk/ http://www.amazon.com/author/alan_gauld Follow my photo-blog on Flickr at: http://www.flickr.com/photos/alangauldphotos _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor