Thank you very much! I had forgotten that unix URLs are case sensitive.
Also, I changed my 'For' statements to your suggestion, tweaked the exception code a little, and it's working. So, there are obviously several ways to open files. Do you have a standard practice, or does it depend on the file format? I will eventually be working with Excel and possibly mssql tables. Thanks again for your help. Roy On Thu, Dec 3, 2009 at 3:46 AM, Christian Witts <[email protected]>wrote: > Roy Hinkelman wrote: > >> >> Your list is great. I've been lurking for the past two weeks while I >> learned the basics. Thanks. >> >> I am trying to loop thru 2 files and scrape some data, and the loops are >> not working. >> >> The script is not getting past the first URL from state_list, as the test >> print shows. >> >> If someone could point me in the right direction, I'd appreciate it. >> >> I would also like to know the difference between open() and csv.reader(). >> I had similar issues with csv.reader() when opening these files. >> >> Any help greatly appreciated. >> >> Roy >> >> Code: Select all >> # DOWNLOAD USGS MISSING FILES >> >> import mechanize >> import BeautifulSoup as B_S >> import re >> # import urllib >> import csv >> >> # OPEN FILES >> # LOOKING FOR THESE SKUs >> _missing = open('C:\\Documents and >> Settings\\rhinkelman\\Desktop\\working DB files\\missing_topo_list.csv', >> 'r') >> # IN THESE STATES >> _states = open('C:\\Documents and >> Settings\\rhinkelman\\Desktop\\working DB files\\state_list.csv', 'r') >> # IF NOT FOUND, LIST THEM HERE >> _missing_files = [] >> # APPEND THIS FILE WITH META >> _topo_meta = open('C:\\Documents and >> Settings\\rhinkelman\\Desktop\\working DB files\\topo_meta.csv', 'a') >> >> # OPEN PAGE >> for each_state in _states: >> each_state = each_state.replace("\n", "") >> print each_state >> html = mechanize.urlopen(each_state) >> _soup = B_S.BeautifulSoup(html) >> # SEARCH THRU PAGE AND FIND ROW CONTAINING META MATCHING SKU >> _table = _soup.find("table", "tabledata") >> print _table #test This is returning 'None' >> >> If you take a look at the webpage you open up, you will notice there are > no tables. Are you certain you are using the correct URLs for this ? > > for each_sku in _missing: >> > The for loop `for each_sku in _missing:` will only iterate once, you can > either pre-read it into a list / dictionary / set (whichever you prefer) or > change it to > _missing_filename = 'C:\\Documents and > Settings\\rhinkelman\\Desktop\\working DB files\\missing_topo_list.csv' > for each_sku in open(_missing_filename): > # carry on here > >> each_sku = each_sku.replace("\n","") >> print each_sku #test >> try: >> _row = _table.find('tr', text=re.compile(each_sku)) >> except (IOError, AttributeError): >> _missing_files.append(each_sku) >> continue >> else: >> _row = _row.previous >> _row = _row.parent >> _fields = _row.findAll('td') >> _name = _fields[1].string >> _state = _fields[2].string >> _lat = _fields[4].string >> _long = _fields[5].string >> _sku = _fields[7].string >> >> _topo_meta.write(_name + "|" + _state + "|" + _lat + "|" + >> _long + "|" + _sku + "||") >> print x +': ' + _name >> >> print "Missing Files:" >> print _missing_files >> _topo_meta.close() >> _missing.close() >> _states.close() >> >> >> The message I am getting is: >> >> Code: >> >>> >> http://libremap.org/data/state/Colorado/drg/ >> None >> 33087c2 >> Traceback (most recent call last): >> File "//Dc1/Data/SharedDocs/Roy/_Coding Vault/Python code >> samples/usgs_missing_file_META.py", line 34, in <module> >> _row = _table.find('tr', text=re.compile(each_sku)) >> AttributeError: 'NoneType' object has no attribute 'find' >> >> >> And the files look like: >> >> Code: >> state_list >> http://libremap.org/data/state/Colorado/drg/ >> http://libremap.org/data/state/Connecticut/drg/ >> http://libremap.org/data/state/Pennsylvania/drg/ >> http://libremap.org/data/state/South_Dakota/drg/ >> >> missing_topo_list >> 33087c2 >> 34087b2 >> 33086b7 >> 34086c2 >> >> >> ------------------------------------------------------------------------ >> >> _______________________________________________ >> Tutor maillist - [email protected] >> To unsubscribe or change subscription options: >> http://mail.python.org/mailman/listinfo/tutor >> >> > Hope the comments above help in your endeavours. > > -- > Kind Regards, > Christian Witts > > >
_______________________________________________ Tutor maillist - [email protected] To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
