Your list is great. I've been lurking for the past two weeks while I learned the basics. Thanks.
I am trying to loop thru 2 files and scrape some data, and the loops are not working. The script is not getting past the first URL from state_list, as the test print shows. If someone could point me in the right direction, I'd appreciate it. I would also like to know the difference between open() and csv.reader(). I had similar issues with csv.reader() when opening these files. Any help greatly appreciated. Roy Code: Select all # DOWNLOAD USGS MISSING FILES import mechanize import BeautifulSoup as B_S import re # import urllib import csv # OPEN FILES # LOOKING FOR THESE SKUs _missing = open('C:\\Documents and Settings\\rhinkelman\\Desktop\\working DB files\\missing_topo_list.csv', 'r') # IN THESE STATES _states = open('C:\\Documents and Settings\\rhinkelman\\Desktop\\working DB files\\state_list.csv', 'r') # IF NOT FOUND, LIST THEM HERE _missing_files = [] # APPEND THIS FILE WITH META _topo_meta = open('C:\\Documents and Settings\\rhinkelman\\Desktop\\working DB files\\topo_meta.csv', 'a') # OPEN PAGE for each_state in _states: each_state = each_state.replace("\n", "") print each_state html = mechanize.urlopen(each_state) _soup = B_S.BeautifulSoup(html) # SEARCH THRU PAGE AND FIND ROW CONTAINING META MATCHING SKU _table = _soup.find("table", "tabledata") print _table #test This is returning 'None' for each_sku in _missing: each_sku = each_sku.replace("\n","") print each_sku #test try: _row = _table.find('tr', text=re.compile(each_sku)) except (IOError, AttributeError): _missing_files.append(each_sku) continue else: _row = _row.previous _row = _row.parent _fields = _row.findAll('td') _name = _fields[1].string _state = _fields[2].string _lat = _fields[4].string _long = _fields[5].string _sku = _fields[7].string _topo_meta.write(_name + "|" + _state + "|" + _lat + "|" + _long + "|" + _sku + "||") print x +': ' + _name print "Missing Files:" print _missing_files _topo_meta.close() _missing.close() _states.close() The message I am getting is: Code: >>> http://libremap.org/data/state/Colorado/drg/ None 33087c2 Traceback (most recent call last): File "//Dc1/Data/SharedDocs/Roy/_Coding Vault/Python code samples/usgs_missing_file_META.py", line 34, in <module> _row = _table.find('tr', text=re.compile(each_sku)) AttributeError: 'NoneType' object has no attribute 'find' And the files look like: Code: state_list http://libremap.org/data/state/Colorado/drg/ http://libremap.org/data/state/Connecticut/drg/ http://libremap.org/data/state/Pennsylvania/drg/ http://libremap.org/data/state/South_Dakota/drg/ missing_topo_list 33087c2 34087b2 33086b7 34086c2
_______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor