Bill wrote: > Thanks for yoru reply. This was my first attempt,when running through > idleid get the following error:- > > > Traceback (most recent call last): > File "C:\Users\Bill\Desktop\TXT_Output\email_extraction_script.py", line > 27, in <module> > traverse_dirs(working_dir) > File "C:\Users\Bill\Desktop\TXT_Output\email_extraction_script.py", line > 20, in traverse_dirs > if match: > UnboundLocalError: local variable 'match' referenced before assignment > > My code is as follows:
> for line in lines: > match = > re.search(r"\b[^\<][A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,4} [^\>]\b",l > ine) > if match: > print(match.group(0)) > otext = match.group(0) + ",\n" > output_file.write(otext) The indentation of 'if match' is wrong; the way you wrote the line will be executed after the for loop, but you want it inside the loop. You are lucky that the first file you encountered was empty and thus the match variable never set ;) Otherwise the error would have been harder to find. Random remarks: > def traverse_dirs(wdir): > grabline = 0 > for f in os.listdir('.'): The listdir() argument should probably be wdir instead of '.'. > if os.path.isfile(f) == True: The idiomatic way to spell this is if os.path.isfile(f): > content = open(f) > lines = content.readlines() > for line in lines: The readlines() call will put the whole file into a potentially huge list. You don't need to do this for your application. Instead iterate over the file directly: content = open(f) for line in content: That keeps memory consumption low and the data processing can start immediately. PS: The way you wrote it your program will process a single directory. If you want to look into subdirectories you should read up on os.walk() as already suggested. You will end up with something like for path, dirs, files in os.walk(wdir): for name in files: f = os.path.join(path, name) content = open(f) ... _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor