Erik Rise gave a good talk today at PyCon about a parsing library he's working on called Parsimonious. You could maybe look into what he's doing there, and see if that helps you any... Follow him on Twitter at @erikrose to see when his session's video is up. His session was named "Parsing Horrible Things in Python" On Mar 11, 2012 9:48 PM, "Robert Sjoblom" <robert.sjob...@gmail.com> wrote:
> > You haven't shown us the critical part: how are you getting the lines in > > the first place? > > Ah, yes -- > with open(address, "r", encoding="cp1252") as instream: > for line in instream: > > > (Also, you shouldn't shadow built-ins like list as you do above, unless > > you know what you are doing. If you have to ask "what's shadowing?", you > > don't :) > Maybe I should have said list_name.append() instead; sorry for that. > > >> This, however, turned out to be unacceptably slow; this file is 1.1M > >> lines, and it takes roughly a minute to go through. I have 450 of > >> these files; I don't have the luxury to let it run for 8 hours. > > > > Really? And how many hours have you spent trying to speed this up? Two? > > Three? Seven? And if it takes people two or three hours to answer your > > question, and you another two or three hours to read it, it would have > > been faster to just run the code as given :) > Yes, for one set of files. Since I don't know how many sets of ~450 > files I'll have to run this over, I think that asking for help was a > rather acceptable loss of time. I work on other parts while waiting > anyway, or try and find out on my own as well. > > > - if you need to stick with Python, try this: > > > > # untested > > results = [] > > fp = open('filename') > > for line in fp: > > if key in line: > > # Found key, skip the next line and save the following. > > _ = next(fp, '') > > results.append(next(fp, '')) > > Well that's certainly faster, but not fast enough. > Oh well, I'll continue looking for a solution -- because even with the > speedup it's unacceptable. I'm hoping against hope that I only have to > run it against the last file of each batch of files, but if it turns > out that I don't, I'm in for some exciting days of finding stuff out. > Thanks for all the help though, it's much appreciated! > > How do you approach something like this, when someone tells you "we > need you to parse these files. We can't tell you how they're > structured so you'll have to figure that out yourself."? It's just so > much text that's it's hard to get a grasp on the structure, and > there's so much information contained in there as well; this is just > the first part of what I'm afraid will be many. I'll try not to bother > this list too much though. > -- > best regards, > Robert S. > _______________________________________________ > Tutor maillist - Tutor@python.org > To unsubscribe or change subscription options: > http://mail.python.org/mailman/listinfo/tutor >
_______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor