There is linear thinking and then there is more linear thinking. As Alan, Mats and others said, there are often choices we can make and many approaches.
If your goal is to solve a problem NOW and right where you are, any method you can find is great. If your goal is to solve it repeatedly or in many possible places or more efficiently or are trying to learn more about a particular language/method like python, then you have constraints to consider. The linear solution might be to solve the entire problem in one way. That may be python or it may be some UNIX tool like AWK. The flexible solutions may include doing it in stages, perhaps switching tools along the way. So for the logfile solution, some see two paradigms. One is to read in the entire file, no matter how big. The other is to read in no more than a line at a time. Bogus choice. In the logfile example, there are many other choices. If you can recognize the beginning of a region and then the end, sure, you can buffer lines. But how about a solution where you simply read the entire file (a line at a time) while writing a second file on only the lines you might need. When done, if the file is empty, move on. If not, open that smaller file, perhaps reading it all in at once, and process that small file containing perhaps a few error logs. Or, heck, maybe each error region was written into a different file and you process them one at a time with even less complex code. Unless efficiency is paramount, many schemes like these can result in a better division of labor, easier to understand and perhaps even code. And some of these schemes may even have other advantages. Here is yet another weird idea. REWIND. If the log file is a real file, you can wait till you reach the end condition that tells you that you also need earlier lines. Make a note of your position in the file and calculate an estimate of how far back in the file you need to go, perhaps a very generous estimate. Say you rewind a thousand bytes and start reading lines again, perhaps discarding the first one as likely to be incomplete. You can read all these lines into a buffer, figure out which lines you need starting from the end, do what you want with it, toss the buffer, reset the file pointer, and continue! That solution is not linear at all. If you have a huge log file with a sparse (or nonexistent) number of errors to process, this may even be faster than a scheme which buffers the last hundred lines and is constantly rolling those lines over for naught. I am not criticizing any approach but suggesting that one good approach to problems is to not close in on one particular solution prematurely. Be open to other solutions and even think outside that box. There may be many ways to look back. As for the UNIX tools, one nice thing about them was using them in a pipeline where each step made some modification and often that merely allowed the next step to modify that. The solution did not depend on one tool doing everything. Even within python, you can find a way to combine many modules to get a job done rather than building it from scratch. Which leads to the question of how you would design a log file if you knew you needed to be able to search it efficiently for your particular application. I would offer something like HTML as an example. To some extent, the design of many elements looks like this: <BODY> ... </BODY> The idea is you can write code that starts saving info when it reaches the first tag, and when it reaches the second tag that ends it, you make a decision. If you are in the right region, process it. If not, toss it and just move on. Of course, this scheme does not actually work for many of the tags in HTML. Many tags allow an optional close but tolerate not having one. Some things may be nested within each other. But when it comes to log files, if some line says: *** And that marks any error and you now wait till the end of the region to see which error on a line like: #ERRNO: 26 Then you can ask your code to ignore lines till it sees the first, marker. Buffer any subsequent lines till you recognize the second marker and process the buffer. Then go back to ignoring till ... A real logfile may contain many sections for many purposes. They may even have interleaved lines from different processes to the point where your design may require all lines to start with something unique like the process ID of the writer. This would make parsing such a file very hard, perhaps requiring multiple passes sort of like described above. So sometimes a better design is multiple log files that can be merged if needed. Of course if you must use what exists, .... -----Original Message----- From: Tutor <tutor-bounces+avigross=verizon....@python.org> On Behalf Of Mats Wichmann Sent: Monday, December 24, 2018 10:15 AM To: tutor@python.org Subject: Re: [Tutor] look back comprehensively On 12/24/18 2:14 AM, Alan Gauld via Tutor wrote: > On 24/12/2018 05:25, Asokan Pichai wrote: > >> That said, sometimes text processing at the shell can precede or even >> replace some of these. Of course that assumes Unix/Linux OS. > In fact for most log file analysis I still use [ng]awk. > Its hard to beat the simplicity of regex based event handling for > slicing text files. Sure... there's nothing wrong with using "the appropriate tool for the job" rather than making everything look like a Python problem. There's Perl, too... the ultimate log analysis toolkit. If only I could read what I wrote a week later :) _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor