Hi All, I used your solution , however found a strange issue with deque :
I am using python 2.6.6: >>> import collections >>> d = collections.deque('abcdefg') >>> print 'Deque:', d File "<stdin>", line 1 print 'Deque:', d ^ SyntaxError: invalid syntax >>> print ('Deque:', d) Deque: deque(['a', 'b', 'c', 'd', 'e', 'f', 'g']) >>> print d File "<stdin>", line 1 print d ^ SyntaxError: invalid syntax >>> print (d) deque(['a', 'b', 'c', 'd', 'e', 'f', 'g']) In python 2.6 print statement work as print "Solution" however after import collection I have to use print with print("Solution") is this a known issue ? Please let me know . Thanks, On Mon, Dec 10, 2018 at 10:30 PM <tutor-requ...@python.org> wrote: > Send Tutor mailing list submissions to > tutor@python.org > > To subscribe or unsubscribe via the World Wide Web, visit > https://mail.python.org/mailman/listinfo/tutor > or, via email, send a message with subject or body 'help' to > tutor-requ...@python.org > > You can reach the person managing the list at > tutor-ow...@python.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Tutor digest..." > Today's Topics: > > 1. Re: Increase performance of the script (Peter Otten) > 2. Re: Increase performance of the script (Steven D'Aprano) > 3. Re: Increase performance of the script (Steven D'Aprano) > > > > ---------- Forwarded message ---------- > From: Peter Otten <__pete...@web.de> > To: tutor@python.org > Cc: > Bcc: > Date: Sun, 09 Dec 2018 21:17:53 +0100 > Subject: Re: [Tutor] Increase performance of the script > Asad wrote: > > > Hi All , > > > > I have the following code to search for an error and prin the > > solution . > > > > /A/B/file1.log size may vary from 5MB -5 GB > > > > f4 = open (r" /A/B/file1.log ", 'r' ) > > string2=f4.readlines() > > Do not read the complete file into memory. Read one line at a time and > keep > only those lines around that you may have to look at again. > > > for i in range(len(string2)): > > position=i > > lastposition =position+1 > > while True: > > if re.search('Calling rdbms/admin',string2[lastposition]): > > break > > elif lastposition==len(string2)-1: > > break > > else: > > lastposition += 1 > > You are trying to find a group of lines. The way you do it for a file of > the > structure > > foo > bar > baz > end-of-group-1 > ham > spam > end-of-group-2 > > you find the groups > > foo > bar > baz > end-of-group-1 > > bar > baz > end-of-group-1 > > baz > end-of-group-1 > > ham > spam > end-of-group-2 > > spam > end-of-group-2 > > That looks like a lot of redundancy which you can probably avoid. But > wait... > > > > errorcheck=string2[position:lastposition] > > for i in range ( len ( errorcheck ) ): > > if re.search ( r'"error(.)*13?"', errorcheck[i] ): > > print "Reason of error \n", errorcheck[i] > > print "script \n" , string2[position] > > print "block of code \n" > > print errorcheck[i-3] > > print errorcheck[i-2] > > print errorcheck[i-1] > > print errorcheck[i] > > print "Solution :\n" > > print "Verify the list of objects belonging to Database " > > break > > else: > > continue > > break > > you throw away almost all the hard work to look for the line containing > those four lines? It looks like you only need the > "error...13" lines, the three lines that precede it and the last > "Calling..." line occuring before the "error...13". > > > The problem I am facing in performance issue it takes some minutes to > > print out the solution . Please advice if there can be performance > > enhancements to this script . > > If you want to learn the Python way you should try hard to write your > scripts without a single > > for i in range(...): > ... > > loop. This style is usually the last resort, it may work for small > datasets, > but as soon as you have to deal with large files performance dives. > Even worse, these loops tend to make your code hard to debug. > > Below is a suggestion for an implementation of what your code seems to be > doing that only remembers the four recent lines and works with a single > loop. If that saves you some time use that time to clean the scripts you > have lying around from occurences of "for i in range(....): ..." ;) > > > from __future__ import print_function > > import re > import sys > from collections import deque > > > def show(prompt, *values): > print(prompt) > for value in values: > print(" {}".format(value.rstrip("\n"))) > > > def process(filename): > tail = deque(maxlen=4) # the last four lines > script = None > with open(filename) as instream: > for line in instream: > tail.append(line) > if "Calling rdbms/admin" in line: > script = line > elif re.search('"error(.)*13?"', line) is not None: > show("Reason of error:", tail[-1]) > show("Script:", script) > show("Block of code:", *tail) > show( > "Solution", > "Verify the list of objects belonging to Database" > ) > break > > > if __name__ == "__main__": > filename = sys.argv[1] > process(filename) > > > > > > > ---------- Forwarded message ---------- > From: "Steven D'Aprano" <st...@pearwood.info> > To: tutor@python.org > Cc: > Bcc: > Date: Mon, 10 Dec 2018 09:43:20 +1100 > Subject: Re: [Tutor] Increase performance of the script > On Sun, Dec 09, 2018 at 03:45:07PM +0530, Asad wrote: > > Hi All , > > > > I have the following code to search for an error and prin the > > solution . > > > > /A/B/file1.log size may vary from 5MB -5 GB > [...] > > > The problem I am facing in performance issue it takes some minutes to > print > > out the solution . Please advice if there can be performance enhancements > > to this script . > > How many minutes is "some"? If it takes 2 minutes to analyse a 5GB file, > that's not bad performance. If it takes 2 minutes to analyse a 5MB file, > that's not so good. > > > > -- > Steve > > > > > ---------- Forwarded message ---------- > From: "Steven D'Aprano" <st...@pearwood.info> > To: tutor@python.org > Cc: > Bcc: > Date: Mon, 10 Dec 2018 11:00:58 +1100 > Subject: Re: [Tutor] Increase performance of the script > On Sun, Dec 09, 2018 at 03:45:07PM +0530, Asad wrote: > > Hi All , > > > > I have the following code to search for an error and prin the > > solution . > > Please tidy your code before asking for help optimizing it. We're > volunteers, not being paid to work on your problem, and your code is too > hard to understand. > > Some comments: > > > > f4 = open (r" /A/B/file1.log ", 'r' ) > > string2=f4.readlines() > > You have a variable "f4". Where are f1, f2 and f3? > > You have a variable "string2", which is a lie, because it is not a > string, it is a list. > > I will be very surprised if the file name you show is correct. It has a > leading space, and two trailing spaces. > > > > for i in range(len(string2)): > > position=i > > Poor style. In Python, you almost never need to write code that iterates > over the indexes (this is not Pascal). You don't need the assignment > position=i. Better: > > for position, line in enumerate(lines): > ... > > > > lastposition =position+1 > > Poorly named variable. You call it "last position", but it is actually > the NEXT position. > > > > while True: > > if re.search('Calling rdbms/admin',string2[lastposition]): > > Unnecessary use of regex, which will be slow. Better: > > if 'Calling rdbms/admin' in line: > break > > > > break > > elif lastposition==len(string2)-1: > > break > > If you iterate over the lines, you don't need to check for the end of > the list yourself. > > > A better solution is to use the *accumulator* design pattern to collect > a block of lines for further analysis: > > # Untested. > with open(filename, 'r') as f: > block = [] > inside_block = False > for line in f: > line = line.strip() > if inside_block: > if line == "End of block": > inside_block = False > process(block) > block = [] # Reset to collect the next block. > else: > block.append(line) > elif line == "Start of block": > inside_block = True > # At the end of the loop, we might have a partial block. > if block: > process(block) > > > Your process() function takes a single argument, the list of lines which > makes up the block you care about. > > If you need to know the line numbers, it is easy to adapt: > > for line in f: > > becomes: > > for linenumber, line in enumerate(f): > # The next line is not needed in Python 3. > linenumber += 1 # Adjust to start line numbers at 1 instead of 0 > > and: > > block.append(line) > > becomes > > block.append((linenumber, line)) > > > If you re-write your code using this accumulator pattern, using ordinary > substring matching and equality instead of regular expressions whenever > possible, I expect you will see greatly improved performance (as well as > being much, much easier to understand and maintain). > > > > -- > Steve > > _______________________________________________ > Tutor maillist - Tutor@python.org > https://mail.python.org/mailman/listinfo/tutor > -- Asad Hasan +91 9582111698 _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor