Suresh Krishna wrote:
it works (i think), but since this is my very first python program, i would really appreciate feedback on how the program could be improved..

First of all, welcome, new Python programmer,


My main trouble with the program is lack of structure, some lines deal with individual lines of the input file, while other deal with complete citation-records. I think the program would improve if this is more clearly expressed in the code.

One of the reasons that you got the current structure may be due to the use of the 'for line in bibfile' loop, which gives you a new line at only one point in the program. Such a for-loop is good for simple line processing (for each line do ...), but your program has grown beyond that imho. For this reason I'd like to propose to throw out the for-loop, and use a while-loop with three parts instead.


For simplicity, I left out the handling of the records (which we can discuss later perhaps)


inbibfile = open(InFileName,'r')

finished = False
while not finished:
    # Read & skip blank lines
    while True:
        line = inbibfile.readline()
        if len(line) == 0:
            finished = True
            break  # end of file detected
        if len(line.rstrip()) > 0:  # Line contains non-white-space
            break
        # else it was a blank line, skip, read next line

    if finished:
        break

    # line contains the first non-whitespace line of a new record
    # read the entire record until the next empty line or EOF
    current_entry = [line]
    while True:
        line = inbibfile.readline()
        if len(line) == 0:
            finished = True
            break  # Found EOF
        if len(line.rstrip()) > 0:
            current_entry.append(line)
        else: # Found an empty line, finished with current record
            break

    # finished reading (empty line or EOF encountered)
    ## Do something with current_entry

    # if not finished, do the next record

inbibfile.close()



You can go further by introducing a few functions:

inbibfile = open(InFileName,'r')

finished = False
while not finished:
    finished, line = read_blank_lines(inbibfile)
    if finished:
        break

    # line contains the first non-whitespace line of a new record
    current_entry = make_new_record(line)
    finished, current_entry = read_record_lines(current_entry, inbibfile)

    # finished reading (empty line or EOF encountered)
    ## Do something with current_entry

inbibfile.close()

and the functions then contain the details of each step.


A few comments about your current program:


InFileName= "original_library.ris";

In Python, you don't end a line with a ';'.
Also, 'InFileName' is a constant, which is normally given an all-uppercase name.

OUTDUPBIBFILE=open(OutDupFileName,'w')
current_entry=[]
numduplicates=0

One of the most difficult things to do in programming is achieving consistency. Above you have 3 global variables, and you use several different way of writing their names. You should try to write them the same. The big advantage of consistent names is that just by looking at a name you know what kind of thing it is, which reduces the chance of making errors in your program.


(Don't feel bad about it, this seems a very simple and obvious problem but in fact it is very hard to achieve, especially for larger programs and projects with several people).

>        if keyvalue not in current_keys: #is a unique entry
current_keys.append(keyvalue) #append current key to list of keys
            current_entry.append(line) #add the blank line to current entry
OUTBIBFILE.writelines(current_entry) #write out to new bib file without duplicates
            current_entry=[] #clear current entry for next one
            current_keyval=[] #clear current key

Look at the code of each line above, then look at the comment at the same line. What does the comment tell you that the code didn't?

>        continue  #dont write out successive blanks or initial blanks
> elif current_entry and line.isspace(): #reached a blank that demarcates end of current entry > keyvalue=''.join(current_keyval) #generated a key based on certain fields
>     elif len(line)>2: #not a blank, so more stuff in currrent entry

Now do the same here.


You see the difference?

In the first lines you repeat your code in words. In the second lines, you tell what you aim to achieve at that line, ie WHY do you do what you are doing rather than just WHAT does the line do (since that is already defined in the code). In general, there is no need to comment each line. You may also assume that the reader knows Python (otherwise there is little value for him reading the code).

Try to make 'blocks' of code with one or more empty lines in between (see my example code), and write a comment what that block as a whole aims to do.


Hope I didn't critize you too much. For a first Python program it is quite nice.

Sincerely,
Albert
_______________________________________________
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Reply via email to