Re: difflib and intelligent file differences

2009-03-26 Thread hayes . tyler
Thanks for all of your suggestions. Turns out Marco's first version was really the one I needed. Thanks again, t. On Mar 26, 12:37 pm, Marco Mariani wrote: > Marco Mariani wrote: > >> If the lines are really sorted, all you really need is a merge, > > For the archives, and for huge files where

Re: difflib and intelligent file differences

2009-03-26 Thread Marco Mariani
Marco Mariani wrote: If the lines are really sorted, all you really need is a merge, For the archives, and for huge files where /usr/bin/diff or difflib are not appropriate, here it is. #!/usr/bin/env python import sys def run(filea, fileb): p = 3 while True: if p&1: a =

Re: difflib and intelligent file differences

2009-03-26 Thread Steven D'Aprano
On Thu, 26 Mar 2009 07:00:52 -0700, hayes.tyler wrote: > Hello All: > > I am starting to work on a file comparison script where I have to > compare the contents of two large files. ... > (and this is why I'm thinking of using > Python's difflib to work on it) ... > Any suggestions where to start

Re: difflib and intelligent file differences

2009-03-26 Thread Scott David Daniels
hayes.ty...@gmail.com wrote: Any suggestions where to start? Start by reading the docs on the difflib module, perform some of the examples, and attempt to solve it yourself. Once you get in trouble, show a clear example of what you think went wrong. --Scott David Daniels scott.dani...@acm.org

Re: difflib and intelligent file differences

2009-03-26 Thread Marco Mariani
Dave Angel wrote: If the lines are really sorted, all you really need is a merge, D'oh. Right. The posted code works on unsorted files. The sorted case is even simpler as you pointed out. -- http://mail.python.org/mailman/listinfo/python-list

Re: Re: difflib and intelligent file differences

2009-03-26 Thread Dave Angel
If the lines are really sorted, all you really need is a merge, where you read one line from each source, and if equal, read another from each. If one source is less, output the lesser line with appropriate tag , and refresh that one from its source. Stop when either source has run out, and t

Re: difflib and intelligent file differences

2009-03-26 Thread Dave Angel
First comment, have you looked at the standard module difflib? There's a sample program diff.py located in tools\scripts that may do what you need already. It finds the differences in context, and displays them in a way that's frequently intuitive, showing you what's been changed, and

Re: difflib and intelligent file differences

2009-03-26 Thread hayes . tyler
On Mar 26, 11:10 am, Marco Mariani wrote: > Marco Mariani wrote: > >>     while True: > >>         a = filea.readline() > >>         b = fileb.readline() > >>         if not (a or b): > >>             break > > BTW, watch out for this break. It might not be what you want :-/ HA! Just found it :P

Re: difflib and intelligent file differences

2009-03-26 Thread Marco Mariani
Marco Mariani wrote: while True: a = filea.readline() b = fileb.readline() if not (a or b): break BTW, watch out for this break. It might not be what you want :-/ -- http://mail.python.org/mailman/listinfo/python-list

Re: difflib and intelligent file differences

2009-03-26 Thread Marco Mariani
hayes.ty...@gmail.com wrote: My first thought is to do a sweep, where the first sweep takes one line from f1, travels f2, if found, deletes it from a tmp version of f2, and then on to the second line, and so on. If not found, it writes to a file. At the end, if there are also lines still in f1 t