Re: binary file compare...

2009-04-18 Thread Piet van Oostrum
> Adam Olsen (AO) wrote: >AO> The Wayback Machine has 150 billion pages, so 2**37. Google's index >AO> is a bit larger at over a trillion pages, so 2**40. A little closer >AO> than I'd like, but that's still 56294995000 to 1 odds of having >AO> *any* collisions between *any* of the file

Re: binary file compare...

2009-04-17 Thread Steven D'Aprano
On Fri, 17 Apr 2009 11:19:31 -0700, Adam Olsen wrote: > Actually, *cryptographic* hashes handle that just fine. Even for files > with just a 1 bit change the output is totally different. This is known > as the Avalanche Effect. Otherwise they'd be vulnerable to attacks. > > Which isn't to say

Re: binary file compare...

2009-04-17 Thread Lawrence D'Oliveiro
In message , Nigel Rantor wrote: > Adam Olsen wrote: > >> The chance of *accidentally* producing a collision, although >> technically possible, is so extraordinarily rare that it's completely >> overshadowed by the risk of a hardware or software failure producing >> an incorrect result. > > Not

Re: binary file compare...

2009-04-17 Thread Adam Olsen
On Apr 17, 9:59 am, SpreadTooThin wrote: > You know this is just insane.  I'd be satisfied with a CRC16 or > something in the situation i'm in. > I have two large files, one local and one remote.  Transferring every > byte across the internet to be sure that the two files are identical > is just n

Re: binary file compare...

2009-04-17 Thread Adam Olsen
On Apr 17, 9:59 am, norseman wrote: > The more complicated the math the harder it is to keep a higher form of > math from checking (or improperly displacing) a lower one.  Which, of > course, breaks the rules.  Commonly called improper thinking. A number > of math teasers make use of that. Of cou

Re: binary file compare...

2009-04-17 Thread Adam Olsen
On Apr 17, 5:30 am, Tim Wintle wrote: > On Thu, 2009-04-16 at 21:44 -0700, Adam Olsen wrote: > > The Wayback Machine has 150 billion pages, so 2**37.  Google's index > > is a bit larger at over a trillion pages, so 2**40.  A little closer > > than I'd like, but that's still 56294995000 to 1 od

Re: binary file compare...

2009-04-17 Thread SpreadTooThin
On Apr 17, 4:54 am, Nigel Rantor wrote: > Adam Olsen wrote: > > On Apr 16, 11:15 am, SpreadTooThin wrote: > >> And yes he is right CRCs hashing all have a probability of saying that > >> the files are identical when in fact they are not. > > > Here's the bottom line.  It is either: > > > A) Sever

Re: binary file compare...

2009-04-17 Thread norseman
Adam Olsen wrote: On Apr 16, 11:15 am, SpreadTooThin wrote: And yes he is right CRCs hashing all have a probability of saying that the files are identical when in fact they are not. Here's the bottom line. It is either: A) Several hundred years of mathematics and cryptography are wrong. The

Re: binary file compare...

2009-04-17 Thread Tim Wintle
On Thu, 2009-04-16 at 21:44 -0700, Adam Olsen wrote: > The Wayback Machine has 150 billion pages, so 2**37. Google's index > is a bit larger at over a trillion pages, so 2**40. A little closer > than I'd like, but that's still 56294995000 to 1 odds of having > *any* collisions between *any* o

Re: binary file compare...

2009-04-17 Thread Nigel Rantor
Adam Olsen wrote: On Apr 16, 11:15 am, SpreadTooThin wrote: And yes he is right CRCs hashing all have a probability of saying that the files are identical when in fact they are not. Here's the bottom line. It is either: A) Several hundred years of mathematics and cryptography are wrong. The

Re: binary file compare...

2009-04-17 Thread Nigel Rantor
Adam Olsen wrote: On Apr 16, 4:27 pm, "Rhodri James" wrote: On Thu, 16 Apr 2009 10:44:06 +0100, Adam Olsen wrote: On Apr 16, 3:16 am, Nigel Rantor wrote: Okay, before I tell you about the empirical, real-world evidence I have could you please accept that hashes collide and that no matter ho

Re: binary file compare...

2009-04-16 Thread Adam Olsen
On Apr 16, 4:27 pm, "Rhodri James" wrote: > On Thu, 16 Apr 2009 10:44:06 +0100, Adam Olsen wrote: > > On Apr 16, 3:16 am, Nigel Rantor wrote: > >> Okay, before I tell you about the empirical, real-world evidence I have > >> could you please accept that hashes collide and that no matter how many

Re: binary file compare...

2009-04-16 Thread Adam Olsen
On Apr 16, 11:15 am, SpreadTooThin wrote: > And yes he is right CRCs hashing all have a probability of saying that > the files are identical when in fact they are not. Here's the bottom line. It is either: A) Several hundred years of mathematics and cryptography are wrong. The birthday problem

Re: binary file compare...

2009-04-16 Thread Rhodri James
On Thu, 16 Apr 2009 10:44:06 +0100, Adam Olsen wrote: On Apr 16, 3:16 am, Nigel Rantor wrote: Okay, before I tell you about the empirical, real-world evidence I have could you please accept that hashes collide and that no matter how many samples you use the probability of finding two files th

Re: binary file compare...

2009-04-16 Thread Adam Olsen
On Apr 16, 8:59 am, Grant Edwards wrote: > On 2009-04-16, Adam Olsen wrote: > > I'm afraid you will need to back up your claims with real files. > > Although MD5 is a smaller, older hash (128 bits, so you only need > > 2**64 files to find collisions), > > You don't need quite that many to have a

Re: binary file compare...

2009-04-16 Thread SpreadTooThin
On Apr 16, 3:16 am, Nigel Rantor wrote: > Adam Olsen wrote: > > On Apr 15, 12:56 pm, Nigel Rantor wrote: > >> Adam Olsen wrote: > >>> The chance of *accidentally* producing a collision, although > >>> technically possible, is so extraordinarily rare that it's completely > >>> overshadowed by the

Re: binary file compare...

2009-04-16 Thread Grant Edwards
On 2009-04-16, Adam Olsen wrote: > The chance of *accidentally* producing a collision, although > technically possible, is so extraordinarily rare that it's > completely overshadowed by the risk of a hardware or software > failure producing an incorrect result. Not when

Re: binary file compare...

2009-04-16 Thread Nigel Rantor
Adam Olsen wrote: On Apr 16, 3:16 am, Nigel Rantor wrote: Adam Olsen wrote: On Apr 15, 12:56 pm, Nigel Rantor wrote: Adam Olsen wrote: The chance of *accidentally* producing a collision, although technically possible, is so extraordinarily rare that it's completely overshadowed by the risk

Re: binary file compare...

2009-04-16 Thread Adam Olsen
On Apr 16, 3:16 am, Nigel Rantor wrote: > Adam Olsen wrote: > > On Apr 15, 12:56 pm, Nigel Rantor wrote: > >> Adam Olsen wrote: > >>> The chance of *accidentally* producing a collision, although > >>> technically possible, is so extraordinarily rare that it's completely > >>> overshadowed by the

Re: binary file compare...

2009-04-16 Thread Nigel Rantor
Adam Olsen wrote: On Apr 15, 12:56 pm, Nigel Rantor wrote: Adam Olsen wrote: The chance of *accidentally* producing a collision, although technically possible, is so extraordinarily rare that it's completely overshadowed by the risk of a hardware or software failure producing an incorrect resu

Re: binary file compare...

2009-04-16 Thread Adam Olsen
On Apr 15, 12:56 pm, Nigel Rantor wrote: > Adam Olsen wrote: > > The chance of *accidentally* producing a collision, although > > technically possible, is so extraordinarily rare that it's completely > > overshadowed by the risk of a hardware or software failure producing > > an incorrect result.

Re: binary file compare...

2009-04-15 Thread Nigel Rantor
Adam Olsen wrote: The chance of *accidentally* producing a collision, although technically possible, is so extraordinarily rare that it's completely overshadowed by the risk of a hardware or software failure producing an incorrect result. Not when you're using them to compare lots of files. Tr

Re: binary file compare...

2009-04-15 Thread Adam Olsen
On Apr 15, 11:04 am, Nigel Rantor wrote: > The fact that two md5 hashes are equal does not mean that the sources > they were generated from are equal. To do that you must still perform a > byte-by-byte comparison which is much less work for the processor than > generating an md5 or sha hash. > > I

Re: binary file compare...

2009-04-15 Thread SpreadTooThin
On Apr 15, 8:04 am, Grant Edwards wrote: > On 2009-04-15, Martin wrote: > > > > > Hi, > > > On Mon, Apr 13, 2009 at 10:03 PM, Grant Edwards wrote: > >> On 2009-04-13, SpreadTooThin wrote: > > >>> I want to compare two binary files and see if they are the same. > >>> I see the filecmp.cmp functi

Re: binary file compare...

2009-04-15 Thread Nigel Rantor
Grant Edwards wrote: We all rail against premature optimization, but using a checksum instead of a direct comparison is premature unoptimization. ;) And more than that, will provide false positives for some inputs. So, basically it's a worse-than-useless approach for determining if two files

Re: binary file compare...

2009-04-15 Thread Nigel Rantor
Martin wrote: On Wed, Apr 15, 2009 at 11:03 AM, Steven D'Aprano wrote: The checksum does look at every byte in each file. Checksumming isn't a way to avoid looking at each byte of the two files, it is a way of mapping all the bytes to a single number. My understanding of the original question

Re: binary file compare...

2009-04-15 Thread Grant Edwards
On 2009-04-15, Martin wrote: > On Wed, Apr 15, 2009 at 11:03 AM, Steven D'Aprano > I'd still say rather burn CPU cycles than development hours (if I got > the question right), _Hours_? Calling the file compare module takes _one_line_of_code_. Implementing a file compare

Re: binary file compare...

2009-04-15 Thread Grant Edwards
On 2009-04-15, Martin wrote: > Hi, > > On Mon, Apr 13, 2009 at 10:03 PM, Grant Edwards wrote: >> On 2009-04-13, SpreadTooThin wrote: >> >>> I want to compare two binary files and see if they are the same. >>> I see the filecmp.cmp function but I don't get a warm fuzzy feeling >>> that it is doin

Re: binary file compare...

2009-04-15 Thread Martin
On Wed, Apr 15, 2009 at 11:03 AM, Steven D'Aprano wrote: > The checksum does look at every byte in each file. Checksumming isn't a > way to avoid looking at each byte of the two files, it is a way of > mapping all the bytes to a single number. My understanding of the original question was a way t

Re: binary file compare...

2009-04-15 Thread Steven D'Aprano
On Wed, 15 Apr 2009 07:54:20 +0200, Martin wrote: >> Perhaps I'm being dim, but how else are you going to decide if two >> files are the same unless you compare the bytes in the files? > > I'd say checksums, just about every download relies on checksums to > verify you do have indeed the same fil

Re: binary file compare...

2009-04-14 Thread Martin
Hi, On Mon, Apr 13, 2009 at 10:03 PM, Grant Edwards wrote: > On 2009-04-13, SpreadTooThin wrote: > >> I want to compare two binary files and see if they are the same. >> I see the filecmp.cmp function but I don't get a warm fuzzy feeling >> that it is doing a byte by byte comparison of two files

Re: binary file compare...

2009-04-14 Thread Adam Olsen
On Apr 13, 8:39 pm, Grant Edwards wrote: > On 2009-04-13, Peter Otten <__pete...@web.de> wrote: > > > But there's a cache. A change of file contents may go > > undetected as long as the file stats don't change: > > Good point.  You can fool it if you force the stats to their > old values after you

binary file compare...

2009-04-14 Thread SpreadTooThin
I want to compare two binary files and see if they are the same. I see the filecmp.cmp function but I don't get a warm fuzzy feeling that it is doing a byte by byte comparison of two files to see if they are they same. What should I be using if not filecmp.cmp? -- http://mail.python.org/mailman/li

Re: binary file compare...

2009-04-13 Thread Grant Edwards
On 2009-04-13, Peter Otten <__pete...@web.de> wrote: > But there's a cache. A change of file contents may go > undetected as long as the file stats don't change: Good point. You can fool it if you force the stats to their old values after you modify a file and you don't clear the cache. -- Gra

Re: binary file compare...

2009-04-13 Thread Dave Angel
SpreadTooThin wrote: On Apr 13, 2:37 pm, Grant Edwards wrote: On 2009-04-13, Grant Edwards wrote: On 2009-04-13, SpreadTooThin wrote: I want to compare two binary files and see if they are the same. I see the filecmp.cmp function but I don't get a warm fuzzy feeling that i

Re: binary file compare...

2009-04-13 Thread Steven D'Aprano
On Mon, 13 Apr 2009 15:03:32 -0500, Grant Edwards wrote: > On 2009-04-13, SpreadTooThin wrote: > >> I want to compare two binary files and see if they are the same. I see >> the filecmp.cmp function but I don't get a warm fuzzy feeling that it >> is doing a byte by byte comparison of two files t

Re: binary file compare...

2009-04-13 Thread Peter Otten
Grant Edwards wrote: > On 2009-04-13, Grant Edwards wrote: >> On 2009-04-13, SpreadTooThin wrote: >> >>> I want to compare two binary files and see if they are the same. >>> I see the filecmp.cmp function but I don't get a warm fuzzy feeling >>> that it is doing a byte by byte comparison of two

Re: binary file compare...

2009-04-13 Thread SpreadTooThin
On Apr 13, 2:37 pm, Grant Edwards wrote: > On 2009-04-13, Grant Edwards wrote: > > > > > On 2009-04-13, SpreadTooThin wrote: > > >> I want to compare two binary files and see if they are the same. > >> I see the filecmp.cmp function but I don't get a warm fuzzy feeling > >> that it is doing a by

Re: binary file compare...

2009-04-13 Thread Grant Edwards
On 2009-04-13, Grant Edwards wrote: > On 2009-04-13, SpreadTooThin wrote: > >> I want to compare two binary files and see if they are the same. >> I see the filecmp.cmp function but I don't get a warm fuzzy feeling >> that it is doing a byte by byte comparison of two files to see if they >> are t

Re: binary file compare...

2009-04-13 Thread SpreadTooThin
On Apr 13, 2:03 pm, Grant Edwards wrote: > On 2009-04-13, SpreadTooThin wrote: > > > I want to compare two binary files and see if they are the same. > > I see the filecmp.cmp function but I don't get a warm fuzzy feeling > > that it is doing a byte by byte comparison of two files to see if they

Re: binary file compare...

2009-04-13 Thread Grant Edwards
On 2009-04-13, SpreadTooThin wrote: > I want to compare two binary files and see if they are the same. > I see the filecmp.cmp function but I don't get a warm fuzzy feeling > that it is doing a byte by byte comparison of two files to see if they > are they same. Perhaps I'm being dim, but how el

Re: binary file compare...

2009-04-13 Thread SpreadTooThin
On Apr 13, 2:00 pm, Przemyslaw Kaminski wrote: > SpreadTooThin wrote: > > I want to compare two binary files and see if they are the same. > > I see the filecmp.cmp function but I don't get a warm fuzzy feeling > > that it is doing a byte by byte comparison of two files to see if they > > are they

Re: binary file compare...

2009-04-13 Thread Przemyslaw Kaminski
SpreadTooThin wrote: > I want to compare two binary files and see if they are the same. > I see the filecmp.cmp function but I don't get a warm fuzzy feeling > that it is doing a byte by byte comparison of two files to see if they > are they same. > > What should I be using if not filecmp.cmp? W

Re: File Compare with difflib.context_diff

2009-03-20 Thread JanC
JohnV wrote: > I have a txt file that gets appended with data over a time event. The > data comes from an RFID reader and is dumped to the file by the RFID > software. I want to poll that file several times over the time period > of the event to capture the current data in the RFID reader. > > W

Re: File Compare with difflib.context_diff

2009-03-19 Thread JohnV
Here is the latest version of the code: currentdata_file = r"C:\Users\Owner\Desktop\newdata.txt" # the latest download from the clock lastdata_file = r"C:\Users\Owner\Desktop\mydata.txt" # the prior download from the clock output_file = r"C:\Users\Owner\Desktop\out.txt" # will hold delta clock dat

Re: File Compare with difflib.context_diff

2009-03-18 Thread JohnV
The below code does the trick with one small problem left to be solved import shutil import string currentdata_file = r"C:\Users\Owner\Desktop\newdata.txt" # the current download from the clock lastdata_file = r"C:\Users\Owner\Desktop\mydata.txt" # the prior download from the clock output_file =

Re: File Compare with difflib.context_diff

2009-03-18 Thread Gabriel Genellina
En Wed, 18 Mar 2009 21:02:42 -0200, Emile van Sebille escribió: JohnV wrote: > What I want to do is compare the old data (lets day it is saved to a file called 'lastdata.txt') with the new data (lets day it is saved to a file called 'currentdata.txt') and save the new appended data to a va

Re: File Compare with difflib.context_diff

2009-03-18 Thread Emile van Sebille
JohnV wrote: > What I want to do is compare the old data (lets day it is saved to a file called 'lastdata.txt') with the new data (lets day it is saved to a file called 'currentdata.txt') and save the new appended data to a variable You may get away with something like: (untested) newdata=op

Re: File Compare with difflib.context_diff

2009-03-18 Thread JohnV
Maybe something like this will work though I am not sure of my quotes and what to import import shutil f = open(r'C:\Users\Owner\Desktop\mydata.txt', 'r') read_data1 = f.read() f.close() shutil.copy('C:\Users\Owner\Desktop\newdata.txt', 'C:\Users\Owner \Desktop\out.txt') file = open(r'C:\Users\O

Re: File Compare with difflib.context_diff

2009-03-18 Thread Chris Rebert
On Wed, Mar 18, 2009 at 2:30 PM, JohnV wrote: > I have a txt file that gets appended with data over a time event.  The > data comes from an RFID reader and is dumped to the file by the RFID > software.  I want to poll that file several times over the time period > of the event to capture the curre

File Compare with difflib.context_diff

2009-03-18 Thread JohnV
I have a txt file that gets appended with data over a time event. The data comes from an RFID reader and is dumped to the file by the RFID software. I want to poll that file several times over the time period of the event to capture the current data in the RFID reader. When I read the data I wan

Re: File compare

2005-10-14 Thread PyPK
but what if case 1: no.of keys in f1 > f2 and case2: no.of keys in f1 < f2. Should'nt we get 1.1 if case 1 and 0.9 if case 2?? it errors of with a keyerror.? -- http://mail.python.org/mailman/listinfo/python-list

Re: File compare

2005-10-14 Thread Magnus Lycka
PyPK wrote: > I have two files > file1 in format > > 'AA' 1 T T > 'AB' 1 T F > > file2 same as file1 > > 'AA' 1 T T > 'AB' 1 T T > > Also the compare should be based on id. So it should look for line > starting with id 'AA' (for example) and then match the line so if in > second case. S

Re: File compare

2005-10-12 Thread PyPK
Not for homework. But anyway thanks much... -- http://mail.python.org/mailman/listinfo/python-list

Re: File compare

2005-10-12 Thread Larry Bates
Sounds a little like "homework", but I'll help you out. There are lots of ways, but this works. import sys class fobject: def __init__(self, inputfilename): try: fp=open(inputfilename, 'r') self.lines=fp.readlines() except IOError: print "Una

Re: File compare

2005-10-12 Thread PyPK
Note that the code i wrote wont do the compare based on id which i am looking for..it just does a direct file to file compare.. -- http://mail.python.org/mailman/listinfo/python-list

File compare

2005-10-12 Thread PyPK
I have two files file1 in format 'AA' 1 T T 'AB' 1 T F file2 same as file1 'AA' 1 T T 'AB' 1 T T Also the compare should be based on id. So it should look for line starting with id 'AA' (for example) and then match the line so if in second case. so this is what I am looking for: 1. read