Re: 24 Million entries and I need to what?

2013-12-28 Thread Levi Pearson
On Fri, Dec 27, 2013 at 11:12 PM, Joshua Marsh jos...@themarshians.com wrote: On Fri, Dec 27, 2013 at 7:54 PM, Levi Pearson levipear...@gmail.com wrote: Is helping someone further along the wrong path really helpful? It sounded to me like he was experimenting, which is great. I don't feel

Re: 24 Million entries and I need to what?

2013-12-28 Thread S. Dale Morrey
Yep the experiment itself is a bad idea. Still if you had 25 million distinct entries each 32 bytes long and you needed the fastest lookup and comparison time possible, what would you use? FYI putting 24 million distinct entries into a single directory even /tmp completely corrupted my file

Re: 24 Million entries and I need to what?

2013-12-28 Thread Levi Pearson
On Sat, Dec 28, 2013 at 1:59 AM, S. Dale Morrey sdalemor...@gmail.com wrote: Yep the experiment itself is a bad idea. Still if you had 25 million distinct entries each 32 bytes long and you needed the fastest lookup and comparison time possible, what would you use? Well, that would depend on

Re: 24 Million entries and I need to what?

2013-12-28 Thread S. Dale Morrey
This won't make sense unless you have some background with distributed redundant monitoring setups, but... Years ago I was working for a company and I wrote an alerting system that did a url shortening trick so alerts could be sent over SMS. The server needed to be simplistic and resilient with

Re: 24 Million entries and I need to what?

2013-12-28 Thread Levi Pearson
On Sat, Dec 28, 2013 at 4:52 AM, S. Dale Morrey sdalemor...@gmail.com wrote: This won't make sense unless you have some background with distributed redundant monitoring setups, but... Years ago I was working for a company and I wrote an alerting system that did a url shortening trick so

Re: 24 Million entries and I need to what?

2013-12-28 Thread S. Dale Morrey
I wrote an import script to get all 24 million entries it into an H2 database as well a MySQL DB, both local, both in server mode. Checking 1000 randomly generated hashes takes 129ms on H2. (I couldn't get a figure for a single lookup, it returned too quickly). By comparison MySQL takes ~ 50ms per

Re: 24 Million entries and I need to what?

2013-12-28 Thread Steve Meyers
On 12/27/13 8:19 PM, S. Dale Morrey wrote: Well Levi you would be quite correct if the intent were to actually seek a collision across 256 bits of space. That is not what I'm going for here. In my mind detecting a collision would be evidence of a flawed implementation of the hashing algorithm

Re: 24 Million entries and I need to what?

2013-12-28 Thread Nicholas Stewart
I've done similar testing (looking for collisions within sha1) with millions of strings and their hashes. I didn't actually expect to find any collisions but I wanted to try it anyway. In the process I realized that ruby's Hash wouldn't work for this project because of memory limitations (if I

Re: 24 Million entries and I need to what?

2013-12-28 Thread Andy Bradford
Thus said S. Dale Morrey on Fri, 27 Dec 2013 01:59:04 -0700: I'd like to do this with posix tools, but I'm thinking I may have to write my own app to slurp it up into a table of some sort. A database is a possibility I guess, but the latency seems like it might be higher than some

Re: 24 Million entries and I need to what?

2013-12-28 Thread Andy Bradford
Thus said S. Dale Morrey on Sat, 28 Dec 2013 01:59:50 -0700: Still if you had 25 million distinct entries each 32 bytes long and you needed the fastest lookup and comparison time possible, what would you use? For static data, CDB, hands down: http://cr.yp.to/cdb.html It has a hard limit

Re: 24 Million entries and I need to what?

2013-12-28 Thread Sasha Pachev
Levi is of course right - to check the correctness of implementation of SHA1 we need at the very least to compare the output on some random set of known/trusted/correctly computed input/output pairs. However, the exercise of computing and comparing a lot of hashes within some reasonable time is