Re: new coreutil? shuffle - randomize file contents

2005-07-16 Thread Frederik Eaton
Also, each external symbol (function, macro, variable) should have a comment explaining what it does. Currently I'm at a bit of a loss trying to figure out what things do, so my comments will be limited. +#ifndef _CHECKSUM_H +#define _CHECKSUM_H 1 + +#include sys/types.h +#include

Re: new coreutil? shuffle - randomize file contents

2005-07-16 Thread David Feuer
On Sat, Jul 16, 2005 at 07:01:53AM -0700, Frederik Eaton wrote: If it's the pseudorandomness, I think mentioning that is redundant, and the same thing I said about not wanting implementation in the API applies - a good pseudorandom number generator should be externally indistinguishable from

Re: new coreutil? shuffle - randomize file contents

2005-07-15 Thread Jim Meyering
Frederik Eaton [EMAIL PROTECTED] wrote: Attached is a second patch, which contains a ChangeLog entry and some formatting changes as requested by Jim. Can you update your patch to be relative to coreutils-CVS, http://savannah.gnu.org/cvs/?group=coreutils rather than to the aging 5.2.1? Also,

Re: new coreutil? shuffle - randomize file contents

2005-07-15 Thread Paul Eggert
Thanks for working on this. You've gotten further than anyone else has! Some quick comments: Frederik Eaton [EMAIL PROTECTED] writes: Is there a script for making a patch with all the right files excluded by the way? Not yet. That's on the list of things to do. The fix will be to remove

Re: new coreutil? shuffle - randomize file contents

2005-07-15 Thread Frederik Eaton
Hi, Attached is a third patch. Is there a script for making a patch with all the right files excluded by the way? cvs diff produces a huge amount of unrelated output because of files that are both in the repository and touched by configure, and it doesn't list new files. And diff doesn't seem to

Re: new coreutil? shuffle - randomize file contents

2005-07-14 Thread Frederik Eaton
Attached is a second patch, which contains a ChangeLog entry and some formatting changes as requested by Jim. On Tue, Jun 07, 2005 at 08:47:14AM +0200, Jim Meyering wrote: Frederik Eaton [EMAIL PROTECTED] wrote: Here is a preliminary patch for basic shuffling functionality in 'sort', with

Re: new coreutil? shuffle - randomize file contents

2005-06-07 Thread Jim Meyering
Frederik Eaton [EMAIL PROTECTED] wrote: Here is a preliminary patch for basic shuffling functionality in 'sort', with same-keys-sort-together behavior. It adds two options: -R to compare based on a hash of key, and --seed to specify salt for the hash. If --seed is not given then the default is

Re: new coreutil? shuffle - randomize file contents

2005-06-06 Thread Frederik Eaton
Here is a preliminary patch for basic shuffling functionality in 'sort', with same-keys-sort-together behavior. It adds two options: -R to compare based on a hash of key, and --seed to specify salt for the hash. If --seed is not given then the default is to read from /dev/random or /dev/urandom.

Re: new coreutil? shuffle - randomize file contents

2005-06-05 Thread Frederik Eaton
So, the prototype runs a little slower than I had expected - it's currently using md5 hashes, I could also look into CRC or something faster (but less secure, for those concerned). Anyway here is a sample: $ time ./sort -R /usr/share/dict/words /dev/null ./sort -R /usr/share/dict/words

Re: new coreutil? shuffle - randomize file contents

2005-06-05 Thread Frederik Eaton
$ print -l g f e d c b a | ./sort -R | md5sum dda0a6660319917afd6ed021f27fb452 - $ print -l a b c d e f g | ./sort -R | md5sum dda0a6660319917afd6ed021f27fb452 - By the way, this wouldn't actually be the default behavior, you'd have to specify an explicit seed and have it be the same each

Re: new coreutil? shuffle - randomize file contents

2005-06-04 Thread Davis Houlton
On Saturday 04 June 2005 01:11, Philip Rowlands wrote: Extend sort. In extending sort, would the O(n) shuffle algorithm be implemented? Or would the existing O(n log n) mergesort logic be used via keys? Though I am not a sort maintainer, and am probably the least qualified to pass assumption

Re: new coreutil? shuffle - randomize file contents

2005-06-04 Thread Alfred M. Szmidt
If you intened on making shuffle part of coreutils someday, then you could use the GNU womb repository on Savannah. You'd need to get proper papers form [EMAIL PROTECTED] though, and if you add code that was written by someone else we'd need papers from them too. But this would make putting

Re: new coreutil? shuffle - randomize file contents

2005-06-04 Thread Paul Eggert
Frederik Eaton [EMAIL PROTECTED] writes: How about this: Put an upper limit on the number of samples that your adversary will be able to try before the earth blows up. But that's not how adversarial attacks work. They work by exploiting flaws in your pseudorandom number generator. Thus, for

Re: new coreutil? shuffle - randomize file contents

2005-06-04 Thread David Feuer
On Sat, Jun 04, 2005 at 04:29:50PM -0700, Paul Eggert wrote: were truly random. The application to high-stakes poker games should be obvious. snip (I agree that all this is overkill for non-adversarial applications.) Aside from shuffling cards (which should rarely if ever involve more than

Re: new coreutil? shuffle - randomize file contents

2005-06-04 Thread Frederik Eaton
On Sat, Jun 04, 2005 at 04:29:50PM -0700, Paul Eggert wrote: Frederik Eaton [EMAIL PROTECTED] writes: How about this: Put an upper limit on the number of samples that your adversary will be able to try before the earth blows up. But that's not how adversarial attacks work. They work by

Re: new coreutil? shuffle - randomize file contents

2005-06-03 Thread Paul Eggert
Philip Rowlands [EMAIL PROTECTED] writes: I'm still interested to read what Paul considers to be the difficulties of such an implementation? Suppose you're randomizing an input file of 10 million lines. And suppose you want to approximate a truly random key by using a 128-bit random key for

Re: new coreutil? shuffle - randomize file contents

2005-06-03 Thread Frederik Eaton
On Thu, Jun 02, 2005 at 11:31:26PM -0700, Paul Eggert wrote: Philip Rowlands [EMAIL PROTECTED] writes: I'm still interested to read what Paul considers to be the difficulties of such an implementation? Suppose you're randomizing an input file of 10 million lines. And suppose you want

Re: new coreutil? shuffle - randomize file contents

2005-06-02 Thread James Youngman
On Wed, Jun 01, 2005 at 06:52:08PM -0700, Frederik Eaton wrote: So, what is the current state of things? Who is in charge of accepting patches? The coreutils maintainers, who are all subscribed to this list I think. So, you're asking in the right place. Are we decided that a 'shuffle'

Re: new coreutil? shuffle - randomize file contents

2005-06-02 Thread Philip Rowlands
On Thu, 2 Jun 2005, James Youngman wrote: I think the consensus is that the functionality belongs in sort. Beyond that things are a bit less clear. However, Paul put forward a proposed usage which adapts the current -k option (see

Re: new coreutil? shuffle - randomize file contents

2005-06-02 Thread Jim Meyering
Frederik Eaton [EMAIL PROTECTED] wrote: So, what is the current state of things? Who is in charge of accepting patches? Are we decided that a 'shuffle' command but no 'sort -R' facility would be best, or that it would be good to have both, or is it still in question whether either would be

Re: new coreutil? shuffle - randomize file contents

2005-06-02 Thread Frederik Eaton
James I think the consensus is that the functionality belongs in sort. James Beyond that things are a bit less clear. However, Paul put forward a James proposed usage which adapts the current -k option (see James http://lists.gnu.org/archive/html/bug-coreutils/2005-05/msg00179.html). James Nobody

Re: new coreutil? shuffle - randomize file contents

2005-06-02 Thread Philip Rowlands
On Thu, 2 Jun 2005, Frederik Eaton wrote: Phil Is it that the app must guarantee all lines of a Phil non-seekable stdin must have an equal chance of any sort order? See my comment to James above. I think one need not make this guarantee, since only a tiny fraction of possible sort orders will be

Re: new coreutil? shuffle - randomize file contents

2005-06-02 Thread David Feuer
There seems to be some sloppy thinking regarding efficiency and uniform randomness. Regarding uniform randomness, the infamous Oleg of comp.lang.{scheme,functional} writes: Furthermore, if we have a sequence of N elements and associate with each element a key -- a random number uniformly

Re: new coreutil? shuffle - randomize file contents

2005-06-02 Thread David Feuer
There seems to be some sloppy thinking regarding efficiency and uniform randomness. Regarding uniform randomness, the infamous Oleg of comp.lang.{scheme,functional} writes: Furthermore, if we have a sequence of N elements and associate with each element a key -- a random number uniformly

Re: new coreutil? shuffle - randomize file contents

2005-06-02 Thread Frederik Eaton
Phil Is it that the app must guarantee all lines of a Phil non-seekable stdin must have an equal chance of any sort order? See my comment to James above. I think one need not make this guarantee, since only a tiny fraction of possible sort orders will be able to be tried by the user.

Re: new coreutil? shuffle - randomize file contents

2005-06-01 Thread Frederik Eaton
So, what is the current state of things? Who is in charge of accepting patches? Are we decided that a 'shuffle' command but no 'sort -R' facility would be best, or that it would be good to have both, or is it still in question whether either would be accepted? Frederik --

Re: new coreutil? shuffle - randomize file contents

2005-05-31 Thread Davis Houlton
On Monday 30 May 2005 23:02, Frederik Eaton wrote: I hope that you aren't proposing an algorithm which is similar to card-shuffling. That would be exactly like merge-sorting on a key hash - i.e. no more efficient. Agreed! The algorithm implemented is a slight variation on Knuth's shuffle

Re: new coreutil? shuffle - randomize file contents

2005-05-30 Thread Frederik Eaton
On Wed, May 25, 2005 at 10:58:41AM +0100, James Youngman wrote: On Tue, May 24, 2005 at 09:55:35AM -0700, Paul Eggert wrote: That way, you could use, e.g.: sort -k 2,2 -k R which would mean sort by the 2nd field, but if there are ties then sort the ties randomly. sort -R would

Re: new coreutil? shuffle - randomize file contents

2005-05-30 Thread Frederik Eaton
I'm not following exactly - in part I think it is premature to discuss implementation details now. And as for the idea to put shuffle functionality in a separate command, this and other issues were discussed at length in the previous thread which starts here:

Re: new coreutil? shuffle - randomize file contents

2005-05-30 Thread Davis Houlton
Hi Frederik! I guess we're both a little confused :) My question is why would I sort AND shuffle in the same command? Are we talking sort the whole data set and shuffle a subset? I guess I'm having a hard time thinking why I would randomize via key--not saying that there aren't reasons, I'm

Re: new coreutil? shuffle - randomize file contents

2005-05-30 Thread Frederik Eaton
On Mon, May 30, 2005 at 09:25:45AM +, Davis Houlton wrote: Hi Frederik! I guess we're both a little confused :) My question is why would I sort AND shuffle in the same command? Are we talking sort the whole data set and shuffle a subset? I guess I'm having a hard time thinking why I would

Re: new coreutil? shuffle - randomize file contents

2005-05-25 Thread James Youngman
On Tue, May 24, 2005 at 09:55:35AM -0700, Paul Eggert wrote: That way, you could use, e.g.: sort -k 2,2 -k R which would mean sort by the 2nd field, but if there are ties then sort the ties randomly. sort -R would be short for sort -k R. Perhaps this approach avoids the problems that

Re: new coreutil? shuffle - randomize file contents

2005-05-25 Thread Frederik Eaton
On Tue, May 24, 2005 at 11:25:48AM +0100, [EMAIL PROTECTED] wrote: James Youngman wrote: Davis Houlton writes:- I recently had to write a shuffle utility for a personal project and was wondering if it would make a canidate for the coreutils suite. It seems like the kind of utility the

Re: new coreutil? shuffle - randomize file contents

2005-05-24 Thread James Youngman
On Mon, May 23, 2005 at 08:02:19PM +, Davis Houlton wrote: On Monday 23 May 2005 16:35, you wrote: So, I think that shuffle is a good idea. Great! As I wasn't sure if this was a good idea or not, right now the functionality is quite minimal. I agree that it needs to be exapnded, and

Re: new coreutil? shuffle - randomize file contents

2005-05-24 Thread P
James Youngman wrote: Davis Houlton writes:- I recently had to write a shuffle utility for a personal project and was wondering if it would make a canidate for the coreutils suite. It seems like the kind of utility the toolbox could use (maybe under section 3. Output of entire files).

RE: new coreutil? shuffle - randomize file contents

2005-05-24 Thread Lemley James - jlemle
I'm just a lurker so my opinion doesn't count. for much. Certainly I don't expect everyone to be a programmer in order to be able to shuffle their playlist, but perhaps an example needs to be added to the sort man-page stating how easy is to accomplish with tools that are likely already

Re: new coreutil? shuffle - randomize file contents

2005-05-24 Thread Bob Proulx
Lemley James - jlemle wrote: Certainly I don't expect everyone to be a programmer in order to be able to shuffle their playlist, but perhaps an example needs to be added to the sort man-page stating how easy is to accomplish with tools that are likely already installed on your system

Re: new coreutil? shuffle - randomize file contents

2005-05-24 Thread Paul Eggert
[EMAIL PROTECTED] writes: Logically the only difference from sort is the low level ordering algorithm. so I vote for and extra arg to sort: --sort=random. More generally, sort could pretend that every line had an extra field called R whose contents are random. That way, you could use, e.g.:

Re: new coreutil? shuffle - randomize file contents

2005-05-24 Thread Davis Houlton
On Tuesday 24 May 2005 15:33, Frederik Eaton wrote: reason to expand the functionality of 'sort'. But in my opinion a more important reason is that the set of commands that one runs on a unix system comprise a language, which is a very important language from a user's perspective, and if

Re: new coreutil? shuffle - randomize file contents

2005-05-23 Thread James Youngman
Davis Houlton writes:- I recently had to write a shuffle utility for a personal project and was wondering if it would make a canidate for the coreutils suite. It seems like the kind of utility the toolbox could use (maybe under section 3. Output of entire files). This behaviour was proposed