[PATCH] split: --chunks option

2009-11-25 Thread Chen Guo
Hi all, This is mostly a step towards multithreaded sort the unix way, but as Padraig mentioned, has its other uses. Parsing and I/O are not my strong suits, so I have a couple of questions: Are there more appropriate functions than open and pread to use here? I usually see wrapper func

Re: [PATCH] split: --chunks option

2009-11-26 Thread Pádraig Brady
Chen Guo wrote: > Hi all, > This is mostly a step towards multithreaded sort the unix way, but as > Padraig mentioned, has its other uses. Thanks again for looking at this. > Parsing and I/O are not my strong suits, so I have a couple of questions: > > Are there more appropriate functio

Fw: [PATCH] split: --chunks option

2009-11-26 Thread Chen Guo
I replied to my own email and forgot to CC the mailing list last night: Hi all, I knew there were bugs still, but I didn't know they were such embarrassing ones. This is what happens when the biggest file you test on is 14 Kb, I guess. Apologies for any inconveniences. One thing I f

Re: [PATCH] split: --chunks option

2009-11-28 Thread Chen Guo
Hi Padraig, > I do think --number is more general than --chunk as it allows you to specify > only 1 number > to get the behaviour described above. Also I notice that FreeBSDs split > recently > got a '-n chunk_count' option, so it would be good to maintain compat with > that > if possible. >

Re: [PATCH] split: --chunks option

2009-11-29 Thread Pádraig Brady
Chen Guo wrote: > Hi Padraig, > >> I do think --number is more general than --chunk as it allows you to specify > >> only 1 number >> to get the behaviour described above. Also I notice that FreeBSDs split >> recently >> got a '-n chunk_count' option, so it would be good to maintain compat with

Re: [PATCH] split: --chunks option

2009-12-14 Thread Chen Guo
Hi all, I'm about to implement everything we discussed. One thing I want to double check is, since we have --number-lines=x/y and --number-bytes=x/y as the long options, obviously we can't just use -n for the short option. So would it be acceptable to do -nb=x/y and -nl=x/y? This seems th

Re: [PATCH] split: --chunks option

2009-12-14 Thread Pádraig Brady
On 14/12/09 21:04, Chen Guo wrote: Hi all, I'm about to implement everything we discussed. One thing I want to double check is, since we have --number-lines=x/y and --number-bytes=x/y as the long options, obviously we can't just use -n for the short option. So would it be acceptable t

Re: [PATCH] split: --chunks option

2009-12-14 Thread Jim Meyering
Pádraig Brady wrote: > On 14/12/09 21:04, Chen Guo wrote: >> Hi all, >> I'm about to implement everything we discussed. One thing I want to >> double check is, since we have --number-lines=x/y and --number-bytes=x/y as >> the long options, obviously we can't just use -n for the short option.

Re: [PATCH] split: --chunks option

2009-12-15 Thread Chen Guo
Hi Jim, > I hope the end-of-term business went well. Pretty well, thanks for asking > It's good to make long option names consistent between tools, > and to avoid long, common prefixes like "--number-". > Have you considered --bytes and --lines, like tail has? Unfortunately split already uses

Re: [PATCH] split: --chunks option

2009-12-15 Thread Jim Meyering
Chen Guo wrote: ... >> It's good to make long option names consistent between tools, >> and to avoid long, common prefixes like "--number-". >> Have you considered --bytes and --lines, like tail has? > Unfortunately split already uses the long options --bytes and --lines. One possibility is to st

Re: [PATCH] split: --chunks option

2009-12-15 Thread Chen Guo
Hi Jim, > One possibility is to stick with the existing long option names, > but let an argument of the form "K/N" evoke the new semantics: > > --bytes=K/N extract the K'th of N portions (byte-oriented) > --lines=K/N extract the K'th of N portions (line-oriented) > While this little split

Re: [PATCH] split: --chunks option

2009-12-15 Thread Jim Meyering
Hi Chen, >> One possibility is to stick with the existing long option names, >> but let an argument of the form "K/N" evoke the new semantics: >> >> --bytes=K/N extract the K'th of N portions (byte-oriented) >> --lines=K/N extract the K'th of N portions (line-oriented) >> > While this little

Re: [PATCH] split: --chunks option

2009-12-15 Thread Chen Guo
>--bytes=/N split the input into N roughly equal portions (byte-oriented) >--lines=/N split the input into N roughly equal portions (line-oriented) > > then, assuming the BSD option works this way, you could document -n N > like this: > >-n Nequivalent to --bytes=/N I thin

Re: [PATCH] split: --chunks option

2009-12-15 Thread Pádraig Brady
On 15/12/09 08:12, Jim Meyering wrote: Chen Guo wrote: ... It's good to make long option names consistent between tools, and to avoid long, common prefixes like "--number-". Have you considered --bytes and --lines, like tail has? Unfortunately split already uses the long options --bytes and -

Re: [PATCH] split: --chunks option

2009-12-15 Thread Chen Guo
Hey guys, Alright as of now everything that we originally talked about has been implemented; tests were done on an 980K ASCII file while limiting buffer size to 2 bytes (will test on binary later). Everything works great. As of now, I've got the following syntax: -bN or --bytes=N is origi

Re: [PATCH] split: --chunks option

2009-12-15 Thread Pádraig Brady
On 15/12/09 22:46, Chen Guo wrote: Hey guys, Alright as of now everything that we originally talked about has been implemented; tests were done on an 980K ASCII file while limiting buffer size to 2 bytes (will test on binary later). Everything works great great stuff. As of now, I've got

Re: [PATCH] split: --chunks option

2009-12-16 Thread Andreas Schwab
Pádraig Brady writes: > For reference bash and ksh support ansi c quoting like $'\0' > so you could specify -t $'\0'. Except that $'\0' is identical to '' in every context. > Also more generally one could do: -t $(printf '\0'), though I wouldn't > depend on those being available, and also passi

Re: [PATCH] split: --chunks option

2009-12-16 Thread Chen Guo
Hi all, > Pádraig Brady writes: > > > For reference bash and ksh support ansi c quoting like $'\0' > > so you could specify -t $'\0'. Since this is only bash and ksh, I'm guessing it'd be best to write our own parsing, to support something like -t\n or -t\0 for shells that don't support ansi C

Re: [PATCH] split: --chunks option

2009-12-20 Thread Chen Guo
Hi guys, Below is the source code portion of the patch. It's ended up a lot bigger than what I thought it would be; split.c has almost doubled in size. Feedback is welcome, especially suggestions wrt parsing or --help output. I've also taken a couple liberties with the -t option... When -

Re: [PATCH] split: --chunks option

2009-12-20 Thread Andreas Schwab
Chen Guo writes: > +/* Parse eol character for -t option. > + TODO: support octal and hex escape sequences? */ The TODO looks obsolete, since you are indeed supporting them. > + case '?': > +eol = '\?'; > +break; I don't think this should be supported, it is only part of

Re: [PATCH] split: --chunks option

2010-01-03 Thread Chen Guo
Hi all, hope everyone had happy holidays. Here's the patch in its entirety. Let me know if anything's not satisfactory. I should note that I went easy on the tests because the other split tests didn't seem all too comprehensive themselves. Please let me knowif I need to be more exhaustive. >From

Re: [PATCH] split: --chunks option

2010-01-03 Thread Pádraig Brady
On 03/01/10 20:37, Chen Guo wrote: Hi all, hope everyone had happy holidays. Here's the patch in its entirety. Let me know if anything's not satisfactory. Thanks for all that. The first thing that hits me is that "round robin" might be better as a parameter rather than an option. I'll review

Re: [PATCH] split: --chunks option

2010-01-06 Thread Pádraig Brady
On 04/01/10 00:54, Pádraig Brady wrote: On 03/01/10 20:37, Chen Guo wrote: Hi all, hope everyone had happy holidays. Here's the patch in its entirety. Let me know if anything's not satisfactory. Thanks for all that. The first thing that hits me is that "round robin" might be better as a param

Re: [PATCH] split: --chunks option

2010-01-07 Thread Chen Guo
Hi Padraig, I was going over your previous e-mail again: > TODO: check why `seq 10 | split -r4` doesn't work This is what I'm getting, which looks like what I'd expect. Do you want to paste what you get, and we can compare? c...@chen-netbook:~$ seq 10 | split -r4 c...@chen-netbook:~$ ls |

Re: [PATCH] split: --chunks option

2010-01-07 Thread Pádraig Brady
On 07/01/10 20:11, Chen Guo wrote: Hi Padraig, I was going over your previous e-mail again: TODO: check why `seq 10 | split -r4` doesn't work This is what I'm getting, which looks like what I'd expect. Do you want to paste what you get, and we can compare? I mustn't have the latest p

Re: [PATCH] split: --chunks option

2010-01-09 Thread Pádraig Brady
On 10/01/10 00:09, Chen Guo wrote: Hi Padraig, I went ahead and corrected the errors you caught, as well as moved over eol_parse() to gl/lib/unescape.c. Calls from ptx, pr, and printf, as you said, probably should be in a separate patch. Also I noticed gl/lib/randperm.h didn't have th

Re: [PATCH] split: --chunks option

2010-02-05 Thread Pádraig Brady
I got a bit of time for the review last night... This was your last interface change for this: -b, --bytes=SIZEput SIZE bytes per output file\n\ + -b, --bytes=/N generate N output files\n\ + -b, --bytes=K/N print Kth of N chunks of file\n\ -C, --line-bytes=SIZE

Re: [PATCH] split: --chunks option

2010-02-05 Thread Jim Meyering
Pádraig Brady wrote: > I got a bit of time for the review last night... > > This was your last interface change for this: > >-b, --bytes=SIZEput SIZE bytes per output file\n\ > + -b, --bytes=/N generate N output files\n\ > + -b, --bytes=K/N print Kth of N chunks of f

Re: [PATCH] split: --chunks option

2010-02-06 Thread Chen Guo
> > $ time yes | head -n1000 | ./split-fwrite -n r/1/1 | wc -l > > 1000 > > > > real0m1.568s > > user0m1.486s > > sys 0m0.072s > > > > $ time yes | head -n1000 | ./split-write -n r/1/1 | wc -l > > 1000 > > > > real0m50.988s > > user0m7.548s > > sys 0m43.250s

bug#7401: [PATCH] split: --chunks option

2010-11-14 Thread Pádraig Brady
On 05/02/10 12:40, Pádraig Brady wrote: > I got a bit of time for the review last night... > > This was your last interface change for this: > >-b, --bytes=SIZEput SIZE bytes per output file\n\ > + -b, --bytes=/N generate N output files\n\ > + -b, --bytes=K/N print

bug#7401: [PATCH] split: --chunks option

2010-11-16 Thread Chen Guo
Holy crap Padraig, thorough :-) > merged lines_chunk_extract() into lines_chunk_split(). > merged lines_rr_extract() into lines_rr(). Great job merging these. I'd felt the previous patch was running on the long side. One suggestion: -From: Chen Guo +From: Padraig Brady I don't feel comfortable

bug#7401: [PATCH] split: --chunks option

2010-11-16 Thread Pádraig Brady
On 16/11/10 19:03, Chen Guo wrote: > Holy crap Padraig, thorough :-) > >> merged lines_chunk_extract() into lines_chunk_split(). >> merged lines_rr_extract() into lines_rr(). > > Great job merging these. I'd felt the previous patch was running on > the long side. > > One suggestion: > -From: Che

bug#7401: [PATCH] split: --chunks option

2010-11-17 Thread Jim Meyering
Pádraig Brady wrote: > On 05/02/10 12:40, Pádraig Brady wrote: >> I got a bit of time for the review last night... ... >> Here is stuff I intend TODO before checking in: >> s/pread()/dd::skip()/ or at least add pread to bootstrap.conf >> fix info docs for reworked interface >> try to refactor du