Re: Patch: --range switch implemented

2001-11-19 Thread Hrvoje Niksic

[ Note for Wget list readers: this discusses the `--range' option
  submitted to the patch list. ]

Herold Heiko <[EMAIL PROTECTED]> writes:

> Also, possibly I missed something, does the download start at byte 0
> (like most programmers ecc. would expect) or at byte 1 (like most users
> would expect) ? In other words, to download the first half of a 10 byte
> file --range 0-4 or --range 1-5 ?

It's 0-4, as documented by Alex.

But you've still raised an interesting question.  I would actually be
happiest with using a closed-open interval for the range, but I didn't
want to propose that because it would be confusing to most
non-programmer (and some programmer) users.  The closed-open interval
means that --range=0-5 retrieves the first five bytes, i.e. bytes
numbered 0-4, or 1-5, depending on how you count.

Although closed-open intervals are confusing at first, they have some
very nice properties:

* You can get the interval size simply by subtracting the endpoints.

* You can easily specify touching but non-overlapping intervals.  For
  example, if you wanted to download the file in 1k chunks, you could
  do this:

  --range=0-1024
  --range=1024-2048
  --range=2048-3072
  ...

  Or, even better:

  --range=0-1k
  --range=1k-2k
  --range=2k-3k
  ...

  There is no way to do the latter with closed-closed endpoints, where
  you have to add and subtract one at a number of places for things to
  work.

It has been said (and I whole-heartedly agre) that closed-closed
intervals are more natural in 1-based counting, whereas closed-open
intervals are more fit for 0-based counting.  I would really like Wget
to use 0-based closed-open interval specification, but I'm still
afraid the users woul have problems understanding it, and that's why I
didn't propose it.

I would appreciate other people's comments on this.



RE: Patch: --range switch implemented

2001-11-19 Thread Herold Heiko

>From: Hrvoje Niksic [mailto:[EMAIL PROTECTED]]
>
>[ Note for Wget list readers: this discusses the `--range' option
>  submitted to the patch list. ]
>
>Herold Heiko <[EMAIL PROTECTED]> writes:
>
>> Also, possibly I missed something, does the download start at byte 0



>But you've still raised an interesting question.  I would actually be
>happiest with using a closed-open interval for the range, but I didn't

>  --range=0-1k
>  --range=1k-2k
>  --range=2k-3k
>  ...
>
>  There is no way to do the latter with closed-closed endpoints, where
>  you have to add and subtract one at a number of places for things to
>  work.
>
>It has been said (and I whole-heartedly agre) that closed-closed
>intervals are more natural in 1-based counting, whereas closed-open
>intervals are more fit for 0-based counting.  I would really like Wget
>to use 0-based closed-open interval specification, but I'm still
>afraid the users woul have problems understanding it, and that's why I
>didn't propose it.
>
>I would appreciate other people's comments on this.

IMHO.

This would render easy the whole thing for
a) people who don't read the manual and are lucky enough to do the
correct thing for wrong reasons
b) people who do read the manual, or at least check it out if something
won't work like expected.

It would render things more difficult for people who think they know
what they are doing, don't check the manuals and complain wildly if
things won't work, in other words lazy sysadmins and programmers ;-)

Personally I'd be happy either way, but you'll never be able to make
happy everybody. Choose what you prefer, document away with good
examples, and put something on the wget --help page (I think the most
used reference source anyway) which makes people *think* and check the
manual (if it seems strange to them). In the style of my previous post,
if you choose a closed-open interval, something like 
NOTE whole file = 0..size !!!
(yeah tripple exclamation marks, sign of a diseased mind - should make
people think fair enough).

Heiko

-- 
-- PREVINET S.p.A.[EMAIL PROTECTED]
-- Via Ferretto, 1ph  x39-041-5907073
-- I-31021 Mogliano V.to (TV) fax x39-041-5907087
-- ITALY



Re: Patch: --range switch implemented

2001-11-19 Thread Hrvoje Niksic

Herold Heiko <[EMAIL PROTECTED]> writes:

> Personally I'd be happy either way, but you'll never be able to make
> happy everybody. Choose what you prefer

I'd love to choose what I prefer, but I'd like to avoid my wild
preferences ruining it for everyone else.  :-)  Thanks for the
support, though.

> In the style of my previous post, if you choose a closed-open
> interval, something like NOTE whole file = 0..size !!!  (yeah
> tripple exclamation marks, sign of a diseased mind - should make
> people think fair enough).

Actually, ".." sounds more like closed-closed to me.  :-)  (That must
be Pascal childhood rearing its ugly head.)  How about supporting
both?  For example:

1.
--range=1..size # wimps, or:
--range=0..size-1   # slightly less wimpy, but less "consistent"

2.
--range=0-size  # real (wo)men

3.
--range=0+size  # both

Or, to pick another example, say you want to download the second
kilobyte of a file:

--range=1025..2048
--range=1024..2047

--range=1024-2048   # my preferred version

--range=1024+1024   # also cool



Re: Patch: --range switch implemented

2001-11-19 Thread Hrvoje Niksic

Herold Heiko <[EMAIL PROTECTED]> writes:

> However, of the top of my head I can't remember many occasions where
> 0-n means closed-open

There are.  (And note that it's n-m in the general case, not just
0-n.)  Off the top of my head, the Java string subscripts, Lisp
array-related functions, Python slices, various Emacs functions, etc.
The Python example is easy to demonstrate:

>>> range(0, 10)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

Also:

>>> a = [0, 1, 2, 3, 4, 5]
>>> a[2:4]
[2, 3]

This makes perfect sense to me, but not everyone would agree.  The
nicest thing about it is that it allows this:

>>> a[0:3] + a[3:]
[0, 1, 2, 3, 4, 5]

I.e. you can construct the original interval by appending the
"touching" subintervals.  This is nice for downloads because it allows
you to download 0-2k, 2k-5k, etc., without the one overlapping byte.

Common Lisp:
[1]> (setq a '(0 1 2 3 4 5))
(0 1 2 3 4 5)
[2]> (subseq a 2 4)
(2 3)

Perl avoids the potential confusion by having its SUBSTR take offset
(0-based) and length, which is clear to everyone.

> while there are at least Pascal and Perl where 0..n

The Pascal reference is to 1..n, not 0..n.  Which is one point you
seem to have missed: IMHO [start, end] makes more sense with intervals
that start with 1, and [start, end) makes more sense with intervals
with start with 0.

> 0..10 #11 bytes including first one, like Perl, Pascal
> 1-10  #10 bytes including first one

1-10 is what I meant by "the Pascal way" because most Pascal programs
use 1-based arrays.  Again, assuming we want to download 16 bytes, the
three options are, in my order of preference:

1 .. 16  # end-closed 1-based, Pascal-like
0 .. 15  # end-closed 0-based, Perl-like
0 .. 16  # end-open 0-based, Python-like

Maybe we should support all 3, but document only one in --help?  That
way most users will not notice the "complexity".  Also, the first
option could well be ignored since 1-based arrays are for wimps.  :-)



Re: Patch: --range switch implemented

2001-11-19 Thread Andre Pang

On Mon, Nov 19, 2001 at 05:07:31PM +0100, Hrvoje Niksic wrote:

> Or, to pick another example, say you want to download the second
> kilobyte of a file:
> 
> --range=1025..2048
> --range=1024..2047

I haven't been following that closely, but how are you going to
tell what the user really wants to do if he gives either of those
two statements?  If you define .. as being closed-closed and
inclusive, I'm confused how 1025..2048 will be interpreted the
same as 1024..2047.  Are we doing implicit kb rounding?


-- 
#ozone/algorithm <[EMAIL PROTECTED]>  - trust.in.love.to.save



Re: Patch: --range switch implemented

2001-11-19 Thread Hrvoje Niksic

Andre Pang <[EMAIL PROTECTED]> writes:

> On Mon, Nov 19, 2001 at 05:07:31PM +0100, Hrvoje Niksic wrote:
> 
>> Or, to pick another example, say you want to download the second
>> kilobyte of a file:
>> 
>> --range=1025..2048
>> --range=1024..2047
> 
> I haven't been following that closely, but how are you going to
> tell what the user really wants to do if he gives either of those
> two statements?

Only one of those statements will be a valid way of downloading the
second kilobyte of a file.  The question is, which one.

The first one assumes the first byte in the file is "1", the second
one assumes it's "0".  Both are inclusive.



Re: Patch: --range switch implemented

2001-11-19 Thread Daniel Stenberg

On Mon, 19 Nov 2001, Hrvoje Niksic wrote:

> >> --range=1025..2048
> >> --range=1024..2047
>
> Only one of those statements will be a valid way of downloading the
> second kilobyte of a file.  The question is, which one.
>
> The first one assumes the first byte in the file is "1", the second one
> assumes it's "0".  Both are inclusive.

I vote for the second alternative. Being (non Python-) programmer, reader of
the RFC2616 and user/author of the curl --range option that uses the HTTP
header syntax...

Then again, both versions could be supported if they just use different
syntaxes.

-- 
  Daniel Stenberg - http://daniel.haxx.se - +46-705-44 31 77
   ech`echo xiun|tr nu oc|sed 'sx\([sx]\)\([xoi]\)xo un\2\1 is xg'`ol




Re: Patch: --range switch implemented

2001-11-19 Thread Hrvoje Niksic

Daniel Stenberg <[EMAIL PROTECTED]> writes:

> Then again, both versions could be supported if they just use
> different syntaxes.

Please note that there is a third version which Andre elided.  We're
deciding for one or more of:

--range=1025..2048
--range=1024..2047

--range=1024..2048   # my preferred version

On the one hand, there's no harm in supporting them all, but there's
enough overengineering in Wget as it is.  I'd like to avoid more.

Compatibility with rfc2616 is a good point, though.  Maybe it's best
to simply stick to 1024-2047 then.



Re: Patch: --range switch implemented

2001-11-19 Thread Andre Pang

On Mon, Nov 19, 2001 at 08:19:08PM +0100, Hrvoje Niksic wrote:

> >> --range=1025..2048
> >> --range=1024..2047
> > 
> > I haven't been following that closely, but how are you going to
> > tell what the user really wants to do if he gives either of those
> > two statements?
> 
> Only one of those statements will be a valid way of downloading the
> second kilobyte of a file.  The question is, which one.

Oh!  That makes it a bit easier :).  I vote for 1025..2048, which
I believe is standard across all functional programming
languages.  The [ and ] also correlate nicely to the mathematical
interpretations.  (Compare [5..10] with (6..9), for instance).

The majority of end-user utilities seem to like counting from 1,
whereas programming languages tend to start counting from 0.  A
quick look at the GNU text utilities (e.g. tail -c) seems use
1-counting rather than 0-counting.

Basically, I think that from an end-user perspective, they're
used to dealing with 1-counting, not 0-counting.


-- 
#ozone/algorithm <[EMAIL PROTECTED]>  - trust.in.love.to.save



Re: Patch: --range switch implemented

2001-11-19 Thread Andre Pang

On Mon, Nov 19, 2001 at 08:33:15PM +0100, Hrvoje Niksic wrote:

> Compatibility with rfc2616 is a good point, though.  Maybe it's best
> to simply stick to 1024-2047 then.

Compatibility with curl is even more important :).  In light of
that, I vote for 1024-2047.  No point having two file retrieval
utilities do something different, just because "it's more
correct".


-- 
#ozone/algorithm <[EMAIL PROTECTED]>  - trust.in.love.to.save



Re: Patch: --range switch implemented

2001-11-19 Thread Hrvoje Niksic

Andre Pang <[EMAIL PROTECTED]> writes:

>> >> --range=1025..2048
>> >> --range=1024..2047
>> > 
>> > I haven't been following that closely, but how are you going to
>> > tell what the user really wants to do if he gives either of those
>> > two statements?
>> 
>> Only one of those statements will be a valid way of downloading the
>> second kilobyte of a file.  The question is, which one.
> 
> Oh!  That makes it a bit easier :).  I vote for 1025..2048, which
> I believe is standard across all functional programming
> languages.  The [ and ] also correlate nicely to the mathematical
> interpretations.  (Compare [5..10] with (6..9), for instance).

This is again a misunderstanding.  Both of the above are inclusive,
they only disagree on whether the first byte is 0 or 1.  You have
elided parts of my posting that explain that.

Note that I never proposed an interval open on both sides, but only on
the right side.  And you deleted that proposal.  Again, the
alternatives are:

1025..2048   starting with one, end-closed
1024..2047   starting with zero, end-closed
1024..2048   starting with zero, end-open

> The majority of end-user utilities seem to like counting from 1,
> whereas programming languages tend to start counting from 0.  A
> quick look at the GNU text utilities (e.g. tail -c) seems use
> 1-counting rather than 0-counting.

That makes sense for counting lines.  Bytes are almost always indexed
from zero.

> Basically, I think that from an end-user perspective, they're used
> to dealing with 1-counting, not 0-counting.

If you're *counting* something, sure.  But we're not implementing `wc
-c' here -- we are referring to specific bytes, or rather to an
interval.  0-based indexing makes much more sense to me.



Re: Patch: --range switch implemented

2001-11-20 Thread Vladi Belperchinov-Shabanski

hi!

  Here is my IMO (in case someone is really interested in:))

  all ranges 0-based,
  support few syntax-es:

  --range=0..1024-- closed-closed
  --range=0-1024 -- closed-open
  --range=1024+2048  -- take 3..4 K's :) i.e. get 2k starting on pos 1024

  (well last one could be like --range=2048@1024 just for fun)

  implementation of all cases is trivial and I cannot see why not having them all?

P! Vladi.

Hrvoje Niksic wrote:
> 
> Herold Heiko <[EMAIL PROTECTED]> writes:
> 
> > However, of the top of my head I can't remember many occasions where
> > 0-n means closed-open
> 
> There are.  (And note that it's n-m in the general case, not just
> 0-n.)  Off the top of my head, the Java string subscripts, Lisp
> array-related functions, Python slices, various Emacs functions, etc.
> The Python example is easy to demonstrate:
> 
> >>> range(0, 10)
> [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
> 
> Also:
> 
> >>> a = [0, 1, 2, 3, 4, 5]
> >>> a[2:4]
> [2, 3]
> 
> This makes perfect sense to me, but not everyone would agree.  The
> nicest thing about it is that it allows this:
> 
> >>> a[0:3] + a[3:]
> [0, 1, 2, 3, 4, 5]
> 
> I.e. you can construct the original interval by appending the
> "touching" subintervals.  This is nice for downloads because it allows
> you to download 0-2k, 2k-5k, etc., without the one overlapping byte.
> 
> Common Lisp:
> [1]> (setq a '(0 1 2 3 4 5))
> (0 1 2 3 4 5)
> [2]> (subseq a 2 4)
> (2 3)
> 
> Perl avoids the potential confusion by having its SUBSTR take offset
> (0-based) and length, which is clear to everyone.
> 
> > while there are at least Pascal and Perl where 0..n
> 
> The Pascal reference is to 1..n, not 0..n.  Which is one point you
> seem to have missed: IMHO [start, end] makes more sense with intervals
> that start with 1, and [start, end) makes more sense with intervals
> with start with 0.
> 
> > 0..10 #11 bytes including first one, like Perl, Pascal
> > 1-10  #10 bytes including first one
> 
> 1-10 is what I meant by "the Pascal way" because most Pascal programs
> use 1-based arrays.  Again, assuming we want to download 16 bytes, the
> three options are, in my order of preference:
> 
> 1 .. 16  # end-closed 1-based, Pascal-like
> 0 .. 15  # end-closed 0-based, Perl-like
> 0 .. 16  # end-open 0-based, Python-like
> 
> Maybe we should support all 3, but document only one in --help?  That
> way most users will not notice the "complexity".  Also, the first
> option could well be ignored since 1-based arrays are for wimps.  :-)

-- 
Vladi Belperchinov-Shabanski <[EMAIL PROTECTED]> <[EMAIL PROTECTED]>
Personal home page at http://www.biscom.net/~cade
DataMax Ltd. http://www.datamax.bg
Too many hopes and dreams won't see the light...


smime.p7s
Description: S/MIME Cryptographic Signature


Re: Patch: --range switch implemented

2001-11-20 Thread Hrvoje Niksic

Vladi Belperchinov-Shabanski <[EMAIL PROTECTED]> writes:

>   Here is my IMO (in case someone is really interested in:))
> 
>   all ranges 0-based,
>   support few syntax-es:
> 
>   --range=0..1024-- closed-closed
>   --range=0-1024 -- closed-open
>   --range=1024+2048  -- take 3..4 K's :) i.e. get 2k starting on pos 1024

I agree with you copmletely; my preferences also lie in that
direction.  Except for one thing: HTTP/1.1 `Range' header uses x-y to
mean closed-closed, 0-based.  We are of course not required to use the
same syntax, but it would be nice not to confuse things.

So what would be a nice alternative syntax for closed-open?  0:1024?
Hyphen is easier to type, though.  Damn, sometimes it's so hard to
win.  :-)

>   (well last one could be like --range=2048@1024 just for fun)

:-)



RE: Patch: --range switch implemented

2001-11-20 Thread Herold Heiko

>From: Hrvoje Niksic [mailto:[EMAIL PROTECTED]]
>
>Vladi Belperchinov-Shabanski <[EMAIL PROTECTED]> writes:
>
>>   Here is my IMO (in case someone is really interested in:))
...

>So what would be a nice alternative syntax for closed-open?  0:1024?
>Hyphen is easier to type, though.  Damn, sometimes it's so hard to
>win.  :-)

Don't forget you need a symbol for the start->size syntax,too ... +
would be perfect,

--range 4096+1k
or --range 4095+1k (shudder)

Maybe what we really need is not a different syntax for every kind of
range definition but a default syntax (whichever symbol you use) and
number modifiers... for example, for the first kb, and suppose we want
to accomodate endusers:

default 1-1024 or 1:1024 or 1..1024

or ]0-1023] (same with : or ..)
or [1-1024] (same with : or ..)
or [0-1024[ (same with : or ..)
or 0+1024 or 0+1k
or [1+1024 or [1+1k
or ]0+1024 or ]0+1k

you get the point, sorry but I'm in a hurry, possibly I got some braces
wrong.
Heiko

-- 
-- PREVINET S.p.A.[EMAIL PROTECTED]
-- Via Ferretto, 1ph  x39-041-5907073
-- I-31021 Mogliano V.to (TV) fax x39-041-5907087
-- ITALY



Re: Patch: --range switch implemented

2001-11-20 Thread Hrvoje Niksic

Herold Heiko <[EMAIL PROTECTED]> writes:

> Don't forget you need a symbol for the start->size syntax,too ... +
> would be perfect,

Yes.  That's +, as implemented in the original patch.  Noone is
disupting that one.

> --range 4096+1k
> or --range 4095+1k (shudder)

Did you mean 4097 here?



RE: Patch: --range switch implemented

2001-11-20 Thread Herold Heiko

>From: Hrvoje Niksic [mailto:[EMAIL PROTECTED]]
>
>Herold Heiko <[EMAIL PROTECTED]> writes:
>
>> Don't forget you need a symbol for the start->size syntax,too ... +
>> would be perfect,
>
>Yes.  That's +, as implemented in the original patch.  Noone is
>disupting that one.
>
>> --range 4096+1k
>> or --range 4095+1k (shudder)
>
>Did you mean 4097 here?
>

Yes in fact, for the 1-based syntax.
Heiko

-- 
-- PREVINET S.p.A.[EMAIL PROTECTED]
-- Via Ferretto, 1ph  x39-041-5907073
-- I-31021 Mogliano V.to (TV) fax x39-041-5907087
-- ITALY