Re: Bug in ETA code on x64

2006-03-29 Thread Greg Hurrell

El 28/03/2006, a las 20:43, Tony Lewis escribió:


Hrvoje Niksic wrote:


The cast to int looks like someone was trying to remove a warning and
botched operator precedence in the process.


I can't see any good reason to use , here. Why not write the line  
as:

  eta_hrs = eta / 3600; eta %= 3600;


Because that's not equivalent. The sequence or comma operator , has  
two operands: first the left operand is evaluated, then the right.  
The result has the type and value of the right operand. Note that a  
command in a list of initializations or arguments is not an operator,  
but simply a punctuation mark!.


Cheers,
Greg




smime.p7s
Description: S/MIME cryptographic signature


Re: Bug in ETA code on x64

2006-03-29 Thread Hrvoje Niksic
Greg Hurrell [EMAIL PROTECTED] writes:

 El 28/03/2006, a las 20:43, Tony Lewis escribió:

 Hrvoje Niksic wrote:

 The cast to int looks like someone was trying to remove a warning and
 botched operator precedence in the process.

 I can't see any good reason to use , here. Why not write the line
 as:
   eta_hrs = eta / 3600; eta %= 3600;

 Because that's not equivalent.

Well, it should be, because the comma operator has lower precedence
than the assignment operator (see http://tinyurl.com/evo5a,
http://tinyurl.com/ff4pp and numerous other locations).

I'd still like to know where Thomas got his version of progress.c
because it seems that the change has introduced the bug.


regex support RFC

2006-03-29 Thread Mauro Tortonesi


hrvoje and i have been recently talking about adding regex support to 
wget. we were considering to add a new --filter option which, by 
supporting regular expressions, would allow more powerful ways of 
filtering urls to download.


for instance the new option could allow the filtering of domain names, 
file names and url paths. in the following case --filter is used to 
prevent any download from the www-*.yoyodyne.com domain and to restrict 
download only to .gif files:


wget -r --filter=-domain:www-*.yoyodyne.com --filter=+file:\.gif$ 
http://yoyodyne.com


(notice that --filter interprets every given rule as a regex).

i personally think the --filter option would be a great new feature for 
wget, and i have already started working on its implementation, but we 
still have a few opened questions.


for instance, the syntax for --filter presented above is basically the 
following:


--filter=[+|-][file|path|domain]:REGEXP

is it consistent? is it flawed? is there a more convenient one?

please notice that supporting multiple comma-separated regexp in a 
single --filter option:


--filter=[+|-][file|path|domain]:REGEXP1,REGEXP2,...

would significantly complicate the implementation and usage of --filter, 
as it would require escaping of the , charachter. also notice that 
current filtering options like -A/R are somewhat broken, as they do not 
allow the usage of , char in filtering rules.


we also have to reach consensus on the filtering algorithm. for 
instance, should we simply require that a url passes all the filtering 
rules to allow its download (just like the current -A/R behaviour), or 
should we instead adopt a short circuit algorithm that applies all rules 
in the same order in which they were given in the command line and 
immediately allows the download of an url if it passes the first allow 
match? should we also support apache-like deny-from-all and 
allow-from-all policies? and what would be the best syntax to trigger 
the usage of these policies?


i am looking forward to read your opinions on this topic.


P.S.: the new --filter option would replace and extend the old -D, -I/X
  and -A/R options, which will be deprecated but still supported.

--
Aequam memento rebus in arduis servare mentem...

Mauro Tortonesi  http://www.tortonesi.com

University of Ferrara - Dept. of Eng.http://www.ing.unife.it
GNU Wget - HTTP/FTP file retrieval tool  http://www.gnu.org/software/wget
Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net
Ferrara Linux User Group http://www.ferrara.linux.it


Re: regex support RFC

2006-03-29 Thread Jim Wright
what definition of regexp would you be following?  or would this be
making up something new?  I'm not quite understanding the comment about
the comma and needing escaping for literal commas.  this is true for any
character in the regexp language, so why the special concern for comma?

I do like the [file|path|domain]: approach.  very nice and flexible.
(and would be a huge help to one specific need I have!)  I suggest also
including an any option as a shortcut for putting the same pattern in
all three options.

Jim



On Wed, 29 Mar 2006, Mauro Tortonesi wrote:

 
 hrvoje and i have been recently talking about adding regex support to wget. we
 were considering to add a new --filter option which, by supporting regular
 expressions, would allow more powerful ways of filtering urls to download.
 
 for instance the new option could allow the filtering of domain names, file
 names and url paths. in the following case --filter is used to prevent any
 download from the www-*.yoyodyne.com domain and to restrict download only to
 .gif files:
 
 wget -r --filter=-domain:www-*.yoyodyne.com --filter=+file:\.gif$
 http://yoyodyne.com
 
 (notice that --filter interprets every given rule as a regex).
 
 i personally think the --filter option would be a great new feature for wget,
 and i have already started working on its implementation, but we still have a
 few opened questions.
 
 for instance, the syntax for --filter presented above is basically the
 following:
 
 --filter=[+|-][file|path|domain]:REGEXP
 
 is it consistent? is it flawed? is there a more convenient one?
 
 please notice that supporting multiple comma-separated regexp in a single
 --filter option:
 
 --filter=[+|-][file|path|domain]:REGEXP1,REGEXP2,...
 
 would significantly complicate the implementation and usage of --filter, as it
 would require escaping of the , charachter. also notice that current
 filtering options like -A/R are somewhat broken, as they do not allow the
 usage of , char in filtering rules.
 
 we also have to reach consensus on the filtering algorithm. for instance,
 should we simply require that a url passes all the filtering rules to allow
 its download (just like the current -A/R behaviour), or should we instead
 adopt a short circuit algorithm that applies all rules in the same order in
 which they were given in the command line and immediately allows the download
 of an url if it passes the first allow match? should we also support
 apache-like deny-from-all and allow-from-all policies? and what would be the
 best syntax to trigger the usage of these policies?
 
 i am looking forward to read your opinions on this topic.
 
 
 P.S.: the new --filter option would replace and extend the old -D, -I/X
 and -A/R options, which will be deprecated but still supported.
 
 -- 
 Aequam memento rebus in arduis servare mentem...
 
 Mauro Tortonesi  http://www.tortonesi.com
 
 University of Ferrara - Dept. of Eng.http://www.ing.unife.it
 GNU Wget - HTTP/FTP file retrieval tool  http://www.gnu.org/software/wget
 Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net
 Ferrara Linux User Group http://www.ferrara.linux.it
 
 


Re: regex support RFC

2006-03-29 Thread Hrvoje Niksic
Mauro Tortonesi [EMAIL PROTECTED] writes:

 for instance, the syntax for --filter presented above is basically the
 following:

 --filter=[+|-][file|path|domain]:REGEXP

I think there should also be url for filtering on the entire URL.
People have been asking for that kind of thing a lot over the years.


Re: regex support RFC

2006-03-29 Thread Hrvoje Niksic
Jim Wright [EMAIL PROTECTED] writes:

 what definition of regexp would you be following?  or would this be
 making up something new?

It wouldn't be new, Mauro is definitely referring to regexps as
normally understood.  The regexp API's found on today's Unix systems
might be usable, but unfortunately those are not available on Windows.
They also lack the support for the very useful non-greedy matching
quantifier (the ? modifier to the * operator) introduced by Perl 5
and supported by most of today's major regexp implementations: Python,
Java, Tcl, etc.

One idea was to use PCRE, bundling it with Wget for the sake of
Windows and systems without PCRE.  Another (http://tinyurl.com/elp7h)
was to use and bundle Emacs's regex.c, the version of GNU regex
shipped with GNU Emacs.  It is small (one source) and offers
Unix-compatible basic and extended regeps, but also supports the
non-greedy quantifier and non-capturing groups.

See the message and the related discussion at http://tinyurl.com/mdwhx
for more about this topic.

 I'm not quite understanding the comment about the comma and needing
 escaping for literal commas.

Supporting PATTERN1,PATTERN2,... would require having a way to quote
the comma character.  But there is little reason for a specific comma
syntax since one can always use (PATTERN1|PATTERN2|...).

Being unable to have a comma in the pattern is a shortcoming in the
current -R/-A options.

 I do like the [file|path|domain]: approach.  very nice and flexible.

Thanks.