> Delivered-To: marty...@venck.us
> Received: by 10.100.109.13 with SMTP id h13cs81395anc;
>         Sun, 9 Jan 2011 14:09:58 -0800 (PST)
> Received: by 10.42.176.2 with SMTP id bc2mr3471559icb.517.1294610998158;
>         Sun, 09 Jan 2011 14:09:58 -0800 (PST)
> Return-Path: <owner-tech+m23...@openbsd.org>
> Received: from shear.ucar.edu (lists.openbsd.org [192.43.244.163])
>         by mx.google.com with ESMTP id
f5si189964icu.33.2011.01.09.14.09.56;
>         Sun, 09 Jan 2011 14:09:57 -0800 (PST)
> Received-SPF: pass (google.com: manual fallback record for domain of
owner-tech+m23...@openbsd.org designates 192.43.244.163 as permitted sender)
client-ip=192.43.244.163;
> Authentication-Results: mx.google.com; spf=pass (google.com: manual fallback
record for domain of owner-tech+m23...@openbsd.org designates 192.43.244.163
as permitted sender) smtp.mail=owner-tech+m23...@openbsd.org; dkim=pass (test
mode) header.i=@gmail.com
> Received: from openbsd.org (localhost.ucar.edu [127.0.0.1])
>       by shear.ucar.edu (8.14.3/8.14.3) with ESMTP id p09M9ZNr021431;
>       Sun, 9 Jan 2011 15:09:35 -0700 (MST)
> Received: from mail-qw0-f49.google.com (mail-qw0-f49.google.com
[209.85.216.49])
>       by shear.ucar.edu (8.14.3/8.14.3) with ESMTP id p09M8Qo7030584
>       for <tech@openbsd.org>; Sun, 9 Jan 2011 15:08:27 -0700 (MST)
> Received: by qwj9 with SMTP id 9so18728829qwj.8
>       for <tech@openbsd.org>; Sun, 09 Jan 2011 14:08:26 -0800 (PST)
> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
h=domainkey-signature:received:received:date:from:to:cc:subject
:in-reply-to:message-id:references:user-agent:mime-version :content-type;
bh=h/uSItba1mLjBNAbFHi24e9uIaFGrvE/zyzh9me7yYw=;
b=XL2MRc5mQPtKcLk0xcUbEGn9qdUnrl6SeDWu1vs3q6WxC/wpfV2rgJanu5AiWpbf49
wtVrJKYpS9dy1CelkCIVzlg+RPUqnv32G3++umRivu257SvjpOOslVewDoQhQWObt2Ge
rOSbU0xTYAxK1Ed/VpudyiJchjG2z31yWyH+E=
> DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
h=date:from:to:cc:subject:in-reply-to:message-id:references
:user-agent:mime-version:content-type;
b=vy/Cr6z1/nOya7/ja1dbvBm4ImoKOvIBVNnuY9F9pk7/nk4nX79tDrpqDzmhuQAxtM
NUIjTiMhF4Iqu1yDpP9CxDFrltopI0N2v6zlVhte2HPPW3pumugdFX8PZAuyr8+H+Nxi
TDNAfVeIOfemJkkd/DClpRrGh+GwyqEa8ab+Q=
> Received: by 10.224.11.20 with SMTP id r20mr26905601qar.51.1294610906191;
Sun, 09 Jan 2011 14:08:26 -0800 (PST)
> Received: from mini.t3rl.org (cpe-74-66-19-205.nyc.res.rr.com
[74.66.19.205])
>       by mx.google.com with ESMTPS id nb15sm16014404qcb.2.2011.01.09.14.08.24
(version=TLSv1/SSLv3 cipher=RC4-MD5); Sun, 09 Jan 2011 14:08:25 -0800 (PST)
> Date: Sun, 9 Jan 2011 17:08:22 -0500 (EST)
> From: Ted Unangst <ted.unan...@gmail.com>
> To: Ted Unangst <ted.unan...@gmail.com>
> cc: tech@openbsd.org
> Subject: Re: http gzip support for ftp
> In-Reply-To: <alpine.bso.2.00.1101091616360.14...@zvav.g3ey.bet>
> Message-ID: <alpine.bso.2.00.1101091707210.14...@zvav.g3ey.bet>
> References: <alpine.bso.2.00.1101091616360.14...@zvav.g3ey.bet>
> User-Agent: Alpine 2.00 (BSO 1167 2008-08-23)
> MIME-Version: 1.0
> Content-Type: TEXT/PLAIN; charset=US-ASCII
> List-Help: <mailto:majord...@openbsd.org?body=help>
> List-Owner: <mailto:tech-ow...@openbsd.org>
> List-Post: <mailto:tech@openbsd.org>
> List-Subscribe: <mailto:majord...@openbsd.org?body=sub%20tech>
> List-Unsubscribe: <mailto:majord...@openbsd.org?body=unsub%20tech>
> X-Loop: tech@openbsd.org
> Precedence: list
> Sender: owner-t...@openbsd.org
>
> On Sun, 9 Jan 2011, Ted Unangst wrote:
>
> > Downloading things can go a lot faster if the server and client support
> > http compression.  This is easily added to the ftp program's http
support.
>
> poopers, I hard coded gzipped to 1 by accident.  3rd time's a charm.
> Please test, as there are obviously lots of different web servers out
> there.
>
> Index: Makefile
> ===================================================================
> RCS file: /home/tedu/cvs/src/usr.bin/ftp/Makefile,v
> retrieving revision 1.25
> diff -u -r1.25 Makefile
> --- Makefile  5 May 2009 19:35:30 -0000       1.25
> +++ Makefile  9 Jan 2011 21:14:51 -0000
> @@ -17,8 +17,8 @@
>
>  CPPFLAGS+= -DINET6
>
> -LDADD+=      -ledit -lcurses -lutil -lssl -lcrypto
> -DPADD+=      ${LIBEDIT} ${LIBCURSES} ${LIBUTIL}
> +LDADD+=      -ledit -lcurses -lutil -lssl -lcrypto -lz
> +DPADD+=      ${LIBEDIT} ${LIBCURSES} ${LIBUTIL} ${LIBZ}
>  LDSTATIC= ${STATIC}
>
>  #COPTS+= -Wall -Wconversion -Wstrict-prototypes -Wmissing-prototypes
> Index: fetch.c
> ===================================================================
> RCS file: /home/tedu/cvs/src/usr.bin/ftp/fetch.c,v
> retrieving revision 1.103
> diff -u -r1.103 fetch.c
> --- fetch.c   25 Aug 2010 20:32:37 -0000      1.103
> +++ fetch.c   9 Jan 2011 22:07:12 -0000
> @@ -63,6 +63,7 @@
>  #ifndef SMALL
>  #include <openssl/ssl.h>
>  #include <openssl/err.h>
> +#include <zlib.h>
>  #else /* !SMALL */
>  #define SSL void
>  #endif /* !SMALL */
> @@ -167,6 +168,80 @@
>       return (epath);
>  }
>
> +#ifndef SMALL

Please put this at the end of the file, to avoid yet-another #ifndef
SMALL block.

> +static size_t
> +chunked_read(FILE *fin, SSL *ssl, unsigned char *buf, size_t amt,
> +    int chunked, int gzipped)
> +{
> +     static int chunksize;
> +     static int zinit;
> +     static z_stream zctx;
> +     static unsigned char zbuf[4096];

Are you sure the states of these variables are clean after each
url_get() invocation?  Please don't forget that FTP is able to fetch
multiple URLs, i.e.: ftp http://foo http://bar.

> +     size_t chunkbuflen;
> +     char *chunkbuf;
> +     size_t len, zlen, zamt;
> +     int rv;
> +
> +     if (chunked && !chunksize) {

RFC 2616 sec. 3.6.1 says that the chunked encoding ends with a
zero-sized chunk.  However, if we still have data to decompress,
this is going to eat a line per each call, till it dies.

This should take into account zctx.avail_in, to let the last
chunk decompress properly.  I.e.:
        chunked && !chunksize && zctx.avail_in == 0

Here's a test case:
        http://venckus.iv.lt/~martynas/zero

> +             chunkbuf = ftp_readline(fin, ssl, &chunkbuflen);
> +             if (!chunkbuf) {
> +                     warnx("no chunk size");
> +                     return 0;
> +             }
> +             chunksize = strtol(chunkbuf, NULL, 16);
> +     }
> +     if (gzipped) {
> +             if (zinit == -1) {
> +                     zinit = 0;
> +                     return 0;
> +             }
> +             if (zctx.avail_in == 0) {
> +                     zamt = sizeof(zbuf);
> +                     if (chunked && zamt > chunksize)
> +                             zamt = chunksize;
> +                     zlen = ftp_read(fin, ssl, zbuf, zamt);
> +                     if (chunked) {
> +                             chunksize -= zlen;
> +                             /* eat empty line after chunk */
> +                             if (chunksize == 0)
> +                                     ftp_readline(fin, ssl, &chunkbuflen);
> +                     }
> +                     zctx.avail_in = zlen;
> +                     zctx.next_in = zbuf;
> +             }
> +             if (!zinit) {
> +                     rv = inflateInit2(&zctx, 15 + 32);
> +                     zinit = 1;
> +             }
> +             zctx.next_out = buf;
> +             zctx.avail_out = amt;
> +             if ((rv = inflate(&zctx, Z_NO_FLUSH)) != Z_OK) {
> +                     if (rv == Z_STREAM_END) {
> +                             inflateEnd(&zctx);
> +                             zinit = -1;
> +                     } else {
> +                             warnx("inflate failed %d", rv);
> +                             return 0;
> +                     }
> +             }
> +             len = zctx.next_out - buf;
> +     } else if (chunked) {
> +             if (amt > chunksize)
> +                     amt = chunksize;
> +             len = ftp_read(fin, ssl, buf, amt);
> +             chunksize -= len;
> +             if (chunksize == 0) /* eat empty line after chunk */
> +                     ftp_readline(fin, ssl, &chunkbuflen);

This code duplicates, perhaps use a separate function to get a
single chunk?  Dunno, don't have a strong opinion about this.

> +     } else {
> +             len = ftp_read(fin, ssl, buf, amt);
> +     }
> +
> +     return len;
> +}
> +#endif /* SMALL */
> +
> +
>  /*
>   * Retrieve URL, via the proxy in $proxyvar if necessary.
>   * Modifies the string argument given.
> @@ -195,6 +270,8 @@
>       const char *scheme;
>       int ishttpsurl = 0;
>       SSL_CTX *ssl_ctx = NULL;
> +     int chunked = 0;
> +     int gzipped = 0;
>  #endif /* !SMALL */
>       SSL *ssl = NULL;
>       int status;
> @@ -607,9 +684,11 @@
>  #endif /* !SMALL */
>               ftp_printf(fin, ssl, "GET /%s %s\r\nHost: ", epath,
>  #ifndef SMALL
> -                     restart_point ? "HTTP/1.1\r\nConnection: close" :
> +                     "HTTP/1.1\r\nConnection: close"
> +#else
> +                     "HTTP/1.0"
>  #endif /* !SMALL */
> -                     "HTTP/1.0");
> +                     );
>               if (strchr(host, ':')) {
>                       char *h, *p;
>
> @@ -638,6 +717,7 @@
>               if (restart_point)
>                       ftp_printf(fin, ssl, "\r\nRange: bytes=%lld-",
>                               (long long)restart_point);
> +             ftp_printf(fin, ssl, "\r\nAccept-Encoding: gzip");
>  #else /* !SMALL */
>               if (port && strcmp(port, "80") != 0)
>                       ftp_printf(fin, ssl, ":%s", port);
> @@ -753,6 +833,17 @@
>  #ifndef SMALL
>                       if (restart_point)
>                               filesize += restart_point;
> +#define TRANSENC "Transfer-Encoding: "
> +             } else if (strncasecmp(cp, TRANSENC, sizeof(TRANSENC)-1) == 0) {
> +                     cp += sizeof(TRANSENC) - 1;
> +                     if (strcasecmp(cp, "chunked") == 0)
> +                             chunked = 1;
> +#define CONTENTENC "Content-Encoding: "
> +             } else if (strncasecmp(cp, CONTENTENC, sizeof(CONTENTENC) - 1)
> +                 == 0) {
> +                     cp += sizeof(CONTENTENC) - 1;
> +                     if (strcasecmp(cp, "gzip") == 0)

Perhaps we should consider x-gzip here, too?

RFC 2616 sec. 3.5:
        Use of program names for the identification of encoding formats
        is not desirable and is discouraged for future encodings. Their
        use here is representative of historical practice, not good
        design. For compatibility with previous implementations of HTTP,
        applications SHOULD consider "x-gzip" and "x-compress" to be
        equivalent to "gzip" and "compress" respectively.

> +                             gzipped = 1;
>  #endif /* !SMALL */
>  #define LOCATION "Location: "
>               } else if (isredirect &&
> @@ -853,7 +944,13 @@
>       len = 1;
>       oldinti = signal(SIGINFO, psummary);
>       while (len > 0) {
> -             len = ftp_read(fin, ssl, buf, 4096);
> +#ifndef SMALL
> +             if (chunked || gzipped)
> +                     len = chunked_read(fin, ssl, buf, 4096,
> +                         chunked, gzipped);
> +             else
> +#endif
> +                     len = ftp_read(fin, ssl, buf, 4096);

Since chunked_read handles the "!chunked && !gzipped" case itself,
I think it would be simpler to just:

+ #ifdef SMALL
+ len = ftp_read(fin, ssl, buf, 4096);
+ #else
+ len = chunked_read(fin, ssl, buf, 4096, chunked, gzipped);
+ #endif

>               bytes += len;
>               for (cp = buf, wlen = len; wlen > 0; wlen -= i, cp += i) {
>                       if ((i = write(out, cp, wlen)) == -1) {

As much as I like it, I'm not sure we'll be able to enable this by
default, though.  There are quite some misconfigured servers out
there, like:
        http://elinks.cz/download/elinks-0.11.7.tar.gz

This one incorrectly sends:
        Content-Encoding: gzip

While it doesn't actually compress it (twice).  So the client
decompresses it, and you end up actually having decompressed TAR
called 'elinks-0.11.7.tar.gz'.

Reply via email to