Picot Chappell <[EMAIL PROTECTED]> writes:

> Why doesn't wget assume that files, which don't declare content
> type, are text/html files?

Good question.  I don't know, perhaps such brokenness never occurred
to me.  And I don't remember anyone reporting it until now.

> I'm looking into patching http.c, so that if type isn't defined it
> gets set to text/html.  Has this been done for 1.8.1 already?  If
> so, can someone pass that patch along to me?
>
> Also, if I do this, will it cause horrible wget hiccups?

I don't think it will make a difference, except improve user
experience in the case that you describe.  Correctly written pages
will not be affected adversely, and that's what truly matters.

Here is a patch that should implement what you need.  Please let me
know if it works for you.

2002-04-16  Hrvoje Niksic  <[EMAIL PROTECTED]>

        * http.c (gethttp): If Content-Type is not given, assume
        text/html.

Index: src/http.c
===================================================================
RCS file: /pack/anoncvs/wget/src/http.c,v
retrieving revision 1.90
diff -u -r1.90 http.c
--- src/http.c  2002/04/14 05:19:27     1.90
+++ src/http.c  2002/04/16 00:14:57
@@ -1308,10 +1308,12 @@
        }
     }
 
-  if (type && !strncasecmp (type, TEXTHTML_S, strlen (TEXTHTML_S)))
+  /* If content-type is not given, assume text/html.  This is because
+     of the multitude of broken CGI's that "forget" to generate the
+     content-type.  */
+  if (!type || 0 == strncasecmp (type, TEXTHTML_S, strlen (TEXTHTML_S)))
     *dt |= TEXTHTML;
   else
-    /* We don't assume text/html by default.  */
     *dt &= ~TEXTHTML;
 
   if (opt.html_extension && (*dt & TEXTHTML))

Reply via email to