on the opt.method would be 'restored' to POST by RESTORE_POST_DATA.
Regards,
Gijs
Op 01-05-13 22:16 schreef Giuseppe Scrivano:
hi Gijs,
Gijs van Tulder gvtul...@gmail.com writes:
Giuseppe Scrivano wrote:
what about this patch? Any comment?
Another suggestion: why not save the original
a simple fix. See the attached patch.
Regards,
Gijs
From d2e6e16b3062cc0e6b3c13fd04e3654ed2dbdb6e Mon Sep 17 00:00:00 2001
From: Gijs van Tulder gvtul...@gmail.com
Date: Sun, 21 Apr 2013 22:36:50 +0200
Subject: [PATCH] Remove old reference to opt.post_data.
---
src/ChangeLog |5 +
src/http.c
/resources/warc-implementation-guidelines-v1
commit b54fb8feb9dfb2a111d15f1b759de61217d5251e
Author: Gijs van Tulder gvtul...@gmail.com
Date: Fri Apr 12 23:37:45 2013 +0200
warc: Follow the guidelines for metadata records
Do not use the same UUID for the manifest and arguments records
/private/wgzip.c#314
diff --git a/src/ChangeLog b/src/ChangeLog
index 8e1213f..65d636d 100644
--- a/src/ChangeLog
+++ b/src/ChangeLog
@@ -1,3 +1,8 @@
+2013-03-31 Gijs van Tulder gvtul...@gmail.com
+
+ * warc.c: Correctly write the field length in the skip length field
+ of .warc.gz files
Ruehsen fixed this in his version of the patch.)
The attached patch uses number_to_string to fix the printf in
warc_write_cdx_record.
Regards,
Gijs
From 21fc9f0dd9c71e2dc3aea29be4e16f14620d12a5 Mon Sep 17 00:00:00 2001
From: Gijs van Tulder gvtul...@gmail.com
Date: Sat, 24 Nov 2012 12:44:14
.
Regards,
Gijs
commit 66c0595f5440b36afb7307d4cab3d6430254183b
Author: Gijs van Tulder gvtul...@gmail.com
Date: Mon Nov 12 22:03:30 2012 +0100
Fix for invalid WARC Content-Length header on some platforms.
diff --git a/src/ChangeLog b/src/ChangeLog
index ec78fe8..3901d94 100644
--- a/src/ChangeLog
(rec_existing-url, url) == 0)
The attached patch makes this change. The deduplication works better.
Regards,
Gijs
From 807b98d7d9289765c9f210336d2dbf294d663f99 Mon Sep 17 00:00:00 2001
From: Gijs van Tulder gvtul...@gmail.com
Date: Wed, 30 May 2012 23:00:04 +0200
Subject: [PATCH] warc: Fix segfault
Hi,
There's a problem if you combine --output-document with --recursive or
--page-requisites. --output-document breaks the recursion.
First you get a warning:
WARNING: combining -O with -r or -p will mean that all downloaded
content will be placed in the single file you specified.
That
:28:11 +
@@ -1,3 +1,8 @@
+2012-04-11 Gijs van Tulder gvtul...@gmail.com
+
+ * bootstrap.conf (gnulib_modules): Include module `regex'.
+ * configure.ac: Check for PCRE library.
+
2012-03-25 Ray Satiro raysat...@yahoo.com
* configure.ac: Fix build under mingw when OpenSSL is used
Hi,
Here is a patch that adds the --acceptregex and --rejectregex options.
With these options it would be possible to do two things:
1. You can match complete urls, instead of just the directory prefix or
the file name suffix (which you can do with --accept and
--include-directories).
2. You
Ángel González wrote:
I really like PCRE, but I think the default should be POSIX regex
Certainly. (I'm not sure if it's even worth adding the PCRE option.
Matching URLs can't be that hard, can it?)
How are the interactions between --{accept,reject}regex and
--{accept,reject}?
The regex
18:13:27 +
+++ src/ChangeLog 2012-04-01 20:35:28 +
@@ -1,3 +1,7 @@
+2012-04-01 Gijs van Tulder gvtul...@gmail.com (tiny change)
+
+ * html-url.c: Prevent crash on incomplete STYLE tag.
+
2012-03-29 From: Tim Ruehsen tim.rueh...@gmx.de (tiny change)
* utils.c (library): Include sys
-31 23:16:33 +
@@ -1,3 +1,9 @@
+2012-02-01 Gijs van Tulder gvtul...@gmail.com
+
+ * warc.c: Fix large file support with ftello, fseeko.
+ * warc.h: Fix large file support.
+ * http.c: Fix large file support.
+
2012-01-27 Gijs van Tulder gvtul...@gmail.com
* retr.c (fd_read_body
+1,8 @@
+2012-01-27 Gijs van Tulder gvtul...@gmail.com
+
+ * retr.c (fd_read_body): Fix a memory leak with chunked responses.
+ * http.c (skip_short_body): Fix the same memory leak.
+
2012-01-09 Gijs van Tulder gvtul...@gmail.com
* init.c: Disable WARC compression if zlib is disabled
regards,
Thank you for a wonderful utility,
--
Evgeniy
=== modified file 'ChangeLog'
--- ChangeLog 2011-12-12 20:30:39 +
+++ ChangeLog 2012-01-09 13:40:01 +
@@ -1,3 +1,7 @@
+2012-01-09 Gijs van Tulder gvtul...@gmail.com
+
+ * configure.ac: Always try to use libz, even without SSL
lovely. I am going to push it soon with some small adjustments.
That's good to hear.
There's one other small adjustment that you may want to make, see the
attached patch. One of the WARC functions uses the basename function,
which causes problems on OS X. Including libgen.h and strdup-ing
Hi,
I think there is a memory leak in the GnuTLS part of wget. When
downloading multiple files from a HTTPS server, wget with GnuTLS uses a
lot of memory.
Perhaps an explanation for this can be found in src/http.c. The gethttp
calls ssl_init for each download:
/* Initialize the SSL
Hi David,
David H. Lipman wrote:
I have seen WARC mentioned but have not seen a definition.
WARC (Web ARChive, ISO 28500:2009) [1] is a file format for storing web
resources. It is used for making archives of web sites. The Internet
Archive, for example, uses it as the file format for
can you please send a complete diff against the current development
tree version?
Here's the diff of the WARC additions (1.9MB zipped) to revision 2565:
http://dl.dropbox.com/u/365100/wget_warc-20110926-complete.patch.bz2
Thanks,
Gijs
Hi.
It's been a while since we've discussed the WARC addition to Wget. Is
there anything I can help with?
Gijs
Giuseppe Scrivano writes:
The implementation makes use of the open source WARC Tools library
(Apache License 2.0):
http://code.google.com/p/warc-tools/
how much code is really needed from that library? I wonder if we can
avoid this dependency at all.
The library comes with some
makes use of the open source WARC Tools library
(Apache License 2.0):
http://code.google.com/p/warc-tools/
I look forward to your response.
Kind regards,
Gijs van Tulder
22 matches
Mail list logo