When I try the URL you gave using curl and no special arguments, I get this:
C:\Users\Karl>curl -vvv "http://lucene.jugem.jp/?eid=39" * About to connect() to lucene.jugem.jp port 80 (#0) * Trying 210.172.160.170... connected * Connected to lucene.jugem.jp (210.172.160.170) port 80 (#0) > GET /?eid=39 HTTP/1.1 > User-Agent: curl/7.21.7 (i386-pc-win32) libcurl/7.21.7 OpenSSL/1.0.0c zlib/1.2 .5 librtmp/2.3 > Host: lucene.jugem.jp > Accept: */* > < HTTP/1.1 200 OK < Date: Wed, 09 Jan 2013 08:47:52 GMT < Server: Apache/2.0.59 (Unix) < Vary: User-Agent,Host,Accept-Encoding < Last-Modified: Tue, 08 Jan 2013 07:58:33 GMT < Accept-Ranges: bytes < Content-Length: 22594 < Cache-Control: private < Pragma: no-cache < Connection: close < Content-Type: text/html There's no 302 from here. Are you trying to crawl through a proxy? If so, that might be where the problem lies. Karl On Wed, Jan 9, 2013 at 3:40 AM, Karl Wright <[email protected]> wrote: > It sounds like the httpclient upgrade definitely broke something. We > should open a ticket. > > But first, can you confirm what connector this is? Is it the web > connector? If so, I am puzzled because the web connector has always > logged any 302 return, but then queued a second document which it > subsequently fetches. > > Karl > > On Wed, Jan 9, 2013 at 2:10 AM, Shinichiro Abe > <[email protected]> wrote: >> Hi, >> >> I'm using trunk code and crawling web site with seeds which have >> http://lucene.jugem.jp/?eid=39 (koji's blog --I don't obey robots.txt). >> As I'm look at Simple History, it shows 302 result code at fetch activity >> and doesn't ingest document. >> >> When I used MCF 1.0.1 in the same situation, Simple History showed 200 >> result code and MCF could ingest documents. >> >> Why does the trunk shows 302 status? Is it relevant to upgrading httpclient? >> >> Thanks in advance, >> Shinichiro Abe
