I'm using web connector. > Are you trying to crawl through a proxy? No. I just set seeds that url without a proxy. (Also I didn't obey robots.txt)
Using curl, it is the same as your result. Could you reproduce that? Shinichiro On 2013/01/09, at 17:49, Karl Wright wrote: > When I try the URL you gave using curl and no special arguments, I get this: > > > C:\Users\Karl>curl -vvv "http://lucene.jugem.jp/?eid=39" > * About to connect() to lucene.jugem.jp port 80 (#0) > * Trying 210.172.160.170... connected > * Connected to lucene.jugem.jp (210.172.160.170) port 80 (#0) >> GET /?eid=39 HTTP/1.1 >> User-Agent: curl/7.21.7 (i386-pc-win32) libcurl/7.21.7 OpenSSL/1.0.0c >> zlib/1.2 > .5 librtmp/2.3 >> Host: lucene.jugem.jp >> Accept: */* >> > < HTTP/1.1 200 OK > < Date: Wed, 09 Jan 2013 08:47:52 GMT > < Server: Apache/2.0.59 (Unix) > < Vary: User-Agent,Host,Accept-Encoding > < Last-Modified: Tue, 08 Jan 2013 07:58:33 GMT > < Accept-Ranges: bytes > < Content-Length: 22594 > < Cache-Control: private > < Pragma: no-cache > < Connection: close > < Content-Type: text/html > > There's no 302 from here. > > Are you trying to crawl through a proxy? If so, that might be where > the problem lies. > > Karl > > On Wed, Jan 9, 2013 at 3:40 AM, Karl Wright <[email protected]> wrote: >> It sounds like the httpclient upgrade definitely broke something. We >> should open a ticket. >> >> But first, can you confirm what connector this is? Is it the web >> connector? If so, I am puzzled because the web connector has always >> logged any 302 return, but then queued a second document which it >> subsequently fetches. >> >> Karl >> >> On Wed, Jan 9, 2013 at 2:10 AM, Shinichiro Abe >> <[email protected]> wrote: >>> Hi, >>> >>> I'm using trunk code and crawling web site with seeds which have >>> http://lucene.jugem.jp/?eid=39 (koji's blog --I don't obey robots.txt). >>> As I'm look at Simple History, it shows 302 result code at fetch activity >>> and doesn't ingest document. >>> >>> When I used MCF 1.0.1 in the same situation, Simple History showed 200 >>> result code and MCF could ingest documents. >>> >>> Why does the trunk shows 302 status? Is it relevant to upgrading httpclient? >>> >>> Thanks in advance, >>> Shinichiro Abe
