Hi All,
I'm using Nutch 1.7 (trunk) and writing a plugin to index to HBase (using
Nutch2.1 is not an option - I had to use 1.7 and write an indexer myself).
I believe I'm well on my way, but I had a few questions.
So my first step in the process was to make sure that the NutchDocument
held the fi
I am not 100% sure if this would work, but you can try passing -Djsse.
enableSNIExtension=false to the fetch command.
On Mon, May 27, 2013 at 3:22 PM, Tejas Patil wrote:
> Which version of Java are you using ? This problem is seen with Java 7
> [0]. Downgrading to Java 6 might help you.
>
> [0]
The parser has already filtered out the unwanted urls as per the old regex
rules. So "update" will not get those urls.
Run "bin/nutch parse -all -force" to reparse the segments with the new
regexes and then try what you did earlier ie. update -> generate -> fetch
etc..
On Mon, May 27, 2013 at 2:4
Which version of Java are you using ? This problem is seen with Java 7 [0].
Downgrading to Java 6 might help you.
[0] :
http://stackoverflow.com/questions/7615645/ssl-handshake-alert-unrecognized-name-error-since-upgrade-to-java-1-7-0
On Mon, May 27, 2013 at 8:33 AM, Eyeris Rodriguez Rueda wrote
Hi Markus,
a similar problem was posted some time ago:
http://lucene.472066.n3.nabble.com/NegativeArraySizeException-and-quot-problem-advancing-port-rec-quot-during-fetching-tt3994633.html#a3996554
Sebastian
On 05/27/2013 11:06 AM, Markus Jelsma wrote:
> Hi,
>
> For some reason the fetcher som
Hi all,
Im tring to crawl a web site and I get one error with ssl, this is the
exception javax.net.ssl.SSLProtocolException, specifically
2013-05-27 10:46:06,226 INFO fetcher.Fetcher - fetch of
https://cubatravel.cidi.uci.cu:8443/ failed with:
javax.net.ssl.SSLProtocolException: handshake alert:
I had previously excluded some urls in a nutch crawl to limit the scope of
the crawl during testing by including the appropriate regex in the
regex-urlfilter.txt file . I would now like to lift those restrictions and
have editing the regex-urlfilter.txt to allow more urls. However after
executing
Hi,
For some reason the fetcher sometimes produces corrupts unreadable segments. It
then exists with exception like "problem advancing post", or "negative array
size exception" etc.
java.lang.RuntimeException: problem advancing post rec#702
at org.apache.hadoop.mapred.Task$ValuesIterat
Dear List,
One thing I should add is that it works fine with SOLR:
bin/nutch solrindex http://127.0.0.1:8983/solr/ -reindex
will load the index up nicely
Thanks again for your suggestions.
Regards,
Nicholas W.
On Thu, May 23, 2013 at 10:47 AM, Nicholas W <4...@log1.net> wrote:
> Dear List,
>
9 matches
Mail list logo