Re: Cassandra and Nutch 2.X not coding in UTF8

2014-09-08 Thread cervenkovab
So it is not a bug, it is a feature :). Thank you for the explanation, I thought I had something bad configured. Best Barbora -- View this message in context: http://lucene.472066.n3.nabble.com/Cassandra-and-Nutch-2-X-not-coding-in-UTF8-tp4157008p4157553.html Sent from the Nutch - User mailing

RE: Nutch not crawling deep enough into directory structure

2014-09-08 Thread Mattmann, Chris A (3980)
Hi Paul, Try expanding your last parameter (which is the # of crawling rounds). Also make sure to check these properties: db.ignore.internal.links false If true, when adding new links to a page, links from the same host are ignored. This is an effective way to limit the size of the l

Nutch not crawling deep enough into directory structure

2014-09-08 Thread Paul Rogers
Hi Guys Reposting this since I think it got lost in the tail end of the last post. I have a web site serving a series of documents (pdf's) and am using Nutch 1.8 to index them in solr. The base url is http://localhost/ and the documents are stored in a series of directories in the directory http

Re: Cassandra and Nutch 2.X not coding in UTF8

2014-09-08 Thread Lewis John Mcgibbney
Hi cervenkovab, This is an inherent design choice we made whilst developing gora-cassandra module to what it is now. Ultimately we store all data as a Byte Array. CQLSH subsequently gets data as it is within Cassandra. Therefore no decoding is done on the client side before the data is presented t

Re: Permission to edit a wiki page

2014-09-08 Thread Lewis John Mcgibbney
Hi Jorge, On Sat, Sep 6, 2014 at 5:45 PM, wrote: > > > I’ve written a blog post on how to index the inline and outlinks of a > Webpage using Nutch 1.x (currently 1.9) If possible and if this would help > make easier for people understanding how to extend nutch to their needs I > would like to ad

Re: Nutch 1.7 fetch happening in a single map task.

2014-09-08 Thread Meraj A. Khan
AFAIK, the script does not go by the mode you set , but the presence of the *nutch*.job file in the a directory a level above script it self i. ../*.job. Can you please check if you have the Hadoop job file at the appropriate location? On Mon, Sep 8, 2014 at 9:22 AM, Simon Z wrote: > Thank you

Re: Nutch 1.7 fetch happening in a single map task.

2014-09-08 Thread Simon Z
Thank you very Meraj for your reply, I also thought it's a typo. I had set the numFetchers via numSlaves, and the echo of generator showed that numFetcher is 8 (numTasks=`expr $numSlaves \* 2` , that is 4 by 2), but the output of generator showed that the run mode is "local" and generate exact on