So it is not a bug, it is a feature :). Thank you for the explanation, I
thought I had something bad configured.
Best
Barbora
--
View this message in context:
http://lucene.472066.n3.nabble.com/Cassandra-and-Nutch-2-X-not-coding-in-UTF8-tp4157008p4157553.html
Sent from the Nutch - User mailing
Hi Paul,
Try expanding your last parameter (which is the # of crawling rounds).
Also make sure to check these properties:
db.ignore.internal.links
false
If true, when adding new links to a page, links from
the same host are ignored. This is an effective way to limit the
size of the l
Hi Guys
Reposting this since I think it got lost in the tail end of the last post.
I have a web site serving a series of documents (pdf's) and am using Nutch
1.8 to index them in solr. The base url is http://localhost/ and the
documents are stored in a series of directories in the directory
http
Hi cervenkovab,
This is an inherent design choice we made whilst developing gora-cassandra
module to what it is now.
Ultimately we store all data as a Byte Array. CQLSH subsequently gets data
as it is within Cassandra. Therefore no decoding is done on the client side
before the data is presented t
Hi Jorge,
On Sat, Sep 6, 2014 at 5:45 PM, wrote:
>
>
> I’ve written a blog post on how to index the inline and outlinks of a
> Webpage using Nutch 1.x (currently 1.9) If possible and if this would help
> make easier for people understanding how to extend nutch to their needs I
> would like to ad
AFAIK, the script does not go by the mode you set , but the presence of the
*nutch*.job file in the a directory a level above script it self i.
../*.job.
Can you please check if you have the Hadoop job file at the appropriate
location?
On Mon, Sep 8, 2014 at 9:22 AM, Simon Z wrote:
> Thank you
Thank you very Meraj for your reply, I also thought it's a typo.
I had set the numFetchers via numSlaves, and the echo of generator showed
that numFetcher is 8 (numTasks=`expr $numSlaves \* 2` , that is 4 by 2),
but the output of generator showed that the run mode is "local" and
generate exact on
7 matches
Mail list logo