Dear all,
I was able to get nutch 1.2 working previously. I have done a clean install of
nutch 1.2 now, and I strictly follow the instructions below:
http://wiki.apache.org/nutch/NutchTutorialPre1.3
But now I have encounter this problem below. Why is it so? Do we need to setup
tomcat in order
Hi,
With large map output the task tracker can time out (no progress update during
merge). Using io.sort.factor i can tune the merge phase to proceed a bit
faster. Yet it can still time out when the cluster is very busy etc. I've
increased the task time out but now it also takes longer to get
Dear all,
Just to update, I have solved my problem. Apparently, we also need to edit this
file conf/crawl-urlfilter.txt, besides conf/regex-urlfilter.txt
Can we amend this pagehttp://wiki.apache.org/nutch/NutchTutorialPre1.3
I am sure many others encounter the same problem as me.
Hi Gabriele,
At first this seems like a plausable arguement, however my question concerns
what Nutch would do if we wished to change the Solr core which to index to?
If we removed this functionality from the crawldb there would be no way to
determine what Nutch was to fetch and what it wasn't.
On Sat, Jul 16, 2011 at 1:29 PM, lewis john mcgibbney
lewis.mcgibb...@gmail.com wrote:
Hi Gabriele,
At first this seems like a plausable arguement,
Indeed, I think it could be a FAQ. Shall I add it to nutch wiki?
however my question concerns
what Nutch would do if we wished to change
Please feel free to add this to the wiki as it is a question that will
undoubtably arise in the future.
Lewis
On Sat, Jul 16, 2011 at 12:37 PM, Gabriele Kahlout gabri...@mysimpatico.com
wrote:
On Sat, Jul 16, 2011 at 1:29 PM, lewis john mcgibbney
lewis.mcgibb...@gmail.com wrote:
Hi
Gabriele
What you are describing could be done with Nutch 2.0 by adding a SOLR
backend to GORA. SOLR would be used to store the webtable and provided that
you setup the schema accordingly you could index the appropriate fields for
searching. I think there were plans to add SOLR as a GORA backend.
Hi Tamanjit,
Thank you for your help. I tried your suggestion, but it crawl every normal url
except url of this type
answers.yahoo.com/question/index;_ylt=AtKz1xss1AS6RGeAQTFz1kyf5HNG;_ylv=3?qid=20110715030336AAzXnNs
I also try this suggestion by
Further to this, I have been working on a JIRA ticket for this [1]
If you could, can you please test. I will also shortly and hopefully we can
get this committed soon.
Thank you
[1] https://issues.apache.org/jira/browse/NUTCH-672
On Tue, Jul 12, 2011 at 9:36 PM, lewis john mcgibbney
Hello,
I did not understand ParseData.parseData -
In ParseData there are getContentMeta and getParseMeta
There is also a getMeta(String string) - it appears that there is no
setter for this.
There is also setParseMeta, but it appears content meta is not settable.
Best Regards,
C.B.
On
I've used crawl to ensure config is correct and I don't get any errors,
so I must be doing something wrong with the individual steps, but can;t
see what.
Hello,
You could put the features into ParseData by calling
/parseData.getParseMeta().set(features, valueOfFeatures);
/When you wanna use it, call parseData.getParseMeta().get(features) to
get it out/, /the same as the use of Java Map.
No need call the setter method. :-)/
/Regards,
Joey/
/
12 matches
Mail list logo