date:20110716

Cannot crawl problem

2011-07-16 Thread Kelvin

Dear all, I was able to get nutch 1.2 working previously. I have done a clean install of nutch 1.2 now, and I strictly follow the instructions below: http://wiki.apache.org/nutch/NutchTutorialPre1.3 But now I have encounter this problem below. Why is it so? Do we need to setup tomcat in order

Fetcher thread time out

2011-07-16 Thread Markus Jelsma

Hi, With large map output the task tracker can time out (no progress update during merge). Using io.sort.factor i can tune the merge phase to proceed a bit faster. Yet it can still time out when the cluster is very busy etc. I've increased the task time out but now it also takes longer to get

Re: Cannot crawl problem

2011-07-16 Thread Kelvin

Dear all, Just to update, I have solved my problem. Apparently, we also need to edit this file conf/crawl-urlfilter.txt, besides conf/regex-urlfilter.txt Can we amend this pagehttp://wiki.apache.org/nutch/NutchTutorialPre1.3 I am sure many others encounter the same problem as me.

Re: Isn't there redudant/wasteful duplication between nutch crawldb and solr index?

2011-07-16 Thread lewis john mcgibbney

Hi Gabriele, At first this seems like a plausable arguement, however my question concerns what Nutch would do if we wished to change the Solr core which to index to? If we removed this functionality from the crawldb there would be no way to determine what Nutch was to fetch and what it wasn't.

Re: Isn't there redudant/wasteful duplication between nutch crawldb and solr index?

2011-07-16 Thread Gabriele Kahlout

On Sat, Jul 16, 2011 at 1:29 PM, lewis john mcgibbney lewis.mcgibb...@gmail.com wrote: Hi Gabriele, At first this seems like a plausable arguement, Indeed, I think it could be a FAQ. Shall I add it to nutch wiki? however my question concerns what Nutch would do if we wished to change

Re: Isn't there redudant/wasteful duplication between nutch crawldb and solr index?

2011-07-16 Thread lewis john mcgibbney

Please feel free to add this to the wiki as it is a question that will undoubtably arise in the future. Lewis On Sat, Jul 16, 2011 at 12:37 PM, Gabriele Kahlout gabri...@mysimpatico.com wrote: On Sat, Jul 16, 2011 at 1:29 PM, lewis john mcgibbney lewis.mcgibb...@gmail.com wrote: Hi

Re: Isn't there redudant/wasteful duplication between nutch crawldb and solr index?

2011-07-16 Thread Julien Nioche

Gabriele What you are describing could be done with Nutch 2.0 by adding a SOLR backend to GORA. SOLR would be used to store the webtable and provided that you setup the schema accordingly you could index the appropriate fields for searching. I think there were plans to add SOLR as a GORA backend.

Re: Is it possible to crawl yahoo answer?

2011-07-16 Thread Kelvin

Hi Tamanjit, Thank you for your help. I tried your suggestion, but it crawl every normal url except url of this type answers.yahoo.com/question/index;_ylt=AtKz1xss1AS6RGeAQTFz1kyf5HNG;_ylv=3?qid=20110715030336AAzXnNs I also try this suggestion by

Re: running tests from the command line

2011-07-16 Thread lewis john mcgibbney

Further to this, I have been working on a JIRA ticket for this [1] If you could, can you please test. I will also shortly and hopefully we can get this committed soon. Thank you [1] https://issues.apache.org/jira/browse/NUTCH-672 On Tue, Jul 12, 2011 at 9:36 PM, lewis john mcgibbney

Re: modifying parse implementation

2011-07-16 Thread Cam Bazz

Hello, I did not understand ParseData.parseData - In ParseData there are getContentMeta and getParseMeta There is also a getMeta(String string) - it appears that there is no setter for this. There is also setParseMeta, but it appears content meta is not settable. Best Regards, C.B. On

Re: skipping invalid segments nutch 1.3

2011-07-16 Thread Leo Subscriptions

I've used crawl to ensure config is correct and I don't get any errors, so I must be doing something wrong with the individual steps, but can;t see what.

Re: modifying parse implementation

2011-07-16 Thread Joye

Hello, You could put the features into ParseData by calling /parseData.getParseMeta().set(features, valueOfFeatures); /When you wanna use it, call parseData.getParseMeta().get(features) to get it out/, /the same as the use of Java Map. No need call the setter method. :-)/ /Regards, Joey/ /

Cannot crawl problem

Fetcher thread time out

Re: Cannot crawl problem

Re: Isn't there redudant/wasteful duplication between nutch crawldb and solr index?

Re: Isn't there redudant/wasteful duplication between nutch crawldb and solr index?

Re: Isn't there redudant/wasteful duplication between nutch crawldb and solr index?

Re: Isn't there redudant/wasteful duplication between nutch crawldb and solr index?

Re: Is it possible to crawl yahoo answer?

Re: running tests from the command line

Re: modifying parse implementation

Re: skipping invalid segments nutch 1.3

Re: modifying parse implementation

12 matches

Site Navigation

Mail list logo

Footer information