nutch and mail

2011-06-24 Thread Alexey Tsoy
When nutch enters data into a MySQL database error: java.io.IOException: java.sql.BatchUpdateException: Data truncation: Data too long for column 'id' at row 1 The column id is inverted url (standard). Has anyone encountered this problem? thanks! MySQL version 5.1

Problem in search

2011-06-24 Thread Jefferson
My problem is in the search. I made the site crawler http://en.wikipedia.org/wiki/Albert_Einstein When I access the http://localhost:8080/nutch-1.1/ and digit Adolf Hitler returns me a result, ok. When I type phenomena returns 0 results, not ok. Attached is my config files and logging. thanks

Re: Depth-first crawling

2011-06-24 Thread Gabriele Kahlout
in the wiki there is an example of how you can write your own parser plugin (class). Essentially you will receive from nutch (implementing the interface) the page text and there start counting the terms (re-invent the wheel, or copy-paste from solr code, or find a library) to do that. On Thu, Jun

Re: Problem in search

2011-06-24 Thread lewis john mcgibbney
Hi Jefferson, I cannot access either your nutch-site or nutch-default but I see that your http.content.limit is INFO http.Http - http.content.limit = 65536 It is a fairly large page so maybe this can be the cause. I'm sorrry I don't have access to my linux worktop so I can't test myself can you

Re: Problem in search

2011-06-24 Thread Markus Jelsma
That might be it. The page says Content-Length: 340671 Hi Jefferson, I cannot access either your nutch-site or nutch-default but I see that your http.content.limit is INFO http.Http - http.content.limit = 65536 It is a fairly large page so maybe this can be the cause. I'm sorrry I don't

Re: Problem in search

2011-06-24 Thread Jefferson
ready. Now I have another problem: digit phenomena and he returns this: - Albert Einstein - Wikipedia, the free encyclopedia Albert Einstein From Wikipedia, the free encyclopedia Jump ... - what might be happening? Thanks for the help below my configuration files:

Apache Nutch 1.3 tutorial now on Wiki

2011-06-24 Thread lewis john mcgibbney
Hi all, With permission from the author I managed to adapt a blog entry for the above which can be found here. At this stage I would ask for anyone interested to make changes/improvements/etc. Once we can verify the integrity and accuracy of the entry it would be nice to rebuild the website with

Re: Problem in search

2011-06-24 Thread lewis john mcgibbney
Can you expand on this? I am not understanding your description of the problem. On Fri, Jun 24, 2011 at 12:52 PM, Jefferson jeff151520...@msn.com wrote: ready. Now I have another problem: digit phenomena and he returns this: - Albert Einstein - Wikipedia, the free encyclopedia Albert