Re: Nutch 2.x : readdb command dump

2013-01-16 Thread kiran chitturi
On Thu, Jan 17, 2013 at 12:39 AM, Lewis John Mcgibbney < lewis.mcgibb...@gmail.com> wrote: > Hi Kiran, > > On Wednesday, January 16, 2013, kiran chitturi > wrote: > > > > Can i make changes to this parameter ALL_FIELDS and then try to dump the > > fields based on the user input ? This command mig

Re: Nutch 2.x : readdb command dump

2013-01-16 Thread Lewis John Mcgibbney
Hi Kiran, On Wednesday, January 16, 2013, kiran chitturi wrote: > > Can i make changes to this parameter ALL_FIELDS and then try to dump the > fields based on the user input ? This command might look like './bin/nutch > readdb -dump baseUrl $OUTPUT'. I assume by $OUTPUT you mean the field to pas

Re: confirm subscribe to user@nutch.apache.org

2013-01-16 Thread cocofan
On Wed, Jan 16, 2013, at 12:53 PM, user-h...@nutch.apache.org wrote: > Hi! This is the ezmlm program. I'm managing the > user@nutch.apache.org mailing list. > > I'm working for my owner, who can be reached > at user-ow...@nutch.apache.org. > > To confirm that you would like > >coco...@mail

Re: Nutch - ElasticSearch example

2013-01-16 Thread Stanislav Orlenko
one more thing. I did not manage to bind nutch 2.1 with the newest version of elasticsearch (0.20.2). Although I am sure that this version of nutch works fine with elasticsearch-0.19.10 If I remember correctly nutch 2.1 cannot discover elasticsearch cluster and exit by timeout On Wed, Jan 16, 201

RE: Wrong ParseData in segment

2013-01-16 Thread Markus Jelsma
Hi Sebastian, Makes sense, i'll be sure to modify the parser plugins. Perhaps it would be worth trying to make sure a single thread uses a single instance. I don't know why it works the way it does. Judging from the pointed thread it's intended behaviour. On the other side, reusing parser plug

Re: Nutch 2.x : readdb command dump

2013-01-16 Thread kiran chitturi
Hi Lewis, Thanks for your suggestions. I am looking at WebTableReader to make changes, particularly at line 319 [0]. There the query fields are set and the parameter ALL_FIELDS from webpage is passed. Can i make changes to this parameter ALL_FIELDS and then try to dump the fields based on the use

Re: Wrong ParseData in segment

2013-01-16 Thread Sebastian Nagel
Hi Markus, > However, i assumed the plugins were already in a thread-safe environment > because each > FetcherThread instance has it's own instance of ParseUtil. I had similar assumptions but the debug output to investigate my problem is straightforward (the number are object hash codes): 2013-

Re: Nutch 2.x : readdb command dump

2013-01-16 Thread Lewis John Mcgibbney
Hi Kiran, For this I think you are looking at diving further into the Gora API and codebase. As you can see around line 232 [0], the Query is set and executed based on the key. What you wish to do would possible encompass setting fields via the Gora Query API. There are some other useful methods i

Re: nutch/util/NodeWalker class is not thread safe

2013-01-16 Thread Sebastian Nagel
Hi, > Any ideas if this can cause problems Yes, it can definitely cause problems. I've just observed such a problem in our custom plugin which traverses the DOM tree to extract nodes by CSS3 selectors. > and how to make it thread safe? That's hard if not impossible. The inner states (current node

nutch/util/NodeWalker class is not thread safe

2013-01-16 Thread alxsss
Hello, I use this class NodeWalker at src/java/org/apache/nutch/util/NodeWalker.java in one of our plugins. I noticed this comment //Currently this class is not thread safe. It is assumed that only one thread will be accessing the NodeWalker at any given time." above the class definition.

RE: Wrong ParseData in segment

2013-01-16 Thread Markus Jelsma
Sebastian! I thought about that too since i do sometimes use class variables in some parse plugins such as storing the Parse object. However, i assumed the plugins were already in a thread-safe environment because each FetcherThread instance has it's own instance of ParseUtil. I'll modify the

Re: Wrong ParseData in segment

2013-01-16 Thread Sebastian Nagel
Hi Markus, right now I have seen this problem in a small test set of 20 documents: - various document types (HTML, PDF, XLS, zip, doc, ods) - small and quite large docs (up to 12 MB) - local docs via protocol-file - fetcher.parse = true - Nutch 1.4, local mode Somehow metadata from a one doc slip

Re: Nutch - ElasticSearch example

2013-01-16 Thread Lewis John Mcgibbney
It should be added that currently this functionality is available only on the 2.x branch courtesy of Ferdy. Lewis On Wednesday, January 16, 2013, Stanislav Orlenko wrote: > Hi > bin/nutch elasticindex $elasticClusterName -reindex > it is enough for me > use "bin/nutch elasticindex" for the help o

Re: Nutch - ElasticSearch example

2013-01-16 Thread Stanislav Orlenko
Hi bin/nutch elasticindex $elasticClusterName -reindex it is enough for me use "bin/nutch elasticindex" for the help output On Wed, Jan 16, 2013 at 10:32 AM, Anand Bhagwat wrote: > Hi, > I am looking for some documentation around Nutch / ElasticSearch > integration. Please let me know if you guy

Nutch - ElasticSearch example

2013-01-16 Thread Anand Bhagwat
Hi, I am looking for some documentation around Nutch / ElasticSearch integration. Please let me know if you guys have some examples for the same. Regards, Anand.