On Thu, Jan 17, 2013 at 12:39 AM, Lewis John Mcgibbney <
lewis.mcgibb...@gmail.com> wrote:
> Hi Kiran,
>
> On Wednesday, January 16, 2013, kiran chitturi
> wrote:
> >
> > Can i make changes to this parameter ALL_FIELDS and then try to dump the
> > fields based on the user input ? This command mig
Hi Kiran,
On Wednesday, January 16, 2013, kiran chitturi
wrote:
>
> Can i make changes to this parameter ALL_FIELDS and then try to dump the
> fields based on the user input ? This command might look like './bin/nutch
> readdb -dump baseUrl $OUTPUT'.
I assume by $OUTPUT you mean the field to pas
On Wed, Jan 16, 2013, at 12:53 PM, user-h...@nutch.apache.org wrote:
> Hi! This is the ezmlm program. I'm managing the
> user@nutch.apache.org mailing list.
>
> I'm working for my owner, who can be reached
> at user-ow...@nutch.apache.org.
>
> To confirm that you would like
>
>coco...@mail
one more thing. I did not manage to bind nutch 2.1 with the newest version
of elasticsearch (0.20.2). Although I am sure that this version of nutch
works fine with elasticsearch-0.19.10
If I remember correctly nutch 2.1 cannot discover elasticsearch cluster and
exit by timeout
On Wed, Jan 16, 201
Hi Sebastian,
Makes sense, i'll be sure to modify the parser plugins. Perhaps it would be
worth trying to make sure a single thread uses a single instance. I don't know
why it works the way it does. Judging from the pointed thread it's intended
behaviour.
On the other side, reusing parser plug
Hi Lewis,
Thanks for your suggestions. I am looking at WebTableReader to make
changes, particularly at line 319 [0]. There the query fields are set and
the parameter ALL_FIELDS from webpage is passed.
Can i make changes to this parameter ALL_FIELDS and then try to dump the
fields based on the use
Hi Markus,
> However, i assumed the plugins were already in a thread-safe environment
> because each
> FetcherThread instance has it's own instance of ParseUtil.
I had similar assumptions but the debug output to investigate my problem is
straightforward
(the number are object hash codes):
2013-
Hi Kiran,
For this I think you are looking at diving further into the Gora API and
codebase.
As you can see around line 232 [0], the Query is set and executed based on
the key.
What you wish to do would possible encompass setting fields via the Gora
Query API. There are some other useful methods i
Hi,
> Any ideas if this can cause problems
Yes, it can definitely cause problems. I've just observed such a problem
in our custom plugin which traverses the DOM tree to extract nodes by CSS3
selectors.
> and how to make it thread safe?
That's hard if not impossible. The inner states
(current node
Hello,
I use this class NodeWalker at src/java/org/apache/nutch/util/NodeWalker.java
in one of our plugins. I noticed this comment
//Currently this class is not thread safe. It is assumed that only one
thread will be accessing the NodeWalker at any given time."
above the class definition.
Sebastian!
I thought about that too since i do sometimes use class variables in some parse
plugins such as storing the Parse object. However, i assumed the plugins were
already in a thread-safe environment because each FetcherThread instance has
it's own instance of ParseUtil.
I'll modify the
Hi Markus,
right now I have seen this problem in a small test set of 20 documents:
- various document types (HTML, PDF, XLS, zip, doc, ods)
- small and quite large docs (up to 12 MB)
- local docs via protocol-file
- fetcher.parse = true
- Nutch 1.4, local mode
Somehow metadata from a one doc slip
It should be added that currently this functionality is available only on
the 2.x branch courtesy of Ferdy.
Lewis
On Wednesday, January 16, 2013, Stanislav Orlenko
wrote:
> Hi
> bin/nutch elasticindex $elasticClusterName -reindex
> it is enough for me
> use "bin/nutch elasticindex" for the help o
Hi
bin/nutch elasticindex $elasticClusterName -reindex
it is enough for me
use "bin/nutch elasticindex" for the help output
On Wed, Jan 16, 2013 at 10:32 AM, Anand Bhagwat wrote:
> Hi,
> I am looking for some documentation around Nutch / ElasticSearch
> integration. Please let me know if you guy
Hi,
I am looking for some documentation around Nutch / ElasticSearch
integration. Please let me know if you guys have some examples for the same.
Regards,
Anand.
15 matches
Mail list logo