i used an existing ThaiAnalyzer which was in lucene package.
ok - i renamed the lucene.analysis.th.* to nutch.analysis.th.* - compiled
and
placed all class files in a jar - analysis-th.jar (do i need to bundle the
ngp file in the jar as well ?)
1. You don't have to refactor the lucene analyzer.
Hi guys,
I have a few questions regarding the way nutch indexes and the best way a
recrawl can be implemented.
1. Why does nutch has to create a new index every time when indexing,
while it can just merge it with the old existing index? I try to change the
value in the IndexMerger cla
[
http://issues.apache.org/jira/browse/NUTCH-339?page=comments#action_12454045 ]
Sami Siren commented on NUTCH-339:
--
I am running with 300 thread, and in parsing mode
thread dump shows:
191 threads waiting on condition
at java.lang.Thread.slee
[
http://issues.apache.org/jira/browse/NUTCH-339?page=comments#action_12453989 ]
Andrzej Bialecki commented on NUTCH-339:
-
Ah, we are getting somewhere ... fetchQueues.totalSize=0 means that all input
entries from the queues have been p
[
http://issues.apache.org/jira/browse/NUTCH-339?page=comments#action_12453975 ]
Sami Siren commented on NUTCH-339:
--
perhaps thath exception is just a consequence of something other like this:
2006-11-27 07:35:09,434 INFO fetcher.Fetcher2 - -a
new field's data is also stored as a meta data - value is assigned during
parse process and then during index, it reads meta-data field value and adds
it to an index. Looks like, I will have to run parse and index again.
Thanks much.
On 11/28/06, Gal Nitzan <[EMAIL PROTECTED]> wrote:
Hi,
You
[
http://issues.apache.org/jira/browse/NUTCH-407?page=comments#action_12453934 ]
Chris A. Mattmann commented on NUTCH-407:
-
I'm not entirey sure what the right answer to this is. One thing that I do know
is that a colleague at my own wor
[
http://issues.apache.org/jira/browse/NUTCH-407?page=comments#action_12453932 ]
Alan Tanaman commented on NUTCH-407:
In our team we feel that this patch would have been beneficial in practical
terms. In the context of the enterprise intell
Hi,
You do not mention if the new field's data is stored as a metadata? Does the
value data being created during parse or is it added only during the index
phase?
If your new field is created during the parse process than you could delete
only the parse folders and run the parse process i.e. (del
Hi All,
Is it possible to update the index without refetching everything? I have
changed logic of one of my plugins (which also sets a custom field in the
index) - and I would like this field to get updated without refetching
everything - is it doable?
Thanks,
[
http://issues.apache.org/jira/browse/NUTCH-233?page=comments#action_12453919 ]
Sean Dean commented on NUTCH-233:
-
Could I suggest that this change, from ".*(/.+?)/.*?\1/.*?\1/" to
".*(/[^/]+)/[^/]+\1/[^/]+\1/" be committed to at least trunk fo
[
http://issues.apache.org/jira/browse/NUTCH-339?page=comments#action_12453820 ]
Andrzej Bialecki commented on NUTCH-339:
-
This looks weird, if anything it rather seems caused by a bug in Hadoop - are
you able to run "readseg -dump" on
12 matches
Mail list logo