Hello,
After playing around with nutch for a few months I was tying to implement
the thai lanaguage analyzer for nutch.
Downloaded the subversion version and compiled using ant - everything fine.
Next - I didn't see any tutorial for thai - but i did see one for chinese at
http://issues.apach
Hi,
I need to modify the Nutch Indexer class because for me it is very
useful to add some fields to the generated Lucene index. I was trying
and I find out that it is possible to add fields to the Document with
doc.addField() in the reduce function. My point is that for those fields
I need the h
Hi,
My configuration was as suggested Dennis Kubes in the nutch-user forum but I
still have the problem.
I think the problem was fixed for http protocol with the NUTCH-344 and the
configuration :
http.max.delays
30
but putting the configuration :
fetcher.max.crawl.delay
30
don
i think you should learn the javacc ,then understand the analasis.jj
then the thai will be resolved soon .
just try it
On 11/7/06, sanjeev <[EMAIL PROTECTED]> wrote:
Hello,
After playing around with nutch for a few months I was tying to implement
the thai lanaguage analyzer for nutch.
Downlo
Javier P. L. wrote:
Hi,
I need to modify the Nutch Indexer class because for me it is very
useful to add some fields to the generated Lucene index. I was trying
and I find out that it is possible to add fields to the Document with
doc.addField() in the reduce function. My point is that for tho
[ http://issues.apache.org/jira/browse/NUTCH-389?page=all ]
Enis Soztutar updated NUTCH-389:
Attachment: urlTokenizer-improved.diff
This is an improvement and a minor bug fix over the previous url tokenizer.
This version first replaces characters, which
[
http://issues.apache.org/jira/browse/NUTCH-393?page=comments#action_12447787 ]
Enis Soztutar commented on NUTCH-393:
-
Also IndexingException is catched by the Indexer, in which case the whole
document is not added to the writer (the funct
porting clustering-carrot2 plugin to carrot2 v2.0
-
Key: NUTCH-397
URL: http://issues.apache.org/jira/browse/NUTCH-397
Project: Nutch
Issue Type: Improvement
Reporter: Do?acan Güney
[ http://issues.apache.org/jira/browse/NUTCH-397?page=all ]
Doğacan Güney updated NUTCH-397:
Attachment: clustering-carrot2-lib.tar.gz
carrot2-nutch-plugin.patch
clustering.patch
> porting clustering-carrot2 plugin to carr
[
http://issues.apache.org/jira/browse/NUTCH-393?page=comments#action_12447939 ]
Eelco Lempsink commented on NUTCH-393:
--
I'm not sure I agree with that. After running a document through a set of
filters you'd expect all filters ran. If not,
map-reduce very slow when crawling on single server
---
Key: NUTCH-398
URL: http://issues.apache.org/jira/browse/NUTCH-398
Project: Nutch
Issue Type: Bug
Components: fetcher
Affec
Could you give me more details on how/what to use the JAVACC ?
Am I supposed to compile the file ? I did and got some errors and warnings.
For chinese I have to modify the NutchAnalysis.jj and add some tokens - no ?
For thai I read from one Otis Gospodnetic's post that I have to add the
tag to
Oh btw - I followed the chinese tutorial and was able to compile and
everything was fine.
Lemme just test if it is working properly - however i didn't make any
changes to NutchAnalysis.jj
I need more information please.
Thanks a bunch.
--
View this message in context:
http://www.nabble.com/i
Hi sanjeev and Kauu
I want to support "Hindi-Language widely spoken in India" language.
Can u guide what else I need to modify ? I think there is no support to
search and index "Hindi" language.
I want to work on this. But I need some information as what
to modify and where eaxctly
[
http://issues.apache.org/jira/browse/NUTCH-398?page=comments#action_12448033 ]
nutch.newbie commented on NUTCH-398:
FYI
Its more of a Hadoop bug...
http://issues.apache.org/jira/browse/HADOOP-206
Seems like the bug is not highly prioriti
[
http://issues.apache.org/jira/browse/NUTCH-398?page=comments#action_12448053 ]
Uros Gruber commented on NUTCH-398:
---
Did anyone try to use single machine but not with local mode but with nutch
acting like one node? Maybe this is workaround ti
Arun
I'm sure there is/must be a patch for Hindi too.
I was seeing something on the forum about the Marathi Lanaguage.
Only there is no documentation anywhere for these things.
I'm assuming that in the pluggable architecture of Nutch the support for one
language is the
same as for any other
17 matches
Mail list logo