implement thai lanaguage analyzer in nutch

2006-11-07 Thread sanjeev
Hello, After playing around with nutch for a few months I was tying to implement the thai lanaguage analyzer for nutch. Downloaded the subversion version and compiled using ant - everything fine. Next - I didn't see any tutorial for thai - but i did see one for chinese at http://issues.apach

Modifiying Nutch Indexer

2006-11-07 Thread Javier P. L.
Hi, I need to modify the Nutch Indexer class because for me it is very useful to add some fields to the generated Lucene index. I was trying and I find out that it is possible to add fields to the Document with doc.addField() in the reduce function. My point is that for those fields I need the h

Re: Fetcher freezes

2006-11-07 Thread Aisha
Hi, My configuration was as suggested Dennis Kubes in the nutch-user forum but I still have the problem. I think the problem was fixed for http protocol with the NUTCH-344 and the configuration : http.max.delays 30 but putting the configuration : fetcher.max.crawl.delay 30 don

Re: implement thai lanaguage analyzer in nutch

2006-11-07 Thread kauu
i think you should learn the javacc ,then understand the analasis.jj then the thai will be resolved soon . just try it On 11/7/06, sanjeev <[EMAIL PROTECTED]> wrote: Hello, After playing around with nutch for a few months I was tying to implement the thai lanaguage analyzer for nutch. Downlo

Re: Modifiying Nutch Indexer

2006-11-07 Thread Enis Soztutar
Javier P. L. wrote: Hi, I need to modify the Nutch Indexer class because for me it is very useful to add some fields to the generated Lucene index. I was trying and I find out that it is possible to add fields to the Document with doc.addField() in the reduce function. My point is that for tho

[jira] Updated: (NUTCH-389) a url tokenizer implementation for tokenizing index fields : url and host

2006-11-07 Thread Enis Soztutar (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-389?page=all ] Enis Soztutar updated NUTCH-389: Attachment: urlTokenizer-improved.diff This is an improvement and a minor bug fix over the previous url tokenizer. This version first replaces characters, which

[jira] Commented: (NUTCH-393) Indexer doesn't handle null documents returned by filters

2006-11-07 Thread Enis Soztutar (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-393?page=comments#action_12447787 ] Enis Soztutar commented on NUTCH-393: - Also IndexingException is catched by the Indexer, in which case the whole document is not added to the writer (the funct

[jira] Created: (NUTCH-397) porting clustering-carrot2 plugin to carrot2 v2.0

2006-11-07 Thread JIRA
porting clustering-carrot2 plugin to carrot2 v2.0 - Key: NUTCH-397 URL: http://issues.apache.org/jira/browse/NUTCH-397 Project: Nutch Issue Type: Improvement Reporter: Do?acan Güney

[jira] Updated: (NUTCH-397) porting clustering-carrot2 plugin to carrot2 v2.0

2006-11-07 Thread JIRA
[ http://issues.apache.org/jira/browse/NUTCH-397?page=all ] Doğacan Güney updated NUTCH-397: Attachment: clustering-carrot2-lib.tar.gz carrot2-nutch-plugin.patch clustering.patch > porting clustering-carrot2 plugin to carr

[jira] Commented: (NUTCH-393) Indexer doesn't handle null documents returned by filters

2006-11-07 Thread Eelco Lempsink (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-393?page=comments#action_12447939 ] Eelco Lempsink commented on NUTCH-393: -- I'm not sure I agree with that. After running a document through a set of filters you'd expect all filters ran. If not,

[jira] Created: (NUTCH-398) map-reduce very slow when crawling on single server

2006-11-07 Thread AJ Chen (JIRA)
map-reduce very slow when crawling on single server --- Key: NUTCH-398 URL: http://issues.apache.org/jira/browse/NUTCH-398 Project: Nutch Issue Type: Bug Components: fetcher Affec

Re: implement thai lanaguage analyzer in nutch

2006-11-07 Thread sanjeev
Could you give me more details on how/what to use the JAVACC ? Am I supposed to compile the file ? I did and got some errors and warnings. For chinese I have to modify the NutchAnalysis.jj and add some tokens - no ? For thai I read from one Otis Gospodnetic's post that I have to add the tag to

Re: implement thai lanaguage analyzer in nutch

2006-11-07 Thread sanjeev
Oh btw - I followed the chinese tutorial and was able to compile and everything was fine. Lemme just test if it is working properly - however i didn't make any changes to NutchAnalysis.jj I need more information please. Thanks a bunch. -- View this message in context: http://www.nabble.com/i

Re: implement thai lanaguage analyzer in nutch

2006-11-07 Thread Arun Kaundal
Hi sanjeev and Kauu I want to support "Hindi-Language widely spoken in India" language. Can u guide what else I need to modify ? I think there is no support to search and index "Hindi" language. I want to work on this. But I need some information as what to modify and where eaxctly

[jira] Commented: (NUTCH-398) map-reduce very slow when crawling on single server

2006-11-07 Thread nutch.newbie (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-398?page=comments#action_12448033 ] nutch.newbie commented on NUTCH-398: FYI Its more of a Hadoop bug... http://issues.apache.org/jira/browse/HADOOP-206 Seems like the bug is not highly prioriti

[jira] Commented: (NUTCH-398) map-reduce very slow when crawling on single server

2006-11-07 Thread Uros Gruber (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-398?page=comments#action_12448053 ] Uros Gruber commented on NUTCH-398: --- Did anyone try to use single machine but not with local mode but with nutch acting like one node? Maybe this is workaround ti

Re: implement thai lanaguage analyzer in nutch

2006-11-07 Thread sanjeev
Arun I'm sure there is/must be a patch for Hindi too. I was seeing something on the forum about the Marathi Lanaguage. Only there is no documentation anywhere for these things. I'm assuming that in the pluggable architecture of Nutch the support for one language is the same as for any other