Re: Re: [Nutch-dev] RE: A problem about Chinese word segment

2005-03-16 Thread Jason Tang
weird! Nutch supports Chinese characters searching. Can you print your query string in search.jsp? NOTE: the page should be encoded in UTF-8. /Jack === At 2005-03-17, 13:49:00 you wrote: === >I have added Chinese stopwords in String[] STOP_WORDS in NutchAnalysis.jj. >My problem is N

Re: [Nutch-dev] RE: A problem about Chinese word segment

2005-03-16 Thread cao yuzhong
I have added Chinese stopwords in String[] STOP_WORDS in NutchAnalysis.jj. My problem is Nutch returns nothing when I using any Chinese keywords. Even though I can find these Chinese keywords in the index files(using luke). From: "Jason Tang" <[EMAIL PROTECTED]> Reply-To: [EMAIL PROTECTED] To: "

Re: [Nutch-dev] RE: A problem about Chinese word segment

2005-03-16 Thread Jason Tang
Hi cao I think character "的" is stopword in Chinese characters. I think NutchAnalysis.jj should load different stopwords file when the language is different. /Jack === At 2005-03-17, 10:27:40 you wrote: === >No anwser for this? >Any tips are appreciated. > >>From: "cao yuzhong" <[

[Nutch-dev] RE: A problem about Chinese word segment

2005-03-16 Thread cao yuzhong
No anwser for this? Any tips are appreciated. From: "cao yuzhong" <[EMAIL PROTECTED]> Reply-To: [EMAIL PROTECTED] To: [EMAIL PROTECTED] CC: [EMAIL PROTECTED] Subject: A problem about Chinese word segment Date: Tue, 15 Mar 2005 05:16:30 + hi,all Now,Nutch-0.6 simply treats a Chinese character as

[Nutch-dev] Re: Starting a non-profit organisation running Nutch with a thousand or more sponsored servers

2005-03-16 Thread Stefan Groschupf
have you collected these offers somewhere? Check the source-forge mail archive. --- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to

[Nutch-dev] Re: Starting a non-profit organisation running Nutch with a thousand or more sponsored servers

2005-03-16 Thread Michael Wechner
Stefan Groschupf wrote: Lets do some calculation: 2 billion pages: (google has 8 billion) 100 kilobytes * 2 000 000 000 = 186.264515 terabytes per Month 1 * 100MBit per Month = 33.1776 TB 186 / 33 = 5.6 The cheapest offer for 100 MBit I found was 1000 USD per month. So you pay 6000 USD per month

[Nutch-dev] Re: Starting a non-profit organisation running Nutch with a thousand or more sponsored servers

2005-03-16 Thread Stefan Groschupf
Lets do some calculation: 2 billion pages: (google has 8 billion) 100 kilobytes * 2 000 000 000 = 186.264515 terabytes per Month 1 * 100MBit per Month = 33.1776 TB 186 / 33 = 5.6 The cheapest offer for 100 MBit I found was 1000 USD per month. So you pay 6000 USD per month just crawling without an

[Nutch-dev] [jira] Updated: (NUTCH-10) extension points are defined multiple times

2005-03-16 Thread Stefan Grroschupf (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-10?page=history ] Stefan Grroschupf updated NUTCH-10: --- Attachment: extensionpoint_patch_withplugin.txt Sorry murphy's law, I'm not that family with Subversion, and it works a bit different. :-( Anyway the p

[Nutch-dev] Re: [jira] Updated: (NUTCH-10) extension points are defined multiple times

2005-03-16 Thread Stefan Groschupf
John, I'm very sorry, murphy's law ... and may be the new subversion Find a new file in the jira. Stefan Hi, Stefan, The patch does not seem to include the code of nutch-extensionpoints. Or am I missing something? Thanks, John On Wed, Mar 16, 2005 at 08:09:21PM +0100, Stefan Grroschupf (JIR

[Nutch-dev] Re: [jira] Updated: (NUTCH-10) extension points are defined multiple times

2005-03-16 Thread John X
Hi, Stefan, The patch does not seem to include the code of nutch-extensionpoints. Or am I missing something? Thanks, John On Wed, Mar 16, 2005 at 08:09:21PM +0100, Stefan Grroschupf (JIRA) wrote: > [ http://issues.apache.org/jira/browse/NUTCH-10?page=history ] > > Stefan Grroschupf updated

[Nutch-dev] Starting a non-profit organisation running Nutch with a thousand or more sponsored servers

2005-03-16 Thread Michael Wechner
Hi I was recently thinking that it would be fun to start a non-profit organization in order to run Nutch as a really "transparent and open" search engine, very similar as for instance Google, but really focusing only on the search. Thanks to all Nutch devs the software is there or it's getting th

[Nutch-dev] [jira] Updated: (NUTCH-10) extension points are defined multiple times

2005-03-16 Thread Stefan Grroschupf (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-10?page=history ] Stefan Grroschupf updated NUTCH-10: --- Attachment: extenionPointPatch.txt patch that changes the plugin xml and creates a new nutch-extensionpoints plugin > extension points are defined mult

[Nutch-dev] [jira] Created: (NUTCH-10) extension points are defined multiple times

2005-03-16 Thread Stefan Grroschupf (JIRA)
extension points are defined multiple times --- Key: NUTCH-10 URL: http://issues.apache.org/jira/browse/NUTCH-10 Project: Nutch Type: Bug Reporter: Stefan Grroschupf Priority: Minor Attachments: extenionPointPatch.txt

[Nutch-dev] extension points are defined multiple times

2005-03-16 Thread Stefan Groschupf
Hi developers, since we are in the process of organizing our code, I would love to see an issue fixed that I see as minor bug since a long time. There are some extension points defined to extend the nutch core, however a plugin can defined it own extension points as well to make the plugin exten

[Nutch-dev] API Docs link not working

2005-03-16 Thread Alonso Andres
The link "API Docs" (pointing to http://incubator.apache.org/nutch/apidocs/index.html) at Documentation section in nutch site is not working, where I can find these docs? Thanks --- SF email is sponsored by - The IT Product Guide Read honest &

Re: [Nutch-dev] Re: a query-lang plugin

2005-03-16 Thread Stefan Groschupf
=> When I'm querying with lang field this doesn't work with nutch-0.6 . So where is it ? I'm going to make a Lucene Fuzzy Query plugin (to deal with "approximate:word" queries), so if it already exists , please could you tell me. We talk about the language query filer plugin, right? On my local

Re: [Nutch-dev] Re: a query-lang plugin

2005-03-16 Thread Christophe Noel
Hello, Thanks a lot Kelvin Tan... Thanks to your tip I found it was juste a silly things (something in plugin.xml was not correct). Stefan : == here comes already a lang query filter plugin within nutch. === => When I'm querying with lang field this doesn't work with nutch-0.6 . So where is it ?

[Nutch-dev] Re: a query-lang plugin

2005-03-16 Thread Stefan Groschupf
There comes already a lang query filter plugn within nutch. It works great. Just make sure that the language detection plugin is loaded until indexing and that that the 'lang' query filter is loaded in tomcat as well. Please note that we talk about 2 different nutch-default.xml you need to chang

[Nutch-dev] Re: a query-lang plugin

2005-03-16 Thread Kelvin Tan
Christophe, try running your query (from your db directory) using: bin/nutch net.nutch.searcher.Query What does it say? If there's a discrepancy bet. that, and what's happening on Tomcat, then it means you need to sync your changes bet. Tomcat and Nutch home. On Wed, 16 Mar 2005 12:16:29 +0100,

[Nutch-dev] a query-lang plugin

2005-03-16 Thread Christophe Noel
Hello, As there is a lang field, I decided to make a query-lang plugin... but that doesn't work. Here is my procedure : 1. cp -R query-site query-lang 2. change every "site" into "lang" (class name, parameters, etc.) 3. change all build.xml files 4. remove all things about indexing 5. compile wit