weird! Nutch supports Chinese characters searching.
Can you print your query string in search.jsp?
NOTE: the page should be encoded in UTF-8.
/Jack
=== At 2005-03-17, 13:49:00 you wrote: ===
>I have added Chinese stopwords in String[] STOP_WORDS in NutchAnalysis.jj.
>My problem is N
I have added Chinese stopwords in String[] STOP_WORDS in NutchAnalysis.jj.
My problem is Nutch returns nothing when I using any Chinese keywords.
Even though I can find these Chinese keywords in the index files(using
luke).
From: "Jason Tang" <[EMAIL PROTECTED]>
Reply-To: [EMAIL PROTECTED]
To: "
Hi cao
I think character "的" is stopword in Chinese characters.
I think NutchAnalysis.jj should load different stopwords file when the language
is different.
/Jack
=== At 2005-03-17, 10:27:40 you wrote: ===
>No anwser for this?
>Any tips are appreciated.
>
>>From: "cao yuzhong" <[
No anwser for this?
Any tips are appreciated.
From: "cao yuzhong" <[EMAIL PROTECTED]>
Reply-To: [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
CC: [EMAIL PROTECTED]
Subject: A problem about Chinese word segment
Date: Tue, 15 Mar 2005 05:16:30 +
hi,all
Now,Nutch-0.6 simply treats a Chinese character as
have you collected these offers somewhere?
Check the source-forge mail archive.
---
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to
Stefan Groschupf wrote:
Lets do some calculation:
2 billion pages: (google has 8 billion)
100 kilobytes * 2 000 000 000 = 186.264515 terabytes per Month
1 * 100MBit per Month = 33.1776 TB
186 / 33 = 5.6
The cheapest offer for 100 MBit I found was 1000 USD per month.
So you pay 6000 USD per month
Lets do some calculation:
2 billion pages: (google has 8 billion)
100 kilobytes * 2 000 000 000 = 186.264515 terabytes per Month
1 * 100MBit per Month = 33.1776 TB
186 / 33 = 5.6
The cheapest offer for 100 MBit I found was 1000 USD per month.
So you pay 6000 USD per month just crawling without an
[ http://issues.apache.org/jira/browse/NUTCH-10?page=history ]
Stefan Grroschupf updated NUTCH-10:
---
Attachment: extensionpoint_patch_withplugin.txt
Sorry murphy's law, I'm not that family with Subversion, and it works a bit
different. :-(
Anyway the p
John,
I'm very sorry, murphy's law
... and may be the new subversion
Find a new file in the jira.
Stefan
Hi, Stefan,
The patch does not seem to include the code of nutch-extensionpoints.
Or am I missing something? Thanks,
John
On Wed, Mar 16, 2005 at 08:09:21PM +0100, Stefan Grroschupf (JIR
Hi, Stefan,
The patch does not seem to include the code of nutch-extensionpoints.
Or am I missing something? Thanks,
John
On Wed, Mar 16, 2005 at 08:09:21PM +0100, Stefan Grroschupf (JIRA) wrote:
> [ http://issues.apache.org/jira/browse/NUTCH-10?page=history ]
>
> Stefan Grroschupf updated
Hi
I was recently thinking that it would be fun to start a non-profit
organization
in order to run Nutch as a really "transparent and open" search engine, very
similar as for instance Google, but really focusing only on the search.
Thanks to all Nutch devs the software is there or it's getting th
[ http://issues.apache.org/jira/browse/NUTCH-10?page=history ]
Stefan Grroschupf updated NUTCH-10:
---
Attachment: extenionPointPatch.txt
patch that changes the plugin xml and creates a new nutch-extensionpoints plugin
> extension points are defined mult
extension points are defined multiple times
---
Key: NUTCH-10
URL: http://issues.apache.org/jira/browse/NUTCH-10
Project: Nutch
Type: Bug
Reporter: Stefan Grroschupf
Priority: Minor
Attachments: extenionPointPatch.txt
Hi developers,
since we are in the process of organizing our code, I would love to see
an issue fixed that I see as minor bug since a long time.
There are some extension points defined to extend the nutch core,
however a plugin can defined it own extension points as well to make
the plugin exten
The link "API Docs" (pointing to
http://incubator.apache.org/nutch/apidocs/index.html) at Documentation
section in nutch site is not working, where I can find these docs?
Thanks
---
SF email is sponsored by - The IT Product Guide
Read honest &
=> When I'm querying with lang field this doesn't work with nutch-0.6
. So where is it ?
I'm going to make a Lucene Fuzzy Query plugin (to deal with
"approximate:word" queries), so if it already exists , please could
you tell me.
We talk about the language query filer plugin, right?
On my local
Hello,
Thanks a lot Kelvin Tan... Thanks to your tip I found it was juste a
silly things (something in plugin.xml was not correct).
Stefan :
==
here comes already a lang query filter plugin within nutch.
===
=> When I'm querying with lang field this doesn't work with nutch-0.6 .
So where is it ?
There comes already a lang query filter plugn within nutch.
It works great.
Just make sure that the language detection plugin is loaded until
indexing and that that the 'lang' query filter is loaded in tomcat as
well.
Please note that we talk about 2 different nutch-default.xml you need
to chang
Christophe, try running your query (from your db directory) using: bin/nutch
net.nutch.searcher.Query
What does it say?
If there's a discrepancy bet. that, and what's happening on Tomcat, then it
means you need to sync your changes bet. Tomcat and Nutch home.
On Wed, 16 Mar 2005 12:16:29 +0100,
Hello,
As there is a lang field, I decided to make a query-lang plugin... but
that doesn't work.
Here is my procedure :
1. cp -R query-site query-lang
2. change every "site" into "lang" (class name, parameters, etc.)
3. change all build.xml files
4. remove all things about indexing
5. compile wit
20 matches
Mail list logo