timizePhrase. The other change
> I mentioned previously was adding the PorterStemFilter
> to NutchDocumentAnalysis.tokenStream.
>
> If anyone is interested in the changes, let me know
> and I'll send them to you. Or maybe it's worth slapping
> onto the Wiki.
>
>
;>
>>
>> Otis
>> --
>> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>>
>>
>>
>> - Original Message
>> > From: RanjithStar
>> > To: nutch-user@lucene.apache.org
>> > Sent: Wednesday, December 17, 2008
; > To: nutch-user@lucene.apache.org
> > Sent: Wednesday, December 17, 2008 2:29:07 AM
> > Subject: Re: Stemming issues
> >
> >
> > Hi,
> > Thanks for your reply. I can do stemming. So 'flowers' will be stemmed as
> > 'flower' and Lucene will i
gt; To: nutch-user@lucene.apache.org
> Sent: Wednesday, December 17, 2008 2:29:07 AM
> Subject: Re: Stemming issues
>
>
> Hi,
> Thanks for your reply. I can do stemming. So 'flowers' will be stemmed as
> 'flower' and Lucene will index it as 'flower
lto:ranjith2...@gmail.com]
Gesendet: 17 December 2008 08:29
An: nutch-user@lucene.apache.org
Betreff: Re: Stemming issues
Hi,
Thanks for your reply. I can do stemming. So 'flowers' will be stemmed as
'flower' and Lucene will index it as 'flower' itself. But the problem is, i
Hi,
Thanks for your reply. I can do stemming. So 'flowers' will be stemmed as
'flower' and Lucene will index it as 'flower' itself. But the problem is, if
I search for 'flowers', it won't give any result. How can we tackle this?
If we perform a search
Hi,
Yes, if you want flowers to match flower you will want to apply stemming. You
can use the Snowball for English. I don't have any code handy, but you can see
how it's done if you look at Lucene's unit test for Snowball Analyzer.
Otis
--
Sematext -- http://sematext.com/ -
ow can it be accomplished? Also, which stemmer can I use? snowball?
Can any one please add some code snippet for how to use it?
--
View this message in context:
http://www.nabble.com/Stemming-issues-tp21035261p21035261.html
Sent from the Nutch - User mailing list archive at Nabble.com.
I managed to connect Nutch 0.9 to my stemming machine. Don't know if
my approach would work on 0.8.1
On Wed, Oct 29, 2008 at 10:56 PM, jcze <[EMAIL PROTECTED]> wrote:
>
> Hi, i'm using nutch 0.8.1, I'm lost about the stemming of nutch, tried the
> wiki on MultiLingua
Hi, i'm using nutch 0.8.1, I'm lost about the stemming of nutch, tried the
wiki on MultiLingual Support. coz it said that it could stem the words..
hmm.. but I'm lost because it said that I need to modify the IndexSegment
class which i couldnt find.. =(
Anywayz, i tried the stemm
gt; From: Sathyam Y <[EMAIL PROTECTED]>
> To: nutch-user@lucene.apache.org
> Sent: Thursday, May 8, 2008 12:16:07 PM
> Subject: Stemming / Summary issue
>
> I am trying to integrate PorterStemming with Nutch and was able to
> successfully follow the changes suggested a
I am trying to integrate PorterStemming with Nutch and was able to
successfully follow the changes suggested at
http://wiki.apache.org/nutch/Stemming?highlight=%28stemming%29
The search results are working well with stemmed words, but I am
having difficulty getting correct summaries. I am
I am trying to integrate PorterStemming with Nutch and was able to
successfully follow the changes suggested at
http://wiki.apache.org/nutch/Stemming?highlight=%28stemming%29
The search results are working well with stemmed words, but I am
having difficulty getting correct summaries
All,
I am trying to integrate PorterStemming with Nutch and was able to
successfully follow the changes suggested at
http://wiki.apache.org/nutch/Stemming?highlight=%28stemming%29
The search results are working well with stemmed words, but I am having
difficulty getting correct
Hi list,
I'm trying to get stemming working on nutch-1.0-dev using the
instructions found on the wiki for version 0.8 (
http://wiki.apache.org/nutch/Stemming ). I've set up everything pretty
much how it was outlined in the walkthrough, but I'm getting errors
when I try to
;.
Specifically, I'd forgotten to make the change (as they have on the
wiki) to my nutch-default.xml, in the value for plugin.includes
replacing "query-(basic|site|url)" with "query-(stemmer|site|url)".
Howie Wang wrote:
It sounds like the query parser is not stemming
It sounds like the query parser is not stemming for you. Make sure
that you activate the new stemming query filter is activated in the
Nutch directory under your app server. Check the nutch-*.xml files
under WEB-INF/classes to make sure that your new query filter is
included.
Howie
> Date:
First of all, a question on stemming. We've tried applying the patches from
the main wiki ( http://wiki.apache.org/nutch/Stemming ) and that seems to
work fine for the most part. We are seeing one kind of strange result
though. If we index a series of pages (web crawl of 2 of our sites
Doğacan Güney wrote:
On 6/28/07, Robert Young <[EMAIL PROTECTED]> wrote:
Hi,
Are the Nutch Stemming modifications available as a patch? I can't
seem to find anything on issue.apache.org
There is some sort of stemming for German and French languages
(available as plugin anal
On 6/28/07, Robert Young <[EMAIL PROTECTED]> wrote:
Hi,
Are the Nutch Stemming modifications available as a patch? I can't
seem to find anything on issue.apache.org
There is some sort of stemming for German and French languages
(available as plugin analysis-de and analysis-fr). I
Hi,
Are the Nutch Stemming modifications available as a patch? I can't
seem to find anything on issue.apache.org
Thanks
Rob
Hello Ronny,
Tuesday, June 19, 2007, 14:38, you wrote:
NR> Is'nt English default for nutch?
It must as I understand but I do not see any stemming for English.
--
Best regards,
Scammailto:[EMAIL PROTECTED]
Is'nt English default for nutch?
Regards,
Ronny
-Opprinnelig melding-
Fra: Scam [mailto:[EMAIL PROTECTED]
Sendt: 19. juni 2007 11:54
Til: shinta himura
Emne: Re[2]: Problems stemming
Hello shinta,
Monday, June 18, 2007, 23:23, you wrote:
sh> I resolved this problem :
Thank
Hello shinta,
Monday, June 18, 2007, 23:23, you wrote:
sh> I resolved this problem :
Thank you for the answer! But your solution is for languages FR and DE.
I need to turn analysis for English language and I do not see
anylisis-en at all in the plugins directory. May be you have some
ideas about
I am also interested in the stemming part :-)
Regards,
Ronny
-Opprinnelig melding-
Fra: Scam [mailto:[EMAIL PROTECTED]
Sendt: 18. juni 2007 18:04
Til: shinta himura
Emne: Re: Problems stemming
Hello shinta,
Wednesday, June 13, 2007, 12:36, you wrote:
sh> I have some problems w
> nutch-user@lucene.apache.org> Subject: Re: Problems stemming> > Hello
> shinta,> > Wednesday, June 13, 2007, 12:36, you wrote:> > sh> I have some
> problems with Nutch's stemmer. I don't manage to get> sh> it working.. I use
> Nucht 0.9. Could you explain
tch-default.xml.
I have the same problem. Analysis plugin is included
(analysis-(fr|en|de)) but stemming does not work at all.
--
Best regards,
Scammailto:[EMAIL PROTECTED]
Hi,
I have some problems with Nutch's stemmer. I don't manage to get it working. I
use Nucht 0.9. Could you explain me what I have to do in way to activate it. I
already add necessary plugins in file conf/nutch-default.xml.
Regards,
Damien.
: Re: exact matches and stemming
Maybe you could store in your index both the stemmed word and the original
one. Although it will increment the size of your index.
Another posibllity could be to develop a WildcardQuery plugin or a
FuzzyQuery plugin, because lucene comes with this capabilities,
Maybe you could store in your index both the stemmed word and the original
one. Although it will increment the size of your index.
Another posibllity could be to develop a WildcardQuery plugin or a
FuzzyQuery plugin, because lucene comes with this capabilities, and avoid
stemming task. But it is
Hello,
I want to use the FrenchAnalyzer for stop word and stemming treatment
but I want to still be able to do exact search, the problem is that the
FrenchAnalyzer remove characters from the terms when the indexing is made so it
isn't possible to have only exact matches from an index in
pawns a great deal of controversy as
> to weather it can be considered "intelligent" or just very good at
> smalltalk. :)
>
>> Having said that, I'd like to make my users search experience as good as
>> possible. To do that, I need to solve two little "problems
ing and spawns a great deal of controversy as
to weather it can be considered "intelligent" or just very good at
smalltalk. :)
Having said that, I'd like to make my users search experience as good as
possible. To do that, I need to solve two little "problems" :
-
Hi everyone!
Im using version 7.2 of Nutch and Im very happy with it. Want to send a
big thumbs up for you guys behind it!
Having said that, Id like to make my users search experience as good as
possible. To do that, I need to solve two little problems :
- Stemming in my
at we have to replace the accented chars before indexing, and also to
do it on the query string.
Could you detailed for me how to use the class AccentReplacer .
- second point, I want to make French stemming, could somebody helps me?
Thanks in advance.
Aîcha
We could, although other than readability, it won't make any difference.
[EMAIL PROTECTED] wrote:
Hi, Matthew
I think we should use fieldName instead of field, or not...
===stemming code begin===
public TokenStream tokenStream(String field, Reader r
Hi, Matthew
I think we should use fieldName instead of field, or not...
===stemming code begin===
public TokenStream tokenStream(String field, Reader reader) {
Analyzer analyzer;
if ("anchor".equals(field)) {
analyzer = ANCHO
Howie,
Thanks for all the help configuring your stemming addon for version
0.8. I compared query-basic and query-stemmer and the only new feature
that was added is a "host" boost. I made the changes and everything
works perfect.
I uploaded the code to the wiki for both version
his version
of stemming everything works. Also the pagination is realized too.
The best way is to develop Eugen's code - this is my opinion. I think
that Jerome Charron also interested in that code - because of
highlighting of results.
What is Your opinion about aforesaid?
to Eugene: Can You
Hi,
I think we should wait when Eugen can share his code. In his version
of stemming everything works. Also the pagination is realized too.
The best way is to develop Eugen's code - this is my opinion. I think
that Jerome Charron also interested in that code - because of
highlighting of re
quot; doesnt, even though thats the
word thats actually on the page).
I tried a different approach and removed the query-stemmer value from
nutch-site.xml to attempt to disable the plugin. I reran the crawl and it
didn't load the plugin. However, it still had the same stemming
functionalit
an the crawl and
it didn't load the plugin. However, it still had the same stemming
functionality. I'm guessing this is due to editing the main files such
as CommonGrams.java and NutchDocumentAnalyzer.java. Should I attempt too
copy the needed methods into StemmerQueryFilter.java and try
ay, someone recently told me that they
were able to put all the stemming code into an indexing
filter without touching any of the main code. All they
did was to copy some of the code that is being done
in NutchDocumentAnalyzer and CommonGrams into
their custom index filter. Haven't tried it myse
gine "interview", the stemming takes place and the
page with the word "interviews" is returned.
However, if I type in the word "interviews" no page is returned. (The
page with the word interviews on it should be returned).
Any ideas??
Matt
Dima Mazmanov wrote:
Hi,
d.equals("title")) {
ts = new LowerCaseFilter(ts);
return new PorterStemFilter(ts);
} else {
return ts;
P.S.
May be I miss something - because I can't make the my last nutch buld
to crawl.
Regards
Alexey.
Hi, .
I've gotten a couple of questions offlist about stemm
Hi, Dima
Thanks for Your contribution. I'll try it on this sunday.
> Hi, .
>
> I've gotten a couple of questions offlist about stemming
> so I thought I'd just post here with my changes. Sorry that
> some of the changes are in the main code and not in a plug
Hi, .
I've gotten a couple of questions offlist about stemming
so I thought I'd just post here with my changes. Sorry that
some of the changes are in the main code and not in a plugin. It
seemed that it's more efficient to put in the main analyzer. It
would be nice if later rel
otels" miss out of the results.
Also because of stemming my fielded searching on custom fields has
stopped working. Implemented on the lines of
http://wiki.apache.org/nutch/WritingPluginExample
If I search for "rating:3" it gets modified to "rate 3" and hence I
don
I am using the code as given at
http://www.nabble.com/RE%3A-Nutch-does-not-use-stemmers--p249520.html
Deactivate the basic query filter and it should work.
Jérôme
--
http://motrech.free.fr/
http://www.frutch.org/
I am using the code as given at
http://www.nabble.com/RE%3A-Nutch-does-not-use-stemmers--p249520.html
On 6/29/06, Jérôme Charron <[EMAIL PROTECTED]> wrote:
Yes, that's what stemming is supposed to do.
But take a look at your query (that I have cut and paste in my previous
mail): bot
yeah that page had both hotel and hotels, but shouldn't it have been
all pages that contain hotel or hotels or both. thats what stemming is
supposed to do.
Yes, that's what stemming is supposed to do.
But take a look at your query (that I have cut and paste in my previous
mail): both
yeah that page had both hotel and hotels, but shouldn't it have been
all pages that contain hotel or hotels or both. thats what stemming is
supposed to do.
I have 2 pages that contain 'groves' and no page containing 'grove', I
get no result when stemmer plugin is e
I need stemming in my search engine based on Nutch 0.7.2, the stemming
query is being created but I am not getting appropriate results.
If I search for hotel, I get 11 results, but if I search for hotels, I
get 1 result.
You got one result that contains both hotel and hotels ... no
Hey,
I need stemming in my search engine based on Nutch 0.7.2, the stemming
query is being created but I am not getting appropriate results.
If I search for hotel, I get 11 results, but if I search for hotels, I
get 1 result.
Any thoughts?
I have implemented stemming using the code in the mail
Eugen Kochuev wrote:
P.P.S Why not to develop efficient technique to fight near-duplicates
and SE spam? This is absolutely necessary if build Internet search
Why not, indeed? ;) The answer is that it is very difficult. There are
simple methods that Nutch uses (MD5 and "text profile"), but g
Hi,Eugen
I think that is right way.
---
Regards,
Alexey
> P.P.S Why not to develop efficient technique to fight near-duplicates
> and SE spam? This is absolutely necessary if build Internet search
> engine based on nutch. Another "must have" is variable refetch time
> for pages (this
Hi,Eugene
Thanks a lot!
---
Regards
Alexey
> Sorry for the delay answering you. I will definitely share my code
> with nutch community, but currently I'm on vacation, away from my
> sources, so I will share them as soon as my vacation ends ;-)
Alexey,
Sorry for the delay answering you. I will definitely share my code
with nutch community, but currently I'm on vacation, away from my
sources, so I will share them as soon as my vacation ends ;-)
P.S. Nutch is great I and I hope that my efforts will help to make it
better.
P.P.S Why not t
Hi,Jerome
I think that the best way is to ask Eugene to share his code. I hope
he will comply our request... :)
I want to believe that his answer will be positive! if not, then I
will share my "BAD code" to You.
---
Regards
Alexey
na> I don't know.
na> Could you please send me off li
I succeeded in implementing Russian stemming for nutch 0.8. Here's the
example
http://j1.lan23.net:8080/?query=%D1%81%D0%B0%D0%B9%D1%82&hitsPerPage=10
Everything is working fine, including highlighting.
Eugen, don't want to share your code with the community?
;-)
--
http://m
What is my mistake in wrapping of lucene's russian analyzer? As I
understand lucene works well with russian (I read about it in the
lucene users and developers mail lists).
I don't know.
Could you please send me off list your code.
Jérôme
Hi, Eugen!
Could You help me with russian stemming and highlighting!
How did You do that?
---
Regards
Alexey
Hi, Eugen!
Could You help me with russian stemming and highlighting!
How did You do that?
---
Regards
Alexey
Alexey,
I succeeded in implementing Russian stemming for nutch 0.8. Here's the
example http://j1.lan23.net:8080/?query=%D1%81%D0%B0%D0%B9%D1%82&hitsPerPage=10
Everything is working fine, including highlighting.
--
Best regards,
Eugenmailto:[EMAIL PROTECTED]
to Jerome
What is my mistake in wrapping of lucene's russian analyzer? As I
understand lucene works well with russian (I read about it in the
lucene users and developers mail lists).
---
Regards
Alexey
erstand how it works.
Yes, I'am trying to wrap lucene russian stemming into nutch.
Russian language, in my opinion, more powerfull "big language" in the
world - One tning could be told by many ways (using only stemming of
the words or so...).
---
Regards
Alexey
> For example: The page contain next words (in a different
> forms)(text is in russian):
Russian? I don't see a Russian Analyzer in the Trunk.
Did you write your own analyzer for Russian?
Are you using org.apache.lucene.analysis.ru.RussianAnalyzer?
The source code of RussianAnalyzer looks very c
different
forms)(text is in russian):
- fish (different forms),
- sea,
- mission (only in main form),
- electricity,
- aquarium (different forms),
- lighting (different forms).
1)with stemming
- fish (main form and not) - find (stemming works)
- sea - can't find
- mission (onl
in",
"durch", "wegen", "wird"
};
// From src/java/org/apache/lucene/analysis/de/GermanAnalyzer.java
// of Lucene 1.4.3 distribution. This could be slightly out of date.
You'd have to either modify the source code in:
src/plugin/analysis-de/src/java/org/apache/nutch/analysis/de/GermanAnalyzer.java
to use the constructor that takes the word list or the file name of the word
list,
I think.
> when I use trunk version should I change some code as it shown at wiki
> in MultiLingual support page? Because, as I understand everything in
> trunk version have been done for stemming plugins integration without
> code changing.
I believe Jérôme has implemented these code changes into the Trunk.
-kuro
Actually I could not find stopwords file. Could You help me with this.
If you have simply wrapped a Lucene's analyzer (like fr and de analyzers),
the default stop word list is inside the analyzer code (take a look at the
analyzer source).
Jérôme
--
http://motrech.free.fr/
http://www.frutch.org
When I disable the stemming (the index is the same) it could find that
words (of course it find only that form of the words which presents in
the queries).
Just a silly question: Do you build your index with the analyzers turned on?
(does the documents language was correctly guessed and the
to Jerome
> Checks that these words are not in the stopword list of your analyzer.
Actually I could not find stopwords file. Could You help me with this.
Actually I am sure that such words as mission, sea, ocean, building,
electricity, etc. couldn't be in stopwords file. (at my previous
questio
to Jerome
> Checks that these words are not in the stopword list of your analyzer.
Actually I could not find stopwords file. Could You help me with this.
Actually I am sure that such worda as a mission, sea, ocean, building,
electricity, etc. couldn't be in stopwords file. (at my previous
quest
Thanks!
to Jerome
> Checks that these words are not in the stopword list of your analyzer.
That words aren't in the stopword list. It couldn't find them at all.
When I disable the stemming (the index is the same) it could find that
words (of course it find only that form of th
1) some words it can find, but some not (I'am sure that "missing"
words present in index, because when I disable the stemming it finds
them)
Checks that these words are not in the stopword list of your analyzer.
2) the queries with positive results have some strange thing - it
At first I use trunk version.
The problem is in the next points:
1) some words it can find, but some not (I'am sure that "missing"
words present in index, because when I disable the stemming it finds
them)
2) the queries with positive results have some strange thing - it
finds not a
The analysis-xx plugin provides the stemming function to the analyzer
used for indexing but it does not provide the same stemming function
the the query analyzer.
In the trunk, the analysis plugins are both used for documents analysis
and queries analysis.
The right thing to do is to make the
Here's my understanding of the current state of analyzer, which might
be wrong.
The analysis-xx plugin provides the stemming function to the analyzer
used for indexing but it does not provide the same stemming function
the the query analyzer.
This means that instead of typing the complete
Is there any way to setup stemming? I made necessary changes in
includes by adding analysis-(de|fr|ru), but it seems to me that there
is a problem with "search query module", because the words that
present in index the nutch couldn't find (even with the right form of
the words). What is my mistake?
> Could anybody help me with adding steming for russian language.
As suggested by Matthias, you can use the lucene stemming package and wrap
it in a NutchAnalyzer.
See the analysis-fr and analysis-de sample plugins in Nutch.
A description of the internal mechanism is available on the nutch w
Hi,
there is a lucene package for russion stemming:
http://lucene.apache.org/java/docs/api/org/apache/lucene/analysis/ru/package-summary.html
We have once implemented the corresponding german stemming module:
http://wiki.apache.org/nutch/German
But this is for an old nutch version. Maybe this
Could anybody help me with adding steming for russian language.
Thanks
2) How do I get stemming to work on Nutch?
Here's what I did:
http://www.nutchhacks.com/ftopic873.php
Hi,
I wonder if you could help me out. I just have a few general questions which
I'm really stuck on:
1) What does the ontology plugin do?
2) How do I get stemming to work on Nutch?
3) I've seen some literature on Wordnet and just wondered if there's a way
to get that working on
Hi,
I wonder if you could help me out. I just have a few general questions which
I'm really stuck on:
1) What does the ontology plugin do?
2) How do I get stemming to work on Nutch?
3) I've seen some literature on Wordnet and just wondered if there's a way
to get that work
85 matches
Mail list logo