Anders Nielsen wrote:
>Can't you just keep 2 fields, one with the stemmed version of the text used
>for indexing purposes (index but not stored) and a second field with the
>original text (un-indexed but stored). Then when you know you got a match on
>the nth term in the stemmed version, you ca
tober 2001 03:44
To: [EMAIL PROTECTED]
Subject: RE: Token retrieval question
>From what I remember, lucene indices are structures like:
...>
where for every TERM there is a list of DOCs in which it appears and the
respective POSitions in that DOC.
Our problem is that TERM, usually, is a n
>From what I remember, lucene indices are structures like:
...>
where for every TERM there is a list of DOCs in which it appears and the
respective POSitions in that DOC.
Our problem is that TERM, usually, is a non-word (or stem). For display
purposes, having a real word as the representative f
Hi,
This is a nice discussion :)
> >
> Yes, I see that. One additional problem that I need to solve for my
> application is that I need to map from stemmed forms of the terms to at
> least one un-stemmed form. Ideally it would be all un-stemmed forms, but
> I can live with the first one. I r
Excellent! This is a good confirmation of my direction.
I have a question to the list - are there any votes out there for
including this kind of "stem reversal" into Lucene, or does it more
properly belong outside of Lucene, in the application using it?
(I'm leaving the text below for easy refe
>From what I remember, lucene indices are structures like:
...>
where for every TERM there is a list of DOCs in which it appears and the
respective POSitions in that DOC.
Our problem is that TERM, usually, is a non-word (or stem). For display
purposes, having a real word as the representative f
Doug Cutting wrote:
>>From: Dmitry Serebrennikov [mailto:[EMAIL PROTECTED]]
>>
>>Doug, thanks for posting these. I may end up going in this
>>direction in
>>the next few days and will use this as a blueprint. Maybe I'll end up
>>putting in the first pass implementation and then you can
>>la
> From: Dmitry Serebrennikov [mailto:[EMAIL PROTECTED]]
>
> Doug, thanks for posting these. I may end up going in this
> direction in
> the next few days and will use this as a blueprint. Maybe I'll end up
> putting in the first pass implementation and then you can
> later further
> tune it
You can count me in on this :)
--- Doug Cutting <[EMAIL PROTECTED]> wrote:
> Right now, Lucene does not have good support for
> what you're doing. Lucene
> as it stands is designed to support basic search,
> not other statistical text
> processing. However there are two features that I
> would
Doug, thanks for posting these. I may end up going in this direction in
the next few days and will use this as a blueprint. Maybe I'll end up
putting in the first pass implementation and then you can later further
tune it when you get to it.
Question on term numbers through: what would be an a
Right now, Lucene does not have good support for what you're doing. Lucene
as it stands is designed to support basic search, not other statistical text
processing. However there are two features that I would like to add to
Lucene that would help you.
1. Seekable TermDocs.
This would let you ef
I'm actually working on exactly the same problem. Just yesterday, I
implemented a new query (called CooccuranceQuery) that, given a list of
terms, acts as a BooleanQuery with all of the terms being required and
then reports back a list of other terms in the index with a count of how
many docum
12 matches
Mail list logo