Hi Steve,
I'm not sure what's wrong with the above (have you tried each of the two nested
SpanNot clauses independently?), but here's another thing to try:
Your query works. And as turns out, if I don't commit the same
embarrassing lower case / upper case inconsistency over and over
Hi,
I was looking at SpanNotQuery to see if I could make do without the
position increment gaps. A search requirement that's causing me some
trouble to implement is when two terms are supposed to be on the same
L_2, yet on different L_3's (L_3's are hierarchically below L_2).
With the
Hi Rene,
On 03/17/2010 at 11:17 AM, Rene Hackl-Sommer wrote:
SpanNot fieldName=MyField
Include
!-- Gets all the matching spans within L_2 boundaries and includes
them --
SpanNot
Include
SpanNear slop=2147483647 inOrder=false
SpanTermt293/SpanTerm
SpanTermt4979/SpanTerm
/SpanNear
Hi Guys,
Thanks for the input! I am now going to put in some work to see how
things fare.
Should I post the question about substituting int with long on
lucene-dev again, if need arises?
Thanks again,
Rene
Am 15.03.2010 23:04, schrieb Steven A Rowe:
Hi Rene,
Have you seen
Sure. I'd start a new thread though, referencing this one and outlining why
none of the solutions you tried worked.
Erick
On Tue, Mar 16, 2010 at 4:35 AM, Rene Hackl-Sommer rene.a.ha...@gmx.dewrote:
Hi Guys,
Thanks for the input! I am now going to put in some work to see how things
Hello,
I am working at a use case that is very demanding regarding the number
of token positions. For one special field in the index, I need to
represent different hierarchy levels, like this:
MyField
Level_1
Level_2
Level_3
Please note that I need to do this with Lucene, not a XML search
Is your entire corpus a single document? Because I'm having trouble
imagining a single document where this would be a problem, unless
your increment gap is huge. The term positions are relative to
a single document...
You say that your levels have less than 1,000 elements each With
an increment
Is your entire corpus a single document? Because I'm having trouble
imagining a single document where this would be a problem, unless
your increment gap is huge. The term positions are relative to
a single document...
It is getting pretty huge, yes (see below). The term positions are also
Hi Rene,
Why can't you use a different field for each of the Level_X's, i.e.
MyLevel1Field, MyLevel2Field, MyLevel3Field?
On 03/15/2010 at 9:59 AM, Rene Hackl-Sommer wrote:
Search in MyField: Terms T1 and T2 on Level_2 and T3,
T4, and T5 on Level_3, which should both be in the
same
I was wondering about Steven's approach to, have you considered it?
I don't know the internals of whether you could go to a 64 bit quantity for
term positions, but I suspect it would be *very* involved, but perhaps
people more familiar with the code could comment.
How big is your corpus?
Hi Steve,
Why can't you use a different field for each of the Level_X's, i.e.
MyLevel1Field, MyLevel2Field, MyLevel3Field?
Well, the hierarchical structure needs to be maintained. As hundreds of
Level_X entities can be found on levels 2 and 3, I need to be able to
tell for instance
Hi Erick,
What about indexing
the triplets with a small increment gap between? That is:
...
gets indexed as:
level1-1/level2-1/level3-1 +gap 100
level1-1/level2-1/level3-2 +gap 100
level1-1/level2-2/level3-3 +gap 100
level1-1/level2-2/level3-4
If I understand this correctly, the field
Not quite what I had in mind, more like
level1-1/level2-1/level3-1/Term1 level1-1/level2-1/level3-1/Term2
level1-1/level2-1/level3-2/Term3 level1-1/level2-1/level3-2/Term4
With an increment gap 0f 100 and an analyzer that split on slashes, the term
positions would be
something like:
term term
Hi Rene,
Have you seen SpanNotQuery?:
http://lucene.apache.org/java/3_0_1/api/core/org/apache/lucene/search/spans/SpanNotQuery.html
For a document that looks like:
Level_1 id=1
Level_2 id=1
Level_3 id=1T1 T2 T3/Level_3
Level_3 id=2T4 T5 T6/Level_3
Level_3 id=3T7 T8 T9/Level_3
14 matches
Mail list logo