flex landing), so... we just need a way to pass this Bits down to
the low level scorers that actually pull a postings list. But, we should only
do this if the filter is not sparse.
Also: the filter must be inverted, and, ORd with the deleted docs.
This can result in enormous perf gains for searches d
seful and I'm hoping to add smart
dtd-driven query entry into Luke.
> contrib/xml-query-parser: NumericRangeQuery and -Filter support
> ---
>
> Key: LUCENE-2306
> URL: https://issues.
[
https://issues.apache.org/jira/browse/LUCENE-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Uwe Schindler updated LUCENE-2306:
--
Summary: contrib/xml-query-parser: NumericRangeQuery and -Filter support
(was: contrib/xml
yaa... and now I am trying with multiple filters. Thanks
--
View this message in context:
http://old.nabble.com/Lucene-Filter-tp27756577p27778081.html
Sent from the Lucene - Java Developer mailing list archive at Nabble.com
Maybe now its also running correct with the filter?
-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de
> -Original Message-
> From: Dyutiman [mailto:dyutiman.chaudh...@gmail.com]
> Sent: Wednesday, March 03, 2010 2:34 PM
&
Oho... actually I didn't check that part of my code at all
Thanks a lot for pointing out this to me. The search is running perfectly
now
thanks
Dyutiman
--
View this message in context:
http://old.nabble.com/Lucene-Filter-tp27756577p27768251.html
Sent from the Lucene - Java Deve
--
From: Dyutiman
To: java-dev@lucene.apache.org
Sent: Wed, 3 March, 2010 11:40:29
Subject: Re: Lucene Filter
Thanks Erick,
I tried Luke and it seems that my index is fine (see the screenshot attached
http://old.nabble.com/file/p27767115/luke.JPG luke.JPG ).
That means I did something w
SearchUtil.java ). If you please can check it ones that will be very
helpful.
thanks again
Dyutiman
--
View this message in context:
http://old.nabble.com/Lucene-Filter-tp27756577p27767115.html
Sent from the Lucene - Java Developer mailing list archive at Nabble.com
Taking a quick glance at the code, I don't see anything
obviously wrong as far as the problem you describe goes.
What happens if you just add a required clause to your query
string rather than use a Filter? Something like
+sentiment:positive? If you do that, query.toString is
your f
code
>
> HTH
> Erick
>
> On Tue, Mar 2, 2010 at 9:35 AM, Dyutiman
> wrote:
>
>>
>> Hi,
>> I am new in this forum and new to Lucene also. I m getting some issue
>> while
>> trying to filter my Lucene result.
>>
>> While creating the
c, etc.
Cure this by moving the new Document inside the while loop
If this doesn't help, please show your indexing and
searching code
HTH
Erick
On Tue, Mar 2, 2010 at 9:35 AM, Dyutiman wrote:
>
> Hi,
> I am new in this forum and new to Lucene also. I m getting some i
Hi,
I am new in this forum and new to Lucene also. I m getting some issue while
trying to filter my Lucene result.
While creating the index I am creating a field called sentiment and possible
values are 'positive', 'negative' & 'neutral', I am indexing
the help here!
> Automaton Query/Filter (scalable regex)
> ---
>
> Key: LUCENE-1606
> URL: https://issues.apache.org/jira/browse/LUCENE-1606
> Project: Lucene - Java
> Issue Type: New Feat
revision 91.
> Automaton Query/Filter (scalable regex)
> ---
>
> Key: LUCENE-1606
> URL: https://issues.apache.org/jira/browse/LUCENE-1606
> Project: Lucene - Java
> Is
mmit this to the flex branch tomorrow.
The only differences from Uwe's patch will be:
* ensure the barred-O (ø) is corrrect in Anders name for the NOTICE.txt
* remove the unused instance variable in the enum, as it is unused and
irrelevant for FilteredTermsEnum
> Automaton Query/Filter (sc
e to start looking at committing this to flex so we
do not have to work with huge patches?
+1
> Automaton Query/Filter (scalable regex)
> ---
>
> Key: LUCENE-1606
> URL: https://issues.apache.org/jira
see our testcases exercise the
important bits (not .toString or .toDot or other things, but those
work too).
If you have concerns or think it is confusing, i will do my best to try to
figure out ways to simplify or improve it from here.
> Automaton Query/Filter (scalabl
x27;s looks good, my change was only adding the method param and removing the
access to the noew private tenum.
> Automaton Query/Filter (scalable regex)
> ---
>
> Key: LUCENE-1606
> URL: https://issues.apache.org/jir
ery,
> Filter filter, int n, Sort sort)
>
>
> Key: LUCENE-1271
> URL: https://issues.apache.org/jira/browse/LUCENE-1271
> P
help a bit when seeking.
instead a char[] is reused, and nextString() etc returns boolean if more
solutions exist.
I think its actually more readable in a way, need to reorganize a bit more but
I need a break from this enum.
> Automaton Query/Filter (scalable re
. and it simplifies code.
> Automaton Query/Filter (scalable regex)
> ---
>
> Key: LUCENE-1606
> URL: https://issues.apache.org/jira/browse/LUCENE-1606
> Project: Lucene - Java
>
d by state number for caching transitions, instead of a
hashmap.
> Automaton Query/Filter (scalable regex)
> ---
>
> Key: LUCENE-1606
> URL: https://issues.apache.org/jira/browse/LUCENE-1606
> Project: Lu
compat. I added
it.
> Automaton Query/Filter (scalable regex)
> ---
>
> Key: LUCENE-1606
> URL: https://issues.apache.org/jira/browse/LUCENE-1606
> Project: Lucene - Java
> Is
> Automaton Query/Filter (scalable regex)
> ---
>
> Key: LUCENE-1606
> URL: https://issues.apache.org/jira/browse/LUCENE-1606
> Project: Lucene - Java
> Issue Type: New Feature
>
you get a chance to review it, I can create a new version of the flex branch
patch for this issue... this would resolve one of my "big 3 complaints" about
complexity of the code.
> Automaton Query/Filter (scalable regex)
> ---
>
>
put this nextValidUTF16String in UnicodeUtil and
also use it in SegmentReader.LegacyTermEnum to replace the "hack", just in case
someone else wrote an enum like mine.
+1
> Automaton Query/Filter (scalable regex)
> ---
>
>
= new TermRef(t.text());
{code}
instead it could read something like tr = new
TermRef(UnicodeUtil.nextValidUTF16String(t.text()));
> Automaton Query/Filter (scalable regex)
> ---
>
> Key: LUCENE-1606
> URL: https://i
.
So I think I do not absolutely hate the unicode handling code in this enum
anymore.
> Automaton Query/Filter (scalable regex)
> ---
>
> Key: LUCENE-1606
> URL: https://issues.apache.org/jira/bro
(validUTF16String), and pervert it slightly into nextValidUTF16String.
all the tests pass using this on trunk and flex, and I think it reads much
easier.
> Automaton Query/Filter (scalable regex)
> ---
>
> Key: LUCENE-1606
>
dealt with it already)
// in this case we have to bump to \uE000 (the lowest possible "upper BMP")
unpaired L -> \uE000
edit: sorry for the many edits :)
> Automaton Query/Filter (scalable regex)
> ---
>
>
e 4/5:
// an unpaired low surrogate. this is invalid when not preceded by lead
surrogate
// (and if there was one, the above rules would have dealt with it already)
// in this case we have to bump to \uE (the lowest possible "upper BMP&q
is is invalid when not preceded by lead
surrogate
// (and if there was one, the above rules would have dealt with it already)
// in this case we have to bump to \uE (the lowest possible "upper BMP")
unpaired L -> \uE000
> Automaton Query/Filter (scalable regex)
&
by lead
surrogate
// (and if there was one, the above rules would have dealt with it already)
// in this case we have to bump to \uE (the lowest possible "upper BMP")
unpaired L -> \uE000
> Automaton Query/Filter (scalable regex)
> ---
il an
accept state or a loop).
> Automaton Query/Filter (scalable regex)
> ---
>
> Key: LUCENE-1606
> URL: https://issues.apache.org/jira/browse/LUCENE-1606
> Project: Lucene - Java
>
s (Author: markrmil...@gmail.com):
Sorry - haven't been paying a lot of attention to all of the Unicode
issues/talk lately.
Could you briefly explain cleanupPosition? Whats the case where a seek position
cannot be converted to UTF-8?
> Automaton Query/Filte
attention to all of the Unicode
issues/talk lately.
Could you briefly explain cleanupPosition? Whats the case where a seek position
cannot be converted to UTF-8?
> Automaton Query/Filter (scalable regex)
> ---
>
> Key: LUCENE-1606
or FilteredTermsEnum (see his branch patch, I think its easier there).
if you have ideas how we can simplify any of this in trunk for easier
readability (instead of just adding absurd amounts of comments as I did), I'd
be very interested.
> Automaton
n the position of testing anyway - else I'll look
like a moron when I +1 this thing ;)
bq. If you save this test setup,
I'll save it for sure.
> Automaton Query/Filter (scalable regex)
> ---
>
> Key: LUCENE-1606
&
here).
> Automaton Query/Filter (scalable regex)
> ---
>
> Key: LUCENE-1606
> URL: https://issues.apache.org/jira/browse/LUCENE-1606
> Project: Lucene - Java
> Issue Type: New Feature
>
onfused. Please
>>> anybody help me out to know which part of codes you are working with. How
>>> should I participate in work? Thank you!
>>>
>>>
>>>
>>>
>>>
>>> On Sat, Dec 5, 2009 at 1:02 PM, Uwe Schindler (JIRA) wrote:
>>>
>>>>
ndard corpus). I think the benches you have
already done are probably plenty good for benefits testing.
> Automaton Query/Filter (scalable regex)
> ---
>
> Key: LUCENE-1606
> URL: https://issues.apache.org/jira
going to take more time.
> Automaton Query/Filter (scalable regex)
> ---
>
> Key: LUCENE-1606
> URL: https://issues.apache.org/jira/browse/LUCENE-1606
> Project: Lucene - Java
>
ss
I think you are right about the partial dump. I am indexing the full dump now
(at least I think). I will look at it too, at least for curiousity sake.
> Automaton Query/Filter (scalable regex)
> ---
>
> Key: LUCENE-1606
>
you have already done are probably
plenty good for benefits testing.
> Automaton Query/Filter (scalable regex)
> ---
>
> Key: LUCENE-1606
> URL: https://issues.apache.org/jira/browse/LUCENE-1606
>
of things.
bq. More interesting to see the benefits...
Right, but I'm not really testing for benefits - more for correctness and no
loss of performance. I think the benches you have already done are probably
plenty good for benefits testing.
> Automaton Query/
panel]
>>>
>>> Uwe Schindler updated LUCENE-1606:
>>> --
>>>
>>> Attachment: (was: LUCENE-1606-flex.patch)
>>>
>>> > Automaton Query/Filter (scalable regex)
>>> > -
rsian corpus i mentioned
with nearly 500k terms...
> Automaton Query/Filter (scalable regex)
> ---
>
> Key: LUCENE-1606
> URL: https://issues.apache.org/jira/browse/LUCENE-1606
> Project: Lucene - Java
&g
--
>>
>> Attachment: (was: LUCENE-1606-flex.patch)
>>
>> > Automaton Query/Filter (scalable regex)
>> > ---
>> >
>> > Key: LUCENE-1606
>> > URL: https://issu
606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
> ]
>
> Uwe Schindler updated LUCENE-1606:
> --
>
>Attachment: (was: LUCENE-1606-flex.patch)
>
> > Automaton Query/Filter (scalable regex)
etabpanels:all-tabpanel
>> ]
>>
>> Uwe Schindler updated LUCENE-1606:
>> --
>>
>> Attachment: (was: LUCENE-1606-flex.patch)
>>
>> > Automaton Query/Filter (scalable regex)
>> > ---
pache.org/jira/browse/LUCENE-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel]
>
> Uwe Schindler updated LUCENE-1606:
> --
>
> Attachment: (was: LUCENE-1606-flex.patch)
>
> >
o hear its doing so well on such a "small" index as wikipedia, as
I would think automata overhead would make it slower (although this can
probably be optimized away)
> Automaton Query/Filter (scalable regex)
> ---
>
>
[
https://issues.apache.org/jira/browse/LUCENE-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Uwe Schindler updated LUCENE-1606:
--
Attachment: (was: LUCENE-1606-flex.patch)
> Automaton Query/Filter (scalable re
*xxx***
> Automaton Query/Filter (scalable regex)
> ---
>
> Key: LUCENE-1606
> URL: https://issues.apache.org/jira/browse/LUCENE-1606
> Project: Lucene - Java
> Issue Type: New Feature
are testing
I'm not sure at the moment - but its wikipedia dumps, so I'd guess its rather
high actually. It is hitting the standard analyzer going in (mainly because I
didn't think about changing it on building the indexes). And the queries are
getting hit with the lowercase fil
instead of rewrite but
with reverted LUCENE-2110, which was stupid.
> Automaton Query/Filter (scalable regex)
> ---
>
> Key: LUCENE-1606
> URL: https://issues.apache.org/jira/browse/LUCENE-1606
> Proj
orks on String.
btw how many uniq terms is the field you are testing... this is where it starts
to help with ?, when you have a ton of unique terms.
But I am glad you are testing with hopefully a smaller # of uniq terms, this is
probably more common.
> Automaton Query/Filte
think Robert has mentioned).
So far I haven't seen any anomalies in time taken or anything of that nature.
> Automaton Query/Filter (scalable regex)
> ---
>
> Key: LUCENE-1606
> URL: https:
is used (as I think Robert has mentioned).
So far I haven't seen any anomalies in time taken or anything of that nature.
> Automaton Query/Filter (scalable regex)
> ---
>
> Key: LUCENE-1606
> URL: https://issu
very strange things like seeking forward
and backwards and returning all strange stati.
Will think about one tomorrow.
> Automaton Query/Filter (scalable regex)
> ---
>
> Key: LUCENE-1606
> URL: https://issues.
[
https://issues.apache.org/jira/browse/LUCENE-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Uwe Schindler updated LUCENE-1606:
--
Attachment: (was: LUCENE-1606-flex.patch)
> Automaton Query/Filter (scalable re
gain wrong patch.
> Automaton Query/Filter (scalable regex)
> ---
>
> Key: LUCENE-1606
> URL: https://issues.apache.org/jira/browse/LUCENE-1606
> Project: Lucene - Java
>
work for today, I am
exhausted like the enums.
> Automaton Query/Filter (scalable regex)
> ---
>
> Key: LUCENE-1606
> URL: https://issues.apache.org/jira/browse/LUCENE-1606
> Project: Lucene - Jav
[
https://issues.apache.org/jira/browse/LUCENE-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Uwe Schindler updated LUCENE-1606:
--
Attachment: (was: LUCENE-1606-flex.patch)
> Automaton Query/Filter (scalable re
LUCENE-2110.
Robert: Can you test performance again and compare with old?
> Automaton Query/Filter (scalable regex)
> ---
>
> Key: LUCENE-1606
> URL: https://issues.apache.org/jira/browse/LUCENE-1606
>
your patch the performance is the same.
But the code is much simpler and easier to read... great work.
> Automaton Query/Filter (scalable regex)
> ---
>
> Key: LUCENE-1606
> URL: https://issues.apache.org/jira
[
https://issues.apache.org/jira/browse/LUCENE-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Uwe Schindler updated LUCENE-1606:
--
Attachment: (was: LUCENE-1606-flex.patch)
> Automaton Query/Filter (scalable re
the nextSeekTerm method
to be more straigtForward.
Robert: Sorry, it would be better to test this one *g*
> Automaton Query/Filter (scalable regex)
> ---
>
> Key: LUCENE-1606
> URL: https://issues.apache.org/jira/bro
with the old and new flex patch, I do
not want to commit 2110 before.
Uwe I will run a benchmark on both versions!
> Automaton Query/Filter (scalable regex)
> ---
>
> Key: LUCENE-1606
> URL: https://issues.apa
, as soon as 2110 is
committed I will upload a new patch. But its hard to differentiate between all
modified files.
Robert: Can you do performance tests with the old and new flex patch, I do not
want to commit 2110 before.
> Automaton Query/Filter (scalable re
used in both modes without
any concern that it will ever hurt performance.
> Automaton Query/Filter (scalable regex)
> ---
>
> Key: LUCENE-1606
> URL: https://issues.apache.org/jira/browse/LUCENE-1606
>
, for experimenting
or whatever.
> Automaton Query/Filter (scalable regex)
> ---
>
> Key: LUCENE-1606
> URL: https://issues.apache.org/jira/browse/LUCENE-1606
> Project: Lucene - Java
> Is
Fantastic commenting man - this whole patch is
pretty darn thorough.
> Automaton Query/Filter (scalable regex)
> ---
>
> Key: LUCENE-1606
> URL: https://issues.apache.org/jira/browse/LUCENE-1606
>
what else needs to be done here, please review if you can.
> Automaton Query/Filter (scalable regex)
> ---
>
> Key: LUCENE-1606
> URL: https://issues.apache.org/jira/browse/LUCENE-1606
>
to invalid locations when walking thru the DFA,
because these will be replaced by U+FFFD,
and terms could be skipped, or we go backwards, creating a loop.
Thats why i spent so much time on this.
> Automaton Query/Filter (scalable regex)
> ---
>
>
to valid UTF-8.
{code}
if you have ideas on how to make this nicer I am happy to hear them.
> Automaton Query/Filter (scalable regex)
> ---
>
> Key: LUCENE-1606
> URL: https://issues.apache.org/jira/browse/LUCENE-1606
lect this.
Currently I cheat and take advantage of this property (in trunk) to make the
code simpler.
> Automaton Query/Filter (scalable regex)
> ---
>
> Key: LUCENE-1606
> URL: https://issues.apache.org/jira
now \u can be in the index, and I can seek to
it (it won't get replaced with \uFFFD).
Yes, \u should be untouched now (though I haven't verified -- actually
I'll go add it to the test we already have for \u).
> Automaton Query/
I need
>to know, otherwise it will either skip \u terms, or go into a loop.
> Automaton Query/Filter (scalable regex)
> ---
>
> Key: LUCENE-1606
> URL: https://issues.apache.org/jira/browse/LUCENE-1606
nt for the entire query. this can be determined from
the state/transitions of the path being evaluated, but its not a one-liner!
> Automaton Query/Filter (scalable regex)
> ---
>
> Key: LUCENE-1606
> URL: http
but its not a one-liner!
> Automaton Query/Filter (scalable regex)
> ---
>
> Key: LUCENE-1606
> URL: https://issues.apache.org/jira/browse/LUCENE-1606
> Project: Lucene - Java
> Issue Typ
branch to put it back...
> Automaton Query/Filter (scalable regex)
> ---
>
> Key: LUCENE-1606
> URL: https://issues.apache.org/jira/browse/LUCENE-1606
> Project: Lucene - Java
>
) to make this determination, thanks for the
idea!
> Automaton Query/Filter (scalable regex)
> ---
>
> Key: LUCENE-1606
> URL: https://issues.apache.org/jira/browse/LUCENE-1606
> Project: Lucene - Java
ek itself vs next() Lucene"
> Automaton Query/Filter (scalable regex)
> ---
>
> Key: LUCENE-1606
> URL: https://issues.apache.org/jira/browse/LUCENE-1606
> Project: Lucene - Java
&g
eek Lucene,
based on how costly nextString() is...
> Automaton Query/Filter (scalable regex)
> ---
>
> Key: LUCENE-1606
> URL: https://issues.apache.org/jira/browse/LUCENE-1606
> Project: Lucene - Jav
f I were to do this, then that would kill the TermRef comparison speedup,
because then no matter how much i optimize "my seek" nextString(), it needs to
do the unicode conversion, which we have seen is expensive across many terms.
> Automaton Q
rch through the indexed terms... and not doing a scan when it
determines the term you're seeking to is within the same index block.
But I don't think this'll impact your tests with a large suffix since each seek
will jump way ahead to a new index bloc
.
> Automaton Query/Filter (scalable regex)
> ---
>
> Key: LUCENE-1606
> URL: https://issues.apache.org/jira/browse/LUCENE-1606
> Project: Lucene - Java
> Issue Type: New Feature
>
iteratively, in case someone builds some monster automaton from a 2 page regexp
or something like that.
> Automaton Query/Filter (scalable regex)
> ---
>
> Key: LUCENE-1606
> URL: https://issues.apache.org/jira/bro
[
https://issues.apache.org/jira/browse/LUCENE-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Robert Muir updated LUCENE-1606:
Attachment: (was: LUCENE-1606.patch)
> Automaton Query/Filter (scalable re
a finite language (in the wildcard case, no *), we should not
do the next() call.
but more benchmarking is needed, with more patterns, especially on flex branch
to determine if this heuristic is best.
> Automaton Query/Filter (scalable regex)
> --
se, no *), we should not
do the next() call.
but more benchmarking is needed, with more patterns, especially on flex branch
to determine if this heuristic is best.
> Automaton Query/Filter (scalable regex)
> ---
>
> Key: LUCE
ntation here, I'm hoping we can come up with better
ideas that work well on average.
One problem is, what is an "average" regular expression or wildcard query :)
> Automaton Query/Filter (scalable regex)
> ---
>
>
Well, the seeks need to be done anyway... so you can't work around that. The
only question is if a wasted next() was done before each, I guess...
> Automaton Query/Filter (scalable regex)
> ---
>
> Key: LUCENE-1606
&g
nd compute the next place to go... (and create a few objects along the way)
> Automaton Query/Filter (scalable regex)
> ---
>
> Key: LUCENE-1606
> URL: https://issues.apache.org/jira/browse/LUCENE-1606
&g
ant rework of this maybe should take place in flex
(although I still think this is an improvement for trunk already), to fully
take advantage of it.
> Automaton Query/Filter (scalable regex)
> ---
>
> Key: LUCENE-1606
>
nd it gives me abcdaa back, ill do the same thing
again.
the reason is, somewhere down the line there could be
abcdaa1234 :)
> Automaton Query/Filter (scalable regex)
> ---
>
> Key: LUCENE-1606
&g
's "close").
That said, seeking on trunk is alot more costly than seeking on flex, because
trunk has to make a new [cloned] SegmentTermEnum for each seek.
> Automaton Query/Filter (scalable regex)
> ---
>
> Key: LUCE
time ago, perhaps we should re-test to see if its
appropriate?
> Automaton Query/Filter (scalable regex)
> ---
>
> Key: LUCENE-1606
> URL: https://issues.apache.org/jira/browse/LUCENE-1606
> Project: Luc
e next XXX1234 term to try to
seek to (and we should never use next() on the enum)?
> Automaton Query/Filter (scalable regex)
> ---
>
> Key: LUCENE-1606
> URL: https://issues.apache.org/jira/browse/LUCENE-1606
&g
1 - 100 of 1063 matches
Mail list logo