Re: multiterm numbers regexp search

2014-12-15 Thread Valentin Popov
Thanks, will try. > On 15 дек. 2014 г., at 21:02, Allison, Timothy B. wrote: > > If you can't change the analyzer, you can programmatically build a > MultiPhraseQuery (you'd have to fill in the alternatives ... not a great > option) or a SpanNearQuery composed of span-wrapped RegexpQueries (re

Lucene DocValuesField, SortedDocValuesField usage for filtering and sorting

2014-12-15 Thread Piotr Idzikowski
Hello. I am going to switch to newest (4.10.2) version of Lucene and I'd like to make some optimization in my index and code. I would like to use DocValuesField to get values but also for filtering and sorting. So here I have some questions: If I'd like to use range filter (FieldCacheRangeFilter) I

Lucene DocValuesField, SortedDocValuesField usage for filtering and sorting

2014-12-15 Thread Piotr Idzikowski
Hello. I am going to switch to newest (4.10.2) version of Lucene and I'd like to make some optimization in my index and code. I would like to use DocValuesField to get values but also for filtering and sorting. So here I have some questions: If I'd like to use range filter (FieldCacheRangeFilter) I

RE: multiterm numbers regexp search

2014-12-15 Thread Allison, Timothy B.
If you can't change the analyzer, you can programmatically build a MultiPhraseQuery (you'd have to fill in the alternatives ... not a great option) or a SpanNearQuery composed of span-wrapped RegexpQueries (rewrites are taken care of for you). You might also want to look into using the ComplexP

Benchmark testing Lucene index

2014-12-15 Thread Vijay B
We have our index located on NFS. While benchmark testing, we noticed first query would take lot of time and same query for the second time complete quickly. One of the reason for this could be fscache. To eliminate effect of caching, before start of we plan to umount and mount the NFS filesystem o

MMapdirectory

2014-12-15 Thread Vijay B
> > Finally we are seeing great improvement once we switch to 64-bit java and > MMapDirectory. Our Test run (multiple requests) used to take 26 minutes on > 32-bit and is now improved to 10 minutes on 64-bit java. > > We load stored documents from lucene and pass the documents to a third > party li

Re: multiterm numbers regexp search

2014-12-15 Thread Valentin Popov
Mike, thanks. Problem is that we cant change analyzer, as bank need a search not only for card numbers for compliance and already exist storage is hundred millions of emails. My thinking is make multiterm regexp search query, or search of combination of regexp queries with some distance betwee

Re: multiterm numbers regexp search

2014-12-15 Thread Michael Sokolov
You probably don't want to use StandardAnalyzer: maybe try WhitespaceAnalyzer, but you'll need to enhance your regex a little to deal with punctuation since WA may give you tokens like: 5106-7922-9469-8422. "5106-7922-9469-8422" etc -Mike On 12/15/14 3:45 AM, Valentin Popov wrote: I have

Re: Facet Result Order

2014-12-15 Thread patel mrugesh
Hi Shai, Thanks for the reply. "refreshed" meaning if I come on facet page after closing it, the order gets changed for the facet having same count. I have already mentioned sample data in my first post. Thanks again,Mrugesh On Sunday, 14 December 2014 6:56 PM, Shai Erera wrote:

Re: multiterm numbers regexp search

2014-12-15 Thread Valentin Popov
Nope, this is for compliance request for banking system, have a look to PCI DSS. @wmartinusa, please do not get the traffic, if you have nothing to say about subject. > On 15 дек. 2014 г., at 11:54, wmartinusa wrote: > > Sounds crooked. R u a criminal? > > > Sent from my LG Optimus G

multiterm numbers regexp search

2014-12-15 Thread Valentin Popov
I have a need to find mastercard numbers with regular expression. I’m using Query query = new RegexpQuery(new Term("body", "5{1}<1-5>{1}<0-9>{14}"), RegExp.ALL) to search numbers in email’s body and StandardAnalizer used for body indexing. So number like 5106792294698422 will be indexed as it