Lucene Searching

2011-06-22 Thread Pranav goyal
Hi all, I am in a fix regarding lucene search. I know a little bit about lucene and have successfully created index and searched a lot of queries on that. My main worry is that whenever I search for let say "000" it doesn't give me any result while if I seach for "0341" it'll give me a hit. Ev

Re: Lucene Searching

2011-06-22 Thread Pranav goyal
I can always use * , ? But here I am not talking of this. I just want to get everything which has 341 in it. How to do it without * or ? On Wed, Jun 22, 2011 at 1:00 PM, Pranav goyal wrote: > Hi all, > > I am in a fix regarding lucene search. I know a little bit about lucene and > have successfu

Re: why is query picking up extra result

2011-06-22 Thread Ian Lea
[20110601 TO ] is the way to do it, because these are string fields being checked with string comparisons. If your numbers are variable length you will need to pad them. Or look at NumericField and NumericRangeQuery. Faster and better. -- Ian. On Tue, Jun 21, 2011 at 6:34 PM, Hiller,

Re: I have seen this exception on some posts around but don't see the cause/solution(RamDirectory)..

2011-06-22 Thread Ian Lea
At a guess you are trying to open a searcher on a RAMDirectory that doesn't yet contain anything. Files only get written when stuff is added to an index and the writer is closed or committed. -- Ian. On Tue, Jun 21, 2011 at 11:43 PM, Hiller, Dean x66079 wrote: > Anyone know how to do a simpl

Re: Lucene Searching

2011-06-22 Thread Ian Lea
What does Luke show as being indexed for that field? Other useful tips at http://wiki.apache.org/lucene-java/LuceneFAQ#Why_am_I_getting_no_hits_.2BAC8_incorrect_hits.3F If that field is numeric you could use a NumericField - gets rid of problems with leading zeros. If by "I just want to get eve

Re: any optimizations I can make on this code

2011-06-22 Thread Ian Lea
So, you are reading 100 million records from somewhere and are writing each record to one of 1 million indexes? Really 1 million, with an average of 100 docs in each? 17 hours doesn't sound too bad to me. Before worrying about lucene performance you should double check everything else - in general

Re: questions about fieldCache

2011-06-22 Thread Bernd Fehling
OK, after some sorting fieldCache has some entries and also all other caches. Next I called optimize which started a new searcher. All caches a cleared, _except_ fieldCache. I then started a GC with jconsole and the logfile reported "Full GC". The heap reduced its size but the fieldCache is _stil

Re: ComplexPhraseQueryParser with multiple fields

2011-06-22 Thread Ahmet Arslan
> Which of the solutions did you find to work better? > Can you please say which package should I change it to if I > choose to do it > that way? I think changing package name of ComplexQueryParser is easier. This way you can use existing patch directly. Plus, do you mind voting https://issues.a

A useful resource for Lucene related questions?

2011-06-22 Thread Seth Rosen
Hey guys, I've been a big fan of Lucene and have been working with it for sometime. While trying to use Lucene I often found myself scouring the web, javadoc, source code, and this mailing list for answers to many of the questions I had. I wished there was a site where I could find solutions to c

[Announce] Solr 3.2 with RankingAlgorithm

2011-06-22 Thread Nagendra Nagarajayya
Hi! I would like to announce the availability of Solr 3.2 with RankingAlgorithm. Please download and give the new version a try. This version of RankingAlgorithm exposes a lucene compatible api so almost all of Solr features should work as it is. Note: NRT support will be available by next

questions about searching lucene 3.2

2011-06-22 Thread Bob Rhodes
Hi all, I have some questions about searching 3.2. I have just upgraded from 2.4.1 to 3.2. I am using the standard analyzer to create the index and to search, and one of the fields is called "querytext" and it has content like this among other things: phoneNumber="(904) 555-1212". I've tried many d

[Announce] RankingAlgorithm exposed as Lucene 3.2.0 Api

2011-06-22 Thread Nagendra Nagarajayya
Hi! I would like to announce the release of RankingAlgorithm exposed as Lucene 3.2.0 api and would like to invite you to try it out. Since RankingAlgorithm is exposed as the Lucene API, no code changes are needed. Just download the new lucene-core-3.2.0.jar and the rankingalgorithm.jar, drop

IndexWriter.optimize not using it breaks my test case :(

2011-06-22 Thread Hiller, Dean x66079
I read that in a lot of cases IndexWriter.optimize does not have to be called. I then deleted it and my junit test case broke because results were coming back in the query that were not supposed to be coming back :(. I think everything is single tested. Maybe I should write a more raw junit te

RE: I have seen this exception on some posts around but don't see the cause/solution(RamDirectory)..

2011-06-22 Thread Hiller, Dean x66079
That was it, thanks!!! Dean -Original Message- From: Ian Lea [mailto:ian@gmail.com] Sent: Wednesday, June 22, 2011 2:35 AM To: java-user@lucene.apache.org Subject: Re: I have seen this exception on some posts around but don't see the cause/solution(RamDirectory).. At a guess you are

RE: questions about searching lucene 3.2

2011-06-22 Thread Bob Rhodes
Here is a follow-up. This is a larger example of some of the text I'm searching in my index: The quoted name/value pairs are in the index. " middleName="D", zip="", lastName="ADAMSON", street="00 SOME ST", addAssociates="true", state="CA", city="ROCHESTER", source="SOMESOURCE", person_user="TES

Suggestion: make some more TokenFilters KeywordAttribute aware

2011-06-22 Thread Sujit Pal
Hello, I am currently in need of a LowerCaseFilter and StopFilter that will recognize KeywordAttribute, similar to the way PorterStemFilter currently does (on trunk). Specifically, in case the term is a KeywordAttribute.isKeyword(), it should not lowercase and remove respectively. This can be ach

field sorted searches with unbounded hit count

2011-06-22 Thread Tim Eck
For the searches I want to run on my index I want to return all matching documents (as opposed to N top hits). My first naïve approach was just to use Searcher.search(query, filter, Integer.MAX_VALUE, sort) – that is, pass Integer.MAX_VALUE for the number of possible docs to return. That unfort

Re: highlighting performance

2011-06-22 Thread Itamar Syn-Hershko
I'm not intimately familiar with FVH myself, but that sounds reasonable. Tests usually don't lie. I'd definitely like to see a patched version that avoids that! Itamar. On 22/06/2011 05:29, Michael Sokolov wrote: OK - it seems as if there is a blow-up in FieldPhraseList if a document has a la

NumericField with many, many values

2011-06-22 Thread Nick Pellow
Hi, I have a use-case where a single Document in Lucene contains a single NumericField that could potentially have a 100s of 1000s of values. Values are being added to a document instance like so: List fields = // get fields, possibly 100s of 1000s with the same name, but a diffe

how to approach phrase queries and term grouping

2011-06-22 Thread Jason Guild
Hi All: I am new to Lucene and my project is to provide specialized search for a set of booklets. I am using Lucene Java 3.1. The basic idea is to run queries to find out what booklet and page numbers are match in order to help people know where to look for information in the (rather large

field sorted searches with unbounded hit count

2011-06-22 Thread Tim Eck
For the searches I want to run on my index I want to return all matching documents (as opposed to N top hits). My first naïve approach was just to use Searcher.search(query, filter, Integer.MAX_VALUE, sort) – that is, pass Integer.MAX_VALUE for the number of possible docs to return. That unfortu

Re: Suggestion: make some more TokenFilters KeywordAttribute aware

2011-06-22 Thread Simon Willnauer
On Wed, Jun 22, 2011 at 8:53 PM, Sujit Pal wrote: > Hello, > > I am currently in need of a LowerCaseFilter and StopFilter that will > recognize KeywordAttribute, similar to the way PorterStemFilter > currently does (on trunk). Specifically, in case the term is a > KeywordAttribute.isKeyword(), it

Re: questions about searching lucene 3.2

2011-06-22 Thread Simon Willnauer
As far as I understand you have 2 different problems. 1. search and 2.4 index with 3.2 code using standard analyzer. in this case you should either reindex or pass Version.LUCENE_24 to the StandardAnalyzer ctor that should help here. 2. search a string with parentheses with the query parser you s

Re: Lucene Searching

2011-06-22 Thread Pranav goyal
I tried it and it worked, although it's having one peculiarity. When I search for Item_1 : it gives me 110 hits but when I use *Item_1* it gives me 0 hits. What mistake am I doing here? Also when I search for *341* it is giving me correct results i.e 0341-000-000-DR but it's not working for a