Re: When does Query Parser do its analysis ?

2012-02-01 Thread Doron Cohen
> > In my particular case I add album catalogsno to my index as a keyword > field , but of course if the cat log number contains a space as they often > do (i.e. cad 6) there is a mismatch. Ive now changed my indexing to index > the value as 'cad6' removing spaces. Now if the query sent to the quer

Re: When does Query Parser do its analysis ?

2012-02-01 Thread Paul Taylor
On 01/02/2012 22:03, Robert Muir wrote: On Wed, Feb 1, 2012 at 4:32 PM, Paul Taylor wrote: So it seems like it just broke the text up at spaces, and does text analysis within getFieldQuery(), but how can it make the assumption that text should only be broken at whitespace ? you are right, see

Re: When does Query Parser do its analysis ?

2012-02-01 Thread Chris Hostetter
: So it seems like it just broke the text up at spaces, and does text analysis : within getFieldQuery(), but how can it make the assumption that text should : only be broken at whitespace ? whitespace is a significant metacharacter to the Queryparser - it is used to distinguish multiple clauses

Re: When does Query Parser do its analysis ?

2012-02-01 Thread Robert Muir
On Wed, Feb 1, 2012 at 4:32 PM, Paul Taylor wrote: > > So it seems like it just broke the text up at spaces, and does text analysis > within getFieldQuery(), but how can it make the assumption that text should > only be broken at whitespace ? you are right, see this bug report: https://issues.apa

Re: Phrase Queries vs. SpanTermQueries exact phrases vs. stop words

2012-02-01 Thread Doron Cohen
> int gap = (pp[pp.length-1] - pp[0]) - (pp.length - 1); > > Don't want to cause an IndexOutOfBoundsException Right... that's what I meant with "(boundary cases)"...

RE: Phrase Queries vs. SpanTermQueries exact phrases vs. stop words

2012-02-01 Thread Paul Allan Hill
>Doron wrote: > > int gap = (pp[pp.length] - pp[0]) - (pp.length - 1); int gap = (pp[pp.length-1] - pp[0]) - (pp.length - 1); Don't want to cause an IndexOutOfBoundsException -Paul - To unsubscribe, e-mail: java-user-unsub

When does Query Parser do its analysis ?

2012-02-01 Thread Paul Taylor
So I subclass Query Parser and give it query dug up then debugging shows it calls getFieldQuery(String field, String queryText, boolean quoted) twice once with queryText=dug and one with queryText=up but then when I run it with query dúg up the first call is queryText=dúg even though the

RE: Phrase Queries vs. SpanTermQueries exact phrases vs. stop words

2012-02-01 Thread Paul Allan Hill
Thanks for the discussion, I really appreciate you pointing out that the > Code here ignores PhraseQuery (PQ) 's positions: And by "here" you mean my original code not your suggestion. > To accommodate for this, the overall extra gap can be added to the slope: > int gap = (pp[pp.length] -

Re: lucene-3.0.3

2012-02-01 Thread Sethi, Parampreet
Hi Prasad, I was looking through documentation few days ago and found helpful information in Lucene FAQs. Here are the links http://wiki.apache.org/lucene-java/LuceneFAQ#How_can_I_index_PDF_documents. 3F http://wiki.apache.org/lucene-java/LuceneFAQ#How_can_I_index_file_formats_l ike_OpenDocument

RE: lucene-3.0.3

2012-02-01 Thread Prasad KVSH
Hi, Please find our requirement and we trying to accomplish this. Our client is looking for a Extended search engine like searching the given text inside the documents like (PDF, Msg, Excel, XML, Word, TXT etc) and return the list of file names where it find the text. Using the return list we

RE: lucene-3.0.3

2012-02-01 Thread Prasad KVSH
Hi We have added all the files including PDF/Word/Excel/Txt files but it is searching and finding which are there text files. How to Strip text from either of the Documents (PDF/HTML/XML/MSword/PPT/XLS) Thanks, Prasad K.V.S.H. * Project Manager * PACIFIC COAST STEEL (Pinnacle) Project Ness T

Join between indexes

2012-02-01 Thread Arnon Mazza
Assume we have a Lucene index over which several types of analyses are performed.   Assume that the conclusions of some analysis require that new tokens be added to existing documents in the index. For example, a repeating pattern p (sequence of words) that appears in a large part of the documen

Re: lucene-3.0.3

2012-02-01 Thread Erick Erickson
What did you try and what exceptions did you get? You might review: http://wiki.apache.org/solr/UsingMailingLists Best Erick On Wed, Feb 1, 2012 at 8:54 AM, Prasad KVSH wrote: > It will be great if you provide some working examples on this. We tried > to deploy solr.war but getting exceptions.

RE: lucene-3.0.3

2012-02-01 Thread Prasad KVSH
It will be great if you provide some working examples on this. We tried to deploy solr.war but getting exceptions. Thanks Prasad -Original Message- From: Ian Lea [mailto:ian@gmail.com] Sent: Wednesday, February 01, 2012 7:22 PM To: java-user@lucene.apache.org Subject: Re: lucene-3.0.

RE: lucene-3.0.3

2012-02-01 Thread Prasad KVSH
Hi Karthik, I appreciate your quick response. I guess the next question is how to do strip the text from PDF/HTML/XML/MSword/PPT/XLS and where it will store for indexing. What are the other scenarios (like adding files, deleting files) where we need to execute indexfiles.classs. Thanks Prasa

Re: lucene-3.0.3

2012-02-01 Thread Ian Lea
You could also take a look at Solr. From http://lucene.apache.org/solr/features.html * Easy ways to pull in data from databases and XML files from local disk and HTTP sources * Rich Document Parsing and Indexing (PDF, Word, HTML, etc) using Apache Tika Sounds just what you need. -- Ian. O

Re: lucene-3.0.3

2012-02-01 Thread KARTHIK SHIVAKUMAR
Hi >>lucene-3.0.3 can be used for searching a text from Lucene 's primary job is to do a text search. May it be PDF/HTML/XML/MSword/PPT/XLS U have to have the code for plugin to do 2 things 1) Strip text from either of the Documents (PDF/HTML/XML/MSword/PPT/XLS) 2) Index this processed text us

lucene-3.0.3

2012-02-01 Thread Prasad KVSH
Hi, lucene-3.0.3 can be used for searching a text from PDF, xlsx, docx, doc, xls, msg, TXT files. For this we have any common function to accomplish this. Please help me on this. Thanks Prasad

Re: Does Fuzzy Search scores the same as Exact Match

2012-02-01 Thread Paul Taylor
On 28/01/2012 11:22, Uwe Schindler wrote: -Original Message- From: Paul Taylor [mailto:paul_t...@fastmail.fm] Sent: Saturday, January 28, 2012 10:33 AM To: 'java-user@lucene.apache.org' Subject: Does Fuzzy Search scores the same as Exact Match All things being equal does a fuzzy match gi

RE: Lucene 2.9.4 Wildcard Search, Boost and Sorting

2012-02-01 Thread Lutz Fechner
Thanks for the quick response. Will try to do it this way: Query q = null; MultiFieldQueryParser par = new MultiFieldQueryParser(Version.LUCENE_29, searchFields, analyzer, boosts); par.setMultiTermRewriteMethod(MultiTermQuery.SCORING_BOOLEAN_QUERY_REWRIT

RE: Lucene appears to use memory maps after unmapping them

2012-02-01 Thread Uwe Schindler
Hi one addition: In the coming Lucene 3.6 there are more safety checks in MMapDirectory so the SIGSEGV is more unlikely (it tracks cloned index input in a thread safe list on close). But this only *helps* to find the issue, but does not guarantee that your JVM crashes, sorry. As Robert and Mike

Re: Lucene appears to use memory maps after unmapping them

2012-02-01 Thread Michael McCandless
On Tue, Jan 31, 2012 at 9:42 PM, Trejkaz wrote: > So when we close() our own TextIndex wrapper class, it would call > decRef() - but if another thread is still using the index, this call > to decRef() wouldn't actually close the reader. IMO, this wouldn't > really satisfy the meaning of "close" f

Re: Why read past EOF

2012-02-01 Thread Michael McCandless
Right, you have to ensure (by using the "right" IndexDeletionPolicy) that no commit is ever removed until all readers open against that commit have been closed. "Normally" the filesystem ensures this for us (protects still-open files from being deleted), but NFS (unfortunately!) lacks such semanti

RE: Lucene 2.9.4 Wildcard Search, Boost and Sorting

2012-02-01 Thread Uwe Schindler
Hi, all MultiTermQueries are constant score by default since Lucene 2.9, you can change that back to scoring mode: WildcardQuery.setRewriteMethod(MultiTermQuery.SCORING_BOOLEAN_QUERY_REWRITE) This slows down the query immense or throws TooManyClauses exceptions if too many terms match the wildcar

Lucene 2.9.4 Wildcard Search, Boost and Sorting

2012-02-01 Thread Lutz Fechner
Hi, I have an issue with Lucene 2.9.4 and sorting of wildcard queries. If I set a boost to some documents during indexing like this: doc.setBoost(1000.00); and execute a query like this: PRODUCT_GROUP:2020* I don't get results with a high boost value returned before the documents with no b

Re: upgrading from 3.0.3 to 3.5.0

2012-02-01 Thread Ian Lea
The javadocs for ParallelReader say that all indexes must have same number of docs and all be created and modified the same way. Doesn't sound like your shards. I think you need to create a MultiReader on top of the readers for your individual shards and pass that to the IndexSearcher constructor

Re: Find similar documents of different types

2012-02-01 Thread Ian Lea
I'm not clear exactly what you are asking but I think you will have to build your TermQuery instances one at a time and that sounds fine, if it does what you want and is sufficiently fast. -- Ian. On Tue, Jan 31, 2012 at 1:34 PM, Pedro Lacerda wrote: > For the first strategy i'm using MoreLike

Re: upgrading from 3.0.3 to 3.5.0

2012-02-01 Thread Ganesh
Thanks Ian. >>The deprecation warning in the javadocs says "Please pass an ExecutorService >>to IndexSearcher, instead" so I'd do that. I may need to use IndexSearcher(Reader, ExecutorService). I have sharded my index. Say if i have 10 indexes then i will have 10 IndexSearchers. How to use this

Re: upgrading from 3.0.3 to 3.5.0

2012-02-01 Thread Ian Lea
> I am upgrading from 3.0.3 to 3.5.0. > > 1) NumberTools is deprecated. I am converting long to string and storing it > in Index. Now this is deprecated. If i replace this API with NumericUtils / > NumericField, will it work for existing index? Whether i need to rebuild the > index? You will ne

Re: Apache Lucene file search

2012-02-01 Thread Ian Lea
I suggest you look at Solr instead of lucene. http://lucene.apache.org/solr/ -- Ian. On Wed, Feb 1, 2012 at 7:40 AM, Dheeraj Kv wrote: > Hi >        I learnt about Lucene from google and i thought of implementing it my > company. > I don't want to use Lucene as a web search application. I hav

Re: using character '%' in queries (Lucene v3.1.0)

2012-02-01 Thread Ian Lea
And have you used Luke to see exactly what is being indexed, as Erick suggested? See http://wiki.apache.org/lucene-java/LuceneFAQ#Why_am_I_getting_no_hits_.2BAC8_incorrect_hits.3F. for other things to check. -- Ian. On Wed, Feb 1, 2012 at 6:43 AM, Gal Mainzer wrote: > I tried to use escapin

Apache Lucene file search

2012-02-01 Thread Dheeraj Kv
Hi I learnt about Lucene from google and i thought of implementing it my company. I don't want to use Lucene as a web search application. I have a large backup storage and which consists of html file, doc files and pdf files. I need to search inside a file as well as search for file names

upgrading from 3.0.3 to 3.5.0

2012-02-01 Thread Ganesh
Hello all, I am upgrading from 3.0.3 to 3.5.0. 1) NumberTools is deprecated. I am converting long to string and storing it in Index. Now this is deprecated. If i replace this API with NumericUtils / NumericField, will it work for existing index? Whether i need to rebuild the index? 2) I am u

RE: too many boolean clauses

2012-02-01 Thread Uwe Schindler
I would recommend to use TermsFilter (http://goo.gl/BC9eQ, possibly wrapped by a ConstantScoreQuery). You must do the query building by hand, yuery *parser* cannot do that: TermsFilter tf = new TermsFilter(); // it is in lucene-queries.jar tf.addTerm(new Term("id", val1)); tf.addTerm(new Term("id"