Lucene Query Parser

2011-11-28 Thread Romiko Derbynew
Hi Guys, I am using Lucene with Neo4j. Currently I have queries working well with a combination of Exact and Fuzzy matches in one query. However, we desire a report that first takes the ranking and boosting as the highest priority, but then we want to sort my first name and last name, and

Re: Do duplicate documents affect term scoring?

2011-11-28 Thread Ian Lea
Lucene won't be aware that you've got duplicate documents, but scoring does take account of the number of documents in which search terms appear. See http://lucene.apache.org/java/3_5_0/scoring.html and the javadocs for oal.search.Similarity. Only you can say whether or not you need to worry

Re: Lucene Query Parser

2011-11-28 Thread Ian Lea
Just use one of the search() methods that does sorting and specify an array of sort fields with SortField.SCORE first, then your name fields. But be aware that complex real world textual queries and docs rarely produce identical scores. You could post-process the results and group them into

Re: Lucene on Android: indexing, searching and highlighting

2011-11-28 Thread Ian Lea
As far as I'm aware recent versions of lucene, including the highlighter, should work out of the box. I'd guess that highlighting would be the most resource intensive and therefore troublesome bit. I'm not aware of any sample code showing lucene working on Android, but from my very limited

Re: Lucene index inside of a web app?

2011-11-28 Thread Ian Lea
What is LUCENE_INDEX_DIRECTORY? Some static string in your app? Lucene knows nothing about your app, JSP, or what app server you are using. It requires a file system path and it is up to you to provide that. I always use a full path since I prefer to store indexes outside the app and it avoids

Re: Taxonomy indexer debug

2011-11-28 Thread Mihai Caraman
All packages used: core3.4, queries3.4, facet3.5. Once every 3 minutes I *refreshTax* and once per day I *reopenEveryting*. *InitWriters()* writer = new ThreadedIndexWriter taxWriter = new LuceneTaxonomyWriter // because the reader can't start if doesn't have a valid taxIndex directory

Re: Taxonomy indexer debug

2011-11-28 Thread Doron Cohen
Sequence of operations seems logical, I don't see straight why this does not work. Could you minimize this to a small stand-alone program that does not work as expected? This will allow to recreate the problem here and debug it. It is interesting that facet 3.5 is used with core 3.4 and queries

Re: Taxonomy indexer debug

2011-11-28 Thread Mihai Caraman
Could you minimize this to a small stand-alone program that does not work as expected? This will be hard, because of the bug only appearing after a couple of days or more and i'm starting to think that it is triggered by high data volumes. I'll try to minimize the code and serve more data to

Re: Taxonomy indexer debug

2011-11-28 Thread Doron Cohen
Could you minimize this to a small stand-alone program that does not work as expected? This will be hard, because of the bug only appearing after a couple of days or more and i'm starting to think that it is triggered by high data volumes. I'll try to minimize the code and serve more data

BigInteger usage in numeric Trie range queries

2011-11-28 Thread Jason Rutherglen
Even though the NumericRangeQuery.new* methods do not support BigInteger, the underlying recursive algorithm supports any sized number. Has this been explored? - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org

Re: Lucene index inside of a web app?

2011-11-28 Thread Ian Lea
Using a static string is fine - it just wasn't clear from your original post what it was. I usually use a full path read from a properties file so that I can change it without a recompile, have different settings on test/live/whatever systems, etc. Works for me, but isn't the only way to do it.

RE: Lucene index inside of a web app?

2011-11-28 Thread Uwe Schindler
You can store the index in WEB_INF directory, just use something: ServletContext.getRealPath(/WEB-INF/data/myIndexName); - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Ian Lea [mailto:ian@gmail.com]

Scoring a document using LDA topics

2011-11-28 Thread Stephen Thomas
List, I am trying to incorporate the Latent Dirichlet Allocation (LDA) topic model into Lucene. Briefly, the LDA model extracts topics (distribution over words) from a set of documents, and then represents each document with topic vectors. For example, documents could be represented as: d1 = (0,

Re: Scoring a document using LDA topics

2011-11-28 Thread Sujit Pal
Hi Stephen, We are doing something similar, and we store as a multifield with each document as (d,z) pairs where we store the z's (scores) as payloads for each d (topic). We have had to build a custom similarity which implements the scorePayload function. So to find docs for a given d (topic), we

RE: fuzzy prefix search

2011-11-28 Thread meghana
Hi Uwe , I need to do something similar... can u plz tell me how can i pass integer in my fuzzy search query? say for ex. i am searching like q=major~0.6 i want to match terms after prefix maj. how can i pass integer to do that way ? Thanks. Uwe Schindler wrote Hi, You can pass an

Re: Lucene index inside of a web app?

2011-11-28 Thread okayndc
Awesome. Thanks guys! On Mon, Nov 28, 2011 at 12:19 PM, Uwe Schindler u...@thetaphi.de wrote: You can store the index in WEB_INF directory, just use something: ServletContext.getRealPath(/WEB-INF/data/myIndexName); - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen

Analysers for newspaper pages...

2011-11-28 Thread Dawn Zoë Raison
Hi folks, I'm researching the best options to use for analysing/storing newspaper pages in out online archive, and wondered if anyone has any good hints or tips on good practice for this type of media? I'm currently thinking alone the lines of using a customised StandardAnalyser (no stop

RE: fuzzy prefix search

2011-11-28 Thread Uwe Schindler
Hi Meghana, You can only do that by directly instantiating the FuzzyQuery, not via parsed queries. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: meghana [mailto:meghana.rav...@amultek.com] Sent:

RE: Analysers for newspaper pages...

2011-11-28 Thread Steven A Rowe
Hi Dawn, I assume that when you refer to the impact of stop words, you're concerned about query-time performance? You should consider the possibility that performance without removing stop words is good enough that you won't have to take any steps to address the issue. That said, there are

Re: Analysers for newspaper pages...

2011-11-28 Thread Dawn Zoë Raison
Hi Steve, On 28/11/2011 19:43, Steven A Rowe wrote: I assume that when you refer to the impact of stop words, you're concerned about query-time performance? You should consider the possibility that performance without removing stop words is good enough that you won't have to take any steps

Re: Analysers for newspaper pages...

2011-11-28 Thread Ian Lea
You can easily use just the CommonGrams stuff from Solr in your pure lucene project. There are a couple of useful docs on stop words and common grams et al at http://www.hathitrust.org/blogs/large-scale-search/slow-queries-and-common-words-part-1

Re: PorterStemFilter causes wildcard searches to not work

2011-11-28 Thread SBS
I am applying the PorterStemFilter at both indexing and search time. As for schema, I have 3 fields: title, subtitle and notes. When the user enters a query string of */a*itis/*, my software turns this into an actual Lucene query of */title: a*itis OR subtitle: a*itis OR notes: a*itis/* and I