date:20111128

Re: PorterStemFilter causes wildcard searches to not work

2011-11-28 Thread SBS

I am applying the PorterStemFilter at both indexing and search time. As for schema, I have 3 fields: title, subtitle and notes. When the user enters a query string of */a*itis/*, my software turns this into an actual Lucene query of */title: a*itis OR subtitle: a*itis OR notes: a*itis/* and I get

Re: Analysers for newspaper pages...

2011-11-28 Thread Ian Lea

You can easily use just the CommonGrams stuff from Solr in your pure lucene project. There are a couple of useful docs on stop words and common grams et al at http://www.hathitrust.org/blogs/large-scale-search/slow-queries-and-common-words-part-1 http://www.hathitrust.org/blogs/large-scale-search

Re: Analysers for newspaper pages...

2011-11-28 Thread Dawn Zoë Raison

Hi Steve, On 28/11/2011 19:43, Steven A Rowe wrote: I assume that when you refer to "the impact of stop words," you're concerned about query-time performance? You should consider the possibility that performance without removing stop words is good enough that you won't have to take any steps

RE: Analysers for newspaper pages...

2011-11-28 Thread Steven A Rowe

Hi Dawn, I assume that when you refer to "the impact of stop words," you're concerned about query-time performance? You should consider the possibility that performance without removing stop words is good enough that you won't have to take any steps to address the issue. That said, there are

RE: "fuzzy prefix" search

2011-11-28 Thread Uwe Schindler

Hi Meghana, You can only do that by directly instantiating the FuzzyQuery, not via parsed queries. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: meghana [mailto:meghana.rav...@amultek.com] > Sent

Analysers for newspaper pages...

2011-11-28 Thread Dawn Zoë Raison

Hi folks, I'm researching the best options to use for analysing/storing newspaper pages in out online archive, and wondered if anyone has any good hints or tips on good practice for this type of media? I'm currently thinking alone the lines of using a customised StandardAnalyser (no stop wor

Re: Lucene index inside of a web app?

2011-11-28 Thread okayndc

Awesome. Thanks guys! On Mon, Nov 28, 2011 at 12:19 PM, Uwe Schindler wrote: > You can store the index in WEB_INF directory, just use something: > ServletContext.getRealPath("/WEB-INF/data/myIndexName"); > > - > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de >

RE: "fuzzy prefix" search

2011-11-28 Thread meghana

Hi Uwe , I need to do something similar... can u plz tell me how can i pass integer in my fuzzy search query? say for ex. i am searching like q=major~0.6 i want to match terms after prefix "maj". how can i pass integer to do that way ? Thanks. Uwe Schindler wrote > > Hi, > > You can pass

Re: Scoring a document using LDA topics

2011-11-28 Thread Sujit Pal

Hi Stephen, We are doing something similar, and we store as a multifield with each document as (d,z) pairs where we store the z's (scores) as payloads for each d (topic). We have had to build a custom similarity which implements the scorePayload function. So to find docs for a given d (topic), we

Scoring a document using LDA topics

2011-11-28 Thread Stephen Thomas

List, I am trying to incorporate the Latent Dirichlet Allocation (LDA) topic model into Lucene. Briefly, the LDA model extracts topics (distribution over words) from a set of documents, and then represents each document with topic vectors. For example, documents could be represented as: d1 = (0,

RE: Lucene index inside of a web app?

2011-11-28 Thread Uwe Schindler

You can store the index in WEB_INF directory, just use something: ServletContext.getRealPath("/WEB-INF/data/myIndexName"); - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Ian Lea [mailto:ian@gmail.co

Re: Lucene index inside of a web app?

2011-11-28 Thread Ian Lea

Using a static string is fine - it just wasn't clear from your original post what it was. I usually use a full path read from a properties file so that I can change it without a recompile, have different settings on test/live/whatever systems, etc. Works for me, but isn't the only way to do it.

Re: Lucene index inside of a web app?

2011-11-28 Thread okayndc

Hi, Thanks for your response. Yes, LUCENE_INDEX_DIRECTORY is a static string which contains the file system path of the index (for example, c:\\index). Is this good practice? If not, what should the full path to an index look like? Thanks On Mon, Nov 28, 2011 at 4:54 AM, Ian Lea wrote: > W

BigInteger usage in numeric Trie range queries

2011-11-28 Thread Jason Rutherglen

Even though the NumericRangeQuery.new* methods do not support BigInteger, the underlying recursive algorithm supports any sized number. Has this been explored? - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For

Re: Taxonomy indexer debug

2011-11-28 Thread Doron Cohen

> > Could you minimize this to a small stand-alone program that does not work > > as expected? > > This will be hard, because of the bug only appearing after a couple of days > or more and i'm starting to think that it is triggered by high data > volumes. I'll try to minimize the code and serve mor

Re: Taxonomy indexer debug

2011-11-28 Thread Mihai Caraman

> Could you minimize this to a small stand-alone program that does not work > as expected? This will be hard, because of the bug only appearing after a couple of days or more and i'm starting to think that it is triggered by high data volumes. I'll try to minimize the code and serve more data to i

Re: Taxonomy indexer debug

2011-11-28 Thread Doron Cohen

Sequence of operations seems logical, I don't see straight why this does not work. Could you minimize this to a small stand-alone program that does not work as expected? This will allow to recreate the problem here and debug it. It is interesting that facet 3.5 is used with core 3.4 and queries 3.4

Re: Taxonomy indexer debug

2011-11-28 Thread Mihai Caraman

All packages used: core3.4, queries3.4, facet3.5. Once every 3 minutes I *refreshTax* and once per day I *reopenEveryting*. *InitWriters()* writer = new ThreadedIndexWriter taxWriter = new LuceneTaxonomyWriter // because the reader can't start if doesn't have a valid taxIndex directory taxWriter.c

Re: Lucene index inside of a web app?

2011-11-28 Thread Ian Lea

What is LUCENE_INDEX_DIRECTORY? Some static string in your app? Lucene knows nothing about your app, JSP, or what app server you are using. It requires a file system path and it is up to you to provide that. I always use a full path since I prefer to store indexes outside the app and it avoids

Re: Lucene on Android: indexing, searching and highlighting

2011-11-28 Thread Ian Lea

As far as I'm aware recent versions of lucene, including the highlighter, should work out of the box. I'd guess that highlighting would be the most resource intensive and therefore troublesome bit. I'm not aware of any sample code showing lucene working on Android, but from my very limited experi

Re: Lucene Query Parser

2011-11-28 Thread Ian Lea

Just use one of the search() methods that does sorting and specify an array of sort fields with SortField.SCORE first, then your name fields. But be aware that complex real world textual queries and docs rarely produce identical scores. You could post-process the results and group them into "good

Re: Do duplicate documents affect term scoring?

2011-11-28 Thread Ian Lea

Lucene won't be aware that you've got duplicate documents, but scoring does take account of the number of documents in which search terms appear. See http://lucene.apache.org/java/3_5_0/scoring.html and the javadocs for oal.search.Similarity. Only you can say whether or not you need to worry abou

Lucene Query Parser

2011-11-28 Thread Romiko Derbynew

Hi Guys, I am using Lucene with Neo4j. Currently I have queries working well with a combination of Exact and Fuzzy matches in one query. However, we desire a report that first takes the ranking and boosting as the highest priority, but then we want to sort my first name and last name, and alwa

Re: PorterStemFilter causes wildcard searches to not work

Re: Analysers for newspaper pages...

Re: Analysers for newspaper pages...

RE: Analysers for newspaper pages...

RE: "fuzzy prefix" search

Analysers for newspaper pages...

Re: Lucene index inside of a web app?

RE: "fuzzy prefix" search

Re: Scoring a document using LDA topics

Scoring a document using LDA topics

RE: Lucene index inside of a web app?

Re: Lucene index inside of a web app?

Re: Lucene index inside of a web app?

BigInteger usage in numeric Trie range queries

Re: Taxonomy indexer debug

Re: Taxonomy indexer debug

Re: Taxonomy indexer debug

Re: Taxonomy indexer debug

Re: Lucene index inside of a web app?

Re: Lucene on Android: indexing, searching and highlighting

Re: Lucene Query Parser

Re: Do duplicate documents affect term scoring?

Lucene Query Parser

23 matches

Site Navigation

Mail list logo

Footer information