Re: Searching for a strict prefix

2010-05-24 Thread Ian Lea
StandardAnalyzer should work fine, mark the field as indexed, no need to store it unless you want to retrieve it for display. Query via QueryParser using "tagname: updateC*" or programatically via PrefixQuery. Although I'm not sure exactly what you mean by "strict prefix". If you mean that the

Re: About loading lazily

2010-05-24 Thread Grant Ingersoll
I'd also add that the Document keeps a pointer to the spot in storage where that value can be loaded from. It can result in a performance saving in the typical search use case where one is displaying just "metadata" fields on a page, but not the full content. In this case, the full content pag

Re: Arrange terms[i]

2010-05-24 Thread Grant Ingersoll
On May 20, 2010, at 5:15 AM, manjula wijewickrema wrote: > Hi, > > I wrote aprogram to get the ferquencies and terms of an indexed document. > The output comes as follows; > > > If I print : +tfv[0] > > Output: > > array terms are:{title: capabl/1, code/2, frequenc/1, lucen/4, over/1, > samp

Applying term frequency thresholds on indexing time

2010-05-24 Thread Xaida
Hi guys! does there exist a way to define some threshold on the terms I wanna store in the index(before they are indexed). I need to store the terms with higheest frequencies. I done it with term vectors and some cutoff ratio that cuts off the least occuring terms, but all this is, ofcourse work

Re: Searching for a strict prefix

2010-05-24 Thread Shlomy Reinstein
Hi, Thanks. By "strict prefix", I meant a prefix of the name (case-insensitive). What you suggest ("tagname: updateC*") was the first thing I tried, but it happens to work only partially. In my case, I have a lot of names beginning with "m_sz", e.g. "m_szComment", "m_szName". Trying a query like "

Re: Searching for a strict prefix

2010-05-24 Thread Ian Lea
I bet it's that underscore in m_sz. Different analyzers do different things with different punctuation characters. I can never remember which does exactly what - it'll be in the javadocs or Lucene In Action or somewhere on the web. You can check what exactly has been indexed by using Luke - alwa

Re: Searching for a strict prefix

2010-05-24 Thread Shlomy Reinstein
Hi, Thanks for the help. BTW, if anyone is interested: I tried the same with KeywordAnalyzer - added a field with value "m_szName", and tried to find it using "FieldName:m_szN*" but failed. Someone in the Lucene IRC channel showed me why - QueryParser, by default, lowercases all expanded terms (e.

Re: CFP for Lucene Revolution Conference, Boston, MA October 7 & 8 2010

2010-05-24 Thread Grant Ingersoll
I should add that talks on Mahout, Tika, Nutch, etc. are also encouraged. -Grant On May 17, 2010, at 8:43 AM, Grant Ingersoll wrote: > Lucene Revolution Call For Participation - Boston, Massachusetts October 7 & > 8, 2010 > > The first US conference dedicated to Apache Lucene and Solr is comi

Using synonyms with Lucene without WordPress

2010-05-24 Thread Larry Hendrix
Does anyone know of any classes available that allow you to define and use your own synonyms when searching with Lucene? I read some about WordPress but it seems those synonyms are predefined English words. The application I am working with searches for the names of contacts and companies. I wou

Re: Using synonyms with Lucene without WordPress

2010-05-24 Thread Simon Willnauer
Larry, you should look at the SynonymFilter in Lucene Contrib Analysis. simon On Mon, May 24, 2010 at 9:40 PM, Larry Hendrix wrote: > Does anyone know of any classes available that allow you to define and use > your own synonyms when searching with Lucene? I read some about WordPress but > it

Re: Applying term frequency thresholds on indexing time

2010-05-24 Thread Erick Erickson
Why do you want to calculate this? This is done for you by the indexing process and taken into account when searching. You're asking for a solution before defining the problem, which makes it very hard to help. See: http://people.apache.org/~hossman/#xyproblem Best Erick On Mon, May 24, 2010 at