Re: Does someone know how to sort the hits list by a specified document field?

2007-11-05 Thread Marcus Herou
Hi. Just add a Sort object to the search. Sort sort = new Sort(sortField, !ascending); Hits[] hits = searcher.search(query, sort); Kindly //Marcus On 11/5/07, jackxin [EMAIL PROTECTED] wrote: Does someone know how to sort the hits list by a specified document field? Even if the field is

Re: Group by in Lucene ?

2007-11-05 Thread Grant Ingersoll
Solr has an issue outstanding right now that implements something that may be close to what you want. They are calling it Field Collapsing. See https://issues.apache.org/jira/browse/SOLR-236 -Grant On Nov 5, 2007, at 12:57 AM, Marcus Herou wrote: Hi. I have a situation where I'm

blank space before special characters

2007-11-05 Thread Leire Urcelay
Hello, I have the following problem with my lucene index. When indexing fields containing special characters (like ), a blank space is inserted before the special character. For example: the content L'article is indexed as L apos; (with a blank space between 'L' and 'amp;'). Is there

Re: Pointers on Messaging Server and Lucene.

2007-11-05 Thread Grant Ingersoll
Can you explain more of what you are trying to do? Lucene just works with text, it is up to you to extract the text from whatever format it is in. That being said, you can try searching the archives of this list as a starting point. You might also check out Aperture

RE : blank space before special characters

2007-11-05 Thread Leire Urcelay
Sorry, I did a mistake in my previous email. The field L'article is indexed as L apos;article. The blank space is inserted between 'L' and 'apos;article'. Thanks, Leire -Message d'origine- De : Leire Urcelay [mailto:[EMAIL PROTECTED] Envoyé : lundi, 5. novembre 2007 13:02 À :

Re: How do we limit the growth of a Lucene Index?

2007-11-05 Thread Grant Ingersoll
You could search this list about distributing your indexes, etc. RemoteSearchable may be handy, but you will probably have to build some infrastructure around it for handling failover, etc. (would make for a nice contribution) How often do you think archived data will need to be accessed?

Re: How do we limit the growth of a Lucene Index?

2007-11-05 Thread Marcus Herou
As you suggest you could either roll the index on the local machine or remote and gzip the content fileds on the archive index and provide a GzipReader when you need to search old results. If money is of the essence then the best solution probably is to have 1 good box with fast SCSI disks which

Reuse TermDocs

2007-11-05 Thread Mike Streeton
Can TermDocs be reused i.e. can you do. TermDocs docs = reader.termDocs(); docs.seek(term1); int i = 0; while (docs.next()) { i++; } docs.seek(term2); int j = 0; while (docs.next()) { j++; } Reuse does seem to work but I get ArrayIndexOutOfBoundsExceptions from BitVector it I

Re: Group by in Lucene ?

2007-11-05 Thread Marcus Herou
Thanks. They seem to have got real far in the dev cycle on this. Seems like it will hit the road in Solr 1.3. However I would really like this feature to be developed for Core Lucene, how do I start that process? Develop it yourself you would say :) I'm serious isn't it a really cool and useful

FuzzyQuery using termDocs() for context filtering

2007-11-05 Thread Timo Nentwig
Hi! Imagine an index holding documents in different languages and country. Language+country is what I call a context and I build and hold a QueryFilter for each context. When performing a fuzzy search, FilteredTermEnum doesn't care about any contexts at all (well, how should it :). It builds

Re: Can changes on an index be visible to an open IndexSearcher without reopening it?

2007-11-05 Thread Michael McCandless
Unfortunately, no. Once open, the IndexReader/IndexSearcher searches a frozen point in time snapshot of the index as it existed when it was first opened. You'll have to open a new searcher in order to see the changes. However, there is work underway now to add a reopen method to IndexReader

Re: Reuse TermDocs

2007-11-05 Thread Yonik Seeley
On 11/5/07, Mike Streeton [EMAIL PROTECTED] wrote: Can TermDocs be reused i.e. can you do. TermDocs docs = reader.termDocs(); docs.seek(term1); int i = 0; while (docs.next()) { i++; } docs.seek(term2); int j = 0; while (docs.next()) { j++; } Reuse does seem to work

Re: RE : blank space before special characters

2007-11-05 Thread Erick Erickson
There are several issues here 1 How are you getting the entity reference? You must be encoding the stream (or getting it encoded for you). So the first thing I'd do is un-encode it. 2 After that, it's a question of what Filters/Analyzers you're using. Take a look at ISOLatin1AccentFilter. I'm

Related items

2007-11-05 Thread Cool Coder
Hello Group, I have a requirement in my project where I need to display related items for any select item in the group. I am not sure whether this can be possible. Let me tell you that all our documents are indexed and for any document selected by user, we need to display

Re: Group by in Lucene ?

2007-11-05 Thread Grant Ingersoll
On Nov 5, 2007, at 7:49 AM, Marcus Herou wrote: Thanks. They seem to have got real far in the dev cycle on this. Seems like it will hit the road in Solr 1.3. However I would really like this feature to be developed for Core Lucene, how do I start that process? Develop it yourself you

Re: How do we limit the growth of a Lucene Index?

2007-11-05 Thread Sandeep Mahendru
Hi Marcus, Thanks for providing these suggestions. I will work on these directions with my team. Regards, Sandeep. On 11/5/07, Marcus Herou [EMAIL PROTECTED] wrote: As you suggest you could either roll the index on the local machine or remote and gzip the content fileds on the archive

Re: How do we limit the growth of a Lucene Index?

2007-11-05 Thread Sandeep Mahendru
Hi Grant, Thanks for providing these suggestions. I will work on these directions with my team. Regards, Sandeep. On 11/5/07, Grant Ingersoll [EMAIL PROTECTED] wrote: You could search this list about distributing your indexes, etc. RemoteSearchable may be handy, but you will probably have

Re: Pointers on Messaging Server and Lucene.

2007-11-05 Thread DURGA DEEP
We have an e-mail server / Calendar Server / Address book etc. And we are planning on to use Lucene for searching through the respective stores. I am aware that I have convert every thing in to a format acceptable to Lucene. But I would like to see if any one out there, has done this previously

Needs TermFreqeuency from the index

2007-11-05 Thread Sure
Hi All, We are trying to fetch the TermFreq from the lucene index, using IndexReader.getTermFreqVectors(). But the problem here is, if none of the fields in the index is Vectorized, then the above function call returns null. In my index, none of the fields are vectorized. Without re-creating the

How to generate TermFreqVector from an existing index

2007-11-05 Thread Shailendra Mudgal
Hi All, I have an index without does not have the termFreqVector stored in it. I do not want to recreate the index as it is a big index and took a lot of time while creation. Is their a other way for generating the termFreqVector with the available info for all the documents. Any help will be