Re: Scalability of Lucene indexes

2005-02-19 Thread Praveen Peddi
We are doing the same exacting thing. We didn't test with so many documents. The most we tested till now 3 million documents with 3GB file size. I would be interested in seeing how you maintained replicated indices that r in sync. The way we did was, run the indexer on each server independently.

Re: Lucene in the Humanties

2005-02-18 Thread Praveen Peddi
Good work Eric (even though UI could be made pretty). We use lucene so I have some knowledge of it. I could see the features you are using with lucene (like paging, highlighting, different kinds of pharases). Over all, good stuff. Praveen - Original Message - From: "Erik Hatcher" <[EMA

Re: Best way to find if a document exists, using Reader ...

2005-01-14 Thread Praveen Peddi
From: "Morus Walter" <[EMAIL PROTECTED]> To: "Lucene Users List" Sent: Friday, January 14, 2005 8:37 AM Subject: Re: Best way to find if a document exists, using Reader ... Praveen Peddi writes: Does it makes sense to call docFreq or termDocs (which ever is fas

Best way to find if a document exists, using Reader ...

2005-01-14 Thread Praveen Peddi
aveen ****** Praveen Peddi Sr Software Engg, Context Media, Inc. email:[EMAIL PROTECTED] Tel: 401.854.3475 Fax: 401.861.3596 web: http://www.contextmedia.com ** Context Media- "The Leader in Enterprise Content Integration"

Re: sorting on a field that can have null values

2004-12-29 Thread Praveen Peddi
Hi, Sorry for the late response. I didn't cheak the reply till now. I think sorting on a field that doesn't exist for every doc is throwing NullPointerException for me (if its of type string). FYI: I am using my own comparator for string (see below for the code). I am sure something is wrong in

sorting on a non english based locale field

2004-12-29 Thread Praveen Peddi
e's code gets the locale from SortField but I don't have access to SortField in this comparator. Any ideas Should StringIgnoreCaseSortComparator be just knowing the locale at the time of instantiating? Praveen ****** Prav

sorting on a field that can have null values (resend)

2004-12-21 Thread Praveen Peddi
ene.search.Hits;(Searcher.java:41) If its a bug in lucene, Will it be fixed in next release? Any suggestions would be appreciated. Praveen ****** Praveen Peddi Sr Software Engg, Context Media, Inc. email:[EMAIL PROTECTED] Tel: 401.854.3475 Fax

sorting on a field that can have null values

2004-12-20 Thread Praveen Peddi
tions would be appreciated. Praveen ****** Praveen Peddi Sr Software Engg, Context Media, Inc. email:[EMAIL PROTECTED] Tel: 401.854.3475 Fax: 401.861.3596 web: http://www.c

sorting on a field that can have null values

2004-12-20 Thread Praveen Peddi
tions would be appreciated. Praveen ****** Praveen Peddi Sr Software Engg, Context Media, Inc. email:[EMAIL PROTECTED] Tel: 401.854.3475 Fax: 401.861.3596 web: http://www.c

Re: Lucene appreciation

2004-12-16 Thread Praveen Peddi
The product looks great. Are you separately indexing by reading info from all the sites or just issuing federated search to all job sites? I am impressed by the speed. Its surely fater than dice and all other job search sites. I understand its in beta version but adding an advanced search option

Re: Opinions: Using Lucene as a thin database

2004-12-14 Thread Praveen Peddi
range query so you will have to say my_numeric_filed:[80 TO ??] but this would not work in the a/m example or am I missing something? regards Akmal Am Di, den 14.12.2004 schrieb Praveen Peddi um 16:07: Even we use lucene for similar purpose except that we index and store quite a few fields. Infact

Re: Opinions: Using Lucene as a thin database

2004-12-14 Thread Praveen Peddi
Even we use lucene for similar purpose except that we index and store quite a few fields. Infact I also update partial documents as people suggested. I store all the indexed fields so I don't have to build the whole document again while updating partial document. The reason we do this is due to

Re: sorting tokenized field

2004-12-13 Thread Praveen Peddi
ng tokenized field On Dec 13, 2004, at 2:22 PM, Praveen Peddi wrote: If its not added to the release code already, is there any reason for it being not added. As noted, there is a performance issue with sorting by tokenized fields. It would seem far more advisable for you to simply add another fi

Re: sorting tokenized field

2004-12-13 Thread Praveen Peddi
ne. Aviran http://www.aviransplace.com -Original Message- From: Praveen Peddi [mailto:[EMAIL PROTECTED] Sent: Monday, December 13, 2004 10:48 AM To: lucenelist Subject: Fw: sorting tokenized field Hi all, I forwarding the same email I sent before. Just wanted to try my luck again :). Thanks in

Fw: sorting tokenized field

2004-12-13 Thread Praveen Peddi
Hi all, I forwarding the same email I sent before. Just wanted to try my luck again :). Thanks in advance. Praveen - Original Message - From: "Praveen Peddi" <[EMAIL PROTECTED]> To: "Lucene Users List" <[EMAIL PROTECTED]> Sent: Friday, December 10, 2

Re: sorting tokenized field

2004-12-10 Thread Praveen Peddi
OTECTED] Sent: Friday, December 10, 2004 13:53 PM To: Lucene Users List Subject: Re: sorting tokenized field On Dec 10, 2004, at 1:40 PM, Praveen Peddi wrote: I read that the tokenised fields cannot be sorted. In order to sort tokenized field, either the application has to duplicate field with dif

Re: sorting tokenized field

2004-12-10 Thread Praveen Peddi
ssage - From: "Erik Hatcher" <[EMAIL PROTECTED]> To: "Lucene Users List" <[EMAIL PROTECTED]> Sent: Friday, December 10, 2004 1:53 PM Subject: Re: sorting tokenized field On Dec 10, 2004, at 1:40 PM, Praveen Peddi wrote: I read that the tokenised fields cannot be sort

sorting tokenized field

2004-12-10 Thread Praveen Peddi
this functionality built into lucene? Praveen ****** Praveen Peddi Sr Software Engg, Context Media, Inc. email:[EMAIL PROTECTED] Tel: 401.854.3475 Fax: 401.861.3596 web: http://www.contextmedi

Re: partial updating of lucene

2004-12-09 Thread Praveen Peddi
But I don't need anything that Limo or Luke is doing, if all my fields are stored in the index (isStored() will be true for all fields). right? Praveen - Original Message - From: "Luke Francl" <[EMAIL PROTECTED]> To: "Lucene Users List" <[EMAIL PROTECTED]> Sent: Thursday, December 09, 20

Re: partial updating of lucene

2004-12-09 Thread Praveen Peddi
- From: "Erik Hatcher" <[EMAIL PROTECTED]> To: "Lucene Users List" <[EMAIL PROTECTED]> Sent: Thursday, December 09, 2004 10:00 AM Subject: Re: partial updating of lucene On Dec 9, 2004, at 9:48 AM, Praveen Peddi wrote: But when I am searching, it only searches

Re: partial updating of lucene

2004-12-09 Thread Praveen Peddi
retrieved in step 1. On Wed, 8 Dec 2004 17:53:26 -0500, Praveen Peddi <[EMAIL PROTECTED]> wrote: Hi all, I have a question about updating the lucene document. I know that there is no API to do that now. So this is what I am doing in order to update the document with the field "title"

partial updating of lucene

2004-12-08 Thread Praveen Peddi
, the search works fine before and after updating. Praveen ****** Praveen Peddi Sr Software Engg, Context Media, Inc. email:[EMAIL PROTECTED] Tel: 401.854.3475 Fax: 401.861.3596 web: http://www.contextmedia.com *

Lucene Vs Ixiasoft

2004-12-08 Thread Praveen Peddi
Does anyone know about Ixiasoft server. Its a xml repository/search engine. If anyone knows about it, does he/she also know how it is compared to Lucene? Which is fast? Praveen ** Praveen Peddi Sr Software Engg, Context Media, Inc

Re: Numeric Range Restrictions: Queries vs Filters

2004-11-23 Thread Praveen Peddi
Chris's RangeFilter does not cache anything where as QueryFilter does caching. Is it better to add the caching funtionality to RangeFilter also? or does it not make any difference? Praveen - Original Message - From: "Erik Hatcher" <[EMAIL PROTECTED]> To: "Lucene Users List" <[EMAIL PROT

Re: False Locking Conflict?

2004-11-19 Thread Praveen Peddi
If you have more than one lucene application running on the same machine, they all share the same temp file? Atleast I had this problem when I run my application in 2 diff instances of weblogic on the same machine. Praveen - Original Message - From: "Otis Gospodnetic" <[EMAIL PROTECTED]>

Using Shared directory as lucene index in cluster

2004-10-14 Thread Praveen Peddi
. Praveen ****** Praveen Peddi Sr Software Engg, Context Media, Inc. email:[EMAIL PROTECTED] Tel: 401.854.3475 Fax: 401.861.3596 web: http://www.contextmedia.com ** Con

Re: sorting and score ordering

2004-10-13 Thread Praveen Peddi
Use SortField.FIELD_SCORE as the first element in the SortField[] when you pass it to sort method. Praveen - Original Message - From: "Chris Fraschetti" <[EMAIL PROTECTED]> To: "Lucene Users List" <[EMAIL PROTECTED]> Sent: Wednesday, October 13, 2004 3:19 PM Subject: Re: sorting and scor

Making lucene work in weblogic cluster

2004-10-08 Thread Praveen Peddi
ng to figure out which one is the best and how to solve the above problems. If you guys have any ideas, Pls shoot them. I would appreciate any help regarding making lucene clusterable (both indexing and searching). Praveen ****** Praveen

Re: Memory usage: IndexSearcher & Sort

2004-10-01 Thread Praveen Peddi
Hello all, is this patch going to be part of 1.4.2 release. If so, does anyone know when this release is due. I am currently using 1.4 final and wanted to migrate to 1.4.1. But after knowing that there is a memoryleak in 1.4.1 sorting, I have decided to wait until the next release. Praveen

Re: displaying 'pages' of search results...

2004-09-22 Thread Praveen Peddi
AIL PROTECTED]> Sent: Wednesday, September 22, 2004 2:53 AM Subject: displaying 'pages' of search results... > Hi > > Can u share the searcher.search(query, hitCollector); [light weight paging > api ] > > Code on the form ,may be somebody like me need's

Re: displaying 'pages' of search results...

2004-09-21 Thread Praveen Peddi
The way we do it is: Get all the document ids, cache them and then get the first 50, second 50 documents etc. We wrote a light weight paging api on top of lucene. We call searcher.search(query, hitCollector); Our HitCollectorImpl implements collect method and just collects the document id only.

Re: problem with SortField[] in search method (newbie)

2004-09-15 Thread Praveen Peddi
Does it mean you indexed all "not null" fields?. I think you should change your code so that you always index the fields you want to sort. In any case, it looks like some of your documents have shortName not null and not indexed. If you do not have any non-indexed shotnames in the index, I don't t

Re: Moving from a single server to a cluster

2004-09-08 Thread Praveen Peddi
We went thru the same scenario as yours. We recently made our application clsuterable and I wrote our own version of jdbc directory (similar to the SQLDirectory posted by someone) with our own caching. It was great for searching for indexing had become a real bottleneck. So we have decided to move

Re: Lucene for Indian Languages

2004-08-23 Thread Praveen Peddi
Infact CJK analyzer also works well with indian languages. Since CJKAnalyzer considers the multi byte characters as special case, it works with most asian multi byte characters. I introduced CJKAnalyzer for japanese text search and we also tested with hindi and telugu languages. All our search test

Re: lucene and ejb applications

2004-08-20 Thread Praveen Peddi
Infact we do the same exact thing. Session bean method called search() delegates to a POJO SearchService. We lazy load the IndexSearch cache it in memory and invalidate that object when someone else modifies the index. This trick works wonderfually for us. The search has become faster after caching

merge factor and minMergeDocs

2004-07-23 Thread Praveen Peddi
=100 in both cases). I am confident that my indexing time used to vary with change in the merge factor before (with lucene 1.3 RC3 I think). Praveen ****** Praveen Peddi Sr Software Engg, Context Media, Inc. email:[EMAIL PROTECTED]

Re: Large index files

2004-07-23 Thread Praveen Peddi
Yes Lucene may create new file when you add document but based on merge factor, minmergedocs, optimize and many other variables, it will merge the multiple documents into single document. You may not always have a single file but in most cases very few files. Praveen - Original Message -

No change in the indexing time after increase the merge factor

2004-07-20 Thread Praveen Peddi
wrong? Praveen ** Praveen Peddi Sr Software Engg, Context Media, Inc. email:[EMAIL PROTECTED] Tel: 401.854.3475 Fax: 401.861.3596 web: http://www.contextmed

Re: Problems indexing Japanese with CJKAnalyzer

2004-07-15 Thread Praveen Peddi
If its a web application, you have to cal request.setEncoding("UTF-8") before reading any parameters. Also make sure html page encoding is specified as "UTF-8" in the metatag. most web app servers decode the request paramaters in the system's default encoding algorithm. If u call above method, I th

wierd error in weblogic due to lucene

2004-07-13 Thread Praveen Peddi
this problem before? Is lucene capable of handling 500K documents? Why would lucene un deploy the application <000204> <149401> <149404> <000205> <149404> Any help is appreciated. Thanks Praveen *********

snowball analyzer and default analyzers in lucene core

2004-07-02 Thread Praveen Peddi
** Praveen Peddi Sr Software Engg, Context Media, Inc. email:[EMAIL PROTECTED] Tel: 401.854.3475 Fax: 401.861.3596 web: http://www.contextmedia.com ** Context Media- "The L

Compile errors in FrenchAnalyzer

2004-07-02 Thread Praveen Peddi
I get compile time errors with FrenchAnalyzer in the constructor with file name and the method setStemExclusionTable. Unhandled exception type IOException How do I fix these errors? Should I just throw IOException or catch the exception in the method and ignore. I am using lucene 1.4 final. Pr

languages lucene can support

2004-07-01 Thread Praveen Peddi
ts relased only today :)). Whats the fix for it? Praveen Praveen ****** Praveen Peddi Sr Software Engg, Context Media, Inc. email:[EMAIL PROTECTED] Tel: 401.854.3475 Fax: 401.861.3596 web: http://www.c

Re: Sorting and tokenization

2004-07-01 Thread Praveen Peddi
e Users List" <[EMAIL PROTECTED]> Sent: Thursday, July 01, 2004 10:24 AM Subject: Re: Sorting and tokenization > Hi, > > You just need to have another title field that is not tokenized - for > sorting purposes. > > Best, > John > > On Thu, 2004-07-01 at 15:15, Praveen

Sorting and tokenization

2004-07-01 Thread Praveen Peddi
title). So if we make it un tokenized we may lose an improtant functionality. My question is, is there any way I can achieve sorting the objects by title and keeping title as tokenized? Thanks in advance. Praveen ****** Praveen Peddi S

Do we really need CJKAnalyzer to search japanese characters

2004-06-28 Thread Praveen Peddi
aveen ** Praveen Peddi Sr Software Engg, Context Media, Inc. email:[EMAIL PROTECTED] Tel: 401.854.3475 Fax: 401.861.3596 web: http://www.contextmedia.com ** Context Media- "The Leader in Enterprise Content I

Do we really need CJKAnalyzer to search japanese characters

2004-06-25 Thread Praveen Peddi
aveen ****** Praveen Peddi Sr Software Engg, Context Media, Inc. email:[EMAIL PROTECTED] Tel: 401.854.3475 Fax: 401.861.3596 web: http://www.contextmedia.com ** Context Media- "The Leader in Enterprise Content Integration"