We are doing the same exacting thing. We didn't test with so many documents.
The most we tested till now 3 million documents with 3GB file size.
I would be interested in seeing how you maintained replicated indices that r
in sync. The way we did was, run the indexer on each server independently.
Good work Eric (even though UI could be made pretty). We use lucene so I
have some knowledge of it. I could see the features you are using with
lucene (like paging, highlighting, different kinds of pharases). Over all,
good stuff.
Praveen
- Original Message -
From: "Erik Hatcher" <[EMA
From: "Morus Walter" <[EMAIL PROTECTED]>
To: "Lucene Users List"
Sent: Friday, January 14, 2005 8:37 AM
Subject: Re: Best way to find if a document exists, using Reader ...
Praveen Peddi writes:
Does it makes sense to call docFreq or termDocs (which ever is fas
aveen
******
Praveen Peddi
Sr Software Engg, Context Media, Inc.
email:[EMAIL PROTECTED]
Tel: 401.854.3475
Fax: 401.861.3596
web: http://www.contextmedia.com
**
Context Media- "The Leader in Enterprise Content Integration"
Hi,
Sorry for the late response. I didn't cheak the reply till now.
I think sorting on a field that doesn't exist for every doc is throwing
NullPointerException for me (if its of type string). FYI: I am using my own
comparator for string (see below for the code). I am sure something is wrong in
e's code gets the
locale from SortField but I don't have access to SortField in this comparator.
Any ideas Should StringIgnoreCaseSortComparator be just knowing the locale
at the time of instantiating?
Praveen
******
Prav
ene.search.Hits;(Searcher.java:41)
If its a bug in lucene, Will it be fixed in next release? Any suggestions would
be appreciated.
Praveen
******
Praveen Peddi
Sr Software Engg, Context Media, Inc.
email:[EMAIL PROTECTED]
Tel: 401.854.3475
Fax
tions would
be appreciated.
Praveen
******
Praveen Peddi
Sr Software Engg, Context Media, Inc.
email:[EMAIL PROTECTED]
Tel: 401.854.3475
Fax: 401.861.3596
web: http://www.c
tions would
be appreciated.
Praveen
******
Praveen Peddi
Sr Software Engg, Context Media, Inc.
email:[EMAIL PROTECTED]
Tel: 401.854.3475
Fax: 401.861.3596
web: http://www.c
The product looks great. Are you separately indexing by reading info from
all the sites or just issuing federated search to all job sites? I am
impressed by the speed. Its surely fater than dice and all other job search
sites. I understand its in beta version but adding an advanced search option
range query so you will have to say
my_numeric_filed:[80 TO ??]
but this would not work in the a/m example or am I missing something?
regards
Akmal
Am Di, den 14.12.2004 schrieb Praveen Peddi um 16:07:
Even we use lucene for similar purpose except that we index and store
quite
a few fields. Infact
Even we use lucene for similar purpose except that we index and store quite
a few fields. Infact I also update partial documents as people suggested. I
store all the indexed fields so I don't have to build the whole document
again while updating partial document. The reason we do this is due to
ng tokenized field
On Dec 13, 2004, at 2:22 PM, Praveen Peddi wrote:
If its not added to the release code already, is there any reason for it
being not added.
As noted, there is a performance issue with sorting by tokenized fields.
It would seem far more advisable for you to simply add another fi
ne.
Aviran
http://www.aviransplace.com
-Original Message-
From: Praveen Peddi [mailto:[EMAIL PROTECTED]
Sent: Monday, December 13, 2004 10:48 AM
To: lucenelist
Subject: Fw: sorting tokenized field
Hi all,
I forwarding the same email I sent before. Just wanted to try my luck again
:).
Thanks in
Hi all,
I forwarding the same email I sent before. Just wanted to try my luck again
:).
Thanks in advance.
Praveen
- Original Message -
From: "Praveen Peddi" <[EMAIL PROTECTED]>
To: "Lucene Users List" <[EMAIL PROTECTED]>
Sent: Friday, December 10, 2
OTECTED]
Sent: Friday, December 10, 2004 13:53 PM
To: Lucene Users List
Subject: Re: sorting tokenized field
On Dec 10, 2004, at 1:40 PM, Praveen Peddi wrote:
I read that the tokenised fields cannot be sorted. In order to sort
tokenized field, either the application has to duplicate field with
dif
ssage -
From: "Erik Hatcher" <[EMAIL PROTECTED]>
To: "Lucene Users List" <[EMAIL PROTECTED]>
Sent: Friday, December 10, 2004 1:53 PM
Subject: Re: sorting tokenized field
On Dec 10, 2004, at 1:40 PM, Praveen Peddi wrote:
I read that the tokenised fields cannot be sort
this functionality built into
lucene?
Praveen
******
Praveen Peddi
Sr Software Engg, Context Media, Inc.
email:[EMAIL PROTECTED]
Tel: 401.854.3475
Fax: 401.861.3596
web: http://www.contextmedi
But I don't need anything that Limo or Luke is doing, if all my fields are
stored in the index (isStored() will be true for all fields). right?
Praveen
- Original Message -
From: "Luke Francl" <[EMAIL PROTECTED]>
To: "Lucene Users List" <[EMAIL PROTECTED]>
Sent: Thursday, December 09, 20
-
From: "Erik Hatcher" <[EMAIL PROTECTED]>
To: "Lucene Users List" <[EMAIL PROTECTED]>
Sent: Thursday, December 09, 2004 10:00 AM
Subject: Re: partial updating of lucene
On Dec 9, 2004, at 9:48 AM, Praveen Peddi wrote:
But when I am searching, it only searches
retrieved in step 1.
On Wed, 8 Dec 2004 17:53:26 -0500, Praveen Peddi
<[EMAIL PROTECTED]> wrote:
Hi all,
I have a question about updating the lucene document. I know that there
is no API to do that now. So this is what I am doing in order to update
the document with the field "title"
, the search
works fine before and after updating.
Praveen
******
Praveen Peddi
Sr Software Engg, Context Media, Inc.
email:[EMAIL PROTECTED]
Tel: 401.854.3475
Fax: 401.861.3596
web: http://www.contextmedia.com
*
Does anyone know about Ixiasoft server. Its a xml repository/search engine. If
anyone knows about it, does he/she also know how it is compared to Lucene?
Which is fast?
Praveen
**
Praveen Peddi
Sr Software Engg, Context Media, Inc
Chris's RangeFilter does not cache anything where as QueryFilter does
caching. Is it better to add the caching funtionality to RangeFilter also?
or does it not make any difference?
Praveen
- Original Message -
From: "Erik Hatcher" <[EMAIL PROTECTED]>
To: "Lucene Users List" <[EMAIL PROT
If you have more than one lucene application running on the same machine,
they all share the same temp file? Atleast I had this problem when I run my
application in 2 diff instances of weblogic on the same machine.
Praveen
- Original Message -
From: "Otis Gospodnetic" <[EMAIL PROTECTED]>
.
Praveen
******
Praveen Peddi
Sr Software Engg, Context Media, Inc.
email:[EMAIL PROTECTED]
Tel: 401.854.3475
Fax: 401.861.3596
web: http://www.contextmedia.com
**
Con
Use SortField.FIELD_SCORE as the first element in the SortField[] when you
pass it to sort method.
Praveen
- Original Message -
From: "Chris Fraschetti" <[EMAIL PROTECTED]>
To: "Lucene Users List" <[EMAIL PROTECTED]>
Sent: Wednesday, October 13, 2004 3:19 PM
Subject: Re: sorting and scor
ng to figure out which one is the best
and how to solve the above problems.
If you guys have any ideas, Pls shoot them. I would appreciate any help regarding
making lucene clusterable (both indexing and searching).
Praveen
******
Praveen
Hello all,
is this patch going to be part of 1.4.2 release. If so, does anyone know
when this release is due. I am currently using 1.4 final and wanted to
migrate to 1.4.1. But after knowing that there is a memoryleak in 1.4.1
sorting, I have decided to wait until the next release.
Praveen
AIL PROTECTED]>
Sent: Wednesday, September 22, 2004 2:53 AM
Subject: displaying 'pages' of search results...
> Hi
>
> Can u share the searcher.search(query, hitCollector); [light weight paging
> api ]
>
> Code on the form ,may be somebody like me need's
The way we do it is: Get all the document ids, cache them and then get the
first 50, second 50 documents etc. We wrote a light weight paging api on top
of lucene. We call searcher.search(query, hitCollector); Our
HitCollectorImpl implements collect method and just collects the document id
only.
Does it mean you indexed all "not null" fields?. I think you should change
your code so that you always index the fields you want to sort.
In any case, it looks like some of your documents have shortName not null
and not indexed. If you do not have any non-indexed shotnames in the index,
I don't t
We went thru the same scenario as yours. We recently made our application
clsuterable and I wrote our own version of jdbc directory (similar to the
SQLDirectory posted by someone) with our own caching. It was great for
searching for indexing had become a real bottleneck. So we have decided to
move
Infact CJK analyzer also works well with indian languages. Since CJKAnalyzer
considers the multi byte characters as special case, it works with most
asian multi byte characters. I introduced CJKAnalyzer for japanese text
search and we also tested with hindi and telugu languages. All our search
test
Infact we do the same exact thing. Session bean method called search()
delegates to a POJO SearchService. We lazy load the IndexSearch cache it in
memory and invalidate that object when someone else modifies the index. This
trick works wonderfually for us. The search has become faster after caching
=100 in both cases). I am
confident that my indexing time used to vary with change in the merge factor before
(with lucene 1.3 RC3 I think).
Praveen
******
Praveen Peddi
Sr Software Engg, Context Media, Inc.
email:[EMAIL PROTECTED]
Yes Lucene may create new file when you add document but based on merge
factor, minmergedocs, optimize and many other variables, it will merge the
multiple documents into single document. You may not always have a single
file but in most cases very few files.
Praveen
- Original Message -
wrong?
Praveen
**
Praveen Peddi
Sr Software Engg, Context Media, Inc.
email:[EMAIL PROTECTED]
Tel: 401.854.3475
Fax: 401.861.3596
web: http://www.contextmed
If its a web application, you have to cal request.setEncoding("UTF-8")
before reading any parameters. Also make sure html page encoding is
specified as "UTF-8" in the metatag. most web app servers decode the request
paramaters in the system's default encoding algorithm. If u call above
method, I th
this problem before? Is lucene capable of handling 500K documents?
Why would lucene un deploy the application
<000204>
<149401>
<149404>
<000205>
<149404>
Any help is appreciated.
Thanks
Praveen
*********
**
Praveen Peddi
Sr Software Engg, Context Media, Inc.
email:[EMAIL PROTECTED]
Tel: 401.854.3475
Fax: 401.861.3596
web: http://www.contextmedia.com
**
Context Media- "The L
I get compile time errors with FrenchAnalyzer in the constructor with file name and
the method setStemExclusionTable.
Unhandled exception type IOException
How do I fix these errors? Should I just throw IOException or catch the exception in
the method and ignore.
I am using lucene 1.4 final.
Pr
ts relased only today :)). Whats the fix
for it?
Praveen
Praveen
******
Praveen Peddi
Sr Software Engg, Context Media, Inc.
email:[EMAIL PROTECTED]
Tel: 401.854.3475
Fax: 401.861.3596
web: http://www.c
e Users List" <[EMAIL PROTECTED]>
Sent: Thursday, July 01, 2004 10:24 AM
Subject: Re: Sorting and tokenization
> Hi,
>
> You just need to have another title field that is not tokenized - for
> sorting purposes.
>
> Best,
> John
>
> On Thu, 2004-07-01 at 15:15, Praveen
title). So if we make it un tokenized we may lose an improtant functionality.
My question is, is there any way I can achieve sorting the objects by title and
keeping title as tokenized?
Thanks in advance.
Praveen
******
Praveen Peddi
S
aveen
**
Praveen Peddi
Sr Software Engg, Context Media, Inc.
email:[EMAIL PROTECTED]
Tel: 401.854.3475
Fax: 401.861.3596
web: http://www.contextmedia.com
**
Context Media- "The Leader in Enterprise Content I
aveen
******
Praveen Peddi
Sr Software Engg, Context Media, Inc.
email:[EMAIL PROTECTED]
Tel: 401.854.3475
Fax: 401.861.3596
web: http://www.contextmedia.com
**
Context Media- "The Leader in Enterprise Content Integration"
47 matches
Mail list logo