optimization issue

2007-10-16 Thread Melanie Langlois
Hi, It looks like my index is not really optimized, because in the index directory I can see 28 .cfs files (see below) Basically I see files modified on October 3trhd which are only 3KB, when the last files date is October 15. There is only two big files(_fdm.cfs and _uz1.cfs) of 34 and 29

Re: use lucene as datastore?

2007-10-16 Thread Chris Lu
No experience on this. But there are two points I can think of: 1) you can use compressed field to store the text 2) use the hash code of the path as the key -- Chris Lu - Instant Scalable Full-Text Search On Any Database/Application site: http://www.dbsight.net demo: http

use lucene as datastore?

2007-10-16 Thread argh
Hi, I'm adding Lucene to an existing project where a daemon monitors a frequently updated file system tree containing lots of expensive-to-parse files for changes in order to keep cached metadata up to date about each file. (File writes unfortunately cannot be routed to allow for more efficient

ApacheCon

2007-10-16 Thread Grant Ingersoll
If you are planning on attending ApacheCon in Atlanta, let us know if you are interested in attending the Birds of a Feather meeting by expressing that interest at: http://wiki.apache.org/apachecon/ BirdsOfaFeatherUs07 -Grant -- Grant Ingersoll http://lucene.granti

Re: Fieldable and Document class.

2007-10-16 Thread Erick Erickson
There is no "update in place" functionality in Lucene. You can use IndexModifier which (under the covers) does a delete/add. How dynamic is "pretty dynamic"? updating one doc/hour? Updating 1,000 docs a second? Best Erick On 10/16/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: > > HI F

Fieldable and Document class.

2007-10-16 Thread Durga . Tirunagari
HI Folks, How to address the a change in the name of the file (or) the location of a file that was indexed.?. Is re-indexing this file the only way out ?. Since most of our files, we are planning on indexing are pretty dynamic. Thank you _Durga

Re: Lucene

2007-10-16 Thread Grant Ingersoll
Yes, with a little bit of work, as there is nothing out of the box for it. If you store term vectors (or re-analyze the document) you can use the sample code from my ApacheCon 2005 talk (http://www.cnlp.org/ apachecon2005/, which also covers how to use TermVectors) OR you can try implement

Lucene

2007-10-16 Thread Jae Joo
Hi, Does Lucene have the function to return top 5 most frequency keywords in the article? Thanks, Jae

Re: Customized search with Lucene?

2007-10-16 Thread Doron Cohen
Where and how do you store this type of info: If user U1 search for query Q7 boost doc D5 by B17 If user U2 search for query Q3 boost doc D15 by B2 Seems lots of info, and it must be persistent. Perhaps o.a.l.search.function can help - assuming you have this info available at search time, and

Chinese test resources wanted

2007-10-16 Thread Ivan Vasilev
Hi Guys, We just implemented multi language support in our application. We tested it with some files which content is copy/pasted from some Chinese sites and everything seems to work correctly, but we need to test it more thoroughly. Any suggestions from were to get some testing resources and

Re: Lucene ID and scoring.

2007-10-16 Thread Erick Erickson
Can't answer the second question, but the answer to the first is "no". Not only are Lucene IDs internally generated, but they change when you delete/optimize. Would caching filters help? Especially if you pre-computed them at, say, warm up? What problem are you trying to solve anyway? A statemen

Re: Number of terms

2007-10-16 Thread sandeep chawla
Thanks a lot but one question- IndexOutput class doesn't have a method writeFloat ? How do u write float to index.. shall i create public method writeFloat as public void writeFloat(float f) { writeByte((byte)(f >>32); writeByte((byte)(f >>16); writeByte((byte)(f >>8); writeB

Re: Number of terms

2007-10-16 Thread Karl Wettin
16 okt 2007 kl. 13.07 skrev sandeep chawla: While calculating the lengthnorm- there is a precision-loss. http://lucene.apache.org/java/docs/scoring.html#Score%20Boosting How to avoid the precision loss? You replace the use of bytes to floats when storing the norms (DocumentsWriter) in the f

Number of terms

2007-10-16 Thread sandeep chawla
Hi, While calculating the lengthnorm- there is a precision-loss. http://lucene.apache.org/java/docs/scoring.html#Score%20Boosting How to avoid the precision loss? Thanks Sandeep -- SANDEEP CHAWLA House No- 23 10th main BTM 1st Stage Bangalore Mobile: 91-9986150603