Incemental Updating

2006-08-25 Thread neils
Hi, i have two applications on an windows machine. One is the searchengine where the index is can be searched. The second application runs one time on a day which updates (deletions/adding) the index. My question: The index is already opened (Indexreader) by the frist application. Is there a pro

A problem on performance

2006-08-25 Thread luan xl
I have got nearly 4 million chinese documents, each size ranges from 1k - 300k. So I use org.apache.lucene.analysis.cn.ChineseAnalyzer as the analyzer for the text. The index have four fields: content - tokenized not stored title - tokenized and stored path - stored only date - stored only Fo

Re: what do i get with FieldCache.DEFAULT.getStrings(...);

2006-08-25 Thread Chris Hostetter
FieldCache was designed with searching in mind, where there can only be a single indexed Term for each doc (otherwise how would you sort a doc that had two Terms "a" and "z" ?) I'm acctually suprised you are getting any values out instead of an Exception If you index your Field as UN_TOKENIZED y

RE: Sharing Documents between Lucene and DotLucene

2006-08-25 Thread George Aroush
Hi, I am the developer and maintainer of Lucene.Net. DotLucene is the old name, Lucene.Net is the official name. You can find out more about Lucene.Net by visiting this link: http://incubator.apache.org/lucene.net/ I am not sure what you mean by "marshall Document objects from Java to C#". Howe

Re: controlled vocabulary

2006-08-25 Thread Dedian Guo
Hi, Xin, in my understanding , the document in Lucene is a term of collection of fields, while a field is pair of keyword and value, tough it can be indexed or stored or both. That is plain structure. if you wanna index a deep tree structure such as complex objects and keep those relationship insi

Sharing Documents between Lucene and DotLucene

2006-08-25 Thread d rj
Hello- I am just wondering if any one has encountered any good strategies for sharing search records between a Linux based server using Lucene and a Windows based client using DotLucene. I am doing all the indexing on the server ( i.e. the master index is contained on the server) and I would lik

Re: WIll storing docs affect lucene's search performance ?

2006-08-25 Thread Grant Ingersoll
It is on the HEAD version in SVN. See http://wiki.apache.org/jakarta-lucene/SourceRepository for info on checking out from SVN. On Aug 25, 2006, at 10:44 AM, Rupinder Singh Mazara wrote: Where can I find information which version / tag to checkout so as to get the lazy loading verity of l

Re: WIll storing docs affect lucene's search performance ?

2006-08-25 Thread Grant Ingersoll
It is on the HEAD version in SVN. See http://wiki.apache.org/jakarta-lucene/SourceRepository for info on checking out from SVN. -Grant On Aug 25, 2006, at 10:44 AM, Rupinder Singh Mazara wrote: Where can I find information which version / tag to checkout so as to get the lazy loading ver

Re: controlled vocabulary

2006-08-25 Thread Zhao, Xin
now. i have a second thought about one meah term per document. the scoring formula(hits too) is based on document, right? does it mean that we shouldn't have more than one document for each object indexed? for example, i try to index a publication, for some of the information, like title, abstr

Re: Test new query parser?

2006-08-25 Thread Mark Miller
I have received a few inquires about my new query parser. I apologize for making that announcement a little premature. My current implementation only allows simple mixing of proximity queries with boolean queries...complex mixing would result in an incorrect search. A reply to my first email ma

Re: controlled vocabulary

2006-08-25 Thread Zhao, Xin
Hi, Rupinder, Our algorithm is a little different from what PubMed does. We have scoring for each mesh term, which will affect the search result. What do you think the difference would be for these two: document.addField(Field.Keyword("mesh", "")); and document.addField( new Field( "mesh", "

Re: controlled vocabulary

2006-08-25 Thread Rupinder Singh Mazara
Hi Xin then perhaps you can change it to Field.Index.TOKENIZED, but i was not aware that pubmed boosts mesh terms, they broadly classify terms as major and minor, if you plan to use this simple system of classification consider adding the major terms twice to the document ? Zhao, Xin wrote

Re: controlled vocabulary

2006-08-25 Thread Zhao, Xin
Hi, Rupinder, My understanding is Field.Index.NO_NORMS disables index-time boosting and field length normalization at the same time. But I do need index-time boosting to store the scoring of each mesh term. Have I missed anything? Thank you very much for your help, Xin - Original Message

Re: controlled vocabulary

2006-08-25 Thread Rupinder Singh Mazara
hi Xin this is take a look at this you can add multiple fields with the name mesh for ( i=0; i< meshList.size() ; i++ ){ meshTerm = meshList.get(i) document.addField( new Field( "mesh", meshTerm.semanticWebConceptId, Field.Store.YES , Field.Index.NO_NORMS ); } when querying this index

Re: WIll storing docs affect lucene's search performance ?

2006-08-25 Thread Rupinder Singh Mazara
Where can I find information which version / tag to checkout so as to get the lazy loading verity of lucene Grant Ingersoll wrote: Large stored fields can affect performance when you are iterating over your hits (assuming you are not interested in the value of the stored field at that point

Re: what do i get with FieldCache.DEFAULT.getStrings(...);

2006-08-25 Thread Chris Lu
Not sure of the solution though. But FieldCache.DEFAULT.getStrings() is returning a String[], with one String for each document. Seems your field is analyzed into multiple String values. Chris Lu --- Lucene Search on Any Databases/Applications h

Re: Lucene vs Database Search

2006-08-25 Thread Chris Lu
Performance wise, Lucene search is much faster for full-text search. If you only do "Employee ID" search, or exact match of Names, database's search can do a good job already. If it's regarding the index maintenance, you should have a updated_at column for each record, and select the latest recor

Re: controlled vocabulary

2006-08-25 Thread Zhao, Xin
Hi, Thank you for your reply. I had thought about the first two solutions before. If we apply one doc for each MeSH term, it would be 26 docs for each item digested(we actually need the top 25 MeSH terms generated), would it be any problem if there are too many documents? If we apply field name

what do i get with FieldCache.DEFAULT.getStrings(...);

2006-08-25 Thread Martin Braun
hello, I am using FieldCache.DEFAULT.getStrings in combination with an own HitCollector (I loop through all results and count the number of occurences of a fieldvalue in the results). My Problem is that I have Filed values like dt.|lat or ger.|eng. an it seems that only the last token of the field

Index Stat Functions

2006-08-25 Thread Mag Gam
Hi All, I am trying to get some stats on my Index such as: 1) When it was created 2) Size in MB of the index 3) If I can get the size, date of each file in the index. For example: I index 100 files, is it possible for me to get their name, size, and date when the last modification of that file (

Re: Upgrade from 1.4.3 to 1.9.1. Any problems with using existing index files?

2006-08-25 Thread Michael McCandless
We are upgrading from Lucene 1.4.3 to 1.9.1, and have many customers with large existing index files. In our testing we have reused large indexes created in 1.4.3 in 1.9.1 without incident. We have looked through the changelog and the code and can't see any reason there should be any problems

Re: Index-Format difference between 1.4.3 and 2.0

2006-08-25 Thread Gopikrishnan Subramani
Not sure if it helps, but I have been using Luke (webstart version) from it's website for quite sometime now for inspecting and manipulating my indexes built using Lucene 2.0. I may not be a power user of Luke in that sense, but I haven't found any issues using the basic features. Gopi On 8/25/

Re: Index-Format difference between 1.4.3 and 2.0

2006-08-25 Thread lude
Hi Andrzej, a month ago you mentioned a new Lucene 2.0 compatible Version of luke. Does it exist somewhere? Thanks lude On 7/20/06, Andrzej Bialecki <[EMAIL PROTECTED]> wrote: lude wrote: >> As Luke was release with a Lucene-1.9 > > Where did you get this information? From all I know Lu