Hi,
i have two applications on an windows machine. One is the searchengine where
the index is can be searched.
The second application runs one time on a day which updates
(deletions/adding)
the index.
My question:
The index is already opened (Indexreader) by the frist application. Is there
a pro
I have got nearly 4 million chinese documents, each size ranges from 1k -
300k. So I use
org.apache.lucene.analysis.cn.ChineseAnalyzer as the analyzer for the text.
The index have
four fields:
content - tokenized not stored
title - tokenized and stored
path - stored only
date - stored only
Fo
FieldCache was designed with searching in mind, where there can only be a
single indexed Term for each doc (otherwise how would you sort a doc that
had two Terms "a" and "z" ?) I'm acctually suprised you are getting any
values out instead of an Exception
If you index your Field as UN_TOKENIZED y
Hi,
I am the developer and maintainer of Lucene.Net.
DotLucene is the old name, Lucene.Net is the official name. You can find
out more about Lucene.Net by visiting this link:
http://incubator.apache.org/lucene.net/
I am not sure what you mean by "marshall Document objects from Java to C#".
Howe
Hi, Xin, in my understanding , the document in Lucene is a term of
collection of fields, while a field is pair of keyword and value, tough it
can be indexed or stored or both. That is plain structure. if you wanna
index a deep tree structure such as complex objects and keep those
relationship insi
Hello-
I am just wondering if any one has encountered any good strategies for
sharing search records between a Linux based server using Lucene and a
Windows based client using DotLucene.
I am doing all the indexing on the server ( i.e. the master index is
contained on the server) and I would lik
It is on the HEAD version in SVN.
See http://wiki.apache.org/jakarta-lucene/SourceRepository for info
on checking out from SVN.
On Aug 25, 2006, at 10:44 AM, Rupinder Singh Mazara wrote:
Where can I find information which version / tag to checkout so as to
get the lazy loading verity of l
It is on the HEAD version in SVN.
See http://wiki.apache.org/jakarta-lucene/SourceRepository for info
on checking out from SVN.
-Grant
On Aug 25, 2006, at 10:44 AM, Rupinder Singh Mazara wrote:
Where can I find information which version / tag to checkout so as to
get the lazy loading ver
now. i have a second thought about one meah term per document. the scoring
formula(hits too) is based on document, right? does it mean that we
shouldn't have more than one document for each object indexed?
for example, i try to index a publication, for some of the information,
like title, abstr
I have received a few inquires about my new query parser. I apologize
for making that announcement a little premature. My current
implementation only allows simple mixing of proximity queries with
boolean queries...complex mixing would result in an incorrect search. A
reply to my first email ma
Hi, Rupinder,
Our algorithm is a little different from what PubMed does. We have scoring
for each mesh term, which will affect the search result.
What do you think the difference would be for these two:
document.addField(Field.Keyword("mesh", ""));
and
document.addField( new Field( "mesh", "
Hi Xin
then perhaps you can change it to Field.Index.TOKENIZED, but i was
not aware that pubmed boosts mesh terms, they broadly classify terms as
major and minor, if you plan to use this simple system of classification
consider adding the major terms twice to the document ?
Zhao, Xin wrote
Hi, Rupinder,
My understanding is Field.Index.NO_NORMS disables index-time boosting and
field length normalization at the same time. But I do need index-time
boosting to store the scoring of each mesh term. Have I missed anything?
Thank you very much for your help,
Xin
- Original Message
hi Xin
this is take a look at this you can add multiple fields with the name
mesh
for ( i=0; i< meshList.size() ; i++ ){
meshTerm = meshList.get(i)
document.addField( new Field( "mesh", meshTerm.semanticWebConceptId,
Field.Store.YES , Field.Index.NO_NORMS );
}
when querying this index
Where can I find information which version / tag to checkout so as to
get the lazy loading verity of lucene
Grant Ingersoll wrote:
Large stored fields can affect performance when you are iterating over
your hits (assuming you are not interested in the value of the stored
field at that point
Not sure of the solution though. But FieldCache.DEFAULT.getStrings()
is returning a String[], with one String for each document. Seems your
field is analyzed into multiple String values.
Chris Lu
---
Lucene Search on Any Databases/Applications
h
Performance wise, Lucene search is much faster for full-text search.
If you only do "Employee ID" search, or exact match of Names,
database's search can do a good job already.
If it's regarding the index maintenance, you should have a updated_at
column for each record, and select the latest recor
Hi,
Thank you for your reply. I had thought about the first two solutions
before. If we apply one doc for each MeSH term, it would be 26 docs for each
item digested(we actually need the top 25 MeSH terms generated), would it be
any problem if there are too many documents? If we apply field name
hello,
I am using FieldCache.DEFAULT.getStrings in combination with an own
HitCollector (I loop through all results and count the number of
occurences of a fieldvalue in the results).
My Problem is that I have Filed values like dt.|lat or ger.|eng. an it
seems that only the last token of the field
Hi All,
I am trying to get some stats on my Index such as:
1) When it was created
2) Size in MB of the index
3) If I can get the size, date of each file in the index. For example: I
index 100 files, is it possible for me to get their name, size, and date
when the last modification of that file (
We are upgrading from Lucene 1.4.3 to 1.9.1, and have many customers
with large existing index files. In our testing we have reused large
indexes created in 1.4.3 in 1.9.1 without incident. We have looked
through the changelog and the code and can't see any reason there should
be any problems
Not sure if it helps, but I have been using Luke (webstart version) from
it's website for quite sometime now for inspecting and manipulating my
indexes built using Lucene 2.0. I may not be a power user of Luke in that
sense, but I haven't found any issues using the basic features.
Gopi
On 8/25/
Hi Andrzej,
a month ago you mentioned a new Lucene 2.0 compatible Version of luke.
Does it exist somewhere?
Thanks
lude
On 7/20/06, Andrzej Bialecki <[EMAIL PROTECTED]> wrote:
lude wrote:
>> As Luke was release with a Lucene-1.9
>
> Where did you get this information? From all I know Lu
23 matches
Mail list logo