: It seems anaylzers are never get called for UnTokenized fields(Seems no luck
: either using PerFieldAnalyzer).
The label "tokenized" is somewhat missleading .. it assumes that your
analyzer will do some tokenizing (which it doesn't have to do in the case
of the KeywordAnalyzer). The best thing
But what if that word is present in other fields also.
does "docFreq " only look into that particular field ??
On 6/13/06, Chris Hostetter <[EMAIL PROTECTED]> wrote:
Look at the TermEnum class... iterate over the terms in your field, and
docFreq is the number of docs with that term.
: Date:
Hi,
I just looked at the log of my indexing program and saw that after adding 4.5
million documents (16 Gb of text) to a newly created index, it took 7 hours (!)
to carry out the optimization (indexWriter.optimize()). I am running the
indexing program on a (3.2Ghz, 1Gb RAM) desktop computer wit
When I study PrefixQuery, I found a problem.
For example search string: test(*)
This could match testX, testX...X, but not to match test only.
Is it real problem?
CAUTION - Disclaimer *
This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended s
For those with the luxury of a large store of historical queries it's
interesting to note Google's approach to this.
Not some fancy spell checker - just mining searcher behaviour patterns.
Google's Bosworth describes this approach approx 13 minutes into this podcast:
http://www.itconversations.c
Thanks for your replies.
> -Original Message-
> From: Chris Hostetter [mailto:[EMAIL PROTECTED]
> Sent: Tuesday, June 13, 2006 9:13 AM
> To: java-user@lucene.apache.org
> Subject: Re: How can I tell Lucene to also use analyzer for
> Keyword fields
>
>
> : It seems anaylzers are never
The query should be test*
The brackets will be eliminated by the analyzer
Aviran
http://www.aviransplace.com
-Original Message-
From: Flik Shen [mailto:[EMAIL PROTECTED]
Sent: Tuesday, June 13, 2006 6:07 AM
To: java-user@lucene.apache.org
Subject: about PrefixQuery Matching
When I s
Hi Hoss,
Thanks for your quick answer. One of the problems left with the date is
this:
A document (in our case an xml that has many metadata) can have more
than one date, each date with 2 attributes:
Eg:
00-00-1886
In the date index I have for every in the input xml a document
with fields: t
Hi,
I am new to Lucene but feel quite comfortable using the API.
I retrieve the Meta tags and the body from HTML files and their
respective Title and Description from the database and then index
documents.
I use Query class to parse the search query. I get the results and I
display the Title an
: But what if that word is present in other fields also.
: does "docFreq " only look into that particular field ??
docFreq tells you the frequency of a term, a term is a field and a value
-- if you want the counts of a value across multiple fields, you'll have
to add them up yourself. (or make a
: A document (in our case an xml that has many metadata) can have more
: than one date, each date with 2 attributes:
: 00-00-1886
:
: In the date index I have for every in the input xml a document
: with fields: type (document |other), date, art (birthday | deportation |
: death...). For example
Hey Chris,
Thanks for the response.
Chris Hostetter wrote:
: Question is two fold. One, here is the layout I was thinking:
my rule of thumb: if a field is going to contain less then a few dozen
bytes (ie: a date, an email address, etc) you might as well store it ...
it will make your life ea
: I will have millions of entries in my index. Would storing them cause
: any performance issues?
only testing will tell ... but generally speaking i don't think stored
affect query performance very much -- just disk usage.
: >another important thing you should consider is field norms: they don
We keep getting JVM crashes on 1.4.3. I found in the archive that setting a
JVM parameter solved the problem for a few users. We've tried that and it
has not worked. Here's our JVM parameters:
-Xms512m -Xmx1024m -XX:PermSize=256m
We're running Tomcat 5.5.16. Any Idea?
If it's an
We had a similar problem. We discovered that it was basically that eden/from
was out of memory and made two changes and that seems to have helped:
1. Reduce [Max]PermSize to 128M
2. Use the concurrent garbage collector
Good luck.
-h
--- Ross Rankin <[EMAIL PROTECTED]> wrote:
> We keep gettin
Very nice idea. This is the basis of most of the work on
word-sense-disambiguation (e.g. is it "run" as in baseball,
"run" as in stock, or "run" as in stocking? or is "John Smith"
CEO of GM or "John Smith" lover of Pocahantas?). TF/IDF's
not a bad way to compute this, either, though there
are d
Java apps shouldn't throw these kind of seg faults.
Sounds like a problem with memory. Especially if you can't
reproduce the error in the same location. Double especially
if you have the same problems elsewhere under heavy
memory load. I had all kinds of problems with seg faults
in the JVM unt
Hi there,
I'm new in Lucene, and I just know to index a corpus, and run a query. I
thought I can count the times that a word appears in the whole corpus with a
simple query, but it seems to be not so easy. Somebody knows how to do it?
Many Thanks!
Sergi Fernandez.
Hi all.
I'm writting a wrapper component around Lucene (using Avalon) and I'd like
to know the common api usage.
How should I bootstrap the index? Should I create the IndexSearcher when I
initialize the component?
For how long should I let the IndexWriter open? For one document: should I
create
Hi Sergi,
Take a look at TermEnum and TermDocs in the API. You will have to
iterate over these, summing as you go.
You could also, during indexing, store these counts external to Lucene
as you come across the term during the Analysis phase.
Sergi Fernandez wrote:
Hi there,
I'm new in Luc
Ross Rankin wrote:
We keep getting JVM crashes on 1.4.3. I found in the archive that setting a
JVM parameter solved the problem for a few users. We've tried that and it
has not worked. Here's our JVM parameters:
Why not try a new JVM?
Either a newer sun... or a JDK, or a blackdown...
In o
It may well be to do with this Hotspot bug:
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6407471
Note, the bug only appears when you invoke java with the "-server"
command line option.
Kieran
Dan Armbrust wrote:
Ross Rankin wrote:
We keep getting JVM crashes on 1.4.3. I found in t
Hi there,
I'm just starting up with Lucene after reading bits and pieces from
Gospodnetic and Hatcher's "Lucene in Action" (and noticing the API has
changed for 2.0.0).
My question is this: is there a way to detect whether or not the index
exists? I'm currently developing a web application that
Try IndexReader static method indexExists:
http://lucene.apache.org/java/docs/api/org/apache/lucene/index/IndexReader.html#indexExists(java.lang.String)
Kent Fitch
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional comm
Well, I just tried it (opening an IndexSearcher) and got this exception...
java.io.FileNotFoundException: C:\blank\segments (The system cannot find the
file specified)
The directory c:\blank exists, but is empty. So, it seems you can just catch
the exception and infer that your admin users aren'
hi, I'm new to lucene.
Now I want to add full-text search for my website to search articles, images
and bbs topics. I'm not sure to use only one index to search all types of
these, or create 3 indexes for each of type.
If I use only one index, do I have to add a 'type' field to identify
document
Hi,
Kent's suggestion worked (in fact, I had looked for such a method in
other classes of the API -- forgot to look in IndexReader).
It works just as expected :)
Thanks again
On 6/13/06, Erick Erickson <[EMAIL PROTECTED]> wrote:
Well, I just tried it (opening an IndexSearcher) and got this ex
哥们:
这要看你打算如何组织你的索引了.多索引的情况下必须要考虑一个合并的问题
,比如你要查找全文和标题就必须涉及到两个索引的搜索结果,那么你按照什么来合并呢?还有,自己合并结果是一个愚蠢的想法,你必须让lucene替你合并,
这是由于算法的速度决定的.这是多索引最主要的问题,如何去合并各个分区的结果.如果是单分区
,当然你可以把所有相关的东西都放在一个document里,搜索是没有问题的,难度在于"更新",lucene是没有更新操作的,他会先删除doc,再重现添加,如果doc里比较复杂的话你就需要重新去做关于这个doc的索引,如果还涉及到抽取全文,
这个过程需要的时间可就大发了.比如用
I am sorry for my stupid question. Thanks. :-)
Regards,
On 6/13/06, Chris Hostetter <[EMAIL PROTECTED]> wrote:
: But what if that word is present in other fields also.
: does "docFreq " only look into that particular field ??
docFreq tells you the frequency of a term, a term is a field a
29 matches
Mail list logo