There is an issue in JIRA, see http://issues.apache.org/jira/browse/LUCENE-645
So I guess you're not the only one.
/Ronnie
Citerar Mark Miller <[EMAIL PROTECTED]>:
> Am I the only one that gets back a string missing the final character
> when using the highlighter and the null fragmenter? I al
Tomi NA wrote:
On 8/12/06, Mark Miller <[EMAIL PROTECTED]> wrote:
I've made a nice little archive application with lucene. I made it to
handle our largest need: 2.5 million docs or so on a single server. Now
the powers that be say: lets use it for a 30+ million document archive
on a single serve
On 8/12/06, Mark Miller <[EMAIL PROTECTED]> wrote:
I've made a nice little archive application with lucene. I made it to
handle our largest need: 2.5 million docs or so on a single server. Now
the powers that be say: lets use it for a 30+ million document archive
on a single server! (each doc siz
Am I the only one that gets back a string missing the final character
when using the highlighter and the null fragmenter? I always have to add
the last character of what I have asked to be highlighted to what the
highlighter returns when trying to hit highlight an entire
document...anyone else
I've made a nice little archive application with lucene. I made it to
handle our largest need: 2.5 million docs or so on a single server. Now
the powers that be say: lets use it for a 30+ million document archive
on a single server! (each doc size maybe 10k max...as small as a 1 or
2k) Please t
I'd do neither You can look at other analyzers, WhitespaceAnalyzer comes
to mind, breaks on whitespace and leavs all special characters in. There are
several to choose from.
And, if you are indexing other fields and want them handled differently, use
a PerFieldAnalyzerWrapper.
Finally, you migh
: ... right, thanks, now I see what you mean. In other words, IndexReader
: provides the ability to read/iterate terms and docs, but caching the term
: values per doc is for a higher layer - this way keeping IndexReader simpler
: and maintainable. So I guess Oliver can continue with the change as h
: we have recently noticed that doing a locale sensitive sort on a field that
: is missing from some docs causes an NPE inside the call to Collator#compare
: at FieldSortedHitQueue line 320 (Lucene 2.0 src):
: >From looking at the standard String, float and int sorting and reading
: LUCENE-406 I
This is in the FAQ:
http://wiki.apache.org/jakarta-lucene/LuceneFAQ#head-75566820ee94a425c7e2950ac61d24e405fbd914
Citerar kevin <[EMAIL PROTECTED]>:
> Hi,
> how to highlight the search key word in lucene's search results? pls
> give advise,thanks!
>
> --
Hi,
how to highlight the search key word in lucene's search results? pls
give advise,thanks!
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
How do you index your documents? Are you releasing old resources? Can you use a
profiler to see referenced objects?
I've experienced the same problem when indexing xml files which were parsed with
xalan, and the memory leak in that case was in xalan. Switching to saxon solved
the problems for us.
Large stored fields can affect performance when you are iterating
over your hits (assuming you are not interested in the value of the
stored field at that point in time) for a results display since all
Fields are loaded when getting the Document. The SVN trunk has a
version of lazy loadi
Hello, can anyone help?
We're experiencing the following issue an Widows Intranet website:
Following a Tomcat restart, our application has Lucene creating a single
new index in a RAMDirectory, followed by continuous creation of additional
index entries as new content is published.
During the
Hello all!,
How can i tokenize money
values?
Example: $25000, u$s45000, etc,
so that i can search for "$25000" or "$250*"
I think de "StandardTokenizer" class is the responsible for tokenize the content of the field based on
the grammar generated by javaCC, the question is: I hav
On Aug 11, 2006, at 1:23 AM, Martin Braun wrote:
Hello Adrian,
I am indexing some text in a java object that is "%772B" with the
standard analyser and Lucene 2.
Should I be able to search for this with the same text as the
query, or
do I need to do any escaping of characters?
Besides Luk
On Friday 11 August 2006 15:07, Prasenjit Mukherjee wrote:
> I have a requirement ( use highlighter) to store the doc content
> somewhere., and I am not allowed to use a RDBMS. I am thinking of using
> Lucene's Field with (Field.Store.YES and Field.Index.NO) to store the
> doc content. Will it hav
I have a requirement ( use highlighter) to store the doc content
somewhere., and I am not allowed to use a RDBMS. I am thinking of using
Lucene's Field with (Field.Store.YES and Field.Index.NO) to store the
doc content. Will it have any negative affect on my search performance ?
I think I hav
SVN Head does. Has not been released yet.
See http://issues.apache.org/jira/browse/LUCENE-545
and
http://issues.apache.org/jira/browse/LUCENE-609
for some of the issues with it.
On Aug 11, 2006, at 8:19 AM, Dragon Fly wrote:
Mike, which version of Lucene supports lazy loading? Thanks.
Jason is right. I think, even Im not expert on lucene too, your newly
added document cann't recreate terms for field with analyzer, because
field text in empty.
There is very hairy solution - hack a IndexReader, FieldInfosWriter and
use addIndexes.
Lucene is "only" a fulltext search library, n
Mike, which version of Lucene supports lazy loading? Thanks.
From: Michael McCandless <[EMAIL PROTECTED]>
Reply-To: java-user@lucene.apache.org
To: java-user@lucene.apache.org
Subject: Re: Field compression too slow
Date: Fri, 11 Aug 2006 06:59:58 -0400
I can share the data.. but it would be q
Dear All,
How can i get the number of hits in a document from a DASL query result. I am
using following Syntax.
\n" +
"http://jakarta.apache.org/slide/\";>" +
"" +
"" +
"" +
"" +
"" +
"" +
"" +
"" +
"" +
""+scope+""+
"infinity" +
"" +
"" +
"" +
//Content Bas
On Fri, Aug 11, 2006 at 01:22:26PM +0200, Simon Willnauer wrote:
> Sure you can do this.
> You index your document with the keywords assigned to the document and
> search with and Boolean Query to get all document having the keyword
> 1,2,...n-1,n. Just be aware that there are limitations to boolea
Sure you can do this.
You index your document with the keywords assigned to the document and
search with and Boolean Query to get all document having the keyword
1,2,...n-1,n. Just be aware that there are limitations to boolean
queries in lucene. see setMaxClauseCount(). which can be very memory
c
On Fri, Aug 11, 2006 at 08:11:31PM +1000, Jason Polites wrote:
> Yes you could use lucene for this, but it may be overkill for your
> requirement. If I understand you correctly, all you need to is find
> documents which match "any" of the words in your list? Do you need to rank
> the results? I
I can share the data.. but it would be quicker for you to just pull out
some
random text from anywhere you like.
OK, I hear you. I'll pull together some test data ... thanks.
Also.. upon reflection I'm not certain using compression inside the
index is
really a valuable process without laz
Hi,
I have seen previous discussions on the implementation of BM25 in
Lucene, and still do not know the current progress on this. Could
anybody give me some guidance on this? Such as some work has been done
or where to start working on this. Thanks!
Jianhan
Yes you could use lucene for this, but it may be overkill for your
requirement. If I understand you correctly, all you need to is find
documents which match "any" of the words in your list? Do you need to rank
the results? If not, it's probably easier just to create your own inverted
index of
: What I don't know is how can I make that fieldNorm returns the same value
: for both documents, and at the same time this values is bigger than if the
: query only found one of the words, smaller than finding three of three...
...
: I subclass DefaultSimilarity and set it to IndexSearche
I think we've moved well beyond the point where anyone can offer you
suggestions based purely on a description of hte problem.
As i mentioned in my last post, can you post some code that demonstrates
the problem (ie: writes some arbitrary docs, opens a searcher, does a
query that returns N resul
Hello!
I have an assigment, which will require to search documents for keywords or
keyphrases.
For instance, I have a database of keywords/keyphrases, which might contain
several millions items. Now I need to find if document contains any of the
keywords/phrases listed in that database.
I was th
30 matches
Mail list logo