Hi,
Thanks for your suggestion. I thought about the same, but somehow it didn't
seem like such a good idea... Now that I think about it, it would take the same
IO load (in terms of flushing many megabytes to disk) as optimizing in memory
with the FSDirectory.
Another weird thing we observed
Can phrase queries be nested the same way boolean queries can be nested?
I want a user query to be translated into a boolean query (say, x AND
(y OR z)), and I want those terms to be within a certain distance of
each other (approximately within the same sentence, so the slop would
be
On Apr 6, 2006, at 8:47 AM, Michael Dodson wrote:
Can phrase queries be nested the same way boolean queries can be
nested?
Yes... using SpanNearQuery instead of PhraseQuery.
I want a user query to be translated into a boolean query (say, x
AND (y OR z)), and I want those terms to be
The XMLQueryParser in the contrib section also handles
Spans (as well as a few other Lucene queries/filters
not represented by the standard QueryParser).
Here's an example of a complex query from the JUnit
test
?xml version=1.0 encoding=UTF-8?
SpanOr fieldName=contents
SpanNear slop=8
Hello,
How can I configure Lucene to handle numeric range searches? (This question
has been asked 100 times, I'm sure.)
I've tried the suggestions on the SearchNumericalFields wiki page. This
seems to work for simple queries. Searching for line:[1 to 10] gives me
lines 1 thru 10 of the
Hi,
Just wondering if there is anyway to search two indexes with relations
like in the relational database. For example, in index1 there are fields
pid and content. in index2 there are fields cid, record, and
pid. I want to search keyword1 in content and keyword2 in record and
they should
I wrote:
It looks like StopAnalyzer tokenizes by letter, and doesn't handle
apostrophes. So, the input I don't know produces these tokens:
don
t
know
Is that right?
It's not right. StopAnalyzer does tokenize letter by letter, but 't'
is a stopword, so the tokens are:
Hi
We are in the process of upgrading Lucene from 1.2 to 1.9.
There used to be 2 methods in DateField.java in 1.2
public static String MIN_DATE_STRING()
public static String MAX_DATE_STRING()
This basically gave the minimum and the maximum dates we could index
Hi
I need to access min and max values of a particular field in the index, as
soon as a searcher is initialized. I don't need it later. Looking at old
newsgroup mails, I found a few recommendations.
One was to keep the min and max fields external to the index. But this will
not work
Ideally, I'd love to see an article explaining both in detail: the index
structure as well as the merge algorithm...
From: Prasenjit Mukherjee [mailto:[EMAIL PROTECTED]
Sent: Tue 3/28/2006 11:57 PM
To: java-user@lucene.apache.org
Subject: Data structure of a
On Donnerstag 06 April 2006 19:50, John Smith wrote:
I have not drilled down into the implementation details too much, but
what was the reason for getting rid of these methods in Lucene 1.9?
There is no limit on the given dates in DateTools (within the limits of
what Java's Calendar/Date
I firmly believe that clustering support should be a part of Lucene. We've
tried implementing it ourselves and so far have been unsuccessful. We tried
storing Lucene indices in a database that is the back-end repository for our
app in a clustered environment and could not overcome the
Seeing this worries me we'll see users creating XML strings, then
parsing them to get the desired query. I've seen this lots with
QueryParser, but it would be even more gross to see folks do this
with the XML syntax. So, here's my community service message for the
day if you're
Thank you
JS
--- Yonik Seeley [EMAIL PROTECTED] wrote:
On 4/6/06, John Smith [EMAIL PROTECTED]
wrote:
// inherit javadocs
public String[] getStrings (IndexReader reader,
String field)
The string array I get back, is it guaranteed
that the first non-null value I encounter in
What about using lucene just for searching (i.e., no stored fields
except maybe one ID primary key field), and using an RDBMS for
storing the actual documents? This way you're using lucene for what
lucene is best at, and using the database for what it's good at. At
least up to a point -- RDBMSs
I think it's a good idea. For an enterprise-level application, Lucene appears
too file-system and too byte-sequence-centric a technology. Just my opinion.
The Directory API is just too low-level.
I'd be OK with an RDBMS-based Directory implementation I could take and use.
But generally, I
Hi all,
Im still new to Lucene. I'm in the last year of my bachelor degree in
Computer Science. My final thesis is about indexing and searching in Lucene
1.4.3. I've read about Space Optimizations for Total Ranking paper. My main
question is :
1.What search
Dear all
I got a java.lang.NullPointerException at
java.io.StringReader.init(StringReader.java:33) error when processing the
following code:
for (int i = 0; i theHits.length(); i++)
{
Document doc = theHits.doc(i);
String contents = doc.get(contents) ;
TokenStream tokenStream =
Hi -
Is there a fast way (not easy, but speedy) of getting the count of
documents that match a query?
I need the count, and don't need the docs at this point. If I had a
simple query, (e.g. book) I can use docFreq(), and it's lightning
fast. If I just run it as a query it's much slower. I'm
: I need the count, and don't need the docs at this point. If I had a
: simple query, (e.g. book) I can use docFreq(), and it's lightning
: fast. If I just run it as a query it's much slower. I'm just
: wondering if I did a custom scorer / similarity / hitcollector, how
: much faster than a query
Fisheye wrote:
HashSet terms = new HashSet();
query.rewrite(reader).extractTerms(terms);
Ok, but this delivers every term, not just a list of words the Levenshtein
algorithm produced with similarity.
I asked a similar thing in the past about term highlighting in general,
Marvin Humphrey wrote:
I wrote:
It looks like StopAnalyzer tokenizes by letter, and doesn't handle
apostrophes. So, the input I don't know produces these tokens:
don
t
know
Is that right?
It's not right. StopAnalyzer does tokenize letter by letter, but 't' is
a stopword, so
On Apr 6, 2006, at 4:23 PM, Daniel Noll wrote:
Marvin Humphrey wrote:
I wrote:
It looks like StopAnalyzer tokenizes by letter, and doesn't
handle apostrophes. So, the input I don't know produces these
tokens:
don
t
know
Is that right?
It's not right. StopAnalyzer does
Hi all.
I have a situation where a Document is constructed with a bunch of
strings and a couple of readers. An error may occur while reading from
the readers, and in these situations, we want to remove the reader and
then try to index the same document again.
I've made a test case which
24 matches
Mail list logo