Hi,
With Limo 0.5 , can i find out if certain word from some Document is indexed or
not?
With Regards,
Chandrashekhar V Deshmukh
On Monday 22 November 2004 05:02, Kauler, Leto S wrote:
Hi Lucene list,
We have the need for analysed and 'not analysed/not tokenised' clauses
within one query. Imagine an unparsed query like:
+title:Hello World +path:Resources\Live\1
In the above example we would want the first clause
Kauler, Leto S writes:
Would anyone have any suggestions on how this could be done? I was
thinking maybe the QueryParser would have to be changed/extended to
accept a separator other than colon :, something like = for example
to indicate this clause is not to be tokenised.
I suggested
On Nov 22, 2004, at 12:36 AM, Luke Francl wrote:
Well that really depends on how big your index is and what they search
for, now doesn't it? ;)
Everything is relative.
-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED]
Sent: Sun 11/21/2004 2:52 PM
To: Lucene Users List
On Nov 22, 2004, at 2:56 AM, Morus Walter wrote:
Kauler, Leto S writes:
Would anyone have any suggestions on how this could be done? I was
thinking maybe the QueryParser would have to be changed/extended to
accept a separator other than colon :, something like = for
example
to indicate this
Erik Hatcher writes:
If your query isn't entered by users, you shouldn't use query parser in
most cases anyway.
I'd go even further and say in all cases.
If you use lucene as a search server you have to provide the query somehow.
E.g. we have an php application, that sends queries to a
Hi,
(First of all : what is the plurial of index in english ; indexes or indices
?)
I want to search into several indexes (indices ?).
For that, I parse a new query using QueryParser or MultiFieldQueryParser.
Then I search my indexes using the MultiSearcher class.
Ok, but the
PDF(s) can definitely slow things down, depending on their size.
If there are a few larger PDF documents that time is definitely possible.
Luke
- Original Message -
From: Miguel Angel [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Saturday, November 20, 2004 11:25 AM
Subject: How much
As I understand it optimization is when you merge several segments into one
allowing for faster queries.
The FAQs and API have further details.
http://lucene.sourceforge.net/cgi-bin/faq/faqmanager.cgi?file=chapter.indexingtoc=faq#q24
Luke
- Original Message -
From: Miguel Angel [EMAIL
On Nov 22, 2004, at 9:18 AM, Cocula Remi wrote:
(First of all : what is the plurial of index in english ; indexes or
indices ?)
We used indexes in Lucene in Action. Its a bit ambiguous in English,
but indexes sounds less formal and is acceptable.
For that, I parse a new query using QueryParser
Hi,
I had requested help on an issue we have been facing with the Too many
open files Exception garbling the search indexes and crashing the
search on the web site.
As a suggestion, you had asked us to look at the articles on O'Reilly
Network which had specific context around this exact
On Nov 22, 2004, at 9:17 AM, Morus Walter wrote:
Erik Hatcher writes:
If your query isn't entered by users, you shouldn't use query parser
in
most cases anyway.
I'd go even further and say in all cases.
If you use lucene as a search server you have to provide the query
somehow.
E.g. we have an
On Mon, 2004-11-22 at 02:27, Chandrashekhar wrote:
Hi,
With Limo 0.5 , can i find out if certain word from some Document is indexed
or not?
This feature doesn't exist as such.
You could search for it and if results come up, then the word is in the
documents it returns.
I'll add
If you are going to compare scores across multiple indices, I'd suggest
considering one of the patches here:
http://issues.apache.org/bugzilla/show_bug.cgi?id=31841
Chuck
-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED]
Sent: Monday, November 22, 2004 6:30 AM
I did following test:
I created the RAM folder on my Red Hat box and copied c. 1Gb of indexes
there.
I expected the queries to run much quicker.
In reality it was even sometimes slower(sic!)
Lucene has it's own RAM disk functionality. If I implement it, would it
bring any benefits?
Thanks in
For the Lucene book I wrote some test cases that compare FSDirectory
and RAMDirectory. What I found was that with certain settings
FSDirectory was almost as fast as RAMDirectory. Personally, I would
push FSDirectory and hope that the OS and the Filesystem do their share
of work and caching for
Hello again,
I've modified DateFilter to filter out document IDs as suggested. All seems to
be running well until I tried a specific test case. All my documents have IDs
in the 400,000 range. If I set my lower limit to 5, nothing comes back. After
examining the code, I found the issue to be at
It sounds like you need to pad your numbers with leading zeroes, i.e.
use the same type of encoding as is required by RangeQuery's. If you
query with 05 instead of 5 do you get what you expect? If all your
document id's are fixed length, then string comparison will be
isomorphic to integer
Is there a way to auto-generate uid in Lucene? Even it is just a way to
query the highest uid and let the application add one to it will do.
Thanks.
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail:
Hi folks:
Is there an indexing benchmark somewhere? I see a search
benchmark on the lucene home site.
Thanks
-John
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
What would the purpose of an auto-generated UID be?
But no, Lucene does not generate UID's for you. Documents are numbered
internally by their insertion order. This number changes, however,
when documents are deleted in the middle and the index is optimized.
Erik
On Nov 22, 2004, at
If you are on linux the number of file handles for a session is much lower than
that for the whole machine. ulimit -n will tell you. There are instructions
on the web for changing this setting, it involves the /etc/security/limits.conf
and setting the values for nofile.
(bulkadm is my user)
According to the Lucene homepage, Lucene 1.4.2 was released
on October 1, 2004
However, the dist on www.apache.org does not have a copy of
Lucene 1.4.2
http://www.apache.org/dist/jakarta/lucene/binaries/
Where can I download Lucene 1.4.2?
-Sean
In the same Lucene News section where the announcement about 1.4.2 is
listed, there is a link that says Binary and source distributions are
available here. ...
http://cvs.apache.org/dist/jakarta/lucene/v1.4.2/
I got really confused yesterday after I already had the binary version
and i was
In my test, I have 12900 documents. Each document is small, a few
discreet fields (KeyWord type) and 1 Text field containing only 1
sentence.
with both mergeFactor and maxMergeDocs being 1000
using RamDirectory, the indexing job took about 9.2 seconds
not using RamDirectory, the indexing job
Otis Gospodnetic wrote:
For the Lucene book I wrote some test cases that compare FSDirectory
and RAMDirectory. What I found was that with certain settings
FSDirectory was almost as fast as RAMDirectory. Personally, I would
push FSDirectory and hope that the OS and the Filesystem do their share
Otis Gospodnetic wrote:
For the Lucene book I wrote some test cases that compare FSDirectory
and RAMDirectory. What I found was that with certain settings
FSDirectory was almost as fast as RAMDirectory. Personally, I would
push FSDirectory and hope that the OS and the Filesystem do their share
Click the here link on the Lucene home page.
We Lucene committers have been very very lame and have not published
the binary distribution appropriately for the mirrors to pick up. One
of these days we'll correct this, but for now you can click the link
from the announcement on the home page.
Just to clarify. I have a Field 'uid' those value is an unique integer. I
use it as a key to the document stored externally. I don't mean Lucene's
internal document number.
I was wonder if there is a method to query the highest value of a field,
perhaps something like:
On Nov 22, 2004, at 4:39 PM, aurora wrote:
Just to clarify. I have a Field 'uid' those value is an unique
integer. I use it as a key to the document stored externally. I don't
mean Lucene's internal document number.
I was wonder if there is a method to query the highest value of a
field,
Just to clarify. I have a Field 'uid' those value is an unique integer.
I use it as a key to the document stored externally. I don't mean
Lucene's internal document number.
I was wonder if there is a method to query the highest value of a
field, perhaps something like:
Not exactly sure what you're trying to do. You can easily generate a number
when you index each Document and insert it in a uid field (which is, BTW, what
I do), and if you base it on a timestamp plus some characteristic of the
document (which is also what I do), it should always be unique.
I'm sorry, I wasn't involved in the original conversation but maybe I
can jump in with some info that will help.
The number of files depends on the merge factor, number of segments, and
number of indexed fields in your index. It also depends on whether you
are using compound files or not (this
A useful resource for increasing the number of file handles on various
operating systems is the Volano Report:
http://www.volano.com/report/
I had requested help on an issue we have been facing with the Too many
open files Exception garbling the search indexes and crashing the
search on the
It seems that when compared to other datastores that Lucene starts to
fall down. For example lucene doesn't perform online index
optimizations so if you add 10 documents you have to run optimize()
again and this isn't exactly a fast operation.
I'm wondering about the potential for a generic
Is it possible to combine Lucene and multi-dimensional scaling in some way?
(NOTE: numbers in [] indicate Footnotes)
I'm rather new to Lucene (and this list), so if I'm grossly
misunderstanding things, forgive me.
One of my main needs as I investigate Search technologies is to restrict
results based on Ranges of numeric values. Looking over the archives of
this list,
Of course, not only did I manage to forget to include the attachment, but
when I sent a reply with the code, mail.apache.org rejected it because it
was a ZIP file.
So let's see how mail.apache.or feels about 6 seperate text files.
: Date: Mon, 22 Nov 2004 18:25:24 -0800 (PST)
: Subject:
38 matches
Mail list logo