Hi all!
I'm having a problem with searching dates. I created two documents with the
same date, 08/27/2002, in a lastModified field and then try and search a
range lastModified:[20020827 TO 20020827] (Other, wider ranges, don't seem
to help). My understanding is that this should return my two doc
I did a quick search and it looks like you can pick up
the Java JVM from IBM at no cost. They say it passes
Sun's compatibility tests. They have versions 1.3.1
and 1.4.1 on the site:
http://www-106.ibm.com/developerworks/java/jdk/
-Tom
--- Tomcat Programmer <[EMAIL PROTECTED]> wrote:
>
> I s
I saw an article from IBM somewhere, talking about how
you go about giving options to the JVM to use all the
non-reserved memory segments (on AIX which has
segmented memory) and this would allow more than a 2GB
heap. The point of that statement is that it sounds
like IBM's JVM can do it. I'm not
Hi Eric,
Thanks for the replies, and your consideration on this
problem. In my case, I use the non-static method
because I want to set some properties (most
importantly the default operator to AND) for the query
parser. Looking at the code snip provided, I guess the
only thing the query parser
Are there ways to build transient indexes in memory in
less than 1 second from the first query results?
--- petite_abeille <[EMAIL PROTECTED]> wrote: >
> On Nov 13, 2003, at 22:32, Jie Yang wrote:
>
> > I am trying to optimse the 500 OR
> > terms so that it does not do a full 2 millions
> docs
On Nov 13, 2003, at 22:32, Jie Yang wrote:
I am trying to optimse the 500 OR
terms so that it does not do a full 2 millions docs
search but on the 1000 returned.
Would it be beneficial to move the first result set into its own
(transient) index to perform the second part of your query?
PA.
the one by NaturalBridge might, but it is not cheap.
Herb...
-Original Message-
From: Doug Cutting [mailto:[EMAIL PROTECTED]
Sent: Thursday, November 13, 2003 4:57 PM
To: Lucene Users List
Subject: Re: Two possible solutions on Parallel Searching
I don't know of a Java implementation wh
Jie Yang wrote:
In this case, probably using a single RAMDirectory
would allow me to run parallel searching without worry
about disk access. Well, anyone tried to have a
RAMDirectory of 5G in size?
I don't know of a Java implementation which lets you have a heap larger
than 2GB. In my experience,
Jie Yang wrote:
--- Erik Hatcher <[EMAIL PROTECTED]> wrote:
Well, not quite, User normally enters a search string
A that normally returns 1000 out of 2 millions docs. I
then append A with 500 OR conditions... A AND (B or C
or ... or x500).
Are you adding the same 500 terms to each query? Or even
In this case, probably using a single RAMDirectory
would allow me to run parallel searching without worry
about disk access. Well, anyone tried to have a
RAMDirectory of 5G in size?
--- Otis Gospodnetic <[EMAIL PROTECTED]>
wrote: > Multiple threads against the same index or
multiple
> indices - n
you're doing TREC-style query expansion using automatic relevance feedback?
Herb
-Original Message-
From: Jie Yang [mailto:[EMAIL PROTECTED]
Sent: Thursday, November 13, 2003 4:33 PM
To: Lucene Users List
Subject: Re: Query Filters on term A in query "A AND (B OR C OR D)"
Well, not
--- Erik Hatcher <[EMAIL PROTECTED]> wrote:
> Are we talking about that query being entered by the
> user and you
> handing it just like that to QueryParser? If so,
> then QueryFilter
> won't help.
Well, not quite, User normally enters a search string
A that normally returns 1000 out of 2 mill
How challenging would it be to add something to QueryParser to allow you to specify
that you want to use filters?
I have a similiar case where I do a search for term1 AND term2 AND
links:http???www?url?com?dir*
If lucene would use order of operations or in some way do the first two searches
fi
Multiple threads against the same index or multiple indices - no
advantage - think about the mechanical parts involved (disk head).
Multiple threads against indices on different disks (not just
paritions!) - yes, that would be faster.
Reading the index from the disk is the bottleneck, not the CPU
On Thursday, November 13, 2003, at 04:07 PM, Jie Yang wrote:
Erik, Just to make sure I understand you right, In an
example query: ZipCode:CA10927 AND Gender:Male
Are we talking about that query being entered by the user and you
handing it just like that to QueryParser? If so, then QueryFilter
--- Erik Hatcher <[EMAIL PROTECTED]> wrote:
> On Thursday, November 13, 2003, at 03:28 PM, Dan
> Quaroni wrote:
> > To my knowledge the answer is No, lucene performs
> each query
> > separately and
> > then performs the joins after it has all the
> results. This is
> > actually a
> > rather se
I guess I was wrong, then... But I have 262 indexes with a combined 130 or
so million documents and at times the memory usage for a single query
exceeds 1.3 gigs with me only taking the top 25 of the hits. We pushed the
jvm to 1.6 gigs and it seems to be doing OK, but if it's not the results
from
Dan Quaroni wrote:
name:Bob's Discount Furniture AND state:California AND city:San Diego
Now, that query is going to retrieve EVERY Bob's discount furniture, EVERY
company in California, and EVERY city in San Diego and then join them. That
makes the memory requirements for this query far higher t
multiple threads on a single disk will likely results in significantly slower
searching, possibly an order of magnitude or more slowdown depending on many factors
such as available RAM, etc.
Herb...
-Original Message-
From: Jie Yang [mailto:[EMAIL PROTECTED]
Sent: Thursday, November 13,
On Thursday, November 13, 2003, at 03:28 PM, Dan Quaroni wrote:
To my knowledge the answer is No, lucene performs each query
separately and
then performs the joins after it has all the results. This is
actually a
rather serious problem when it comes to searches in large indexes
where a
single
To my knowledge the answer is No, lucene performs each query separately and
then performs the joins after it has all the results. This is actually a
rather serious problem when it comes to searches in large indexes where a
single field is very important but has a very low uniqueness.
For example,
Can anyone clarify a bit more ont he issue below? I
don't seems can find out any hints in this list.
Much Thanks..
> > Again, I still feel a bit curious and want to find
> > out does lucene do (or in the future) pre-filter
> > on "AND join conditions". For example, A AND (B OR
> > C OR D). if
--- Doug Cutting <[EMAIL PROTECTED]> wrote: > First,
note that the approaches you describe will
> only improve
> performance if you have multiple CPUs and/or
> multiple disks holding the
> indexes.
>
> Second, MultiSearcher is currently implemented to
> search indexes
> serially, not each in a
On Thu, Nov 13, 2003 at 10:18:39AM -0800, Doug Cutting wrote:
> Dror Matalon wrote:
> >In there a reason why RODirectory shouldn't just be rolled into lucene?
> >
> >http://www.csita.unige.it/software/free/lucene/
>
> This just looks like a version of FSDirectory with lock files disabled.
> I th
Dror Matalon wrote:
In there a reason why RODirectory shouldn't just be rolled into lucene?
http://www.csita.unige.it/software/free/lucene/
This just looks like a version of FSDirectory with lock files disabled.
I think it would be better to just make it easier to disable lock
files. Currently
William W wrote:
If I have two indexes and use the MultiSearcher will it be faster than
only one index with all the documents ?
No, in fact it would be slower. However it could be faster if (a)
someone contributes a parallel version of MultiSearcher and (b) you're
either running on a multiple-
On Nov 13, 2003, at 19:00, Dror Matalon wrote:
I've been experimenting with it and it seems to work as advertised. It
has the advantage of not requiring *any* write capability in /tmp or
anywhere else.
There is a system property to turn off the lock files altogether.
PA.
In there a reason why RODirectory shouldn't just be rolled into lucene?
http://www.csita.unige.it/software/free/lucene/
I've been experimenting with it and it seems to work as advertised. It
has the advantage of not requiring *any* write capability in /tmp or
anywhere else.
Regards,
Dror
On T
Hi Folks,
If I have two indexes and use the MultiSearcher will it be faster than only
one index with all the documents ?
Thanks,
William.
From: Doug Cutting <[EMAIL PROTECTED]>
Reply-To: "Lucene Users List" <[EMAIL PROTECTED]>
To: Lucene Users List <[EMAIL PROTECTED]>
Subject: Re: Two possible
Kevin A. Burton wrote:
When I first read this changelog entry:
> 2. Changed file locking to place lock files in
>System.getProperty("java.io.tmpdir"), where all users are
>permitted to write files. This way folks can open and correctly
>lock indexes which are read-only to them.
I
First, note that the approaches you describe will only improve
performance if you have multiple CPUs and/or multiple disks holding the
indexes.
Second, MultiSearcher is currently implemented to search indexes
serially, not each in a separate thread. To implement multi-threaded
searching one c
I had a thought on my earlier post on "Poor
Performance when searching for 500+ terms".
The problem is on how to improve the performance when
searching for 500+ OR search terms. i.e. enter a
search string of :
W1 OR W2 OR W3 OR .. OR w500.
I thought I could rewrite the MultiSearcher class s
> I am not using RAMDirectory due to the large size of
> index file. the index generated on hard disc is 1.57G
> for 1 million documents, each document has average 500
> terms. I am using Field.UnStored(fieldName, terms), so
> i beliece I am not storing the documents, just the
> index. (is that rig
I suggest checking the list archive. Doug has explained the reasons
behind the current design several times.
Otis
--- "Wilton, Reece" <[EMAIL PROTECTED]> wrote:
> I agree it's a bit of a strange design.
>
> It seems that there should be one class that handles all
> modifications
> of the index
Because Lucene has to first find the segment that the specified
document is in, and this is done via IndexReaders, not IndexWriters.
More about this in the Lucene book.
Otis
--- Dror Matalon <[EMAIL PROTECTED]> wrote:
> Which begs the question: why do you need to use an IndexReader rather
> tha
No, sorry.
Otis
--- Ralf Bierig <[EMAIL PROTECTED]> wrote:
> Does Lucene implement Latent Semantic Indexing? Examples?
>
> Ralf
>
> --
> NEU FÜR ALLE - GMX MediaCenter - für Fotos, Musik, Dateien...
> Fotoalbum, File Sharing, MMS, Multimedia-Gruß, GMX FotoService
>
> Jetzt kostenlos anmelden
Lucene does not implement vector space model.
Otis
--- [EMAIL PROTECTED] wrote:
> Hi,
>
> does Lucene implement a Vector Space Model? If yes, does anybody have
> an
> example of how using it?
>
> Cheers,
> Ralf
>
> --
> NEU FÜR ALLE - GMX MediaCenter - für Fotos, Musik, Dateien...
> Fotoalbum
Thanks Julian
I am not using RAMDirectory due to the large size of
index file. the index generated on hard disc is 1.57G
for 1 million documents, each document has average 500
terms. I am using Field.UnStored(fieldName, terms), so
i beliece I am not storing the documents, just the
index. (is that
On Nov 11, 2003, at 21:02, Bruce Ritchie wrote:
Just a note the LSI is encumbered by US patents 4,839,853 and
5,301,109. It would be wise to make sure that any implementation is
either blessed by the patent holders or does not infringe on the
patents.
Since when did developers turn into armchai
On Thursday, November 13, 2003, at 07:16 AM, Hackl, Rene wrote:
Yes and yes. Users range from Information Professionals to "naive" end
users.
If there's a string like "N-(t-Butyl)-N-(3,5-dinitrobenzoyl)-nitroxyl"
users
can be expected to search for "dinitro", "3,5-dinitro", "nitrobenz"
etc.
Each
On Nov 13, 2003, at 15:09, Thomas Krämer wrote:
i am not familiar with intelectual property law, but it sounds
somewhat strange to me, that it is possible to patent an abstract idea
of hom extracting information from data.
The process of "Spreading Cream Cheese On Bagels" (C) (R) (TM) has
been
Hi Bruce,
i am not familiar with intelectual property law, but it sounds somewhat
strange to me, that it is possible to patent an abstract idea of hom
extracting information from data.
i can understand, that it is forbidden to reuse/modify sourcecode of a
given implementation of lsi, but why s
i suggest that you use a special tokenizer that breaks chemical names into their
constituent parts and index them as if they were words.
Herb
-Original Message-
From: Hackl, Rene [mailto:[EMAIL PROTECTED]
Sent: Thursday, November 13, 2003 7:17 AM
To: 'Lucene Users List'
Subject: Re:
>> documents contain very long strings for chemical substances, users are
>> interested in certain parts of the string e.g. find all documents that
>> comprise "*foo*" be it "1-foo-bar" or "rab-oof-13-foonyl-naphthalene").
> So you're saying you want users to be able to search for "of-13" and
> m
On Wednesday, November 12, 2003, at 11:52 PM, Tomcat Programmer wrote:
When using the QueryParser class, the parse method
will throw a TokenMgrError when there is a syntax
error even as simple as a missing quote at the end of
a phrase query. According to the javadoc, you should
never see this clas
Hello,
Since there are a lot of Term objects in your Query, your application must
spend a lot of time collecting information about those Terms.
1/ Do you use RAMDirectory? Loading the whole Directory into memory will
increase speed - your index must not be too big though
2/ You are probably not
On Thursday, November 13, 2003, at 03:22 AM, Hackl, Rene wrote:
documents contain very long strings for chemical substances, users are
interested in certain parts of the string e.g. find all documents that
comprise "*foo*" be it "1-foo-bar" or "rab-oof-13-foonyl-naphthalene").
So you're saying you
> If you can figure out how to tell Lucene what the parts of strings are
> when you create the index, it should be easy to do this.
Well, sometimes different kinds of brackets, hyphens and interpunctation
signs
would inherently belong to strings, sometimes not. The whole collection as
such
is ra
On Thu, Nov 13, 2003 at 09:22:57AM +0100, Hackl, Rene wrote:
> Hi John,
>
> Indeed, the RCO index is ok for prefix-style wildcards. But it doesn't work
> for _simultaneous_ left and right truncation ("*oba*"). I have no idea about
> how often this kind of search is actually employed, but in this p
Hi John,
Indeed, the RCO index is ok for prefix-style wildcards. But it doesn't work
for _simultaneous_ left and right truncation ("*oba*"). I have no idea about
how often this kind of search is actually employed, but in this particular
context it is really needed (I sketched this before on this l
50 matches
Mail list logo