RE: constant scoring queries

2005-05-10 Thread Robert Engels
I did the nearly the exact same thing in my "derived" Lucene. But in order to limit modifications to the Lucene core, I created a QueryCache class, and have derived versions of Prefix and Range query consult the class, passing in the IndexReader and query to see if there is a cached result. I also

RE: constant scoring queries

2005-05-11 Thread Robert Engels
Would there be anyway to "rewrite" the cached queries as documents are added? By this I mean, if a user runs an "expensive" range query that gets cached, then another user adds a document that should be included in the cached query, the addDocument() method would "update" the cached query. I think

optimized reopen?

2005-05-11 Thread Robert Engels
Is there any way to optimize the closing & reopening of an Index? Since the IndexReader.open() opens a MultiReader is there are multiple segments, it seems a reopen() method could be implemented, which detects which segments are the same as the current open index, and then passes those SegementRea

RE: optimized reopen?

2005-05-11 Thread Robert Engels
that sub-readers can be shared like this right now... the difficulty may lie in deleted-docs. -Yonik On 5/11/05, Robert Engels <[EMAIL PROTECTED]> wrote: > Is there any way to optimize the closing & reopening of an Index? > > Since the IndexReader.open() opens a MultiReader is

RE: optimized reopen?

2005-05-11 Thread Robert Engels
sub-readers can be shared like this right now... the difficulty may lie in deleted-docs. -Yonik On 5/11/05, Robert Engels <[EMAIL PROTECTED]> wrote: > Is there any way to optimize the closing & reopening of an Index? > > Since the IndexReader.open() opens a MultiReader is

RE: One Byte is Seven bits too many? - A Design suggestion

2005-05-22 Thread Robert Engels
I have always thought that the norms should be an interface, rather than fixed, as there are many uses of lucene where norms are not necessary, and the memory overhead is substantial. -Original Message- From: Arvind Srinivasan [mailto:[EMAIL PROTECTED] Sent: Sunday, May 22, 2005 7:05 PM To

RE: major searching performance improvement

2005-05-25 Thread Robert Engels
essage- From: Doug Cutting [mailto:[EMAIL PROTECTED] Sent: Wednesday, May 25, 2005 4:20 PM To: java-dev@lucene.apache.org Subject: Re: major searching performance improvement Robert Engels wrote: > Attached are files that dramatically improve the searching performance > (2x improvement on s

RE: major searching performance improvement

2005-05-25 Thread Robert Engels
It is my understanding of memory mapped files is that the file is assigned an address range in the virtual address space, and using the MMU/paging facilities the file is mapped into that range. Java cannot work with memory pointers directly, so there is at minimum some JNI calls that are made when

RE: Lucene vs. Ruby/Odeum

2005-06-01 Thread Robert Engels
nformation here? Robert Engels -Original Message- From: Daniel Naber [mailto:[EMAIL PROTECTED] Sent: Wednesday, June 01, 2005 5:07 PM To: java-dev@lucene.apache.org Subject: Re: Lucene vs. Ruby/Odeum On Tuesday 17 May 2005 04:41, Otis Gospodnetic wrote: > http://www.zedshaw.com/p

RE: Lucene vs. Ruby/Odeum

2005-06-01 Thread Robert Engels
One more thing, his statement that "why returning 20 documents would perform any better than returning all of them" (paraphrased), shows complete ignorance of proper Lucene usage. R -Original Message----- From: Robert Engels [mailto:[EMAIL PROTECTED] Sent: Wednesday, June 01, 2005

RE: Lucene vs. Ruby/Odeum

2005-06-01 Thread Robert Engels
I think I am going to start a new Blog - "Zed's an Idiot". -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Wednesday, June 01, 2005 6:39 PM To: java-dev@lucene.apache.org Subject: Re: Lucene vs. Ruby/Odeum On Jun 1, 2005, at 6:07 PM, Daniel Naber wrote: > On Tuesd

RE: Lucene vs. Ruby/Odeum

2005-06-01 Thread Robert Engels
y and positive by educating folks in proper Lucene usage and responding in kind regardless of the mistakes, attitudes, or flame- bait we may encounter. Erik On Jun 1, 2005, at 7:48 PM, Robert Engels wrote: > I think I am going to start a new Blog - "Zed's an Idiot". > >

RE: Lucene vs. Ruby/Odeum

2005-06-02 Thread Robert Engels
ore experiments with different JVM's and memory settings: http://www.zedshaw.com/projects/ruby_odeum/odeum_lucene_part2.html On Jun 2, 2005, at 12:27 AM, Robert Engels wrote: > I read all of Zed's posts on the subject and I feel he certainly > presents a > strong anti-Java Most

RE: Lucene vs. Ruby/Odeum

2005-06-02 Thread Robert Engels
nt JVM's and memory settings: http://www.zedshaw.com/projects/ruby_odeum/odeum_lucene_part2.html On Jun 2, 2005, at 12:27 AM, Robert Engels wrote: > I read all of Zed's posts on the subject and I feel he certainly > presents a > strong anti-Java Most definitely an anti-Java

RE: Search deadlocking under load

2005-07-13 Thread Robert Engels
I had posted an NioFile and caching system that greatly increases the parallelness of Lucene. Although on some platforms (Windows), the low-level NioChannel is not completely thread-safe so it can still block, although the code has some work-arounds for this problem. You can never achieve "100% pa

RE: lucene API

2005-08-17 Thread Robert Engels
I think you should leave the API as is, and write a custom XML writer for lucene search results. The request is trivial since you can simple pass the single string. I would not write wrapper beans just to use the built-in serialization support. The custom XML writer will be MUCH, MUCH faster, as y

RE: lucene API

2005-08-18 Thread Robert Engels
ce APIs use a far simpler "consumer" interface. Robert Engels -Original Message- From: Maros Ivanco [mailto:[EMAIL PROTECTED] Sent: Thursday, August 18, 2005 11:40 AM To: java-dev@lucene.apache.org Subject: RE: lucene API -"Robert Engels" <[EMAIL PROTECTED]>

RE: Lucene does NOT use UTF-8.

2005-08-28 Thread Robert Engels
Sorry, but I think you are barking up the wrong tree... and your tone is quite bizarre. My personal OPINION is that your "script" language is an abomination, and anyone that develops in it is clearly hurting the advancement of all software - but that is another story, and doesn't matter much to the

RE: Lucene does NOT use UTF-8.

2005-08-29 Thread Robert Engels
I think the VInt should be the numbers of bytes to be stored using the UTF-8 encoding. It is trivial to use the String methods identified before to do the conversion. The String(char[]) allocates a new char array. For performance, you can use the actual CharSet encoding classes - avoiding all of

RE: Lucene does NOT use UTF-8.

2005-08-30 Thread Robert Engels
That method should easily be changed to public final String readString() throws IOException { int length = readVInt(); return new String(readBytes(length),"UTF-8); } readBytes(0 could reuse the same array if it was large enough. Then only the single char[] is created in the String code. -Ori

RE: Lucene does NOT use UTF-8.

2005-08-30 Thread Robert Engels
I think you guys are WAY overcomplicating things, or you just don't know enough about the Java class libraries. If you use the java.nio.charset.CharsetEncoder class, then you can reuse the byte[] array, and then it is a simple write of the length, and a blast copy of the required number of bytes t

RE: Lucene does NOT use UTF-8.

2005-08-30 Thread Robert Engels
At bit more clarity... Using CharBuffer and ByteBuffer allows for easy reuse and expansion. You also need to use the CharSetDecoder class as well. -Original Message- From: Robert Engels [mailto:[EMAIL PROTECTED] Sent: Tuesday, August 30, 2005 12:40 PM To: java-dev@lucene.apache.org

RE: Lucene does NOT use UTF-8.

2005-08-30 Thread Robert Engels
Not true. You do not need to pre-scan it. When you use CharSet encoder, it will write the bytes to a buffer (expanding as needed). At the end of the encoding you can get the actual number of bytes needed. The pseudo-code is use CharsetEncoder to write String to ByteBuffer write VInt using ByteBu

RE: Lucene does NOT use UTF-8.

2005-08-30 Thread Robert Engels
PROTECTED] Subject: Re: Lucene does NOT use UTF-8. On 8/30/05, Robert Engels <[EMAIL PROTECTED]> wrote: > > Not true. You do not need to pre-scan it. What I previously wrote, with emphasis on key words added: "one has to *either* buffer the entire string, *or* pre-scan it.&

RE: Eliminating norms ... completley

2005-10-07 Thread Robert Engels
I did exactly this in my custom lucene, since the array of a byte per document is extremely wasteful in a lot of applications. I just changed the code to return null from getNorms() and modified the callers to treat a null array as always 1 for any document. -Original Message- From: Chris

RE: Eliminating norms ... completley

2005-10-10 Thread Robert Engels
Doesn't this cause a problem for highly interactive and large indexes? Since every update to the index requires the rewriting of the norms, and constructing a new array. How expensive is the maintining of the norms on disk, at least in regards to index merging? -Original Message- From:

RE: Are Non-consecutive Document IDs feasible?

2005-10-11 Thread Robert Engels
Just add another field to document that is your "external" document identifier, which is what the request is essentially asking for - another layer of indirection between identifiers and physical locations in the index. -Original Message- From: Shane O'Sullivan [mailto:[EMAIL PROTECTED] Se

RE: [jira] Commented: (LUCENE-414) Java NIO patch against Lucene 1.9

2005-10-26 Thread Robert Engels
specified > Environment: Operating System: All > Platform: All > Reporter: Chris Lamprecht > Assignee: Lucene Developers > Attachments: MemoryLRUCache.java, NioFile.java, nio-lucene-1.9.patch > > Robert Engels previously submitted a patch against Lucene 1.4 for a Jav

RE: [jira] Commented: (LUCENE-414) Java NIO patch against Lucene 1.9

2005-10-26 Thread Robert Engels
Message- From: Doug Cutting [mailto:[EMAIL PROTECTED] Sent: Wednesday, October 26, 2005 4:51 PM To: java-dev@lucene.apache.org Subject: Re: [jira] Commented: (LUCENE-414) Java NIO patch against Lucene 1.9 Robert Engels wrote: > The reason for using Nio and not IO is IO requires multiple file hand

RE: bytecount as String and prefix length

2005-10-31 Thread Robert Engels
All of the JDK source is available via download from Sun. -Original Message- From: Marvin Humphrey [mailto:[EMAIL PROTECTED] Sent: Monday, October 31, 2005 6:31 PM To: java-dev@lucene.apache.org Subject: Re: bytecount as String and prefix length I wrote... > I think I'll take a crack at

RE: word count in a doc

2005-10-31 Thread Robert Engels
it is the frequency(). -Original Message- From: jacky [mailto:[EMAIL PROTECTED] Sent: Monday, October 31, 2005 9:49 PM To: java-dev@lucene.apache.org Subject: word count in a doc hi, e.g in test.txt: hello world, hello friend. When i search word "hello", can i get the count 2 by l

RE: Faking index merge by modifying segments file?

2005-11-01 Thread Robert Engels
Problem is the terms need to be sorted in a single segment. -Original Message- From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] Sent: Tuesday, November 01, 2005 1:52 AM To: java-dev@lucene.apache.org Subject: Faking index merge by modifying segments file? Hello, I spent most of today ta

RE: Faking index merge by modifying segments file?

2005-11-01 Thread Robert Engels
The solution we came up with is (I think) a bit better, since it does require any copying of files. Since MultiSegmentReader already does the segment/document # offsetting, and a segment does not change after written, we created a reopen() method that reopens an existing index, (knowing which segm

RE: Faking index merge by modifying segments file?

2005-11-02 Thread Robert Engels
: Faking index merge by modifying segments file? Hello, --- Robert Engels <[EMAIL PROTECTED]> wrote: > Problem is the terms need to be sorted in a single segment. Are you referring to Term Dictionary (.tis and .tii files as described at http://lucene.apache.org/java/docs/fileformats.ht

RE: Put field in database vs. Lucene

2005-11-02 Thread Robert Engels
If you do not put them in Lucene, performing any sort of AND search will be VERY difficult, and/or VERY slow. -Original Message- From: Mario Alejandro M. [mailto:[EMAIL PROTECTED] Sent: Wednesday, November 02, 2005 4:08 PM To: Lucene Developers List Subject: Put field in database vs. Lucen

Capitalized Method Names?

2005-11-25 Thread Robert Engels
I noticed that there a few Capitalized method names in the FastCharStream? Is there a reason for this? It is not according to Java standards.

Re: "Advanced" query language

2005-12-02 Thread Robert Engels
I don't see the value in this. What ever is generating the xml could just as easily create/instantiate the query objects. I would much rather see the query parser migrated to an internal parser (that would be easier to maintain), and develop a syntax that allowed easier use of the most common/p

NioFile cache performance

2005-12-08 Thread Robert Engels
I finally got around to writing a testcase to verify the numbers I presented. The following testcase and results are for the lowest level disk operations. On my machine reading from the cache, vs. going to disk (even when the data is in the OS cache) is 30%-40% faster. Since Lucene makes ext

RE: NioFile cache performance

2005-12-08 Thread Robert Engels
ugh for most applications. I will attempt to get some performance numbers using/not using NioFile performing actual Lucene queries. -Original Message----- From: Robert Engels [mailto:[EMAIL PROTECTED] Sent: Thursday, December 08, 2005 10:37 AM To: Lucene-Dev Subject: NioFile cache perf

RE: NioFile cache performance

2005-12-08 Thread Robert Engels
I modified MemoryLRUCache to use the attached ConcurrentHashMap.java and ran under 1.4.2_10 filesize is 4194304 non-cached time = 11140, avg = 0.01114 non-cached threaded (3 threads) time = 35485, avg = 0.011828 cached time = 6109, avg = 0.006109 cache hits 996138 cache misses 386

RE: NioFile cache performance

2005-12-09 Thread Robert Engels
, 2005 7:07 AM To: java-dev@lucene.apache.org Subject: Re: NioFile cache performance John Haxby wrote: > Robert Engels wrote: > >> Using a 4mb file (so I could be "guarantee" the disk data would be in >> the OS cache as well), the test shows the following results. >

RE: NioFile cache performance

2005-12-09 Thread Robert Engels
y, December 09, 2005 4:24 AM To: java-dev@lucene.apache.org Subject: Re: NioFile cache performance Robert Engels wrote: > Using a 4mb file (so I could be "guarantee" the disk data would be in > the OS cache as well), the test shows the following results. Which OS? If it

RE: "Advanced" query language

2005-12-19 Thread Robert Engels
Why not just write a Quark Storage component based upon lucene, and then you get XPath/Query compliance? and leave the custom (and simple) XML based persistence mechanism for lucene queries as proprietary. -Original Message- From: Joaquin Delgado [mailto:[EMAIL PROTECTED] Sent: Monday, De

RE: JE Directory/XA Transactions

2005-12-29 Thread Robert Engels
I think JE transactions are held completely in memory, so this may be an issue - although I have not reviewed your implementation yet... :) -Original Message- From: Andi Vajda [mailto:[EMAIL PROTECTED] Sent: Thursday, December 29, 2005 5:14 PM To: java-dev@lucene.apache.org Subject: Re: JE

RE: indexreader refresh

2006-01-04 Thread Robert Engels
I proposed and posted a patch for this long ago. Only thing missing would be some sort of reference courting for segments (rather than the 'stayopen' flag). /** * reopens the IndexReader, possibly reusing the segments for greater efficiency. The original IndexReader instance * is closed, a

RE: Save to database...

2006-01-05 Thread Robert Engels
There are impl in the contrib that do not need to retrieve the entire index from the db in order to query (there store blocks of files in the db, instead of blocks on disk). I also developed an implementation that did not use blocks but rather a custom index persistence mechanism. There can be se

RE: Save to database...

2006-01-05 Thread Robert Engels
ncy table and a TermPosition table (at the least)... Has this been done before? -Original Message----- From: Robert Engels [mailto:[EMAIL PROTECTED] Sent: Thursday, January 05, 2006 4:36 PM To: java-dev@lucene.apache.org Subject: RE: Save to database... There are impl in the contrib that do not n

RE: Save to database...

2006-01-05 Thread Robert Engels
;sync' mode would prevent this (at the cost of performance). -Original Message- From: Andi Vajda [mailto:[EMAIL PROTECTED] Sent: Thursday, January 05, 2006 10:32 AM To: java-dev@lucene.apache.org; [EMAIL PROTECTED] Subject: RE: Save to database... On Thu, 5 Jan 2006, Robert Engels w

RE: [jira] Commented: (LUCENE-140) docs out of order

2006-01-10 Thread Robert Engels
Possibly "virus scanner" software interfering with the writing/renaming/copying of the index files??? -Original Message- From: Doug Cutting (JIRA) [mailto:[EMAIL PROTECTED] Sent: Tuesday, January 10, 2006 11:29 AM To: java-dev@lucene.apache.org Subject: [jira] Commented: (LUCENE-140) docs

RE: [jira] Created: (LUCENE-487) Database as a lucene index target

2006-01-11 Thread Robert Engels
Since no code has been posted, I'll just ask the question... Does your implementation use the Blob "seek" functions when reading and writing, or does it read/write the blob in its entirety. If it is the latter, your solution is only acceptable for the smallest of Lucene indexes. If it is the fo

RE: [jira] Created: (LUCENE-487) Database as a lucene index target

2006-01-11 Thread Robert Engels
On 1/11/06, Robert Engels <[EMAIL PROTECTED]> wrote: > > Since no code has been posted, I'll just ask the question... > > Does your implementation use the Blob "seek" functions when reading and > writing, or does it read/write the blob in its entirety. > >

RE: Filter

2006-01-26 Thread Robert Engels
I think the interface I proposed is simpler and handles more cases easily. interface SearchFilter { boolean include(int doc); } It seems your interface requires that the SearchFilter know all of the query results before hand. I am not sure this works well with the partial result sets that Luc

RE: How do I send search query to Multiple search Indexes ?

2006-02-03 Thread Robert Engels
Read the book "Lucene in Action". -Original Message- From: Vikas Khengare [mailto:[EMAIL PROTECTED] Sent: Friday, February 03, 2006 12:14 AM To: java-user@lucene.apache.org; java-dev@lucene.apache.org; java-commits@lucene.apache.org Subject: How do I send search query to Multiple search In

RE: [jira] Created: (LUCENE-529) TermInfosReader and other + instance ThreadLocal => transient/odd memory leaks => OutOfMemoryException

2006-03-22 Thread Robert Engels
There is only a single TermInfoReader per index. In order to share this instance with multiple threads, and avoid the overhead of creating new enumerators for each request, the enumerator for the thread is stored in a thread local. Normally, in a server application, threads are pooled, so new t

RE: [jira] Created: (LUCENE-529) TermInfosReader and other + instance ThreadLocal => transient/odd memory leaks => OutOfMemoryException

2006-03-22 Thread Robert Engels
There was a small mistake - there is a single TermInfoReader per segment. -Original Message- From: Robert Engels [mailto:[EMAIL PROTECTED] Sent: Wednesday, March 22, 2006 11:37 AM To: java-dev@lucene.apache.org Subject: RE: [jira] Created: (LUCENE-529) TermInfosReader and other + instance

query parsing

2006-03-22 Thread Robert Engels
Using lucene 1.4.3, if I use the query +cat AND -dog it parses to +cat -dog and works correctly. If I use (+cat) AND (-dog) it parses to +(+cat) +(-dog) and returns no results. Is this a known issue?

RE: [jira] Created: (LUCENE-529) TermInfosReader and other + instance ThreadLocal => transient/odd memory leaks => OutOfMemoryException

2006-03-22 Thread Robert Engels
his issue locally in the code and it works. Regards Andy -----Original Message- From: Robert Engels [mailto:[EMAIL PROTECTED] Sent: 22 March 2006 17:46 To: java-dev@lucene.apache.org Subject: RE: [jira] Created: (LUCENE-529) TermInfosReader and other + instance ThreadLocal => transient/o

RE: query parsing

2006-03-22 Thread Robert Engels
apache.org Subject: Re: query parsing On Mittwoch 22 März 2006 18:49, Robert Engels wrote: > If I use > > (+cat) AND (-dog) > > it parses to > > +(+cat) +(-dog) > > and returns no results. > > Is this a known issue? Basically yes. QueryParser is known to exhibit stra

RE: [jira] Updated: (LUCENE-529) TermInfosReader and other + instance ThreadLocal => transient/odd memory leaks => OutOfMemoryException

2006-03-23 Thread Robert Engels
Your testcase is invalid. Reduce the size by 10, increase the repeat by 10, (SAME amount of memory use), and it works fine. The reason it works in the one case is that you use new WeakReference(new Arrary()), - since the array cannot be referenced, it is immediately GC'd. You should have notic

RE: [jira] Updated: (LUCENE-529) TermInfosReader and other + instance ThreadLocal => transient/odd memory leaks => OutOfMemoryException

2006-03-23 Thread Robert Engels
The only other thing that may be causing your problem is the use of finalize(). This can interfere with the GC ability to GC objects. I am not sure why the finalize() is used in the Lucene ThreadLocal handling. It doesn't seem necessary to me. -Original Message- From: Robert E

RE: [jira] Updated: (LUCENE-529) TermInfosReader and other + instance ThreadLocal => transient/odd memory leaks => OutOfMemoryException

2006-03-23 Thread Robert Engels
is the same version as in 1.5.0_06 and the issue was initially found using 1.5.0_06. The issue is for real. You can blame ThreadLocal but it does what it says on the tin. Regards Andy -Original Message- From: Robert Engels [mailto:[EMAIL PROTECTED] Sent: 23 March 2006 16:05 To: java-dev

RE: [jira] Updated: (LUCENE-529) TermInfosReader and other + instance ThreadLocal => transient/odd memory leaks => OutOfMemoryException

2006-03-24 Thread Robert Engels
essage----- From: Robert Engels [mailto:[EMAIL PROTECTED] Sent: 23 March 2006 18:50 To: java-dev@lucene.apache.org Subject: RE: [jira] Updated: (LUCENE-529) TermInfosReader and other + instance ThreadLocal => transient/odd memory leaks => OutOfMemoryException The testcase is still not correct - at

RE: Question about RemoteSearchable, RMI and queries in parallel

2006-04-12 Thread Robert Engels
I think you may need a much more advanced design - with change detection, parallel query execution, and index modification. A lot of it depends on you semantics of a search - does it mean at the results are 'almost right' at a moment in time, or are pending index changes made first before any quer

RE: Question about RemoteSearchable, RMI and queries in parallel

2006-04-12 Thread Robert Engels
can new as many IndexSearchers as we need. However given the fact that there is only one RemoteSearchable instance running on Server A, how can I run multiple queries on Server B (different indices) at the same time w/o affecting each other. On 4/12/06, Robert Engels <[EMAIL PROTECTED]> wrote: >

RE: lucene search sentence

2006-04-27 Thread Robert Engels
Ask the question on the lucene users list, not the dev-list. And, Read a book. Read the javadoc. Read the samples. -Original Message- From: Anton Feldmann [mailto:[EMAIL PROTECTED] Sent: Thursday, April 27, 2006 10:05 AM To: java-dev@lucene.apache.org; java-user@lucene.apache.org Subject:

RE: GData, updateable IndexSearcher

2006-04-27 Thread Robert Engels
Doug can you please elaborate on this. I thought each segment maintained its own list of deleted documents (since segments are WRITE ONCE, and when that segment is merged or optimized it would "go away" anyway, as the deleted documents are removed. In my reopen() implementation, I check to see if

RE: 2.0 release

2006-04-27 Thread Robert Engels
What about making IndexReader & IndexWriter interfaces? Or creating interfaces for these (IReader & IWriter?), and making all of the classes use the interfaces? -Original Message- From: Doug Cutting [mailto:[EMAIL PROTECTED] Sent: Thursday, April 27, 2006 5:20 PM To: java-dev@lucene.apache

SegmentReader changes?

2006-04-29 Thread Robert Engels
I think one of two things need to happen to the SegmentReader class. Either make the 'segment' variable protected, or make the the initialize() method protected. Without this, subclassing SegmentReader is impossible, since there is no way for the derived class to know what segment it is working w

RE: GData, updateable IndexSearcher

2006-05-01 Thread Robert Engels
fyi, using my reopen(0 implementation (which rereads the deletions) on a 135mb index, with 5000 iterations open & close time using new reader = 585609 open & close time using reopen = 27422 Almost 20x faster. Important in a highly interactive/incremental updating index. -Original Message---

RE: GData, updateable IndexSearcher

2006-05-01 Thread Robert Engels
Re: GData, updateable IndexSearcher Can you post your code? - Original Message ---- From: Robert Engels <[EMAIL PROTECTED]> To: java-dev@lucene.apache.org; jason rutherglen <[EMAIL PROTECTED]> Sent: Monday, May 1, 2006 11:33:06 AM Subject: RE: GData, updateable IndexSearcher

refresh segments for deleted documents?

2006-05-01 Thread Robert Engels
I implemented the IndexReader.reopen(). My original implementation did not "refresh" the deleted documents, and it seemed to work. The latest impl does re-read the deletions. BUT, on inspecting the IndexReader code, I am not sure this is necessary??? When a document is deleted, IndexReader marks

RE: GData, updateable IndexSearcher

2006-05-01 Thread Robert Engels
: GData, updateable IndexSearcher Thanks for the code and performance metric Robert. Have you had any issues with the deleted segments as Doug has been describing? - Original Message From: Robert Engels <[EMAIL PROTECTED]> To: java-dev@lucene.apache.org; jason rutherglen <[EMAIL

RE: refresh segments for deleted documents?

2006-05-01 Thread Robert Engels
? Robert Engels wrote: > Doug, can you comment on exactly why the 'deletions' need to be re-read? > Doesn't seem necessary to me. A common idiom is to use one IndexReader for searches, and a separate for deletions. For example, one might do something like: 1. Open IndexReade

RE: SegmentReader changes?

2006-05-01 Thread Robert Engels
TECTED] Sent: Monday, May 01, 2006 5:44 PM To: java-dev@lucene.apache.org Subject: Re: SegmentReader changes? Robert Engels wrote: > In implementing the 'reopen()' method SegmentReader needs to be subclassed > in order to support 'refreshing' the deleted documents. Why

RE: SegmentReader changes?

2006-05-01 Thread Robert Engels
ization methods called by the static factory method are private. -Original Message- From: Doug Cutting [mailto:[EMAIL PROTECTED] Sent: Monday, May 01, 2006 6:03 PM To: java-dev@lucene.apache.org Subject: Re: SegmentReader changes? Robert Engels wrote: > Correct - changing SegmentRe

RE: SegmentReader changes?

2006-05-01 Thread Robert Engels
ut warning. I'd love to see a good patch that adds an IndexReader.reopen() method and I hope you are not discouraged from writing one. Doug Robert Engels wrote: > I can submit a patch to add the IndexReader.reopen() method. > > BUT, I think the requested change to SegmentReader

MemoryIndex

2006-05-02 Thread Robert Engels
Along the lines of Lucene-550, what about having a MemoryIndex that accepts multiple documents, then wrote the index once at the end in the Lucene file format (so it could be merged) during close. When adding documents using an IndexWriter, a new segment is created for each document, and then the

Why ThreadLocal?

2006-05-04 Thread Robert Engels
In reviewing the code for bug 436 (http://issues.apache.org/jira/browse/LUCENE-436) Why are we using a ThreadLocal for the enumeration at all? Since terms(), and terms(Term t) return new instances anyway, why not just have them clone the needed data structures? Seems like the code could be much

RE: Multiple threads searching in Lucene and the synchronized issue. -- solution attached.

2006-05-09 Thread Robert Engels
I am interested in the exact performance difference in ms per query removing the synchronized block? I can see that after a while when using your code, the JIT will probably inline the 'non-reading' path. Even then... I would not think that 2 lines of synchronized code would contribute much when

RE: Multiple threads searching in Lucene and the synchronized issue. -- solution attached.

2006-05-09 Thread Robert Engels
I think your basic problem is that you are using multiple IndexSearchers? And creating new instances during runtime? If so, you will be reading the index information far too often. This is not a good configuration. -Original Message- From: yueyu lin [mailto:[EMAIL PROTECTED] Sent: Tuesday,

RE: Multiple threads searching in Lucene and the synchronized issue. -- solution attached.

2006-05-09 Thread Robert Engels
I am fairly certain his code is ok, since it rechecks the initialized state in the synchronized block before initializing. Worst case, during the initial checks when the initialization is occurring there may be some unneeded checking, but after that, the code should perform better since it will ne

RE: Multiple threads searching in Lucene and the synchronized issue. -- solution attached.

2006-05-09 Thread Robert Engels
ing a function local variable, m_indexTerms and in JDK1.5.06, it seems ok. Whether it will break in other environments, I still don't know about it. On 5/10/06, Yonik Seeley <[EMAIL PROTECTED]> wrote: > > On 5/9/06, Robert Engels <[EMAIL PROTECTED]> wrote: > > I am fairl

RE: Multiple threads searching in Lucene and the synchronized issue. -- solution attached.

2006-05-09 Thread Robert Engels
e the byte-codes. When I'm using a function local variable, m_indexTerms and in JDK1.5.06, it seems ok. Whether it will break in other environments, I still don't know about it. On 5/10/06, Yonik Seeley <[EMAIL PROTECTED]> wrote: > > On 5/9/06, Robert Engels <[EMAIL PROT

RE: Multiple threads searching in Lucene and the synchronized issue. -- solution attached.

2006-05-10 Thread Robert Engels
For what its worth... That is not my understanding. My understanding is that volatile just ensures the JIT always accesses the var in order - prevents some compiler optimizations - where as synchronized needs to acquire the lock. (There were discussions regarding having volatile create synchronize

RE: Taking a step back

2006-05-10 Thread Robert Engels
I agree with almost all of what you said. The file format issue whoever is a non-issue. If you want interoperability between systems do it via remote invocation and IIOP, or some HTTP interface. This is far more easier to control, especially through version change cycles - otherwise all platforms

RE: Multiple threads searching in Lucene and the synchronized issue. -- solution attached.

2006-05-10 Thread Robert Engels
synchronized issue. -- solution attached. On 5/10/06, Robert Engels <[EMAIL PROTECTED]> wrote: > I think you could use a volatile primitive boolean to control whether or not > the index needs to be read, and also mark the index data volatile and it > SHOULD PROBABLY work. No, that sti

RE: Taking a step back

2006-05-10 Thread Robert Engels
What about the case where a "bug" is found that necessitates a file format change. Obviously this should be VERY rare given adequate testing, but it seems difficult to make a hard and fast rule that X.0 should be able to ALWAYS read X.N. -Original Message- From: Doug Cutting [mailto:[EMA

RE: Taking a step back

2006-05-11 Thread Robert Engels
Exactly. If people don't get the REAL value of Java by now, they are probably not going to ever get it. Weighing ALL of the pros/cons, developing modern software in anything else is just silly. But, arguing this is akin to discussing religion... -Original Message- From: Doug Cutting [mailt

RE: Taking a step back

2006-05-11 Thread Robert Engels
ursday, May 11, 2006 12:08 PM To: java-dev@lucene.apache.org Subject: Re: Taking a step back On May 10, 2006, at 8:02 AM, Robert Engels wrote: > The file format issue whoever is a non-issue. If you want > interoperability > between systems do it via remote invocation and IIOP, or some

RE: Taking a step back

2006-05-11 Thread Robert Engels
r thing that you are describing, I believe. Otis - Original Message From: Robert Engels <[EMAIL PROTECTED]> To: java-dev@lucene.apache.org Sent: Thursday, May 11, 2006 1:37:17 PM Subject: RE: Taking a step back I disagree with that a bit. I have found that certain languages len

RE: LUCENE-436

2006-05-12 Thread Robert Engels
As stated many times, it is SIGNIFICANT if using RAMdirectories to hold entire indexes. If not, then it is not such a big deal. Rather than using FixedThreadLocal, a more involved solution using a runtime property to determine which thread local impl to use is possible. In lieu of that, RAMDirecto

RE: Lucene Index comparison..

2006-05-12 Thread Robert Engels
I think more detail is in needed. -Original Message- From: Krishnan, Ananda [mailto:[EMAIL PROTECTED] Sent: Thursday, May 11, 2006 10:46 PM To: java-dev@lucene.apache.org Cc: [EMAIL PROTECTED] Subject: RE: Lucene Index comparison.. Hi i will explain about my problem a bit more in detail

RE: LUCENE-436

2006-05-12 Thread Robert Engels
esn't seem to have helped afterall.. Robert Engels wrote: > As stated many times, it is SIGNIFICANT if using RAMdirectories to hold > entire indexes. If not, then it is not such a big deal. > > Rather than using FixedThreadLocal, a more involved solution using a runtime > propert

Nio File Caching & Performance Test

2006-05-12 Thread Robert Engels
I finally got around to making the NioFSDirectory with caching 1.9 compliant. I also produced a performance test case. Below is the results on my machine: read random = 586391 read same = 68578 nio read random = 72766 nio max mem = 203292672 nio memory = 102453248 nio hits = 14974713 nio misses =

RE: Nio File Caching & Performance Test

2006-05-15 Thread Robert Engels
To: java-dev@lucene.apache.org Subject: Re: Nio File Caching & Performance Test On May 12, 2006, at 3:38 PM, Robert Engels wrote: > I finally got around to making the NioFSDirectory with caching 1.9 > compliant. I also produced a performance test case. How does this implementation comp

RE: Nio File Caching & Performance Test

2006-05-16 Thread Robert Engels
Doug Cutting wrote: > Robert Engels wrote: > >> The most important statistic is that the reading via the local cache, vs. >> going to the OS (where the block is cached) is 3x faster (22344 vs. >> 68578). >> With random reads, when the block may not be in the OS cache, it

FieldsReader synchronized access vs. ThreadLocal ?

2006-05-16 Thread Robert Engels
In SegmentReader, currently the access to FieldsReader.doc(n) is synchronized (which is must be). Does it not make sense to use a ThreadLocal implementation similar to the TermInfosReader? It seems that in a highly multi-threaded server this synchronized method could lead to significant blocking

RE: Nio File Caching & Performance Test

2006-05-16 Thread Robert Engels
sage From: Yonik Seeley <[EMAIL PROTECTED]> To: java-dev@lucene.apache.org; [EMAIL PROTECTED] Sent: Tuesday, 16 May, 2006 6:10:07 PM Subject: Re: Nio File Caching & Performance Test On 5/16/06, Robert Engels <[EMAIL PROTECTED]> wrote: > SO, I would like to use a memory mapped

non indexed field searching?

2006-05-16 Thread Robert Engels
I know I've (and others have brought this up before), but maybe now with the lazy field loading (seemingly due to larger documents being stored) it is time to revisit. It seems that maybe a query could be separated into Filter and Query clauses (similar to how the query optimizer works in Nutch).

RE: Hacking Luke for bytecount-based strings

2006-05-16 Thread Robert Engels
While you're at it, why not rewrite Luke in Perl as well... Seems like a great use of your time. -Original Message- From: Marvin Humphrey [mailto:[EMAIL PROTECTED] Sent: Tuesday, May 16, 2006 11:36 PM To: java-dev@lucene.apache.org Cc: Andrzej Bialecki Subject: Hacking Luke for bytecount-

  1   2   3   4   5   6   >