Re: Nasty NIO behavior makes NIOFSDirectory silently close channel

2010-01-29 Thread Jason Rutherglen
Defaulting NIOFSDir could account for some of the recent speed improvements users have been reporting in Lucene 2.9. So removing it as a default could reverse those and people could then report Lucene 3.X has slowed... On Thu, Jan 28, 2010 at 5:24 AM, Michael McCandless wrote: > Bummer. > > So t

Re: Uwe's question

2010-02-26 Thread Jason Rutherglen
Lets go to JUnit 4 if possible... Does it provide method level testing? (i.e. one doesn't need to execute every test method just to check the results of one method) On Thu, Feb 25, 2010 at 8:15 PM, Shai Erera wrote: > Ok this seems a discussion related to JUnit 4, so I'll port what I've said >

Re: Uwe's question

2010-02-26 Thread Jason Rutherglen
> I might convert an old-style test case if I was > working in it, but that's probably a personal preference. > I've never tried to learn a command-line invocation of a test > case for a single test method, I've always just used the IDE > to run individual methods

Re: Query modifier

2010-03-30 Thread Jason Rutherglen
David, I totally agree with this idea. On Tue, Mar 30, 2010 at 9:58 AM, David Smiley (@MITRE.org) wrote: > > I observed this problem when I started using Lucene (ages ago) and it's a > shame this situation persists.  In summary, it would be tremendously useful > if Query objects were fully mutabl

Re: GData, updateable IndexSearcher

2006-04-26 Thread jason rutherglen
ginal Message From: Doug Cutting <[EMAIL PROTECTED]> To: [email protected] Sent: Wednesday, April 26, 2006 11:27:44 AM Subject: Re: GData, updateable IndexSearcher jason rutherglen wrote: > Interesting, does this mean there is a plan for incrementally updateable > IndexSearchers

Re: GData, updateable IndexSearcher

2006-04-27 Thread jason rutherglen
dnesday, April 26, 2006 1:44:08 PM Subject: Re: GData, updateable IndexSearcher jason rutherglen wrote: > I was thinking you implied that you knew of someone who had customized their > own, but it was a closed source solution. And if so then you would know how > that project faired.

Re: GData, updateable IndexSearcher

2006-05-01 Thread jason rutherglen
I wanted to post a quick hack to see if it is along the correct lines. A few of the questions regard whether to resuse existing MultiReaders or simply strip out only the SegmentReaders. I do a compare on the segment name and made it public. Thanks! public static IndexReader reopen(IndexRead

Re: GData, updateable IndexSearcher

2006-05-01 Thread jason rutherglen
Can you post your code? - Original Message From: Robert Engels <[EMAIL PROTECTED]> To: [email protected]; jason rutherglen <[EMAIL PROTECTED]> Sent: Monday, May 1, 2006 11:33:06 AM Subject: RE: GData, updateable IndexSearcher fyi, using my reopen(0 implementation (w

Re: GData, updateable IndexSearcher

2006-05-01 Thread jason rutherglen
Thanks for the code and performance metric Robert. Have you had any issues with the deleted segments as Doug has been describing? - Original Message From: Robert Engels <[EMAIL PROTECTED]> To: [email protected]; jason rutherglen <[EMAIL PROTECTED]> Sent: Monday, Ma

Re: GData Server - Lucene storage

2006-06-02 Thread jason rutherglen
Yonik, It might be interesting to merge using BDB into Solr, as an option to provide better realtime updates. Perhaps the replication could be used as well in place of rsync? I don't have any experience with BDB replication, anyone have thoughts on the matter? Jason - Original Message -

Re: GData Server - Lucene storage

2006-06-02 Thread jason rutherglen
Is it possible to turn off directory locking with BDB? How is the performance compared to regular FSDirectory for queries? - Original Message From: Andi Vajda <[EMAIL PROTECTED]> To: [email protected]; jason rutherglen <[EMAIL PROTECTED]> Sent: Friday, June 2, 2006

LUCENE-528 and 565

2006-08-15 Thread jason rutherglen
What about using this http://issues.apache.org/jira/browse/LUCENE-528 to solve this http://issues.apache.org/jira/browse/LUCENE-565 Where the batching is performed in another index that is then merged into the existing one. This is something I have been looking for. Is 528 ok to use?

IndexReader.reopen discussion

2006-08-15 Thread jason rutherglen
There was this discussion regarding adding a reopen method to IndexReader however it seems to have dropped off the map. Robert Engels submitted some code however it was not a patch. http://www.gossamer-threads.com/lists/lucene/java-dev/34898?search_string=reopen I would submit something but h

Re: [jira] Commented: (LUCENE-565) Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided)

2006-08-22 Thread jason rutherglen
Yes I am including this patch as it is very useful for increasing the efficiency of updates as you described. I will be conducting more tests and will post any results. Yes a patch for IndexWriter will be useful so that the entirety of this build will work. Thanks! - Original Message ---

Re: [jira] Commented: (LUCENE-565) Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided)

2006-08-29 Thread jason rutherglen
The documents reached disk as a close was performed on the NewIndexModifier and the index size grows, seem like the deleteable files registers the documents as deleted though, so a search returns nothing and an optimize deletes all of the documents. Maybe the new documents have the same docid a

Re: [jira] Commented: (LUCENE-565) Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided)

2006-09-06 Thread jason rutherglen
Sounds interesting Marvin, I would be willing to test out what you create. I am working on trying creating a rapidly updating index and it sounds like this may help that. I've noticed even using a ramdisk that the whole merging process is quite slow. Maybe also because of the locking that occ

IndexReader.reopen FieldCache

2006-09-07 Thread jason rutherglen
Robert Engels, I implemented the reopen code you posted, works well, thanks. One thing I am curious about, are you able to reuse the FieldCache? From what I am seeing, it is being rebuilt after a commit, which makes the next query slow. Any ideas on this? Thanks, Jason

Formatting comments

2009-01-05 Thread Jason Rutherglen
Michael M, What program are you using on Mac OS X to format/word wrap your comments in what looks to be about 40 characters? -J

Re: Realtime Search

2009-01-05 Thread Jason Rutherglen
+1 Agreed, the initial version should use RAMDirectory in order to keep things simple and to benchmark against other MemoryIndex like index representations. On Fri, Dec 26, 2008 at 10:20 AM, Doug Cutting wrote: > Michael McCandless wrote: > >> So then I think we should start with approach #2 (bu

Re: Realtime Search

2009-01-08 Thread Jason Rutherglen
Based on our discussions, it seems best to get realtime search going in small steps. Below are some possible steps to take. Patch #1: Expose an IndexWriter.getReader method that returns the current reader and shares the write lock Patch #2: Implement a realtime ram index class Patch #3: Implement

Re: Realtime Search

2009-01-09 Thread Jason Rutherglen
or multiple transactions at once on IndexWriter outside of the realtime transactions seems to require a lot of refactoring. On Fri, Jan 9, 2009 at 5:39 AM, Michael McCandless < [email protected]> wrote: > > Jason Rutherglen wrote: > > Patch #1: Expose an IndexWriter.getRea

Re: Realtime Search

2009-01-09 Thread Jason Rutherglen
crash recovery), which should fit "above" Lucene nicely, what else is needed for realtime beyond the single-transaction support Lucene already provides?" What we have described above (exposing IR via IW) will be sufficient and realtime will live above it. On Fri, Jan 9, 2009 at 1

Re: Realtime Search

2009-01-09 Thread Jason Rutherglen
I think the IW integrated IR needs a rule regarding the behavior of IW.flush and IR.flush. There will need to be a flush lock that is shared between the IW and IR. The lock is acquired at the beginning of a flush and released immediately after a successful or unsuccessful call. We will need to shar

Re: Realtime Search

2009-01-12 Thread Jason Rutherglen
[email protected]> wrote: > > Jason Rutherglen wrote: > > Patch #1: Expose an IndexWriter.getReader method that returns the current >> reader and shares the write lock >> > > I tentatively like this approach so far... > > That reader is opened using In

Re: Realtime Search

2009-01-12 Thread Jason Rutherglen
Grant, Do you have a proposal in mind? It would help to suggest something like some classes and methods to help understand an alternative to what is being discussed. -J On Fri, Jan 9, 2009 at 12:05 PM, Grant Ingersoll wrote: > I realize we aren't adding read functionality to the Writer, but it

Re: Realtime Search

2009-01-29 Thread Jason Rutherglen
he same as reopen/clone (because it will call reopen on presumably the latest IR). On Sat, Jan 24, 2009 at 4:29 AM, Michael McCandless < [email protected]> wrote: > Jason Rutherglen wrote: > > > "But I think for realtime we don't want to be using IW's

Re: Realtime Search

2009-01-30 Thread Jason Rutherglen
the behavior of IW.updateDocument? LUCENE-1314 is in and we've agreed IR.reopen causes an IW.flush so I'll continue the LUCENE-1516 patch. On Fri, Jan 30, 2009 at 6:04 AM, Michael McCandless < [email protected]> wrote: > Jason Rutherglen wrote: > > > > We'd

BitVector.get bounds checking

2009-02-03 Thread Jason Rutherglen
A simple way to make BitVector faster would be to turn the get method bounds checking into an assertion. This is similar to OpenBitSet.fastGet.

Re: Porting benchmark suite

2009-02-09 Thread Jason Rutherglen
I'm planning to work on incorporating Mike's Python scripts into the Java benchmark code. I'd like to keep track of overall suggestions for improvements to contrib/benchmark. Perhaps I should open an issue so people can post suggestions? This way I can look at them and code them up (as I'll forget

Re: Porting benchmark suite

2009-02-10 Thread Jason Rutherglen
Mon, Feb 9, 2009 at 11:02 AM, Grant Ingersoll wrote: > > On Feb 9, 2009, at 12:24 PM, Jason Rutherglen wrote: > > I'm planning to work on incorporating Mike's Python scripts into the >> Java benchmark code. I'd like to keep track of overall suggestions >> for

Move deletes to a top level boolean AND NOT query

2009-02-19 Thread Jason Rutherglen
Is anyone working on this? I can't find a patch. I'll start one unless someone has something to post.

Re: Move deletes to a top level boolean AND NOT query

2009-02-19 Thread Jason Rutherglen
them down to each > TermQuery (LUCENE-1536) via random-access API (if the filter can support > it); that can give much better performance. > > However I need to redo these tests once LUCENE-1345 is in. > > Mike > > Jason Rutherglen wrote: > > Is anyone working on

Re: IndexWriter.rollback() logic

2009-02-23 Thread Jason Rutherglen
Howdy An, Commit means the changes are committed, there's no rollback at that point. Also in the futuer please post your questions to [email protected] Take care, Jason On Mon, Feb 23, 2009 at 3:52 PM, An Hong wrote: > A question about IndexWriter.rollback() logic. Its javadoc says

Re: GSoC 09 project ideas...

2009-03-18 Thread Jason Rutherglen
Hi Z.S., I'll update LUCENE-1313 after LUCENE-1516 is committed. I can post the basic new patch I have for LUCENE-1313 (heavily simplified compared to the previous patches), however it will assume LUCENE-1516. The other area that will need to be addressed is standard benchmarking for different r

InstantiatedIndex

2009-03-27 Thread Jason Rutherglen
Hi Karl, I'm thinking InstantiatedIndex needs to implement either clone of all the index data or needs to be able to accept a non-optimized reader, or both. I forget what the obstacles are to implementing the non-optimized reader option? Do you think there are advantages or disadvantages when co

Future projects

2009-04-01 Thread Jason Rutherglen
Now that LUCENE-1516 is close to being committed perhaps we can figure out the priority of other issues: 1. Searchable IndexWriter RAM buffer 2. Finish up benchmarking and perhaps implement passing filters to the SegmentReader level 3. Deleting by doc id using IndexWriter With 1) I'm interested

Re: Future projects

2009-04-02 Thread Jason Rutherglen
4) An additional possibly contrib module is caching the results of TermQueries. In looking at the TermQuery code would we need to cache the entire docs and freqs as arrays which would be a memory hog? On Wed, Apr 1, 2009 at 4:05 PM, Jason Rutherglen wrote: > Now that LUCENE-1516 is close

Re: Future projects

2009-04-02 Thread Jason Rutherglen
On Wed, Apr 1, 2009 at 7:05 PM, Jason Rutherglen > wrote: > > Now that LUCENE-1516 is close to being committed perhaps we can > > figure out the priority of other issues: > > > > 1. Searchable IndexWriter RAM buffer > > I think first priority is to get a good

Re: Future projects

2009-04-02 Thread Jason Rutherglen
ToButNotNext, or simply implementing a commitable version of LUCENE-1536? On Thu, Apr 2, 2009 at 1:40 AM, Michael McCandless < [email protected]> wrote: > On Wed, Apr 1, 2009 at 7:05 PM, Jason Rutherglen > wrote: > > Now that LUCENE-1516 is close to being committed perhaps we can &g

Re: Future projects

2009-04-02 Thread Jason Rutherglen
o Lucene? On Thu, Apr 2, 2009 at 12:59 PM, Michael McCandless < [email protected]> wrote: > On Thu, Apr 2, 2009 at 2:07 PM, Jason Rutherglen > wrote: > > I'm interested in merging cached bitsets and field caches. While this > may > > be something related to

Re: Future projects

2009-04-02 Thread Jason Rutherglen
segment merging such that the user can decide what information they want to record about the merged SRs (I'm pretty sure there isn't a way to do this with MergePolicy?) On Thu, Apr 2, 2009 at 2:41 PM, Michael McCandless < [email protected]> wrote: > On Thu, Apr 2, 2009 at

IndexWriter.addIndexesNoOptimize(IndexReader[] readers)

2009-04-02 Thread Jason Rutherglen
This seems like something that's tenable? It would be useful for merging ram indexes to disk where if a directory is passed, the directory may be changed.

Re: Future projects

2009-04-03 Thread Jason Rutherglen
chael McCandless < [email protected]> wrote: > On Thu, Apr 2, 2009 at 5:56 PM, Jason Rutherglen > wrote: > >> I think I need to understand better why delete by Query isn't > > viable in your situation... > > > > The delete by query is a separa

Re: Future projects

2009-04-03 Thread Jason Rutherglen
om> wrote: > On Wed, Apr 1, 2009 at 7:05 PM, Jason Rutherglen > wrote: > > Now that LUCENE-1516 is close to being committed perhaps we can > > figure out the priority of other issues: > > > > 1. Searchable IndexWriter RAM buffer > > I think first priority

Re: Future projects

2009-04-03 Thread Jason Rutherglen
I looked at the IndexWriter code in regards to creating a realtime reader, with the many flexible indexing classes I'm unsure of how one would get a frozenish IndexInput of the byte slices, given the byte slices are attached to different threads? On Fri, Apr 3, 2009 at 2:42 PM, Jason Ruthe

Re: Future projects

2009-04-06 Thread Jason Rutherglen
, RAMBufferTermEnum, RAMBufferTermDocs, RAMBufferTermPositions would be implemented that can read from the ram buffer. I don't think the current field cache API would like growing arrays? Something hopefully LUCENE-831 will support. On Sat, Apr 4, 2009 at 4:46 AM, Michael McCandless < luc.

Re: Future projects

2009-04-07 Thread Jason Rutherglen
merge sort (across the N thread states) I'm confused about why a merge sort is required? On Tue, Apr 7, 2009 at 1:45 AM, Michael McCandless < [email protected]> wrote: > On Mon, Apr 6, 2009 at 6:43 PM, Jason Rutherglen > wrote: > >> The realtime reader woul

Re: Lucene 2.9 status (to port to Lucene.Net)

2009-04-16 Thread Jason Rutherglen
LUCENE-1313 relies on LUCENE-1516 which is in trunk. If you have other questions George, feel free to ask. On Thu, Apr 16, 2009 at 8:04 AM, George Aroush wrote: > Thanks Mike. > > A quick follow up question. What's the status of > http://issues.apache.org/jira/browse/LUCENE-1313? Can this wor

Re: vacation

2009-04-16 Thread Jason Rutherglen
Enjoy, I just got back from mine, tropical Minneapolis. On Thu, Apr 16, 2009 at 7:45 AM, Michael McCandless < [email protected]> wrote: > Just as a heads up, since we have so many neat Lucene improvements "in > flight": tomorrow I leave for a week long vacation, in a nice warm > place tha

Re: Future projects

2009-04-22 Thread Jason Rutherglen
Hey Michael, You're in San Jose? Feel free to come by one of these days on our pizza days. Also, can you post what you have of LUCENE-1231? I got a lot more familiar with IndexWriter internals with LUCENE-1516 and could to a good whack at getting LUCENE-1231 integrated. Cheers! Jason On Sun,

Re: [Lucene-java Wiki] Update of "LuceneAtApacheConUs2009" by MichaelBusch

2009-04-28 Thread Jason Rutherglen
Michael, I updated the wiki under "New Features in Lucene". I can give a presentation on realtime search in Lucene. -J On Mon, Apr 27, 2009 at 10:11 PM, Michael Busch wrote: > I'm happy to give more than one talk, on the other hand I don't want to > prevent others from presenting. So if anyon

Re: dbsight

2009-04-30 Thread Jason Rutherglen
Hi Mike, You may want to ask your question on [email protected] -J On Thu, Apr 30, 2009 at 11:59 AM, Michael Masters wrote: > Hello Everyone, > > I just started to use lucene recently. Great project BTW. I was > wondering if anyone has suggested making an open source version of > dbsi

Re: Getting an IndexReader from a committed IndexWriter

2009-05-14 Thread Jason Rutherglen
Hi Shay, I think IndexWriter.getReader from LUCENE-1516 in trunk is what you're talking about? It pools readers internally so there's no need to call IndexReader.reopen, one simply calls IW.getReader to get new readers containing recent updates. -J BTW I replied to the message on java-u...@lucen

Re: Getting an IndexReader from a committed IndexWriter

2009-05-14 Thread Jason Rutherglen
ndexReader reflecting the latest committed state (or closed > state) of the IndexWriter. The problem with the getReader method is the > fact > that it is tied to the IndexWriter instance. > > > Jason Rutherglen-2 wrote: > > > > Hi Shay, > > > > I think IndexWr

Re: Lucene's default settings & back compatibility

2009-05-18 Thread Jason Rutherglen
Yeah makes sense, getting in depth with Lucene, and then seeing real world usage, most users still do use the defaults. I think I will try to do help this by writing some wiki pages on new features. Probably this OldSettings/NewSettings model is a good start for a wiki page? Our current wiki FAQ i

Re: Lucene's default settings & back compatibility

2009-05-21 Thread Jason Rutherglen
I'm having trouble visualizing the various methods people are talking about. It seems like we could open an issue and post patches with code illustrating what each person is talking about? On Thu, May 21, 2009 at 10:02 AM, Michael McCandless < [email protected]> wrote: > Actually, we sta

NRT getReader turnaround on large segments

2009-05-28 Thread Jason Rutherglen
An interesting discussion came up. How do we handle IW.getReader turnaround time on large new segments?

Re: NRT getReader turnaround on large segments

2009-05-28 Thread Jason Rutherglen
g to SegmentInfos. > > So, while such warming is happening, if getReader() is called, the > returned reader will still read the old segments. > > Mike > > On Thu, May 28, 2009 at 3:06 PM, Jason Rutherglen > wrote: > > An interesting discussion came up. How do we handle

Re: NRT getReader turnaround on large segments

2009-05-28 Thread Jason Rutherglen
gt; Right. If you play w/ this please report back on how it goes! > > Mike > > On Thu, May 28, 2009 at 3:38 PM, Jason Rutherglen > wrote: > > And warming a segment in mergeMiddle doesn't block the addition of new > > segments. > > > > On Thu, Ma

IndexReaderFactory for IndexWriter LUCENE-1516

2009-05-28 Thread Jason Rutherglen
Some folks may want to have IW.getReader (LUCENE-1516) use custom readers underneath, we can have IW support an IndexReaderFactory?

More efficient loading of terms dictionary

2009-05-28 Thread Jason Rutherglen
This is for Marvin who previously mentioned loading the term dictionary directly from the filesystem (rather than load every Nth term into Java heap) which could improve latency of opening new readers. Were you able to take this idea any further?

Re: ReadOnly IndexReaders

2009-05-29 Thread Jason Rutherglen
Yeah! On Fri, May 29, 2009 at 2:21 PM, Grant Ingersoll wrote: > Does it make sense to add isReadOnly() to IndexReader such that one can > easily introspect whether a Reader is read only? > > -Grant > > - > To unsubscribe, e-mail:

Re: ReadOnly IndexReaders

2009-06-01 Thread Jason Rutherglen
Currently there's ReadOnlyMultiSegmentReader and ReadOnlySegmentReader, which calling instanceof on an IndexReader is a current hacked package protected way of finding out if a reader is read only. I wrote code before which checked by calling instanceof on both, which seemed a bit strange. On Sat

Re: EnwikiDocMaker

2009-06-03 Thread Jason Rutherglen
I saw a weird error related to the xerces, I think it was a class version problem. I'll try it again though to make sure. On Wed, Jun 3, 2009 at 5:58 AM, Shai Erera wrote: > Then perhaps as part of 1595 I can change it to use Java's XML parser, and > test the Enwiki file. If all goes well, we m

Re: [jira] Commented: (LUCENE-1673) Move TrieRange to core

2009-06-09 Thread Jason Rutherglen
> I wonder if we could handle this by adding a setting in FieldInfo? Do we have an issue open that allows any metadata on a per field basis? This seems like something flexible indexing will require? On Tue, Jun 9, 2009 at 10:15 AM, Michael McCandless (JIRA) wrote: > >[ > https://issues.apach

Re: Some thoughts around the use of reader.isDeleted and hasDeletions

2009-06-09 Thread Jason Rutherglen
> I searched the code and was surprised to see isDeleted and hasDeletions are not called from any search code. It was weeded out over time, MatchAllDocsQuery for example used to call it. I think it was to offer users (who are using isDeleted) a way to access deleted docs without a performance hit.

Payloads and TrieRangeQuery

2009-06-09 Thread Jason Rutherglen
At the SF Lucene User's group, Michael Busch mentioned using payloads with TrieRangeQueries. Is this something that's being worked on? I'm interested in what sort performance benefits there would be to this method?

Re: Payloads and TrieRangeQuery

2009-06-10 Thread Jason Rutherglen
less.com> wrote: > Use them how? (Sounds interesting...). > > Mike > > On Tue, Jun 9, 2009 at 10:32 PM, Jason > Rutherglen wrote: > > At the SF Lucene User's group, Michael Busch mentioned using > > payloads with TrieRangeQueries. Is this something that'

Re: Lucene memory usage

2009-06-10 Thread Jason Rutherglen
d >terms, and is slurped into the arrays on init. > > This is a sizable RAM savings over what's done now because you save 2 > objects, 3 pointers, 2 longs, 2 ints (I think), per indexed term. > > Mike > > On Wed, Jun 10, 2009 at 2:02 PM, Jason > Rutherglen wrote: &

Re: Lucene memory usage

2009-06-10 Thread Jason Rutherglen
ikemccandless.com> wrote: > On Wed, Jun 10, 2009 at 4:13 PM, Jason > Rutherglen wrote: > > Great! If I understand correctly it looks like RAM savings? Will > > there be an improvement in lookup speed? (We're using binary > > search here?). > > Yes, sizable

Re: Lucene memory usage

2009-06-10 Thread Jason Rutherglen
ael McCandless < [email protected]> wrote: > On Wed, Jun 10, 2009 at 7:23 PM, Jason > Rutherglen wrote: > > Cool! Sounds like with LUCENE-1458 we can experiment with some > > of these things. Does CSF become just another codec? > > I believe LUCENE-1458 currently only mak

MMap certain files, leave the rest to the regular dir

2009-06-10 Thread Jason Rutherglen
On the topic of MMaping files. Would a Directory implementation that transparently MMaps only certain files be interesting? It could MMap files that are accessed frequently (term dict, postings), as opposed to files such as docstores that are accessed less frequently. This could be built using LUCE

Re: Lucene memory usage

2009-06-11 Thread Jason Rutherglen
:43 AM, Michael McCandless < [email protected]> wrote: > On Wed, Jun 10, 2009 at 9:24 PM, Jason > Rutherglen wrote: > > I read over the LUCENE-1458 comments again. Interesting. I think > > the most compelling argument is that the various files we're > >

Re: Lucene memory usage

2009-06-11 Thread Jason Rutherglen
Maybe we can put together our requested IO operations and submit them for inclusion in NIO Java 7? http://openjdk.java.net/projects/nio/ On Thu, Jun 11, 2009 at 12:21 PM, Jason Rutherglen < [email protected]> wrote: > Makes sense. > > Currently MMapDirectory doesn'

Re: Lucene memory usage

2009-06-11 Thread Jason Rutherglen
umably doesn't run bytes through the IO cache? Granted it's slower on most platforms, but could this be fixed in future Java releases? On Thu, Jun 11, 2009 at 12:50 PM, Michael McCandless < [email protected]> wrote: > On Thu, Jun 11, 2009 at 3:21 PM, Jason > Ruthe

Re: madvise(ptr, len, MADV_SEQUENTIAL)

2009-06-15 Thread Jason Rutherglen
? Would there need to be a hint in the FileChannel.map method? -J On Mon, Jun 15, 2009 at 12:36 AM, Alan Bateman wrote: > Jason Rutherglen wrote: > >> Is there going to be a way to do this in the new Java IO APIs? >> > Good question, as it has come up a few times and is needed

Re: madvise(ptr, len, MADV_SEQUENTIAL)

2009-06-16 Thread Jason Rutherglen
; >> Subject: Re: madvise(ptr, len, MADV_SEQUENTIAL) > >> > >> Lucene could really make use of this method. When a segment merge > >> takes place, we can read & write many GB of data, which without > >> madvise on many OSs would effectively flush the

Execute a testcase method via ant?

2009-06-16 Thread Jason Rutherglen
Doesn't look like this is possible today, though could be handy?

Re: madvise(ptr, len, MADV_SEQUENTIAL)

2009-06-16 Thread Jason Rutherglen
penjdk.java.net > >> Subject: Re: madvise(ptr, len, MADV_SEQUENTIAL) > >> > >> Lucene could really make use of this method. When a segment merge > >> takes place, we can read & write many GB of data, which without > >> madvise on many OSs would effec

Re: madvise(ptr, len, MADV_SEQUENTIAL)

2009-06-16 Thread Jason Rutherglen
Sorry, not portable, but POSIX_FADV_WILLNEED is which can be used with posix_fadvise. On Tue, Jun 16, 2009 at 8:12 PM, Jason Rutherglen < [email protected]> wrote: > Perhaps we'd also like to request readahead be included in JDK7? > > http://linux.die.net/man/2/reada

Re: madvise(ptr, len, MADV_SEQUENTIAL)

2009-06-16 Thread Jason Rutherglen
read & write many GB of data, which without > madvise on many OSs would effectively flush the IO cache (thus hurting > our search performance). > > Mike > > On Mon, Jun 15, 2009 at 6:01 PM, Jason > Rutherglen wrote: > > Thanks Alan. > > > > I cross posted th

Re: Lucene 2.9 Again

2009-06-18 Thread Jason Rutherglen
> I pretty much find any excuse to go and write stuff in Python There's Scala... On Thu, Jun 18, 2009 at 2:37 AM, Michael McCandless < [email protected]> wrote: > On Wed, Jun 17, 2009 at 4:13 PM, Mark Miller wrote: > > Michael Busch wrote: > >> > >> Everyone who is unhappy with the relea

Re: madvise(ptr, len, MADV_SEQUENTIAL)

2009-06-18 Thread Jason Rutherglen
platforms: http://www.gnu.org/software/hello/manual/gnulib/madvise.html On Wed, Jun 17, 2009 at 2:19 AM, Alan Bateman wrote: > Jason Rutherglen wrote: > >> Alan, >> >> Do you think something like FileDescriptor.setAdvise (mirroring >> posix_fadvise) makes sense?

Re: caching an indexreader

2009-06-19 Thread Jason Rutherglen
On the topic of RAM consumption, it seems like field caches could return estimated RAM usage (given they're arrays of standard Java types)? There's methods of calculating per platform (I believe relatively accurately). On Fri, Jun 19, 2009 at 12:11 PM, Michael McCandless < [email protected]

Parallelize tests

2009-06-20 Thread Jason Rutherglen
I was looking at how to parallelize the tests, seems like this ANT command would work, is there an open issue to do this? http://ant.apache.org/manual/CoreTasks/parallel.html

Improving TimeLimitedCollector

2009-06-23 Thread Jason Rutherglen
As we're revamping collectors, weights, and scorers, perhaps we can push time limiting into the individual subscorers? Currently on a boolean query, we're timing out the query at the top level which doesn't work well if the subqueries exceed the time limit.

Re: Execute a testcase method via ant?

2009-06-23 Thread Jason Rutherglen
gt; Is that what you were looking for? > > On Wed, Jun 17, 2009 at 2:20 AM, Jason Rutherglen < > [email protected]> wrote: > >> Doesn't look like this is possible today, though could be handy? >> > >

Re: Improving TimeLimitedCollector

2009-06-24 Thread Jason Rutherglen
need to build a TimeOutQueryWrapper that will wrap a Query, and implement > > the timeout logic, but that's get complicated. > > > > I think the Collector approach makes the most sense to me, since it's the > > only object I fully control in the search proce

Re: Improving TimeLimitedCollector

2009-06-24 Thread Jason Rutherglen
This would be good however how would we obtain the thread? I believe this would require using a ThreadLocalish type of system which could be quite slow (to obtain the thread and lookup in the hashmap). One implementation I looked at before was to add this check in IndexReader.isDeleted (by overrid

Re: [jira] Commented: (LUCENE-1709) Parallelize Tests

2009-06-25 Thread Jason Rutherglen
Mark Miller wrote: > JUnit also supports parallelizing tasks, but its only in the very latest > release. I'd check out that. There are generally more issues than just > firing off multiple tests at once. > > -- > - Mark > > http://www.lucidimagination.com >

Re: addIndexesNoOptimize

2009-07-06 Thread Jason Rutherglen
> MergePolicy expects to receive SegmentInfo instances I ran into this implementing LUCENE-1589. On Mon, Jul 6, 2009 at 3:18 AM, Michael McCandless < [email protected]> wrote: > On Mon, Jul 6, 2009 at 2:18 AM, John Wang wrote: > > > Currently, addIndexesNoOptimize(Directory[] dir) is

Re: Execute a testcase method via ant?

2009-07-06 Thread Jason Rutherglen
gt; > Mike > > On Tue, Jun 23, 2009 at 7:42 PM, Jason > Rutherglen wrote: > > More like ant test -Dtestcase=TestSort -Dtestmethod=testMultiSort > > > > or > > > > ant test -Dtestcase=TestSort.testMultiSort > > > > I Googled a lot for "an

addIndexes* blocks addDocuments calls

2009-07-14 Thread Jason Rutherglen
For replicating and general system performance, it would be good to offer a way to addIndexes* without blocking the addition of more docs. This seems doable somehow?

Re: addIndexes* blocks addDocuments calls

2009-07-14 Thread Jason Rutherglen
uld > actually rollback all changes, ie, remove what was done by addIndexes > but retroactively preserve any segments created by other methods > (flushing, other addIndexes calls, etc.). > > Mike > > On Tue, Jul 14, 2009 at 3:08 PM, Jason > Rutherglen wrote: > > For re

Throttling merges

2009-07-18 Thread Jason Rutherglen
It may be useful to allow users to throttle merges. A callback that IW passes into SegmentMerger would suffice where individual SM methods make use of the callback. I suppose this could slow down overall merging by adding a potentially useless method call. However if merging typically consumes IO r

Re: Throttling merges

2009-07-20 Thread Jason Rutherglen
t; (reader, merging, writer, etc.), and then somehow add > throttling in there. > > Mike > > On Sat, Jul 18, 2009 at 10:37 AM, Jason > Rutherglen wrote: >> It may be useful to allow users to throttle merges. A callback >> that IW passes into SegmentMerger would suff

Re: addIndexes* blocks addDocuments calls

2009-07-21 Thread Jason Rutherglen
k all changes, ie, remove what was done by addIndexes > but retroactively preserve any segments created by other methods > (flushing, other addIndexes calls, etc.). > > Mike > > On Tue, Jul 14, 2009 at 3:08 PM, Jason > Rutherglen wrote: >> For

getTermInfosIndexDivisor deprecated?

2009-07-22 Thread Jason Rutherglen
It's a get method but the UnsupportedOperationException says "Please pass termInfosIndexDivisor up-front when opening IndexReader"? I did pass it in. Writing a test case for Solr that checks it. - To unsubscribe, e-mail: java-de

ShingleFilter + StopWords?

2009-07-27 Thread Jason Rutherglen
I'd like to enable ShingleFilter to only create shingles for a set of (stop) words (rather than for all N tokens). - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: java-dev-h...@luc

Re: ShingleFilter + StopWords?

2009-07-27 Thread Jason Rutherglen
ems fairly common? On Mon, Jul 27, 2009 at 1:22 PM, Steven A Rowe wrote: > Hi Jason, > > On 7/27/2009 at 3:15 PM, Jason Rutherglen wrote: >> I'd like to enable ShingleFilter to only create shingles for a set of >> (stop) words (rather than for all N tokens). > > For

  1   2   3   4   5   6   7   8   9   >