Re: SegmentReader with custom setting of deletedDocs, single reusable FieldsReader

2008-06-25 Thread Yonik Seeley
On Wed, Jun 25, 2008 at 6:29 AM, Michael McCandless <[EMAIL PROTECTED]> wrote: > We've also discussed at one point creating an IndexReader impl that searches > the RAM buffer that DocumentsWriter writes to when adding documents. I > think it's easier than it sounds, on first glance, because Docume

Re: SegmentReader with custom setting of deletedDocs, single reusable FieldsReader

2008-06-25 Thread Yonik Seeley
, then you can easily search for a document and retrieve it's stored fields in order to re-index it with changes (and still maintain decent performance). -Yonik > On Wed, Jun 25, 2008 at 8:41 AM, Yonik Seeley <[EMAIL PROTECTED]> wrote: >> >> On Wed, Jun 25, 2008 at 6:29 AM,

Re: per-field similarity

2008-06-25 Thread Yonik Seeley
On Wed, Jun 25, 2008 at 2:19 PM, Chris Hostetter <[EMAIL PROTECTED]> wrote: > : Might also consider passing in more optional context when retrieving > : the similarity for a field (such as a Query, if searching). > : Something like Similarity.getSimilarity(String field, Query q). > > i assume you m

Re: per-field similarity

2008-06-25 Thread Yonik Seeley
On Wed, Jun 25, 2008 at 5:06 PM, Chris Hostetter <[EMAIL PROTECTED]> wrote: > Hmmm... that seems like it would be confusing: particularly since in the > IndexWriter case the "Query" param would never make sense. changing > IndexWriter.getSimilarity to take a "String fieldName" and changing > Searc

Re: SegmentReader with custom setting of deletedDocs, single reusable FieldsReader

2008-06-27 Thread Yonik Seeley
On Fri, Jun 27, 2008 at 2:43 PM, Michael McCandless <[EMAIL PROTECTED]> wrote: > We could instead keep these SegmentReaders open and reuse them for > applying deletes. We discussed caching SegmentReaders in the original buffered deletes issue too http://issues.apache.org/jira/browse/LUCENE-565 >

Re: SegmentReader with custom setting of deletedDocs, single reusable FieldsReader

2008-06-29 Thread Yonik Seeley
On Sun, Jun 29, 2008 at 9:42 AM, Jason Rutherglen <[EMAIL PROTECTED]> wrote: > IndexReader.document as it is is really a lame duck. The > IndexReader.document call being synchronized at the top level drags down the > performance of systems that store data in Lucene. A single file descriptor > for

Re: readVInt, what is it for?

2008-07-02 Thread Yonik Seeley
The frequency is tracked at index time. It's simply a read at query time. See TermDocs. If you really want to understand more about the code internals of Lucene, I'd suggest stepping through more example queries with a debugger. -Yonik On Wed, Jul 2, 2008 at 8:49 PM, blazingwolf7 <[EMAIL PROTEC

Re: readVInt, what is it for?

2008-07-02 Thread Yonik Seeley
cked during index time? I index my > file earlier. Later I will open the index and perform a search. Shouldn't > the frequency of each term in each document found be calculated at during > the searching process? > > > Yonik Seeley wrote: >> >> The frequency is tracked

Re: Class in Lucene that Perform Search

2008-07-02 Thread Yonik Seeley
On Wed, Jul 2, 2008 at 10:30 PM, blazingwolf7 <[EMAIL PROTECTED]> wrote: > What I am missing is that I fail to locate the class that perform the actual > comparison to determine if a query match any term in a document. You need to understand the inverted index format. Documents that match a term

Re: Class in Lucene that Perform Search

2008-07-03 Thread Yonik Seeley
to add an extra value into it so that I can retrieve the > information during the searching process. Thank Look at payloads first. What problem are you trying to solve? Someone may have an easier approach for you if payloads doesn't work. -Yonik > > Yonik Seeley wrote: >>

Re: [jira] Created: (LUCENE-1328) FileNotFoundException in

2008-07-07 Thread Yonik Seeley
On Mon, Jul 7, 2008 at 5:03 PM, Yajun <[EMAIL PROTECTED]> wrote: > > I'm adding tons of logging, hopefully it will give me some information. Try capturing the directory contents before you take a snapshot... something like ls -l index > index/ls.txt Then if a missing file turns up, you can compar

test

2008-07-09 Thread Yonik Seeley
WCxeOpjGRJqhiEDXOUAoQUljIu/DCtZFMHvKeaJemhNc9fWqARg/I7ZSWpgJmxykC0Lz 71KtcK+tRFjKfT+bMVixuNFtrwPGcTm37hIpA= Received: by 10.115.92.2 with SMTP id u2mr9590818wal.33.1215618647304; Wed, 09 Jul 2008 08:50:47 -0700 (PDT) Received: by 10.114.75.13 with HTTP; Wed, 9 Jul 2008 08:50:47 -0

Re: test

2008-07-10 Thread Yonik Seeley
ame yesterday when posting to java-users. >> But looking at nabble my posts are there. >> >>karl >> >> 10 jul 2008 kl. 03.23 skrev Yonik Seeley: >> >>> sorry for the noise... this is just a test to java-dev. >>> I'm unable to post t

Re: Commit while addIndexes is in progress

2008-07-11 Thread Yonik Seeley
On Fri, Jul 11, 2008 at 2:38 PM, Michael McCandless <[EMAIL PROTECTED]> wrote: > > Hmm, I think we should. > > What should it "mean" when you call commit(), while another thread is in the > middle of addIndexes? Seems like either all or none of the segments in addIndexes should be committed. > We

Re: Commit while addIndexes is in progress

2008-07-11 Thread Yonik Seeley
On Fri, Jul 11, 2008 at 3:27 PM, Ning Li <[EMAIL PROTECTED]> wrote: > We should also disallow concurrent addIndexes, right? Hmmm, the current implementation looks like it won't currently won't work correctly (docWriter.resumeAllThreads() being called while another thread is calling addIndexes, etc

Re: IndexReader.acquire()/release() ?

2008-07-18 Thread Yonik Seeley
On Fri, Jul 18, 2008 at 4:23 PM, Mark Miller <[EMAIL PROTECTED]> wrote: > but there is still something I don't like about that API... Perhaps that it's just a piece of the puzzle? It's doesn't seem sufficient by itself to allow multiple threads to easily share a reader. But it does seem like it

Re: IndexReader.acquire()/release() ?

2008-07-18 Thread Yonik Seeley
Although I do wonder if incRef() and decRef() aren't more suitable names. Just make those methods public, which the caveat that one should not call them on a closed reader. They are expert level APIs after all. On Fri, Jul 18, 2008 at 4:45 PM, Yonik Seeley <[EMAIL PROTECTED]> wrot

Re: performance optimizations

2008-07-23 Thread Yonik Seeley
Making well reasoned arguments about specific patches would be helpful. Also, the complexity vs speed trade-offs are different for core library like Lucene where performance is one of the primary features. -Yonik On Wed, Jul 23, 2008 at 4:01 PM, robert engels <[EMAIL PROTECTED]> wrote: > I hope t

Re: Sort suggestion

2008-07-29 Thread Yonik Seeley
The problem isn't sorting per-se... the problem is quickly retrieving the sort value for a document. For that, we currently have the FieldCache that's what takes up the memory. There are more memory efficient ways, but they just haven't been implemented yet. -Yonik On Tue, Jul 29, 2008 at 3:

Re: [jira] Updated: (LUCENE-1345) Allow Filter as clause to BooleanQuery

2008-07-30 Thread Yonik Seeley
disclaimer: this is just for fun differences should be in the noise in any complex system, and I'm not suggesting any code changes. Actually, with 32 bit registers, x<0 should be faster than x==-1 by one cycle. If it doesn't test faster, then it's because of some optimizations that could be p

Re: [jira] Updated: (LUCENE-1345) Allow Filter as clause to BooleanQuery

2008-07-30 Thread Yonik Seeley
e looked in the past and couldn't find anything. If anyone knows of anything, it would be very cool to have though. -Yonik > Mike > > Yonik Seeley wrote: > >> disclaimer: this is just for fun differences should be in the >> noise in any complex system, and I&

Re: [jira] Updated: (LUCENE-1345) Allow Filter as clause to BooleanQuery

2008-07-30 Thread Yonik Seeley
On Wed, Jul 30, 2008 at 3:17 PM, Stephen Green <[EMAIL PROTECTED]> wrote: > Might the description here: > > http://weblogs.java.net/blog/kohsuke/archive/2008/03/deep_dive_into.html > > help? Sweet! Thanks! -Yonik - To unsubscri

Re: [jira] Updated: (LUCENE-1345) Allow Filter as clause to BooleanQuery

2008-07-30 Thread Yonik Seeley
Stephen pointed out, I can now see that the difference does survive. -Yonik > Yonik Seeley wrote: > >> disclaimer: this is just for fun differences should be in the >> noise in any complex system, and I'm not suggesting any code changes. >> >> Actually, with

Re: release 2.4 soon?

2008-08-18 Thread Yonik Seeley
On Mon, Aug 18, 2008 at 5:38 PM, Michael McCandless <[EMAIL PROTECTED]> wrote: > There are a number of good changes on the trunk, and it's been 7 months > since we released 2.3.0, and we wanted to do more frequent releases so > what do you all think of releasing 2.4 soon? +1 -Yonik -

solr2: Onward and Upward

2008-09-03 Thread Yonik Seeley
If you've considered Solr in the past, but for some reason it didn't meet your needs, we'd love to hear from you over on solr-dev. We're starting to do some forward looking architecture work on the next major version of Solr, so let us know what ideas you have and what you'd like to see! solr-dev

Re: Realtime Search for Social Networks Collaboration

2008-09-03 Thread Yonik Seeley
On Wed, Sep 3, 2008 at 3:20 PM, Jason Rutherglen <[EMAIL PROTECTED]> wrote: > I am wondering > if there are social networks (or anyone else) out there who would be > interested in collaborating with Apache on realtime search to get it > to the point it can be used in production. Good timing Jason,

Re: Moving SweetSpotSimilarity out of contrib

2008-09-03 Thread Yonik Seeley
On Wed, Sep 3, 2008 at 4:55 PM, Michael McCandless <[EMAIL PROTECTED]> wrote: >> I suspect any attempts at "bundling" Lucene code may snowball until you've >> rebuilt Solr. > > Yeah I guess it is... though Solr includes the whole webapp too, whereas I > think there's a natural bundle that wouldn't

Re: Realtime Search for Social Networks Collaboration

2008-09-04 Thread Yonik Seeley
On Wed, Sep 3, 2008 at 6:50 PM, Jason Rutherglen <[EMAIL PROTECTED]> wrote: > I also think it's got a > lot of things now which makes integration difficult to do properly. I agree, and that's why the major bump in version number rather than minor - we recognize that some features will need some am

Re: Realtime Search for Social Networks Collaboration

2008-09-06 Thread Yonik Seeley
There's a good percent of the Solr community that is looking to add everything you are (from a functional point of view). Some of the other little things that we haven't considered (like a remote Java API) sound cool... no reason not to add that also. We're also planning on adding alternatives to

Re: Realtime Search for Social Networks Collaboration

2008-09-08 Thread Yonik Seeley
On Mon, Sep 8, 2008 at 12:33 PM, Michael McCandless <[EMAIL PROTECTED]> wrote: > I'd also trying to make time to explore the approach of creating an > IndexReader impl. that searches IndexWriter's RAM buffer. That seems like it could possibly be the best performing approach in the long run. > I t

Re: Realtime Search for Social Networks Collaboration

2008-09-08 Thread Yonik Seeley
On Mon, Sep 8, 2008 at 3:56 PM, Ning Li <[EMAIL PROTECTED]> wrote: > On Mon, Sep 8, 2008 at 2:43 PM, Yonik Seeley <[EMAIL PROTECTED]> wrote: >> But, how would you maintain a static view of an index...? >> >> IndexReader r1 = indexWriter.getCurrentI

Re: Realtime Search for Social Networks Collaboration

2008-09-08 Thread Yonik Seeley
On Mon, Sep 8, 2008 at 3:04 PM, Michael McCandless <[EMAIL PROTECTED]> wrote: > Right, getCurrentIndex would return a MultiReader that includes > SegmentReader for each segment in the index, plus a "RAMReader" that > searches the RAM buffer. That RAMReader is a tiny shell class that would > basica

Re: Realtime Search for Social Networks Collaboration

2008-09-09 Thread Yonik Seeley
On Tue, Sep 9, 2008 at 5:28 AM, Michael McCandless <[EMAIL PROTECTED]> wrote: > Yonik Seeley wrote: >> What about something like term freq? Would it need to count the >> number of docs after the local maxDoc or is there a better way? > > Good question... > > I th

Re: Realtime Search for Social Networks Collaboration

2008-09-09 Thread Yonik Seeley
On Tue, Sep 9, 2008 at 11:42 AM, Ning Li <[EMAIL PROTECTED]> wrote: > On Tue, Sep 9, 2008 at 10:02 AM, Yonik Seeley <[EMAIL PROTECTED]> wrote: >> Yeah, I think the underlying RandomAccessFile might do the right >> thing, but IndexInput isn't required to see any ch

Re: Realtime Search for Social Networks Collaboration

2008-09-09 Thread Yonik Seeley
On Tue, Sep 9, 2008 at 12:41 PM, Michael McCandless <[EMAIL PROTECTED]> wrote: > Yonik Seeley wrote: >> OR, if all writes are append-only, perhaps we don't ever need to >> invalidate the read buffer and would just need to remove the current >> logic that caches

Re: Realtime Search for Social Networks Collaboration

2008-09-09 Thread Yonik Seeley
On Tue, Sep 9, 2008 at 12:45 PM, Michael McCandless <[EMAIL PROTECTED]> wrote: > Yonik Seeley wrote: >> No, it would essentially be a change in the semantics that all >> implementations would need to support. > > Right, which is you are allowed to open an IndexInput on a

update NOTICE/LICENSE when new jars are added

2008-09-11 Thread Yonik Seeley
I'm finished my audit of the jars we include. There were missing elements for junit, stax (now removed), and stax-utils. In the future we should take care of updating LICENSE/NOTICE immediately when a new jar is added. -Yonik - T

Re: update NOTICE/LICENSE when new jars are added

2008-09-11 Thread Yonik Seeley
Oops, meant for this to go to solr-dev... sorry. On Thu, Sep 11, 2008 at 11:52 AM, Yonik Seeley <[EMAIL PROTECTED]> wrote: > I'm finished my audit of the jars we include. There were missing > elements for junit, stax (now removed), and stax-utils. > In the future we should t

email archives

2008-09-12 Thread Yonik Seeley
On Fri, Sep 12, 2008 at 6:01 AM, Michael McCandless (JIRA) <[EMAIL PROTECTED]> wrote: > Spinoff from here: > > > http://mail-archives.apache.org/mod_mbox/lucene-java-user/200809.mbox/%3Cba72f77f0809111418l29cf215dnd45bf679832d7d42%40mail.gmail.com%3E I had to go to another email archive to fin

Re: [VOTE] Release Lucene 2.4.0

2008-10-01 Thread Yonik Seeley
On Wed, Oct 1, 2008 at 8:26 AM, Michael McCandless <[EMAIL PROTECTED]> wrote: > > OK new artifacts area here: > >http://people.apache.org/~mikemccand/staging-area/lucene2.4take2 > > Please VOTE to release those as 2.4.0. +1 AIUI, Apache only officially releases source code, so any binaries ar

Re: [VOTE] Release Lucene 2.4.0

2008-10-01 Thread Yonik Seeley
On Wed, Oct 1, 2008 at 5:25 PM, Doug Cutting <[EMAIL PROTECTED]> wrote: > Yonik Seeley wrote: >> >> AIUI, Apache only officially releases source code, so any binaries are >> artifacts or derivatives of a release, not really the release itself. > > Unless those

Re: [VOTE] Release Lucene 2.4.0

2008-10-07 Thread Yonik Seeley
On Sun, Oct 5, 2008 at 1:07 PM, Michael McCandless <[EMAIL PROTECTED]> wrote: > Let's start a new VOTE to release these artifacts (derived from svn rev > 701827) as Lucene 2.4.0: > > http://people.apache.org/~mikemccand/staging-area/lucene2.4take4 > > Here's my +1. +1 -Yonik ---

Re: Adding dependency to servlet-api

2008-11-05 Thread Yonik Seeley
On Wed, Nov 5, 2008 at 5:16 AM, mark harwood <[EMAIL PROTECTED]> wrote: > Just checked Solr (forgot about that obvious precedent!) and they have it in > trunk/lib and an entry in trunk/notice.txt which reads: > > " Includes software from other Apache Software Foundation projects, > including, bu

Re: Adding dependency to servlet-api

2008-11-05 Thread Yonik Seeley
> http://www.thetaphi.de > eMail: [EMAIL PROTECTED] > >> -Original Message- >> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik >> Seeley >> Sent: Wednesday, November 05, 2008 4:08 PM >> To: java-dev@lucene.apache.org >> Subjec

Re: OpenBitSet.trimTrailingZeros() doesn't free memory

2008-11-22 Thread Yonik Seeley
On Sat, Nov 22, 2008 at 11:35 AM, Timo Nentwig <[EMAIL PROTECTED]> wrote: > IMHO it doesn't make much sense that trimTrailingZeros() doesn't shrink the > array. Sure the arraycopy() will take some extra time and simply adjusting > wlen still has the benefit that it will probably speed up the bit se

Re: Java logging in Lucene

2008-12-08 Thread Yonik Seeley
On Sat, Dec 6, 2008 at 11:52 AM, Shai Erera <[EMAIL PROTECTED]> wrote: > On the performance side, I don't expect to see any different performance > than what we have today, since checking if infoStream != null should be > similar to logger.isLoggable (or the equivalent methods from SLF4J). I'm lee

Re: [jira] Commented: (LUCENE-1483) Change IndexSearcher to use MultiSearcher semantics for sorted searches

2008-12-09 Thread Yonik Seeley
On Tue, Dec 9, 2008 at 9:23 AM, Mark Miller <[EMAIL PROTECTED]> wrote: > Great, because that's prob the main optimation spot we have. I also made > things a bit difficult with the 50 merge factory. I'll try a 10 later. It's useful to report the number of segments in the index too. Even with high

Re: [jira] Created: (LUCENE-1486) Wildcards, ORs etc inside Phrase queries

2008-12-11 Thread Yonik Seeley
On Thu, Dec 11, 2008 at 11:24 AM, David Kaelbling <[EMAIL PROTECTED]> wrote: > Will https://issues.apache.org/jira/browse/LUCENE-1486 let people > include NOT inside phrases? My customers would like to have queries > like "copyright !mycompany"~2, that find any copyright clause except > their own.

Re: [jira] Created: (LUCENE-1486) Wildcards, ORs etc inside Phrase queries

2008-12-11 Thread Yonik Seeley
On Thu, Dec 11, 2008 at 11:33 AM, Yonik Seeley <[EMAIL PROTECTED]> wrote: > On Thu, Dec 11, 2008 at 11:24 AM, David Kaelbling > <[EMAIL PROTECTED]> wrote: >> Will https://issues.apache.org/jira/browse/LUCENE-1486 let people >> include NOT inside phrases? My custom

Re: 2.9/3.0 plan & Java 1.5

2008-12-13 Thread Yonik Seeley
Parametrization of return types should be fully back compatible. Parameterization of input parameters would be run-time compatible (due to type erasure), but not compile-time compatible. -Yonik On Sat, Dec 13, 2008 at 5:07 PM, Michael McCandless wrote: > > Grant Ingersoll wrote: > >> IIRC, we al

Re: To clone or have a pluggable docidbitset for IndexReader

2008-12-16 Thread Yonik Seeley
On Tue, Dec 16, 2008 at 10:00 AM, Michael McCandless wrote: > Is your need for IndexReader.clone entirely driven by needing a fast way to > swap in your own deleted docs? Could this be done with a FilteredIndexReader subclass that keeps track of additional deletions? -Yonik

Re: Welcome Uwe Schindler as Contrib committer!

2008-12-19 Thread Yonik Seeley
On Fri, Dec 19, 2008 at 5:47 PM, Mark Miller wrote: > Congrats Uwe! Welcome. Can't wait to see what kind of magic you can work > with trie and local lucene! (if indeed there is magic to be worked there) +1 -Yonik - To unsubscri

Re: [jira] Commented: (LUCENE-1482) Replace infoSteram by a logging framework (SLF4J)

2009-01-09 Thread Yonik Seeley
On Fri, Jan 9, 2009 at 3:31 PM, Shalin Shekhar Mangar wrote: > If we forget the bytecode modification for a moment, how much cost does this > add to Lucene when used by a real application with slf4j logging? (e.g. Solr > uses the jdk adapter and no-op adapter cannot be used) AFAIK, the infostream

Re: Filesystem based bitset

2009-01-10 Thread Yonik Seeley
Can we please let this thread die. -Yonik - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org

TrieRange

2009-02-06 Thread Yonik Seeley
I've taken a quick peek at TrieUtils/TrieRangeQuery - nice work Uwe! One general comment is that it seems like TrieUtils tries to do a little too much for users (and in the process covers up some functionallity). For example, whether to encode values in two fields and exactly what those fields are

Re: TrieRange

2009-02-06 Thread Yonik Seeley
On Fri, Feb 6, 2009 at 6:18 PM, Uwe Schindler wrote: >> Encoding a slice per character makes the code simpler, but increases >> the size of the index... but perhaps not enough to worry about in >> practice? > > This is correct. For 2bit and 4bit there is a lot of overhead by this, but > there is n

Re: TrieRange

2009-02-06 Thread Yonik Seeley
On Fri, Feb 6, 2009 at 6:18 PM, Uwe Schindler wrote > The encoding of the values > into two different field names does the trick for the whole range query. > Removing the code that generates the field in exactly that way would remove > the idea behind TrieRangeFilter. Allowing the ability to spec

reopen bug cloning norms for all fields

2009-02-06 Thread Yonik Seeley
After an NPE and a quick gander at SegmentReader, it looks like it's trying to clone the norms for *all* fields, regardless of if they are indexed. The following is the simplest patch that seems to fix things, but I'm wondering if other changes might be warranted (like where fieldNormsChanged is se

Re: reopen bug cloning norms for all fields

2009-02-06 Thread Yonik Seeley
It looks like this was introduced just recently in LUCENE-1314. I've just committed this fix along with test modifications that fail w/o this patch. -Yonik On Fri, Feb 6, 2009 at 10:15 PM, Yonik Seeley wrote: > After an NPE and a quick gander at SegmentReader, it looks like it'

Re: TrieRange

2009-02-07 Thread Yonik Seeley
On Sat, Feb 7, 2009 at 6:04 AM, Uwe Schindler wrote: > An optimization might be to remove > the lower 0 bits from the string, but it would not be needed. The strings > are unique for one precision (no difference between 0-bits there or not). Yes, one would certainly want to remove trailing bits t

Re: TrieRange

2009-02-07 Thread Yonik Seeley
On Sat, Feb 7, 2009 at 6:04 AM, Uwe Schindler wrote: > The field names could be changed, sure, the small performance optimization > is in TrieRangeFilter: The splitting of the range is done in a way, to not > seek back and forward in the Term list, just in forward direction. This is > only possibl

Re: TrieRange

2009-02-07 Thread Yonik Seeley
On Sat, Feb 7, 2009 at 6:04 AM, Uwe Schindler wrote: >> I understand how it works and how one would need to configure it such >> that it be sortable if needed - but my point was really much more >> about allowing people to do things differently if needed. > > Propose an API to generate the documen

Re: TrieRange

2009-02-07 Thread Yonik Seeley
Actually, I think we can totally do away with "variants" of 2,4,8 bits and make it completely generic, able to support any slice size from 1 to 63 bits. I'll work up some prototype code and post it in the original TrieUtils JIRA issue. -Yonik On Sat, Feb 7, 2009 at 9:58 AM, Yoni

Re: TrieRange

2009-02-07 Thread Yonik Seeley
On Sat, Feb 7, 2009 at 12:26 PM, Uwe Schindler wrote: >> To optimize index space, one would want to "right justify" the encoded >> number for any bit range to minimize variation on the left - this >> plays into lucene's prefix compression. The prototype code I just posted in JIRA does this. For

Re: TrieRange

2009-02-07 Thread Yonik Seeley
On Sat, Feb 7, 2009 at 12:29 PM, Uwe Schindler wrote: > This is only a minimal optimization, suitable for very large indexes. The > problem is: if you have many terms in highest precission (a lot of different > double values), seeking is more costly if you jump from higher to lower > precisions.

Re: [VOTE] Release 2.4.1, take 2

2009-03-05 Thread Yonik Seeley
+1 I plugged this RC into Solr 1.3 and everything looks good. Signatures also look good (after importing KEYS.txt... it would be easier to verify if you uploaded your public key to pgp.mit.edu though) -Yonik http://www.lucidimagination.com On Wed, Mar 4, 2009 at 5:27 PM, Michael McCandless wro

Re: [VOTE] Release 2.4.1, take 2

2009-03-05 Thread Yonik Seeley
On Thu, Mar 5, 2009 at 3:37 PM, Michael McCandless wrote: > Yonik Seeley wrote: > >> Signatures also look good (after importing KEYS.txt... it would be >> easier to verify if you uploaded your public key to pgp.mit.edu >> though) > > I did upload it (back for 2.4.0).

Re: Modularization

2009-03-23 Thread Yonik Seeley
On Mon, Mar 23, 2009 at 11:10 AM, Michael McCandless wrote: >   4. Move contrib/* under src/java/*, updating the javadocs to state >       back compatibility promises per class/package. - contrib has always had a lower bar and stuff was committed under that lower bar - there should be no blanket

Re: Empty Sink Tokenizer

2009-03-31 Thread Yonik Seeley
On Tue, Mar 31, 2009 at 12:26 PM, Grant Ingersoll wrote: > What's the benefit of collation? AFAIK, the main reason is to handle multi-valued fields. The need to sort partially stems from the fact that the Document class does not explicitly handle multi-valued fields. Solr must also sort/hash the

Re: Greetings and questions about patches

2009-04-22 Thread Yonik Seeley
On Wed, Apr 22, 2009 at 9:33 PM, Erick Erickson wrote: > So, according to the coverage report, there are two methods that > are never executed by the unit tests (actually 4, 2 that operate on > ints and 2 that operate on longs), isPowerOfTwo and > nextHighestPowerOfTwo. nextHighestPowerOfTwo is es

Re: Another possible optimization - now in DocIdSetIterator

2009-04-24 Thread Yonik Seeley
On Fri, Apr 24, 2009 at 6:20 PM, eks dev wrote: > just do not forget to use -1 < doc instead of -1 != doc Perhaps doc >=0 instead of doc != -1? The crux of it is that status flags (result positive, negative, or zero) are set by many operations - hence a compare/test operation can often be elimina

Re: Score calculation with new by-segment collection

2009-04-30 Thread Yonik Seeley
On Thu, Apr 30, 2009 at 4:44 PM, Earwin Burrfoot wrote: > Did I miss something, or when trunk switched to collecting on > SegmentReaders we've lost proper scores? > I mean, before score depended on TF calculated across all the index, > and now it depends on TF for a given segment (yup, unless I mi

Re: Build failed in Hudson: Lucene-trunk #828

2009-05-15 Thread Yonik Seeley
On Fri, May 15, 2009 at 1:06 PM, Michael McCandless wrote: > Otherwise, I only know of one other intermittent failure for > TestStressIndexing2.testRandomIWReader, which is sometimes it fails to > close all files it had opened... haven't gotten to the bottom of that > one yet. Is there a JIRA iss

Re: Lucene's default settings & back compatibility

2009-05-18 Thread Yonik Seeley
On Mon, May 18, 2009 at 5:06 PM, Michael McCandless wrote: >  * StopFilter should enable position increments by default Is this one an actual improvement in the general case? A query of "foo bar" then wouldn't match a document with "foo and bar", but a query of "foo the bar" would. -Yonik -

Re: Lucene's default settings & back compatibility

2009-05-19 Thread Yonik Seeley
On Tue, May 19, 2009 at 8:50 AM, Robert Muir wrote: > in my tests the problem seemed to boil down to iteration of a sparse > openbitset... so maybe the filter approach is still an option but when # > docs is small some other doc id set impl is used? Directly using the BooleanQuery skips any inter

Re: Lucene's default settings & back compatibility

2009-05-19 Thread Yonik Seeley
Selecting backward compatibility vs latest and greatest could be done w/o Settings (a simple static int containing the version number to act like). It seems like the Settings debate should be based on it's own merits. -Yonik - T

Re: Lucene's default settings & back compatibility

2009-05-19 Thread Yonik Seeley
On Tue, May 19, 2009 at 2:04 PM, Michael McCandless wrote: > On Tue, May 19, 2009 at 9:34 AM, Yonik Seeley > wrote: > >> Selecting backward compatibility vs latest and greatest could be done >> w/o Settings (a simple static int containing the version number to act >>

Re: Re(opening) (Multi)SegmentReaders

2009-05-19 Thread Yonik Seeley
On Mon, May 18, 2009 at 8:06 AM, Michael McCandless wrote: > Yonik is there anything in Solr that might not like this change? Yep, there is :-) Should be very easy to work around though. -Yonik - To unsubscribe, e-mail: java-d

Re: Lucene's default settings & back compatibility

2009-05-19 Thread Yonik Seeley
On Tue, May 19, 2009 at 2:29 PM, Shai Erera wrote: > Is this the time and place to re-raise a previous discussion about moving > SweetSpotSimilarity to core and move to use it? SweetSpotSimilarity wouldn't make a good default. It's a flat topped hill that falls suddenly off on either side. Shor

Re: Lucene's default settings & back compatibility

2009-05-19 Thread Yonik Seeley
On Tue, May 19, 2009 at 4:33 PM, Michael McCandless wrote: > On Tue, May 19, 2009 at 2:27 PM, Yonik Seeley > wrote: >> On Tue, May 19, 2009 at 2:04 PM, Michael McCandless >> wrote: >>> On Tue, May 19, 2009 at 9:34 AM, Yonik Seeley >>> wrote: >>> &g

Re: Lucene's default settings & back compatibility

2009-05-20 Thread Yonik Seeley
On Wed, May 20, 2009 at 7:22 AM, Michael McCandless wrote: > So I think you're suggesting something like this: when you use Lucene, > if you want "latest and greatest" defaults, do nothing. > > If instead you want defaults to match a particular past minor release, > you must call (say) LuceneVersi

Re: Lucene's default settings & back compatibility

2009-05-20 Thread Yonik Seeley
On Wed, May 20, 2009 at 11:46 AM, Mark Miller wrote: > Marvin Humphrey wrote: >> >> Yeesh, that's evil.  :( >> >> It will be sweet, sweet justice if one of your own projects gets infected >> by >> the kind of action-at-a-distance bug you're so blithely unconcerned about > > Heh. Thats a bit over t

Re: Adding clear() to Document

2009-05-20 Thread Yonik Seeley
On Wed, May 20, 2009 at 3:27 PM, Shai Erera wrote: > I noticed Document does not have a clear() method, to remove all the Fields > set on it. Document's state is so simple (a List and a boost), reuse doesn't seem worth it. What if, instead, we allowed the List to be passed into via Document's con

Re: Adding clear() to Document

2009-05-20 Thread Yonik Seeley
er applications, the number of fields may be much larger than in the > current benchmark impls, where it becomes even more important. > > Passing a list of Fields will save the Field allocations (assuming the app > caches them on the outside) but still require Document allocation. Wh

Re: Lucene's default settings & back compatibility

2009-05-20 Thread Yonik Seeley
On Wed, May 20, 2009 at 4:31 PM, Shai Erera wrote: > A personal example - I wrote an Analyzer which includes lots of code (lots > of TokenFilters, Tokenizers etc.). Then I see that the whole TokenStream API > is deprecated and will be replaced. Yeah, that one is going to be causing some headaches

Re: svn commit: r777525 - /lucene/java/trunk/src/java/org/apache/lucene/util/AttributeSource.java

2009-05-22 Thread Yonik Seeley
Why do stuff like this? Null params are almost never valid unless documented... I dislike cluttering up code with validity checks, slightly penalizing users who use the APIs correctly. I recognize that I may be in the minority though. But in this specific instance, the caller will get an immedia

Re: Lucene's default settings & back compatibility

2009-05-22 Thread Yonik Seeley
I'm not a lawyer, so I dislike trying to nail down every detail in writing and try to solve future problems in the abstract. Lucene has never really been 100% back compatible... we've just tried to keep it that way... it's more of a mindset than a reality, and I'm wary of changing that mindset too

Re: Lucene's default settings & back compatibility

2009-05-22 Thread Yonik Seeley
On Fri, May 22, 2009 at 1:22 PM, Michael McCandless wrote: > (That said, unrelated to this discussion, I would actually like to > record per-segment which version of Lucene wrote the segment; this > would be very helpful when debugging issues like LUCENE-1474 where I > need to know if the segments

valid scores for sorting

2009-05-26 Thread Yonik Seeley
I'm attempting to switch Solr to use the new Collector framework to get per-segment sorting and have been hitting some issues. The latest is a function query log(val) which produces both NaN and -Infinity values, which kill the TopScoreDocCollector (invalid docids are produced). results = {org.apa

Re: valid scores for sorting

2009-05-26 Thread Yonik Seeley
On Tue, May 26, 2009 at 9:52 AM, Shai Erera wrote: > We've decided in 1575 to pre-populate HitQueue with sentinel values with > score = Float.NEG_INF, as we assumed these scores will not be produced. TSDC > instantiates HitQueue with pre-filling turned on. > > Is NEG_INF a valid score for you? It

Re: valid scores for sorting

2009-05-26 Thread Yonik Seeley
On Tue, May 26, 2009 at 12:16 PM, Michael McCandless wrote: > What other issues are you hitting? I hit an NPE when using old-style sort comparators. The main thing is that I didn't anticipate having to rewrite all the custom sort comparators Solr has. There are multiple test cases still failing..

Re: valid scores for sorting

2009-05-26 Thread Yonik Seeley
FYI, the upgrading work is going on in https://issues.apache.org/jira/browse/SOLR- -Yonik http://www.lucidimagination.com On Tue, May 26, 2009 at 12:29 PM, Yonik Seeley wrote: > On Tue, May 26, 2009 at 12:16 PM, Michael McCandless > wrote: >> What other issues are you hittin

Re: valid scores for sorting

2009-05-26 Thread Yonik Seeley
On Tue, May 26, 2009 at 1:44 PM, Michael McCandless wrote: > On Tue, May 26, 2009 at 12:29 PM, Yonik Seeley > wrote: >> On Tue, May 26, 2009 at 12:16 PM, Michael McCandless >> wrote: >>> What other issues are you hitting? >> >> I hit an NPE when using old-

Re: valid scores for sorting

2009-05-26 Thread Yonik Seeley
On Tue, May 26, 2009 at 2:22 PM, Michael McCandless wrote: > Hmm -- IndexSearcher tries to detect when SortComparatorSource is > used, and drive the search with the toplevel reader, so that code is > not supposed to be reached.  Do you remember what tickled it? Solr's search code is now using the

upcoming Solr release on Lucene 2.9-dev

2009-05-27 Thread Yonik Seeley
We're aiming for a Solr release in the next few weeks (as usual, we're 6 months behind when we wanted to make the release). The catch is that Solr depends on Lucene 2.9, and there have been a *lot* of changes. We're currently on r779312 (upgraded topday). I'll add a note to Solr to warn users fro

Re: upcoming Solr release on Lucene 2.9-dev

2009-05-28 Thread Yonik Seeley
On Thu, May 28, 2009 at 5:35 AM, Michael McCandless wrote: > The IndexWriter diagnostics (LUCENE-1654: recording Lucene version, > Java/OS version, etc into each segment created) also bumped the index > file format. > > And LUCENE-1623 (fixing back-compat issue w/ field names that have > non-ascii

Re: upcoming Solr release on Lucene 2.9-dev

2009-05-28 Thread Yonik Seeley
On Thu, May 28, 2009 at 2:56 AM, Shai Erera wrote: > If by changes you also mean deprecated features, then take a look at > LUCENE-1614 - if you have your own Scorers/DISIs, you might want to > implement the new methods, since the current ones are deprecated. Yes, we have our own Scorers, but cha

Re: NRT getReader turnaround on large segments

2009-05-28 Thread Yonik Seeley
On Thu, May 28, 2009 at 3:30 PM, Michael McCandless wrote: > This is exactly why we added IndexReaderWarmer -- it pre-warms a newly > merged segment before committing to SegmentInfos. > > So, while such warming is happening, if getReader() is called, the > returned reader will still read the old s

Re: NRT getReader turnaround on large segments

2009-05-28 Thread Yonik Seeley
On Thu, May 28, 2009 at 4:18 PM, Michael McCandless wrote: > Newly added docs are still free to make new segments, and be reopened, > while this warming is taking place. > > So, getReader() will wait for newly added/deleted docs to be flushed & > reopened, but will not wait for any running merges

Re: ReadOnly IndexReaders

2009-05-30 Thread Yonik Seeley
On Sat, May 30, 2009 at 1:27 PM, Mark Miller wrote: > Is there a valid use case? That was my question too... I really can't think of one. Maybe we should leave it out until there is actually a need for it. -Yonik http://www.lucidimagination.com --

<    1   2   3   4   5   6   7   8   9   10   >