On Wed, Jun 25, 2008 at 6:29 AM, Michael McCandless
<[EMAIL PROTECTED]> wrote:
> We've also discussed at one point creating an IndexReader impl that searches
> the RAM buffer that DocumentsWriter writes to when adding documents. I
> think it's easier than it sounds, on first glance, because Docume
, then
you can easily search for a document and retrieve it's stored fields
in order to re-index it with changes (and still maintain decent
performance).
-Yonik
> On Wed, Jun 25, 2008 at 8:41 AM, Yonik Seeley <[EMAIL PROTECTED]> wrote:
>>
>> On Wed, Jun 25, 2008 at 6:29 AM,
On Wed, Jun 25, 2008 at 2:19 PM, Chris Hostetter
<[EMAIL PROTECTED]> wrote:
> : Might also consider passing in more optional context when retrieving
> : the similarity for a field (such as a Query, if searching).
> : Something like Similarity.getSimilarity(String field, Query q).
>
> i assume you m
On Wed, Jun 25, 2008 at 5:06 PM, Chris Hostetter
<[EMAIL PROTECTED]> wrote:
> Hmmm... that seems like it would be confusing: particularly since in the
> IndexWriter case the "Query" param would never make sense. changing
> IndexWriter.getSimilarity to take a "String fieldName" and changing
> Searc
On Fri, Jun 27, 2008 at 2:43 PM, Michael McCandless
<[EMAIL PROTECTED]> wrote:
> We could instead keep these SegmentReaders open and reuse them for
> applying deletes.
We discussed caching SegmentReaders in the original buffered deletes issue too
http://issues.apache.org/jira/browse/LUCENE-565
>
On Sun, Jun 29, 2008 at 9:42 AM, Jason Rutherglen
<[EMAIL PROTECTED]> wrote:
> IndexReader.document as it is is really a lame duck. The
> IndexReader.document call being synchronized at the top level drags down the
> performance of systems that store data in Lucene. A single file descriptor
> for
The frequency is tracked at index time. It's simply a read at query
time. See TermDocs.
If you really want to understand more about the code internals of
Lucene, I'd suggest stepping through more example queries with a
debugger.
-Yonik
On Wed, Jul 2, 2008 at 8:49 PM, blazingwolf7 <[EMAIL PROTEC
cked during index time? I index my
> file earlier. Later I will open the index and perform a search. Shouldn't
> the frequency of each term in each document found be calculated at during
> the searching process?
>
>
> Yonik Seeley wrote:
>>
>> The frequency is tracked
On Wed, Jul 2, 2008 at 10:30 PM, blazingwolf7 <[EMAIL PROTECTED]> wrote:
> What I am missing is that I fail to locate the class that perform the actual
> comparison to determine if a query match any term in a document.
You need to understand the inverted index format. Documents that
match a term
to add an extra value into it so that I can retrieve the
> information during the searching process. Thank
Look at payloads first.
What problem are you trying to solve? Someone may have an easier
approach for you if payloads doesn't work.
-Yonik
>
> Yonik Seeley wrote:
>>
On Mon, Jul 7, 2008 at 5:03 PM, Yajun <[EMAIL PROTECTED]> wrote:
>
> I'm adding tons of logging, hopefully it will give me some information.
Try capturing the directory contents before you take a snapshot...
something like
ls -l index > index/ls.txt
Then if a missing file turns up, you can compar
WCxeOpjGRJqhiEDXOUAoQUljIu/DCtZFMHvKeaJemhNc9fWqARg/I7ZSWpgJmxykC0Lz
71KtcK+tRFjKfT+bMVixuNFtrwPGcTm37hIpA=
Received: by 10.115.92.2 with SMTP id u2mr9590818wal.33.1215618647304;
Wed, 09 Jul 2008 08:50:47 -0700 (PDT)
Received: by 10.114.75.13 with HTTP; Wed, 9 Jul 2008 08:50:47 -0
ame yesterday when posting to java-users.
>> But looking at nabble my posts are there.
>>
>>karl
>>
>> 10 jul 2008 kl. 03.23 skrev Yonik Seeley:
>>
>>> sorry for the noise... this is just a test to java-dev.
>>> I'm unable to post t
On Fri, Jul 11, 2008 at 2:38 PM, Michael McCandless
<[EMAIL PROTECTED]> wrote:
>
> Hmm, I think we should.
>
> What should it "mean" when you call commit(), while another thread is in the
> middle of addIndexes?
Seems like either all or none of the segments in addIndexes should be committed.
> We
On Fri, Jul 11, 2008 at 3:27 PM, Ning Li <[EMAIL PROTECTED]> wrote:
> We should also disallow concurrent addIndexes, right?
Hmmm, the current implementation looks like it won't currently won't
work correctly (docWriter.resumeAllThreads() being called while
another thread is calling addIndexes, etc
On Fri, Jul 18, 2008 at 4:23 PM, Mark Miller <[EMAIL PROTECTED]> wrote:
> but there is still something I don't like about that API...
Perhaps that it's just a piece of the puzzle? It's doesn't seem
sufficient by itself to allow multiple threads to easily share a
reader. But it does seem like it
Although I do wonder if incRef() and decRef() aren't more suitable
names. Just make those methods public, which the caveat that one
should not call them on a closed reader. They are expert level APIs
after all.
On Fri, Jul 18, 2008 at 4:45 PM, Yonik Seeley <[EMAIL PROTECTED]> wrot
Making well reasoned arguments about specific patches would be helpful.
Also, the complexity vs speed trade-offs are different for core
library like Lucene where performance is one of the primary features.
-Yonik
On Wed, Jul 23, 2008 at 4:01 PM, robert engels <[EMAIL PROTECTED]> wrote:
> I hope t
The problem isn't sorting per-se... the problem is quickly retrieving
the sort value for a document. For that, we currently have the
FieldCache that's what takes up the memory. There are more memory
efficient ways, but they just haven't been implemented yet.
-Yonik
On Tue, Jul 29, 2008 at 3:
disclaimer: this is just for fun differences should be in the
noise in any complex system, and I'm not suggesting any code changes.
Actually, with 32 bit registers, x<0 should be faster than x==-1 by
one cycle. If it doesn't test faster, then it's because of some
optimizations that could be p
e looked in the past and couldn't find anything.
If anyone knows of anything, it would be very cool to have though.
-Yonik
> Mike
>
> Yonik Seeley wrote:
>
>> disclaimer: this is just for fun differences should be in the
>> noise in any complex system, and I&
On Wed, Jul 30, 2008 at 3:17 PM, Stephen Green <[EMAIL PROTECTED]> wrote:
> Might the description here:
>
> http://weblogs.java.net/blog/kohsuke/archive/2008/03/deep_dive_into.html
>
> help?
Sweet! Thanks!
-Yonik
-
To unsubscri
Stephen pointed out, I can now see that the
difference does survive.
-Yonik
> Yonik Seeley wrote:
>
>> disclaimer: this is just for fun differences should be in the
>> noise in any complex system, and I'm not suggesting any code changes.
>>
>> Actually, with
On Mon, Aug 18, 2008 at 5:38 PM, Michael McCandless
<[EMAIL PROTECTED]> wrote:
> There are a number of good changes on the trunk, and it's been 7 months
> since we released 2.3.0, and we wanted to do more frequent releases so
> what do you all think of releasing 2.4 soon?
+1
-Yonik
-
If you've considered Solr in the past, but for some reason it didn't
meet your needs, we'd love to hear from you over on solr-dev. We're
starting to do some forward looking architecture work on the next
major version of Solr, so let us know what ideas you have and what
you'd like to see!
solr-dev
On Wed, Sep 3, 2008 at 3:20 PM, Jason Rutherglen
<[EMAIL PROTECTED]> wrote:
> I am wondering
> if there are social networks (or anyone else) out there who would be
> interested in collaborating with Apache on realtime search to get it
> to the point it can be used in production.
Good timing Jason,
On Wed, Sep 3, 2008 at 4:55 PM, Michael McCandless
<[EMAIL PROTECTED]> wrote:
>> I suspect any attempts at "bundling" Lucene code may snowball until you've
>> rebuilt Solr.
>
> Yeah I guess it is... though Solr includes the whole webapp too, whereas I
> think there's a natural bundle that wouldn't
On Wed, Sep 3, 2008 at 6:50 PM, Jason Rutherglen
<[EMAIL PROTECTED]> wrote:
> I also think it's got a
> lot of things now which makes integration difficult to do properly.
I agree, and that's why the major bump in version number rather than
minor - we recognize that some features will need some am
There's a good percent of the Solr community that is looking to add
everything you are (from a functional point of view). Some of the
other little things that we haven't considered (like a remote Java
API) sound cool... no reason not to add that also. We're also
planning on adding alternatives to
On Mon, Sep 8, 2008 at 12:33 PM, Michael McCandless
<[EMAIL PROTECTED]> wrote:
> I'd also trying to make time to explore the approach of creating an
> IndexReader impl. that searches IndexWriter's RAM buffer.
That seems like it could possibly be the best performing approach in
the long run.
> I t
On Mon, Sep 8, 2008 at 3:56 PM, Ning Li <[EMAIL PROTECTED]> wrote:
> On Mon, Sep 8, 2008 at 2:43 PM, Yonik Seeley <[EMAIL PROTECTED]> wrote:
>> But, how would you maintain a static view of an index...?
>>
>> IndexReader r1 = indexWriter.getCurrentI
On Mon, Sep 8, 2008 at 3:04 PM, Michael McCandless
<[EMAIL PROTECTED]> wrote:
> Right, getCurrentIndex would return a MultiReader that includes
> SegmentReader for each segment in the index, plus a "RAMReader" that
> searches the RAM buffer. That RAMReader is a tiny shell class that would
> basica
On Tue, Sep 9, 2008 at 5:28 AM, Michael McCandless
<[EMAIL PROTECTED]> wrote:
> Yonik Seeley wrote:
>> What about something like term freq? Would it need to count the
>> number of docs after the local maxDoc or is there a better way?
>
> Good question...
>
> I th
On Tue, Sep 9, 2008 at 11:42 AM, Ning Li <[EMAIL PROTECTED]> wrote:
> On Tue, Sep 9, 2008 at 10:02 AM, Yonik Seeley <[EMAIL PROTECTED]> wrote:
>> Yeah, I think the underlying RandomAccessFile might do the right
>> thing, but IndexInput isn't required to see any ch
On Tue, Sep 9, 2008 at 12:41 PM, Michael McCandless
<[EMAIL PROTECTED]> wrote:
> Yonik Seeley wrote:
>> OR, if all writes are append-only, perhaps we don't ever need to
>> invalidate the read buffer and would just need to remove the current
>> logic that caches
On Tue, Sep 9, 2008 at 12:45 PM, Michael McCandless
<[EMAIL PROTECTED]> wrote:
> Yonik Seeley wrote:
>> No, it would essentially be a change in the semantics that all
>> implementations would need to support.
>
> Right, which is you are allowed to open an IndexInput on a
I'm finished my audit of the jars we include. There were missing
elements for junit, stax (now removed), and stax-utils.
In the future we should take care of updating LICENSE/NOTICE
immediately when a new jar is added.
-Yonik
-
T
Oops, meant for this to go to solr-dev... sorry.
On Thu, Sep 11, 2008 at 11:52 AM, Yonik Seeley <[EMAIL PROTECTED]> wrote:
> I'm finished my audit of the jars we include. There were missing
> elements for junit, stax (now removed), and stax-utils.
> In the future we should t
On Fri, Sep 12, 2008 at 6:01 AM, Michael McCandless (JIRA)
<[EMAIL PROTECTED]> wrote:
> Spinoff from here:
>
>
> http://mail-archives.apache.org/mod_mbox/lucene-java-user/200809.mbox/%3Cba72f77f0809111418l29cf215dnd45bf679832d7d42%40mail.gmail.com%3E
I had to go to another email archive to fin
On Wed, Oct 1, 2008 at 8:26 AM, Michael McCandless
<[EMAIL PROTECTED]> wrote:
>
> OK new artifacts area here:
>
>http://people.apache.org/~mikemccand/staging-area/lucene2.4take2
>
> Please VOTE to release those as 2.4.0.
+1
AIUI, Apache only officially releases source code, so any binaries ar
On Wed, Oct 1, 2008 at 5:25 PM, Doug Cutting <[EMAIL PROTECTED]> wrote:
> Yonik Seeley wrote:
>>
>> AIUI, Apache only officially releases source code, so any binaries are
>> artifacts or derivatives of a release, not really the release itself.
>
> Unless those
On Sun, Oct 5, 2008 at 1:07 PM, Michael McCandless
<[EMAIL PROTECTED]> wrote:
> Let's start a new VOTE to release these artifacts (derived from svn rev
> 701827) as Lucene 2.4.0:
>
> http://people.apache.org/~mikemccand/staging-area/lucene2.4take4
>
> Here's my +1.
+1
-Yonik
---
On Wed, Nov 5, 2008 at 5:16 AM, mark harwood <[EMAIL PROTECTED]> wrote:
> Just checked Solr (forgot about that obvious precedent!) and they have it in
> trunk/lib and an entry in trunk/notice.txt which reads:
>
> " Includes software from other Apache Software Foundation projects,
> including, bu
> http://www.thetaphi.de
> eMail: [EMAIL PROTECTED]
>
>> -Original Message-
>> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik
>> Seeley
>> Sent: Wednesday, November 05, 2008 4:08 PM
>> To: java-dev@lucene.apache.org
>> Subjec
On Sat, Nov 22, 2008 at 11:35 AM, Timo Nentwig <[EMAIL PROTECTED]> wrote:
> IMHO it doesn't make much sense that trimTrailingZeros() doesn't shrink the
> array. Sure the arraycopy() will take some extra time and simply adjusting
> wlen still has the benefit that it will probably speed up the bit se
On Sat, Dec 6, 2008 at 11:52 AM, Shai Erera <[EMAIL PROTECTED]> wrote:
> On the performance side, I don't expect to see any different performance
> than what we have today, since checking if infoStream != null should be
> similar to logger.isLoggable (or the equivalent methods from SLF4J).
I'm lee
On Tue, Dec 9, 2008 at 9:23 AM, Mark Miller <[EMAIL PROTECTED]> wrote:
> Great, because that's prob the main optimation spot we have. I also made
> things a bit difficult with the 50 merge factory. I'll try a 10 later.
It's useful to report the number of segments in the index too. Even
with high
On Thu, Dec 11, 2008 at 11:24 AM, David Kaelbling
<[EMAIL PROTECTED]> wrote:
> Will https://issues.apache.org/jira/browse/LUCENE-1486 let people
> include NOT inside phrases? My customers would like to have queries
> like "copyright !mycompany"~2, that find any copyright clause except
> their own.
On Thu, Dec 11, 2008 at 11:33 AM, Yonik Seeley <[EMAIL PROTECTED]> wrote:
> On Thu, Dec 11, 2008 at 11:24 AM, David Kaelbling
> <[EMAIL PROTECTED]> wrote:
>> Will https://issues.apache.org/jira/browse/LUCENE-1486 let people
>> include NOT inside phrases? My custom
Parametrization of return types should be fully back compatible.
Parameterization of input parameters would be run-time compatible (due
to type erasure), but not compile-time compatible.
-Yonik
On Sat, Dec 13, 2008 at 5:07 PM, Michael McCandless
wrote:
>
> Grant Ingersoll wrote:
>
>> IIRC, we al
On Tue, Dec 16, 2008 at 10:00 AM, Michael McCandless
wrote:
> Is your need for IndexReader.clone entirely driven by needing a fast way to
> swap in your own deleted docs?
Could this be done with a FilteredIndexReader subclass that keeps
track of additional deletions?
-Yonik
On Fri, Dec 19, 2008 at 5:47 PM, Mark Miller wrote:
> Congrats Uwe! Welcome. Can't wait to see what kind of magic you can work
> with trie and local lucene! (if indeed there is magic to be worked there)
+1
-Yonik
-
To unsubscri
On Fri, Jan 9, 2009 at 3:31 PM, Shalin Shekhar Mangar
wrote:
> If we forget the bytecode modification for a moment, how much cost does this
> add to Lucene when used by a real application with slf4j logging? (e.g. Solr
> uses the jdk adapter and no-op adapter cannot be used)
AFAIK, the infostream
Can we please let this thread die.
-Yonik
-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org
I've taken a quick peek at TrieUtils/TrieRangeQuery - nice work Uwe!
One general comment is that it seems like TrieUtils tries to do a
little too much for users (and in the process covers up some
functionallity).
For example, whether to encode values in two fields and exactly what
those fields are
On Fri, Feb 6, 2009 at 6:18 PM, Uwe Schindler wrote:
>> Encoding a slice per character makes the code simpler, but increases
>> the size of the index... but perhaps not enough to worry about in
>> practice?
>
> This is correct. For 2bit and 4bit there is a lot of overhead by this, but
> there is n
On Fri, Feb 6, 2009 at 6:18 PM, Uwe Schindler wrote
> The encoding of the values
> into two different field names does the trick for the whole range query.
> Removing the code that generates the field in exactly that way would remove
> the idea behind TrieRangeFilter.
Allowing the ability to spec
After an NPE and a quick gander at SegmentReader, it looks like it's
trying to clone the norms for *all* fields, regardless of if they are
indexed.
The following is the simplest patch that seems to fix things, but I'm
wondering if other changes might be warranted (like where
fieldNormsChanged is se
It looks like this was introduced just recently in LUCENE-1314.
I've just committed this fix along with test modifications that fail
w/o this patch.
-Yonik
On Fri, Feb 6, 2009 at 10:15 PM, Yonik Seeley
wrote:
> After an NPE and a quick gander at SegmentReader, it looks like it'
On Sat, Feb 7, 2009 at 6:04 AM, Uwe Schindler wrote:
> An optimization might be to remove
> the lower 0 bits from the string, but it would not be needed. The strings
> are unique for one precision (no difference between 0-bits there or not).
Yes, one would certainly want to remove trailing bits t
On Sat, Feb 7, 2009 at 6:04 AM, Uwe Schindler wrote:
> The field names could be changed, sure, the small performance optimization
> is in TrieRangeFilter: The splitting of the range is done in a way, to not
> seek back and forward in the Term list, just in forward direction. This is
> only possibl
On Sat, Feb 7, 2009 at 6:04 AM, Uwe Schindler wrote:
>> I understand how it works and how one would need to configure it such
>> that it be sortable if needed - but my point was really much more
>> about allowing people to do things differently if needed.
>
> Propose an API to generate the documen
Actually, I think we can totally do away with "variants" of 2,4,8 bits
and make it completely generic, able to support any slice size from 1
to 63 bits.
I'll work up some prototype code and post it in the original TrieUtils
JIRA issue.
-Yonik
On Sat, Feb 7, 2009 at 9:58 AM, Yoni
On Sat, Feb 7, 2009 at 12:26 PM, Uwe Schindler wrote:
>> To optimize index space, one would want to "right justify" the encoded
>> number for any bit range to minimize variation on the left - this
>> plays into lucene's prefix compression.
The prototype code I just posted in JIRA does this. For
On Sat, Feb 7, 2009 at 12:29 PM, Uwe Schindler wrote:
> This is only a minimal optimization, suitable for very large indexes. The
> problem is: if you have many terms in highest precission (a lot of different
> double values), seeking is more costly if you jump from higher to lower
> precisions.
+1
I plugged this RC into Solr 1.3 and everything looks good.
Signatures also look good (after importing KEYS.txt... it would be
easier to verify if you uploaded your public key to pgp.mit.edu
though)
-Yonik
http://www.lucidimagination.com
On Wed, Mar 4, 2009 at 5:27 PM, Michael McCandless
wro
On Thu, Mar 5, 2009 at 3:37 PM, Michael McCandless
wrote:
> Yonik Seeley wrote:
>
>> Signatures also look good (after importing KEYS.txt... it would be
>> easier to verify if you uploaded your public key to pgp.mit.edu
>> though)
>
> I did upload it (back for 2.4.0).
On Mon, Mar 23, 2009 at 11:10 AM, Michael McCandless
wrote:
> 4. Move contrib/* under src/java/*, updating the javadocs to state
> back compatibility promises per class/package.
- contrib has always had a lower bar and stuff was committed under
that lower bar - there should be no blanket
On Tue, Mar 31, 2009 at 12:26 PM, Grant Ingersoll wrote:
> What's the benefit of collation?
AFAIK, the main reason is to handle multi-valued fields.
The need to sort partially stems from the fact that the Document class
does not explicitly handle multi-valued fields.
Solr must also sort/hash the
On Wed, Apr 22, 2009 at 9:33 PM, Erick Erickson wrote:
> So, according to the coverage report, there are two methods that
> are never executed by the unit tests (actually 4, 2 that operate on
> ints and 2 that operate on longs), isPowerOfTwo and
> nextHighestPowerOfTwo. nextHighestPowerOfTwo is es
On Fri, Apr 24, 2009 at 6:20 PM, eks dev wrote:
> just do not forget to use -1 < doc instead of -1 != doc
Perhaps doc >=0 instead of doc != -1?
The crux of it is that status flags (result positive, negative, or
zero) are set by many operations - hence a compare/test operation can
often be elimina
On Thu, Apr 30, 2009 at 4:44 PM, Earwin Burrfoot wrote:
> Did I miss something, or when trunk switched to collecting on
> SegmentReaders we've lost proper scores?
> I mean, before score depended on TF calculated across all the index,
> and now it depends on TF for a given segment (yup, unless I mi
On Fri, May 15, 2009 at 1:06 PM, Michael McCandless
wrote:
> Otherwise, I only know of one other intermittent failure for
> TestStressIndexing2.testRandomIWReader, which is sometimes it fails to
> close all files it had opened... haven't gotten to the bottom of that
> one yet.
Is there a JIRA iss
On Mon, May 18, 2009 at 5:06 PM, Michael McCandless
wrote:
> * StopFilter should enable position increments by default
Is this one an actual improvement in the general case?
A query of "foo bar" then wouldn't match a document with "foo and
bar", but a query of "foo the bar" would.
-Yonik
-
On Tue, May 19, 2009 at 8:50 AM, Robert Muir wrote:
> in my tests the problem seemed to boil down to iteration of a sparse
> openbitset... so maybe the filter approach is still an option but when #
> docs is small some other doc id set impl is used?
Directly using the BooleanQuery skips any inter
Selecting backward compatibility vs latest and greatest could be done
w/o Settings (a simple static int containing the version number to act
like). It seems like the Settings debate should be based on it's own
merits.
-Yonik
-
T
On Tue, May 19, 2009 at 2:04 PM, Michael McCandless
wrote:
> On Tue, May 19, 2009 at 9:34 AM, Yonik Seeley
> wrote:
>
>> Selecting backward compatibility vs latest and greatest could be done
>> w/o Settings (a simple static int containing the version number to act
>>
On Mon, May 18, 2009 at 8:06 AM, Michael McCandless
wrote:
> Yonik is there anything in Solr that might not like this change?
Yep, there is :-) Should be very easy to work around though.
-Yonik
-
To unsubscribe, e-mail: java-d
On Tue, May 19, 2009 at 2:29 PM, Shai Erera wrote:
> Is this the time and place to re-raise a previous discussion about moving
> SweetSpotSimilarity to core and move to use it?
SweetSpotSimilarity wouldn't make a good default. It's a flat topped
hill that falls suddenly off on either side. Shor
On Tue, May 19, 2009 at 4:33 PM, Michael McCandless
wrote:
> On Tue, May 19, 2009 at 2:27 PM, Yonik Seeley
> wrote:
>> On Tue, May 19, 2009 at 2:04 PM, Michael McCandless
>> wrote:
>>> On Tue, May 19, 2009 at 9:34 AM, Yonik Seeley
>>> wrote:
>>>
&g
On Wed, May 20, 2009 at 7:22 AM, Michael McCandless
wrote:
> So I think you're suggesting something like this: when you use Lucene,
> if you want "latest and greatest" defaults, do nothing.
>
> If instead you want defaults to match a particular past minor release,
> you must call (say) LuceneVersi
On Wed, May 20, 2009 at 11:46 AM, Mark Miller wrote:
> Marvin Humphrey wrote:
>>
>> Yeesh, that's evil. :(
>>
>> It will be sweet, sweet justice if one of your own projects gets infected
>> by
>> the kind of action-at-a-distance bug you're so blithely unconcerned about
>
> Heh. Thats a bit over t
On Wed, May 20, 2009 at 3:27 PM, Shai Erera wrote:
> I noticed Document does not have a clear() method, to remove all the Fields
> set on it.
Document's state is so simple (a List and a boost), reuse doesn't seem worth it.
What if, instead, we allowed the List to be passed into via Document's
con
er applications, the number of fields may be much larger than in the
> current benchmark impls, where it becomes even more important.
>
> Passing a list of Fields will save the Field allocations (assuming the app
> caches them on the outside) but still require Document allocation. Wh
On Wed, May 20, 2009 at 4:31 PM, Shai Erera wrote:
> A personal example - I wrote an Analyzer which includes lots of code (lots
> of TokenFilters, Tokenizers etc.). Then I see that the whole TokenStream API
> is deprecated and will be replaced.
Yeah, that one is going to be causing some headaches
Why do stuff like this? Null params are almost never valid unless
documented... I dislike cluttering up code with validity checks,
slightly penalizing users who use the APIs correctly. I recognize
that I may be in the minority though.
But in this specific instance, the caller will get an immedia
I'm not a lawyer, so I dislike trying to nail down every detail in
writing and try to solve future problems in the abstract.
Lucene has never really been 100% back compatible... we've just tried
to keep it that way... it's more of a mindset than a reality, and I'm
wary of changing that mindset too
On Fri, May 22, 2009 at 1:22 PM, Michael McCandless
wrote:
> (That said, unrelated to this discussion, I would actually like to
> record per-segment which version of Lucene wrote the segment; this
> would be very helpful when debugging issues like LUCENE-1474 where I
> need to know if the segments
I'm attempting to switch Solr to use the new Collector framework to
get per-segment sorting and have been hitting some issues.
The latest is a function query log(val) which produces both NaN and
-Infinity values, which kill the TopScoreDocCollector (invalid docids
are produced).
results = {org.apa
On Tue, May 26, 2009 at 9:52 AM, Shai Erera wrote:
> We've decided in 1575 to pre-populate HitQueue with sentinel values with
> score = Float.NEG_INF, as we assumed these scores will not be produced. TSDC
> instantiates HitQueue with pre-filling turned on.
>
> Is NEG_INF a valid score for you?
It
On Tue, May 26, 2009 at 12:16 PM, Michael McCandless
wrote:
> What other issues are you hitting?
I hit an NPE when using old-style sort comparators.
The main thing is that I didn't anticipate having to rewrite all the
custom sort comparators Solr has.
There are multiple test cases still failing..
FYI, the upgrading work is going on in
https://issues.apache.org/jira/browse/SOLR-
-Yonik
http://www.lucidimagination.com
On Tue, May 26, 2009 at 12:29 PM, Yonik Seeley
wrote:
> On Tue, May 26, 2009 at 12:16 PM, Michael McCandless
> wrote:
>> What other issues are you hittin
On Tue, May 26, 2009 at 1:44 PM, Michael McCandless
wrote:
> On Tue, May 26, 2009 at 12:29 PM, Yonik Seeley
> wrote:
>> On Tue, May 26, 2009 at 12:16 PM, Michael McCandless
>> wrote:
>>> What other issues are you hitting?
>>
>> I hit an NPE when using old-
On Tue, May 26, 2009 at 2:22 PM, Michael McCandless
wrote:
> Hmm -- IndexSearcher tries to detect when SortComparatorSource is
> used, and drive the search with the toplevel reader, so that code is
> not supposed to be reached. Do you remember what tickled it?
Solr's search code is now using the
We're aiming for a Solr release in the next few weeks (as usual, we're
6 months behind when we wanted to make the release).
The catch is that Solr depends on Lucene 2.9, and there have been a
*lot* of changes. We're currently on r779312 (upgraded topday).
I'll add a note to Solr to warn users fro
On Thu, May 28, 2009 at 5:35 AM, Michael McCandless
wrote:
> The IndexWriter diagnostics (LUCENE-1654: recording Lucene version,
> Java/OS version, etc into each segment created) also bumped the index
> file format.
>
> And LUCENE-1623 (fixing back-compat issue w/ field names that have
> non-ascii
On Thu, May 28, 2009 at 2:56 AM, Shai Erera wrote:
> If by changes you also mean deprecated features, then take a look at
> LUCENE-1614 - if you have your own Scorers/DISIs, you might want to
> implement the new methods, since the current ones are deprecated.
Yes, we have our own Scorers, but cha
On Thu, May 28, 2009 at 3:30 PM, Michael McCandless
wrote:
> This is exactly why we added IndexReaderWarmer -- it pre-warms a newly
> merged segment before committing to SegmentInfos.
>
> So, while such warming is happening, if getReader() is called, the
> returned reader will still read the old s
On Thu, May 28, 2009 at 4:18 PM, Michael McCandless
wrote:
> Newly added docs are still free to make new segments, and be reopened,
> while this warming is taking place.
>
> So, getReader() will wait for newly added/deleted docs to be flushed &
> reopened, but will not wait for any running merges
On Sat, May 30, 2009 at 1:27 PM, Mark Miller wrote:
> Is there a valid use case?
That was my question too... I really can't think of one.
Maybe we should leave it out until there is actually a need for it.
-Yonik
http://www.lucidimagination.com
--
201 - 300 of 1489 matches
Mail list logo