2.3.0 announcement draft

2008-01-23 Thread Michael Busch
Hi Team, I just uploaded the release files to the mirrors and it should be available for download within the next 24 hours. So tomorrow I'll send an announcement to java-user and [EMAIL PROTECTED], and also add a news entry to the website. I prepared a draft and picked a few "highlights" of this

Re: [VOTE] Release Lucene 2.3.0 Take 2

2008-01-23 Thread Michael Busch
Thanks everyone for voting! We have more than 3 positive votes from PMC members. Therefore I just published the release artifacts to the mirrors. Lucene 2.3.0 should be available for download within the next 24 hours. Thanks everyone for the hard work! -Michael Yonik Seeley wrote: > On Jan 20, 2

Build failed in Hudson: Lucene-trunk #348

2008-01-23 Thread Apache Hudson Server
See http://hudson.zones.apache.org/hudson/job/Lucene-trunk/348/changes Changes: [buschmi] Update version number for nightly maven snapshots. [buschmi] Add 2.3.0 release to doap file. -- [...truncated 3319 lines...] [javac] assertTrue(gotException)

Re: [ANNOUNCE] New Build Server

2008-01-23 Thread Grant Ingersoll
Yes, I will fix the new build tomorrow, hopefully, if not, by this weekend. Worst case is we miss a few nights of nightly builds, but given that a new release is available, people should have their hands full. -Grant On Jan 23, 2008, at 9:38 PM, Michael Busch wrote: Grant Ingersoll wrot

Re: Back Compatibility

2008-01-23 Thread Grant Ingersoll
Yes, I agree these are what is about (despite the divergence into locking). As I see, it the question is about whether we should try to do major releases on the order of a year, rather than the current 2+ year schedule and also how to best handle bad behavior when producing tokens that pr

Re: [ANNOUNCE] New Build Server

2008-01-23 Thread Michael Busch
Grant Ingersoll wrote: > We need to fix the JUnit thing first. I am just not sure how to best > handle it while in mid-release. OK, the release is almost done. We have five binding votes and the vote lasted about 72 hours now. I'll publish the artifacts soon and announce 2.3.0 tomorrow and updat

Re: [ANNOUNCE] New Build Server

2008-01-23 Thread Grant Ingersoll
We need to fix the JUnit thing first. I am just not sure how to best handle it while in mid-release. -Grant On Jan 23, 2008, at 6:56 PM, Michael Busch wrote: Hi Nigel, thanks for your work! It looks like we have to update these two files: - /www/lucene.apache.org/java/docs/api/.htaccess -

[jira] Commented: (LUCENE-1121) Use nio.transferTo when copying large blocks of bytes

2008-01-23 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12561886#action_12561886 ] Mark Miller commented on LUCENE-1121: - Here are some more results from a windows xp an

Re: Site

2008-01-23 Thread Michael Busch
Yonik Seeley wrote: > On Jan 23, 2008 12:09 PM, Michael Busch <[EMAIL PROTECTED]> wrote: > > It doesn't actually work for some of the file types (like > style-sheets) though, so I manually do a dos2unix on those to avoid > extra commits. > Strange... why doesn't it work for style-sheets? -Micha

Re: [ANNOUNCE] New Build Server

2008-01-23 Thread Michael Busch
Hi Nigel, thanks for your work! It looks like we have to update these two files: - /www/lucene.apache.org/java/docs/api/.htaccess - /www/lucene.apache.org/java/docs/nightly/api/.htaccess to redirect to the new location: http://hudson.zones.apache.org/hudson/job/Lucene-trunk/javadoc/. I'll make th

[ANNOUNCE] New Build Server

2008-01-23 Thread Nigel Daley
http://lucene.zones.apache.org:8080/hudson/ has been retired. A new Apache-wide Hudson build server is now up at http://hudson.zones.apache.org/hudson/ Please adjust your bookmarks. All the old builds have been moved over so that history is maintained. Please let me know if you see any probl

Re: Back Compatibility

2008-01-23 Thread DM Smith
Top posting because this is a response to the thread as a whole. It appears that this thread has identified some different reasons for "needing" to break compatibility: 1) A current behavior is now deemed bad or wrong. Examples: the silent truncation of large documents or an analyzer that wor

Re: Back Compatibility

2008-01-23 Thread Michael McCandless
Right. But, that can, and should, be done outside of the Lucene core. Mike robert engels wrote: You must get the write lock before opening the reader if you want transactional consistency and are performing updates. No other way to do it. Otherwise. A opens reader. B opens reader. A per

Re: Back Compatibility

2008-01-23 Thread robert engels
The statement upon rereading seems much stronger than intended. You are correct, but I think the number of users that become contributers is still far less than the number of users. The only abandonment of the users was from the standpoint of maintaining a legacy API. The users are free to

Re: Back Compatibility

2008-01-23 Thread robert engels
I don't think I can say that this needs to happen now either. :) An interesting question to answer would be: If Lucene did not exist, and given all of the knowledge we have, we decided to create a Java based search engine, would the API look like it does today? The answer may be yes. I dou

Re: Back Compatibility

2008-01-23 Thread robert engels
You must get the write lock before opening the reader if you want transactional consistency and are performing updates. No other way to do it. Otherwise. A opens reader. B opens reader. A performs query decides an update is needed based on results B performs query decides an update is needed

RE: Back Compatibility

2008-01-23 Thread Steven A Rowe
Hi robert, On 01/23/2008 at 4:55 PM, robert engels wrote: > If the users are "just dropping in a new version" they are not > contributing to the community... I think just the opposite, they are > parasites. I reject your characterization of passive users as "parasites"; I suspect that you intend

Re: Back Compatibility

2008-01-23 Thread Michael McCandless
robert engels wrote: I think you are incorrect. I would guess the number of people/organizations using Lucene vs. contributing to Lucene is much greater. The contributers work in head (should IMO). The users can select a particular version of Lucene and code their apps accordingly. They

[jira] Commented: (LUCENE-1145) DisjunctionSumScorer small tweak

2008-01-23 Thread Eks Dev (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12561836#action_12561836 ] Eks Dev commented on LUCENE-1145: - Well, I do not know how it behaves on earlier jvm-s and

Re: Back Compatibility

2008-01-23 Thread Michael McCandless
chris Hostetter wrote: : I do like the idea of a static/system property to match legacy : behavior. For example, the bugs around how StandardTokenizer : mislabels tokens (eg LUCENE-1100), this would be the perfect solution. : Clearly those are silly bugs that should be fixed, quickly, with

Re: Back Compatibility

2008-01-23 Thread Michael McCandless
robert engels wrote: Thanks. So all writers still need to get the write lock, before opening the reader in order to maintain transactional consistency. I don't understand what you mean by "before opening the reader"? A writer acquires the write.lock before opening. Readers do not, un

Re: Back Compatibility

2008-01-23 Thread robert engels
I think you are incorrect. I would guess the number of people/organizations using Lucene vs. contributing to Lucene is much greater. The contributers work in head (should IMO). The users can select a particular version of Lucene and code their apps accordingly. They can also back-port fea

Re: Back Compatibility

2008-01-23 Thread Chris Hostetter
: I guess I don't see the back-porting as an issue. Only those that want to need : to do the back-porting. Head moves on... I view it as a potential risk to the overal productivity of the community. If upgrading from A to B is easy people (in general) won't spend a lot of time/effort backport

Re: Back Compatibility

2008-01-23 Thread robert engels
I guess I don't see the back-porting as an issue. Only those that want to need to do the back-porting. Head moves on... On Jan 23, 2008, at 2:00 PM, Chris Hostetter wrote: : I do like the idea of a static/system property to match legacy : behavior. For example, the bugs around how Standard

Re: Back Compatibility

2008-01-23 Thread robert engels
Thanks. So all writers still need to get the write lock, before opening the reader in order to maintain transactional consistency. Was there performance testing done on the lockless commits with heavy contention? I would think that reading the directory to find the latest segments file wo

Re: Back Compatibility

2008-01-23 Thread Chris Hostetter
: I do like the idea of a static/system property to match legacy : behavior. For example, the bugs around how StandardTokenizer : mislabels tokens (eg LUCENE-1100), this would be the perfect solution. : Clearly those are silly bugs that should be fixed, quickly, with this : back-compatible mode t

Re: Back Compatibility

2008-01-23 Thread Michael McCandless
robert engels wrote: I guess I don't understand what a commit lock is, or what's its purpose is. It seems the write lock is all that is needed. The commit.lock was used to guard access to the "segments" file. A reader would acquire the lock (blocking out other readers and writers) when r

[jira] Commented: (LUCENE-1121) Use nio.transferTo when copying large blocks of bytes

2008-01-23 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12561789#action_12561789 ] Michael McCandless commented on LUCENE-1121: That's interesting ... I'll test

Shipping JUnit

2008-01-23 Thread Grant Ingersoll
With the new migration of Hudson to a common "zones" machine, we can no longer rely on JUnit to be in ANT lib for testing purposes. I would like to add a lib directory to the trunk and add our junit dependency to it, plus change the build file appropriately to use that jar file. It also e

[jira] Commented: (LUCENE-1121) Use nio.transferTo when copying large blocks of bytes

2008-01-23 Thread Doug Cutting (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12561778#action_12561778 ] Doug Cutting commented on LUCENE-1121: -- For Hadoop, we've seen significant performanc

Re: Site

2008-01-23 Thread Grant Ingersoll
Site is restored. You probably did the native thing Yonik, but we recently created a new site directory at a higher level (same level as trunk), and that is where the problem is probably at. -Grant On Jan 23, 2008, at 12:17 PM, Yonik Seeley wrote: On Jan 23, 2008 12:09 PM, Michael Busch <

Re: Back Compatibility

2008-01-23 Thread robert engels
I guess I don't understand what a commit lock is, or what's its purpose is. It seems the write lock is all that is needed. If you still need a write lock, then what is the purpose of "lockless" commits. You can get consistency if all writers get the write lock before performing any read.

Re: Unique doc ids

2008-01-23 Thread Nadav Har'El
Hi Michael, On Tue, Jan 22, 2008, Michael Busch wrote about "Unique doc ids": > the question of how to delete with IndexWriter using doc ids is >... > mapping from the dynamic doc ids to the new unique ones. We would also > have to store a reverse mapping (UID -> ID) in the index - we could use >

[jira] Commented: (LUCENE-1145) DisjunctionSumScorer small tweak

2008-01-23 Thread Paul Elschot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12561752#action_12561752 ] Paul Elschot commented on LUCENE-1145: -- When I wrote it, using the queueSize variable

[jira] Updated: (LUCENE-1147) add option to CheckIndex to only check certain segments

2008-01-23 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-1147: --- Attachment: LUCENE-1147.patch Attached patch. All tests pass. I'll commit after 2.

[jira] Created: (LUCENE-1147) add option to CheckIndex to only check certain segments

2008-01-23 Thread Michael McCandless (JIRA)
add option to CheckIndex to only check certain segments --- Key: LUCENE-1147 URL: https://issues.apache.org/jira/browse/LUCENE-1147 Project: Lucene - Java Issue Type: Improvement

Re: Back Compatibility

2008-01-23 Thread Michael McCandless
robert engels wrote: Maybe I don't understand lockless commits then. I just don't think you can enforce transactional consistency without either 1) locking, or 2) optimistic collision detection. I could be wrong here, but this has been my experience. By effectively removing the locking req

Re: Site

2008-01-23 Thread Yonik Seeley
On Jan 23, 2008 12:09 PM, Michael Busch <[EMAIL PROTECTED]> wrote: > Hi Grant, > > thanks for taking care of this! I'm using forrest-0.8. > > Hmm, in your commit it looks like that not only a few lines of each file > were modified, but EVERY line. I built the docs on Windows, so maybe > it's a prob

Re: Site

2008-01-23 Thread Michael Busch
Hi Grant, thanks for taking care of this! I'm using forrest-0.8. Hmm, in your commit it looks like that not only a few lines of each file were modified, but EVERY line. I built the docs on Windows, so maybe it's a problem with the line endings. I think we should set the eol-style to native for al

Re: Back Compatibility

2008-01-23 Thread robert engels
Maybe I don't understand lockless commits then. I just don't think you can enforce transactional consistency without either 1) locking, or 2) optimistic collision detection. I could be wrong here, but this has been my experience. By effectively removing the locking requirement, I think you

Re: [VOTE] Release Lucene 2.3.0 Take 2

2008-01-23 Thread Yonik Seeley
On Jan 20, 2008 9:34 PM, Michael Busch <[EMAIL PROTECTED]> wrote: > Please vote to officially release the release artifacts located at > http://people.apache.org/~buschmi/staging_area/lucene-2.3.0/ as Lucene > 2.3.0. +1 -Yonik -

Re: Site

2008-01-23 Thread Grant Ingersoll
OK, I think I fixed it. I think we need to remove the .svn under /www/ l.a.o/java/docs such that the crontab just does it. I made the mistake of doing an svn up on the site directory, when I should have done the export. -Grant On Jan 23, 2008, at 10:50 AM, Grant Ingersoll wrote: I confir

Re: Another DisjunctionSumScorer micro-tweak / simplification?

2008-01-23 Thread eks dev
I have attached patch to LUCENE-1145 just in case. As far as my testing was correct, it is a bit faster, well, at least not slower. - Original Message From: Yonik Seeley <[EMAIL PROTECTED]> To: java-dev@lucene.apache.org Sent: Wednesday, 23 January, 2008 2:59:52 PM Subject: Re: Anothe

[jira] Updated: (LUCENE-1145) DisjunctionSumScorer small tweak

2008-01-23 Thread Eks Dev (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eks Dev updated LUCENE-1145: Attachment: DSSQueueSizeOptimization.patch Simplification of the DisjunctionSumScorer. - removed cached f

Re: Site

2008-01-23 Thread Grant Ingersoll
I confirm it is screwed up! Not sure what was done before. Michael, can you fix? Thanks, Grant On Jan 23, 2008, at 9:31 AM, Grant Ingersoll wrote: I think I may have screwed up the site by committing the Apachecon news... Why were there local diffs on the site? -Grant ---

Re: Back Compatibility

2008-01-23 Thread Yonik Seeley
On Jan 23, 2008 9:53 AM, Mark Miller <[EMAIL PROTECTED]> wrote: > Also, as he mentioned, we really need a good distributed system that > allows for index partitioning. Thats the ticket to more enterprise > adoption. Could be Solr's work though... Yes, we're working on that :-) -Yonik ---

Re: Back Compatibility

2008-01-23 Thread Mark Miller
Thats where Robert is confusing me as well. To have XA support you just need to be able to define a transaction, atomically commit, or rollback. You also need a consistent state after any of these operations. LUCENE-1044 seems to guarantee that, and so isn't it more like finishing up needed wor

Site

2008-01-23 Thread Grant Ingersoll
I think I may have screwed up the site by committing the Apachecon news... Why were there local diffs on the site? -Grant - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Forrest version

2008-01-23 Thread Grant Ingersoll
Hey Michael, I noticed my generation of the site had a lot of diffs from what is checked in, even though I only edited one page. What version of Forrest are you using? -Grant - To unsubscribe, e-mail: [EMAIL PROTECTED] F

Re: Another DisjunctionSumScorer micro-tweak / simplification?

2008-01-23 Thread Yonik Seeley
On Jan 23, 2008 8:50 AM, eks dev <[EMAIL PROTECTED]> wrote: > this value gets practically maintained on two places, any reason for that? I > would suggest to use scorerDocQueue.size() uniformly as this method gets > definitely inlined. You're probably right... in the presence of proper inlinin

Re: Back Compatibility

2008-01-23 Thread Michael McCandless
Robert, besides LUCENE-1044 (syncing on commit), what is the Lucene core missing in order for you (or, someone) to build XA compliance on top of it? Ie, you can open a writer with autoCommit=false and no changes are committed until you close it. You can abort the session by calling writer.abort

Re: Unique doc ids

2008-01-23 Thread Yonik Seeley
On Jan 23, 2008 6:34 AM, Michael McCandless <[EMAIL PROTECTED]> wrote: >writer.freezeDocIDs(); >try { > get docIDs from somewhere & call writer.deleteByDocID >} finally { > writer.unfreezeDocIDs(); >} Interesting idea, but would require the IndexWriter to flush the buffer

Another DisjunctionSumScorer micro-tweak / simplification?

2008-01-23 Thread eks dev
method next() is the only place where DisjunctionSumScorer uses scorerDocQueue.size() method, on other places cached variable is used: private int queueSize = -1; // used to avoid size() method calls on scorerDocQueue I have tested if these are really mirrored with: assert(queueSize == scorerD

Re: Unique doc ids

2008-01-23 Thread Grant Ingersoll
On Jan 23, 2008, at 6:34 AM, Michael McCandless wrote: At first it might be optional, +1 There are still applications that don't require a UID, or are static for long enough periods of time that the Lucene internal id is sufficient, so I would hate to impose this on those apps. I thin

Re: Back Compatibility

2008-01-23 Thread Michael McCandless
Catching up here... Re the fracturing when Maven went from v1 -> v2: I think Lucene is a totally different animal. Maven is an immense framework; Lucene is a fairly small "core" set of APIs. I think for these "core" type packages it's very important to keep drop-in compatibility as long as poss

Re: Unique doc ids

2008-01-23 Thread Michael McCandless
Michael, Couldn't we add deleteByQuery to IndexWriter without adding the UID field? Would that be "enough" to make IndexReader read-only (ie, do we still really need to delete by docID from IndexWriter?). If we still need that ... maybe we could extend IndexWriter so that you can hold

Re: Unique doc ids

2008-01-23 Thread Michael Busch
Paul Elschot wrote: > Michael, > > How would IndexWriter.addIndexes() work with unique doc ids? Hi Paul, it would probably be a limitation of this design. The only way I can think of right now to ensure that during an addIndexes() the UIDs don't change is an API in IndexWriter like setMinUID(lon

Re: Unique doc ids

2008-01-23 Thread Michael Busch
Terry Yang wrote: > Hi,Michael, > You idea is good! But i have a question and thanks for your help! > Hi Terry, > Can u explain more about how you store a reverse UID-->ID? How u guarantee > UID > can be mapped to the correct dynamic ID. I mean if a docid =5 and then for > some reason changed t

[jira] Commented: (LUCENE-1121) Use nio.transferTo when copying large blocks of bytes

2008-01-23 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12561609#action_12561609 ] Michael McCandless commented on LUCENE-1121: Sun's JVM, 1.4 (on the Windows XP

Re: Unique doc ids

2008-01-23 Thread Paul Elschot
Michael, How would IndexWriter.addIndexes() work with unique doc ids? Regards, Paul Elschot Op Tuesday 22 January 2008 12:07:16 schreef Michael Busch: > Hi Team, > > the question of how to delete with IndexWriter using doc ids is > currently being discussed on java-user > (http://www.gossamer-