[jira] Commented: (LUCENE-2075) Share the Term -> TermInfo cache across threads

2009-11-18 Thread Paul Smith (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12779641#action_12779641 ] Paul Smith commented on LUCENE-2075: bq. This cache impl should be able to suppor

[jira] Commented: (LUCENE-1935) Generify PriorityQueue

2009-10-01 Thread Paul Smith (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12761408#action_12761408 ] Paul Smith commented on LUCENE-1935: thanks Uwe, I thought I would regret as

[jira] Commented: (LUCENE-1935) Generify PriorityQueue

2009-10-01 Thread Paul Smith (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12761395#action_12761395 ] Paul Smith commented on LUCENE-1935: I shall perhaps regret asking this, but is t

[jira] Commented: (LUCENE-1749) FieldCache introspection API

2009-07-24 Thread Paul Smith (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12735242#action_12735242 ] Paul Smith commented on LUCENE-1749: You know what would be absolute icing on

[jira] Commented: (LUCENE-1741) Make MMapDirectory.MAX_BBUF user configureable to support chunking the index files in smaller parts

2009-07-13 Thread Paul Smith (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12730551#action_12730551 ] Paul Smith commented on LUCENE-1741: An algorithm is nice if there are no spec

[jira] Commented: (LUCENE-1342) 64bit JVM crashes on Linux

2008-11-23 Thread Paul Smith (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12650051#action_12650051 ] Paul Smith commented on LUCENE-1342: yeah, it's definitely a Sun bug, not

[jira] Updated: (LUCENE-1342) 64bit JVM crashes on Linux

2008-11-23 Thread Paul Smith (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Smith updated LUCENE-1342: --- Attachment: hs_err_pid27882.log hs_err_pid21301.log 2 crash dumps attached

[jira] Commented: (LUCENE-1342) 64bit JVM crashes on Linux

2008-11-18 Thread Paul Smith (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12648768#action_12648768 ] Paul Smith commented on LUCENE-1342: java version "1.6.0_10" Java(

[jira] Commented: (LUCENE-1372) Proposal: introduce more sensible sorting when a doc has multiple values for a term

2008-09-04 Thread Paul Smith (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12628513#action_12628513 ] Paul Smith commented on LUCENE-1372: bq. I'm not following this argument. W

[jira] Commented: (LUCENE-1372) Proposal: introduce more sensible sorting when a doc has multiple values for a term

2008-09-04 Thread Paul Smith (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12628480#action_12628480 ] Paul Smith commented on LUCENE-1372: Having a Document sorted last because it

[jira] Commented: (LUCENE-1282) Sun hotspot compiler bug in 1.6.0_04/05 affects Lucene

2008-07-13 Thread Paul Smith (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12613208#action_12613208 ] Paul Smith commented on LUCENE-1282:  Can anyone comment as to whether the JRE 1.

[jira] Commented: (LUCENE-1282) Sun hotspot compiler bug in 1.6.0_04/05 affects Lucene

2008-05-14 Thread Paul Smith (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12596964#action_12596964 ] Paul Smith commented on LUCENE-1282: Throwing up an idea here for consideration.

[jira] Commented: (LUCENE-1282) Sun hotspot compiler bug in 1.6.0_04/05 affects Lucene

2008-05-11 Thread Paul Smith (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12595946#action_12595946 ] Paul Smith commented on LUCENE-1282: Another workaround might be to use 

maven snapshots available for 2.3?

2007-08-08 Thread Paul Smith
ppeared to build and test ok. I'm happy to pitch in here. cheers, Paul Smith - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-966) A faster JFlex-based replacement for StandardAnalyzer

2007-07-26 Thread Paul Smith (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12515882 ] Paul Smith commented on LUCENE-966: --- We did pretty much the same thing here at Aconex, The tokenization

Re: Lucene upload to Maven 2 repository

2007-06-18 Thread Paul Smith
On 19/06/2007, at 9:58 AM, Michael Busch wrote: Paul Smith wrote: Any chance of adding source jars as artifacts too? Makes the Maven Eclipse plugin rather nice. I appreciate the effort in organizing the artifacts (particularly the older versions). cheers, Paul In German we have

Re: Lucene upload to Maven 2 repository

2007-06-18 Thread Paul Smith
*sigh*, with attachment this time: lucene_pom.2.patch Description: Binary data On 19/06/2007, at 11:42 AM, Paul Smith wrote:Enhanced version of previous patch.  Now compiles and executes all unit tests (although some of them are failing for me)mvn -f lucene-core.pom.xml testyou can still do a

Re: Lucene upload to Maven 2 repository

2007-06-18 Thread Paul Smith
Enhanced version of previous patch. Now compiles and executes all unit tests (although some of them are failing for me) mvn -f lucene-core.pom.xml test you can still do a package (including source distro) and skip the tests mvn -f lucene-core-pom.xml -Dmaven.test.skip=true package assem

Re: Lucene upload to Maven 2 repository

2007-06-18 Thread Paul Smith
lucene_pom.patch Description: Binary data Attached is a quick patch for the lucene-core pom so that it does compile and package successfully:mvn -f lucene-core.pom.xml packageEnds up with a binary jar in the target/ sub-foldermvn assembly:assemblyCreates a source distribution in the target folde

Re: Lucene upload to Maven 2 repository

2007-06-18 Thread Paul Smith
h/.m2/repository/org/apache/lucene/lucene-parent/@version@/ [EMAIL PROTECTED]@.pom Am I missing something ? Paul On 19/06/2007, at 10:15 AM, Michael Busch wrote: Paul Smith wrote: I might try and grab the trunk and see if I can work out what's needed to do that.. Paul That'

Re: Lucene upload to Maven 2 repository

2007-06-18 Thread Paul Smith
I'm just kidding, of course! I'll try to take a look at that. However, making these artifacts was already a lot of work and I'm not sure how soon I can work on the source artifacts. I might try and grab the trunk and see if I can work out what's needed to do that.. Paul --

Re: Lucene upload to Maven 2 repository

2007-06-18 Thread Paul Smith
On 19/06/2007, at 6:14 AM, Michael Busch wrote: Hello, looking at JIRA and the email archives I find several people asking us to upload Lucene to the Maven2 repository. Currently there are only the artifacts from Lucene core 1.9.1 and 2.0.0 in the repository. 1.9.1 is even incomplete, as

Re: How to handle servlet-api.jar in build?

2007-06-12 Thread Paul Smith
On 12/06/2007, at 7:07 PM, mark harwood wrote: Thanks for the pointers Paul. I just don't think you can 'package' up a distribution that includes these jars in your distribution. Clearly the binary distribution need not bundle servlet-api.jar - a demo.war file is all that is needed. Howe

Re: How to handle servlet-api.jar in build?

2007-06-12 Thread Paul Smith
On 12/06/2007, at 5:09 PM, markharw00d wrote: As part of the documentation push I was considering putting together an updated demo web app which showed a number of things (indexing, search, highlighting, XML Query templates etc) and was wondering what that might mean to the build system if

Re: Tests, Contribs, and Releases

2007-05-16 Thread Paul Smith
To answer your question, though, I don't see any reason not to make the changes to make the current process more repeatable. Yeah, mod'ing the ant process now is going to be simpler to catch the current problem. Still, I'd check the Gump stuff for Lucene, because I'd be surprised that wo

Re: Tests, Contribs, and Releases

2007-05-16 Thread Paul Smith
want to jump at without careful thought , but might be worth considering. I used to be anti-maven, but since version 2, and since Curt Arnold has been setting up the log4j build environment for maven, I've been quite impressed with it's capability. cheers, Paul Smith On 17/0

Re: Large scale sorting

2007-04-09 Thread Paul Smith
A memory saving optimization would be to not load the corresponding String[] in the string index (as discussed previously), but there is currently no way to tell the FieldCachethat the strings are unneeded. The String values are only needed for merging results in a MultiSearcher. Yep, which hap

Re: Large scale sorting

2007-04-09 Thread Paul Smith
In our application, we have to sync up the index pretty frequently, the warm-up of the index is killing it. Yep, it speeds up the first sort, but at the cost of making all the others slower (maybe significantly so). That's obviously not ideal but could make use of sorts in larger index

Re: Large scale sorting

2007-04-09 Thread Paul Smith
Now, if we could use integers to represent the sort field values, which is typically the case for most applications, maybe we can afford to have the sort field values stored in the disk and do disk lookup for each document matched? The look up of the sort field value will be as simple as

Re: Large scale sorting

2007-04-09 Thread Paul Smith
On 10/04/2007, at 4:18 AM, Doug Cutting wrote: Paul Smith wrote: Disadvantages to this approach: * It's a lot more I/O intensive I think this would be prohibitive. Queries matching more than a few hundred documents will take several seconds to sort, since random disk accesse

Large scale sorting

2007-04-06 Thread Paul Smith
this is controversial. But, if we wish Lucene to go beyond where it is now, I think we need to start thinking about this particular problem sooner rather than later. Happy Easter to all, Paul Smith - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-833) Indexing of Subversion Repositories.

2007-03-15 Thread Paul Smith (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12481324 ] Paul Smith commented on LUCENE-833: --- You should try Fisheye! It uses Lucene internally. http://www.cenqua.com

Re: ThreadLocal leak (was Re: Leaking org.apache.lucene.index.* objects)

2006-12-17 Thread Paul Smith
read with interest! :) cheers, Paul Smith smime.p7s Description: S/MIME cryptographic signature

Re: Is it safe to remove the throw from FastCharStream.refill() ?

2006-10-04 Thread Paul Smith
Title: Aconex Email Template On 05/10/2006, at 3:34 PM, Doron Cohen wrote:If I read the JIRA issue right, it look as if this is fixed in Lucene 2.0.1. Is it?If so, where can I download 2.0.1? No 2.0.1 was released (yet).This issue is fixed in the "svn head".Nightly builds that include this (and oth

Re: Is it save to remove the throw from FastCharStream.refill() ?

2006-10-03 Thread Paul Smith
end of the stream for tokenization point of view. I would love to get rid of it, but I think it will break a lot of behaviour. cheers, Paul Smith On 04/10/2006, at 11:48 AM, George Aroush wrote: Hi folks, Over at Lucene.Net, we are trying to determine if it's safe to do the foll

[jira] Commented: (LUCENE-675) Lucene benchmark: objective performance test for Lucene

2006-09-20 Thread Paul Smith (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-675?page=comments#action_12436443 ] Paul Smith commented on LUCENE-675: --- >From a strict performance point of view, a standard set of important, but >don't forget other language

[jira] Commented: (LUCENE-675) Lucene benchmark: objective performance test for Lucene

2006-09-20 Thread Paul Smith (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-675?page=comments#action_12436437 ] Paul Smith commented on LUCENE-675: --- If you're looking for freely available text in bulk, what about: http://www.gutenberg.org/wiki/Main_Page >

Re: svn commit: r437897 [1/2] - in /lucene/java/trunk: ./ src/java/org/apache/lucene/index/ src/java/org/apache/lucene/store/ src/test/org/apache/lucene/store/

2006-08-28 Thread Paul Smith
, Paul Smith smime.p7s Description: S/MIME cryptographic signature

[jira] Commented: (LUCENE-388) [PATCH] IndexWriter.maybeMergeSegments() takes lots of CPU resources

2006-08-14 Thread Paul Smith (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-388?page=comments#action_12427975 ] Paul Smith commented on LUCENE-388: --- This is where some tracing logging code would be useful. Maybe a YourKit memory snapshot to see what's going on..

[jira] Commented: (LUCENE-388) [PATCH] IndexWriter.maybeMergeSegments() takes lots of CPU resources

2006-08-13 Thread Paul Smith (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-388?page=comments#action_12427818 ] Paul Smith commented on LUCENE-388: --- geez, yep definitely don't put this in, my patch was only a 'suggestion' to highlight how it fixes the ro

Re: Lucene and Java 1.5

2006-05-30 Thread Paul Smith
nst it. Before you make any decision, I'd sit down and plan what events you'll actually want to log and at what level. Good planning will make the Lucene library very useful. You can then decide how you're going to log them. cheers, Paul Smith smime.p7s Description: S/MIME cryptographic signature

Re: Tree based BitSet (aka IntegerSet, DocSet...)

2006-01-28 Thread Paul Smith
No, I'm pretty sure it wouldn't, so long as you don't look at this code, lest you become "tainted" ... ;-) Isn't that where the phrase "I have no recollection of that Senator" comes in handy? :) Paul - To unsubscribe, e-

Re: Tree based BitSet (aka IntegerSet, DocSet...)

2006-01-28 Thread Paul Smith
it's of no use except for academic study. Pity. Would that preclude re-implementing the same algorithm in new source code? I'm not clear on whether that violates the license. cheers, Paul Smith - To unsu

Re: JIRA html problems?

2006-01-26 Thread Paul Smith
looks a bit b0rk3n to me as well. Maybe some text being displayed isn't being escaped properly causing HTML mayhem? Paul Smith On 27/01/2006, at 8:12 AM, Yonik Seeley wrote: I've been getting bad HTML out of JIRA lately: http://issues.apache.org/jira/browse/LUCENE Anyone els

Re: "Advanced" query language

2006-01-02 Thread Paul Smith
On 03/01/2006, at 11:08 AM, markharw00d wrote: I thought you said you "didn't really want to have to design a general API for parsing XML as part of this project" ? :) Having grown tired of messing with my own solution I tried using commons Digester with my example XML but ran into iss

Re: "Advanced" query language

2005-12-21 Thread Paul Smith
Hey all, I haven't been paying real close attention to this thread, but if any of you are looking for something that has _easy_ Object->XML->Object you should seriously try XStream (http://xstream.codehaus.org).. Simplest/easiest api I've seen. BSD licensed too (Apache friendly). One c

Re: NioFile cache performance

2005-12-08 Thread Paul Smith
  Most of the CPU time is actually used during the synchronization with multiple threads. I hacked together a version of MemoryLRUCache that used a ConcurrentHashMap from JDK 1.5, and it was another 50% faster ! At a minimum, if the ReadWriteLock class was modified to use the 1.5 facilities some si

[jira] Commented: (LUCENE-467) Use Float.floatToRawIntBits over Float.floatToIntBits

2005-11-17 Thread Paul Smith (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-467?page=comments#action_12357925 ] Paul Smith commented on LUCENE-467: --- If you can create a patch against 1.4.3 there is a reasonable possibility that I could create a 1.4.3 Lucene+ThisPatch jar and re-index

[jira] Commented: (LUCENE-467) Use Float.floatToRawIntBits over Float.floatToIntBits

2005-11-16 Thread Paul Smith (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-467?page=comments#action_12357839 ] Paul Smith commented on LUCENE-467: --- I probably didn't make my testing framework as clear as I should. Yourkit was setup to use method sampling (waking up ev

Re: Float.floatToRawIntBits

2005-11-16 Thread Paul Smith
On 17/11/2005, at 10:21 AM, Chris Lamprecht wrote: 1. Run profiler 2. Sort methods by CPU time spent 3. Optimize 4. Repeat :) Umm, well I know I could make it quicker, it's just whether it still _works_ as expected Maintaining the contract means I'll need to develop some good junit

Re: Float.floatToRawIntBits

2005-11-16 Thread Paul Smith
On 17/11/2005, at 9:24 AM, Doug Cutting wrote: In general I would not take this sort of profiler output too literally. If floatToRawIntBits is 5x faster, then you'd expect a 16% improvement from using it, but my guess is you'll see far less. Still, it's probably worth switching & measuri

Re: Float.floatToRawIntBits

2005-11-16 Thread Paul Smith
I can confirm this takes ~ 20% of an overall Indexing operation (see attached link from YourKit). http://people.apache.org/~psmith/luceneYourkit.jpg Mind you, the whole "signalling via IOException" in the FastCharStream is a way bigger overhead, although I agree much harder to f

Re: Considering lucene

2005-10-02 Thread Paul Smith
On 01/10/2005, at 6:30 AM, Erik Hatcher wrote: On Sep 30, 2005, at 1:26 AM, Paul Smith wrote: This requirement is almost exactly the same as my requirement for the log4j project I work on where I wanted to be able to index every row in a text log file to be it's own Document. It

Re: Considering lucene

2005-09-29 Thread Paul Smith
e fly XPath like queries using Lucene which apparently works very well, but I'm not sure it scales to massive documents such as log files (and your requirements). cheers, Paul Smith On 30/09/2005, at 3:17 PM, <[EMAIL PROTECTED]> <[EMAIL PROTECTED]> wrote: Hi, My na

Re: Map-Reduce

2005-08-04 Thread Paul Smith
On 05/08/2005, at 4:10 AM, Doug Cutting wrote: Doug Cutting wrote: Perhaps we need to factor Nutch into two projects, one with NDFS and MapReduce and the other with the search-specific code. This falls almost exactly on package lines. The packages org.apache.nutch.{io,ipc,fs,ndfs,mapre

Map-Reduce

2005-08-03 Thread Paul Smith
from nutch a shared library? I would love to hear anyones thoughts on the matter. cheers, Paul Smith [1] http://wiki.apache.org/nutch-data/attachments/Presentations/ attachments/oscon05.pdf [2] http://labs.google.com/papers/mapre

Re: [Performance]: IndexWriter again...

2005-05-16 Thread Paul Smith
On 16/05/2005, at 5:00 PM, Paul Elschot wrote: On Monday 16 May 2005 08:24, Paul Smith wrote: something very odd is going on with my attachments... sorry for the spam. It's usually easier open a bug in bugzilla and post the code and the concerns there. The only disadvantage of bugzilla is

Re: [Performance]: IndexWriter again...

2005-05-15 Thread Paul Smith
something very odd is going on with my attachments... sorry for the spam. On 16/05/2005, at 4:22 PM, Paul Smith wrote: I'm not even going to say anything this time :-$ On 16/05/2005, at 4:17 PM, Paul Smith wrote: Silly me, here's the patch with the extra code NOT commented ou

Re: [Performance]: IndexWriter again...

2005-05-15 Thread Paul Smith
I'm not even going to say anything this time :-$ On 16/05/2005, at 4:17 PM, Paul Smith wrote: Silly me, here's the patch with the extra code NOT commented out... Oh my, how embarrassing... :) Paul On 16/05/2005, at 4:15 PM, Paul S

Re: [Performance]: IndexWriter again...

2005-05-15 Thread Paul Smith
Silly me, here's the patch with the extra code NOT commented out... Oh my, how embarrassing... :) Paul On 16/05/2005, at 4:15 PM, Paul Smith wrote: - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e

[Performance]: IndexWriter again...

2005-05-15 Thread Paul Smith
confuse more people than it helps.I would really appreciate anyones thoughts on this, I'll be very happy to be proven wrong because it will just help me understand more of Lucene.  I would hope that speeding up indexing would benefit everyone?  Particularly the large scale sites out there.cheers,Paul Smith IndexWriter.patch Description: Binary data

Re: [Performanc]

2005-04-29 Thread Paul Smith
e comment on the CPU profile I sent in? If there was a way of optimizing that loop, then it could mean a reasonable improvement in indexing speed. cheers, Paul Smith

Re: ParallelReader

2005-04-29 Thread Paul Smith
earching inside the content index in this case. Should it go into the core or in contrib? +1 to core... (non-binding of course). Paul Smith - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]