RE: Results (Re: Survey: Lucene and Java 1.4 vs. 1.5)

2006-06-17 Thread Tatu Saloranta
--- Robert Engels [EMAIL PROTECTED] wrote: I think you should port Lucene to MS-DOS... If your app can't move beyond MS-DOS, then you stick with version 1.9 (or 2.0 in this case). If you can't innovate and move forward, you die. Java has a GREAT history of supporting prior versions.

Re: GData Server - Lucene storage

2006-06-02 Thread Tatu Saloranta
--- Simon Willnauer [EMAIL PROTECTED] wrote: ... Using the client thread as the indexing thread might just cause some performance drawback but that's considerable for Actually, I would not even assume that: handing tasks over between threads causes context switch, and more cache misses. In

Re: Phrase IDF and collection frequency !

2006-05-16 Thread Tatu Saloranta
--- ABDOU Samir [EMAIL PROTECTED] wrote: Hi, Are there any ideas on how to compute the document frequency and collection frequency of phrases? Tokenize your input as phrases (instead of words), and you'll get this the same way you normally get stats for single-word tokens (Terms)? I did

Re: [jira] Commented: (LUCENE-510) IndexOutput.writeString() should write length in bytes

2006-05-08 Thread Tatu Saloranta
--- Marvin Humphrey (JIRA) [EMAIL PROTECTED] wrote: ... It also slows Lucene down -- indexing takes around a 20% speed hit. It would be possible to submit a patch which had a smaller impact on performance, but this one is already over 700 lines long, and it's goal is to achieve standard

Re: 2.0 release

2006-05-06 Thread Tatu Saloranta
--- Maxim Patramanskij [EMAIL PROTECTED] wrote: Currently, buffer sizes for BufferedIndexInput and BufferedIndexOutput are equals and have constant size of 1024 bytes. When using a database for index persistence, it slowdowns performance much because of relatively small buffer size. With

Re: this == that

2006-05-02 Thread Tatu Saloranta
--- jian chen [EMAIL PROTECTED] wrote: I am wondering if interning Strings will be really that critical for performance. The biggest bottle neck is still disk. So, maybe we can use String.equals(...) instead of ==. I would bet big bucks for it saving significant amount of time, even with

Re: storing term text internally as byte array and bytecount as prefix, etc.

2006-05-02 Thread Tatu Saloranta
--- jian chen [EMAIL PROTECTED] wrote: Plus, as open source and open standard advocates, we don't want to be like Micros$ft, who claims to use industrial standard XML as the next generation word file format. However, it is very hard to write your own Word reader, because their word file

Re: (byte)((i 0x7f) | 0x80) == (byte)(i | 0x80)

2006-04-26 Thread Tatu Saloranta
--- Yonik Seeley [EMAIL PROTECTED] wrote: On 4/26/06, Charlie [EMAIL PROTECTED] wrote: writeByte((byte)((i 0x7f) | 0x80)); writeByte((byte)(i | 0x80)); Yes, these two lines are equivalent. It's fairly likely that the JVM already does this optimization for you though... at least

Driver about ACID requirements for Lucene (Re: [jira] Commented: (LUCENE-555))

2006-04-25 Thread Tatu Saloranta
--- dan (JIRA) [EMAIL PROTECTED] wrote: while ( myopicEngineerStillDoesntGetIt) { case(1) { A small business running MySQL has a travelling case(2) { Same scenario. How does team Lucene respond? If you Dan, do us all a favor and please figure out the difference between a

Re: Benchmarking results

2006-04-04 Thread Tatu Saloranta
The times for KinoSearch and Lucene are 5-run ... is due to cache reassignment.) Therefore, the same command was issued on the command line 6 times, separated by semicolons. The first iter was discarded, and the rest were averaged. ... The maximum memory consumption was measured

[jira] Commented: (LUCENE-520) Ability to abort hit collection

2006-03-16 Thread Tatu Saloranta (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-520?page=comments#action_12370726 ] Tatu Saloranta commented on LUCENE-520: --- Quick note regarding exceptions: an easy way to remove most of runtime exception overhead is to just construct a shared

Re: 1.9 RC1

2006-02-19 Thread Tatu Saloranta
--- Nadav Har'El [EMAIL PROTECTED] wrote: Dan Armbrust [EMAIL PROTECTED] wrote on 17/02/2006 08:50:53 PM: ... So I'm not sure the solution is to change the semantics of the existing constructor, but I think Lucene definitely need a new constructor or convenience function that will do the

Optimizing/minimizing memory usage of memory-based indexes

2006-02-10 Thread Tatu Saloranta
I am building a simple classifier system, using Lucene essentially to efficiently+incrementally calculate term frequencies. (due to input variations, I am currently creating a separate index for each attribute, although I guess I could (should?) just use different field for each attribute) Now,