from:"Dmitry Serebrennikov"

Re: strange behaviour in CompoundFileReader fileModified and touchFile

2004-10-01 Thread Dmitry Serebrennikov

Bernhard Messer wrote: Dmitry, Bernhard Messer wrote: hi, CompoundFileReader class contains some code where i can't follow the idea behind it. Maybe somebody else can switch on the light for me, so i can see the track. There are 2 public methods which definitly don't work as expected. I know, ex

Re: strange behaviour in CompoundFileReader fileModified and touchFile

2004-09-30 Thread Dmitry Serebrennikov

Bernhard Messer wrote: hi, CompoundFileReader class contains some code where i can't follow the idea behind it. Maybe somebody else can switch on the light for me, so i can see the track. There are 2 public methods which definitly don't work as expected. I know, extending Directory forces one to

Re: optimized disk usage when creating a compound index

2004-08-12 Thread Dmitry Serebrennikov

Hi Christoph, I agree that your approach achieves better disk usage than deleting segments as they are being merged into the compound file, chiefly because most indexes have one or two large files and the rest are small. I have not reviewed your latest code yet (it's a bit hard without a checke

Re: optimized disk usage when creating a compound index

2004-08-09 Thread Dmitry Serebrennikov

[EMAIL PROTECTED] wrote: Hi Dmitry, Thanks for looking into the code. Dmitry Serebrennikov <[EMAIL PROTECTED]> schrieb am 08.08.2004, I'm sorry for juping into this late, but my impression was that the files being deleted were of the new segment, not the files for segments being m

Re: optimized disk usage when creating a compound index

2004-08-08 Thread Dmitry Serebrennikov

Christoph Goller wrote: Bernhard Messer wrote: Hi Christoph, just reviewed the TestCompoundFile.java and you where absolutly right when saying that the test will fail on windows. No the test is changed in a way that a second file with identical data is created. This file can be used in the test

Re: possible SegmentMerger optimization

2004-08-08 Thread Dmitry Serebrennikov

Bernhard Messer wrote: Dmitry, yeap, you're right Dmitry. Switch on/off compound file would be the trick to simulate the same behavior i described. I did some test on that and found that it working perfect. Great! I'm glad that helps with your issue. By the way, I like what you did with reducing

Re: possible SegmentMerger optimization

2004-08-07 Thread Dmitry Serebrennikov

Bernhard Messer wrote: hi developers, may be there is a small, but effective possibility to optimize the SegmentMerger class when compound file option is enabled, which is default since lucene 1.4. The current implementation creates and writes the compound index file every time the merge() meth

Re: IndexWriter.getUseCompoundFile is confusing

2004-08-07 Thread Dmitry Serebrennikov

Daniel Naber wrote: Hi, I open an index with create=false so I can use addIndexes() on that index. I want to use the existing setting for useCompoundFile of that index. But getUseCompoundFile() will always return true, as it just returns what one has set with setUseCompoundFile() or the default.

Re: Deleting a document with an IndexWriter open

2004-07-20 Thread Dmitry Serebrennikov

Doug Cutting wrote: Then you need to ensure that you leave the index has no deletions, and optimize it if it has any, to remove them. This is probably most safely done as the first step, rather than the last. Good point. I didn't think about this. I'm not sure this method has many advantages ove

Re: Deleting a document with an IndexWriter open

2004-07-19 Thread Dmitry Serebrennikov

Doug Cutting wrote: Dmitry Serebrennikov wrote: So here's a modified sequence of operations, perhaps a bit more efficient than proposed by Christoph: 1) Open an IndexReader for searching - S. Keep it open until the transaction is committed. 2) Open a second IndexReader for deletions -

Re: Deleting a document with an IndexWriter open

2004-07-16 Thread Dmitry Serebrennikov

Another solution that works well in some applications is to rely on document number. This number will remain the same for the life of an IndexReader. This number is also always larger for documents added later. So given two documents with the same ID, the one with the highest document number is

Re: Term vectors: .tvf format question

2004-06-14 Thread Dmitry Serebrennikov

Doug Cutting wrote: So term-number-based vectors would be small and fast to use if all you're using is a single, optimized index, but very slow to use with unoptimized indexes and multiple indexes. That seems like a bad situtation, so, unless someone figures out another way, we're stuck with t

Re: multiple fields to be indexed

2004-06-14 Thread Dmitry Serebrennikov

jitender ahuja wrote: Hi all, I am trying to do a search on multiple fields of which some are indexed. Now, a query can be posed to search in the indexed fields but the Hits class object can have only one query object and the query class similarly can have only one field on which the query is perfo

Re: IndexReader.getCurrentVersion() and IndexReader.lastModified()

2004-06-02 Thread Dmitry Serebrennikov

Well, I know I didn't think of this case back when we were discussion this change. As a recap, the issue was mainly that on some architectures, the clock was not granular enough to detect updates reliably, so some test cases were failing some of the time. You are right, Bernhard, we didn't cons

Re: suggestions for a student project

2004-05-27 Thread Dmitry Serebrennikov

Drew Farris wrote: On Thu, 2004-05-27 at 16:17, Dmitry Serebrennikov wrote: How about adding binary fields capability to Lucene stored fields? This could be fun and useful, especially if there is a need for this for the project you are directly working on. Myself and Doug have discussed this

Re: suggestions for a student project

2004-05-27 Thread Dmitry Serebrennikov

jitender ahuja wrote: Hi, Can anyone tell that for a masters student's project what should be the areas of lucene that need to be developed. I am doing the project at an orgn. For that I took the data from tables and indexed it, but except for own stop words list could do nothing new. Just u

Re: stored field compression

2004-05-14 Thread Dmitry Serebrennikov

Doug Cutting wrote: Dmitry Serebrennikov wrote: Actually, I was thinking of something simpler... Somthing like a special case where one could supply binary data directly into a stored field. Something like: public class Field { public static Field Binary(String name, byte[] value

Re: stored field compression

2004-05-14 Thread Dmitry Serebrennikov

Doug Cutting wrote: Dmitry Serebrennikov wrote: A different approach would be to just allow binary data in fields. That way applications can compress and decompress as they see fit, plus they would be able to store numerical and other data more efficiently. That's an interesting idea.

Re: stored field compression

2004-05-14 Thread Dmitry Serebrennikov

Doug Cutting wrote: Doug Cutting wrote: A more elaborate approach would be to lazily decompress fields when values are accessed. Another big advantage of this approach (as reminded by Peter Cipollone) is that it will make indexing faster, as decompression will be avoided when merging. Doug

Re: incorrect OO in lucene source?

2004-04-20 Thread Dmitry Serebrennikov

Doug Cutting wrote: Robert Engels wrote: Lucene is often cited as an excellent example of OO design. That is kind, but the primary goal of Lucene is to provide functionality, not to use "correct" OO design. The two are not always in accord. Hear, hear! Shouldn't 'Filter' just be an interfa

Re: Too many open files

2004-03-04 Thread Dmitry Serebrennikov

You can tell how many file handles you are allowed to open by your OS (looks like some flavor of Unix from the paths that you have included). One way to reduce the number of files Lucene opens is to use compound indexes (where each index segment uses a single file). Look for this flag on Index

Re: New FieldSortedHitQueue uses Java 1.4 feature

2004-03-03 Thread Dmitry Serebrennikov

Jamie M wrote: --- Jamie M <[EMAIL PROTECTED]> wrote: --- Doug Cutting <[EMAIL PROTECTED]> wrote: Eric Isakson wrote: I hadn't heard any discussion about bumping up to Java 1.4 in the Lucene 1.4 release. Was this just overlooked or are we planning to drop support for p

Re: New FieldSortedHitQueue uses Java 1.4 feature

2004-03-03 Thread Dmitry Serebrennikov

I know in our environment we can't use 1.4 yet. We are integrating with another application that does not support 1.4. They will soon, but the 1.3-based versions will still be in production for quite some time. I wouldn't go as far as giving a -1 for the move, but please keep 1.3 working. I it i

Re: Lucene 1.1 index with Luncene 1.3

2004-02-26 Thread Dmitry Serebrennikov

I'm not sure about all of the changes since then, but I am (mostly) the one responsible for the "compound files" change that does away with the f1, f2, etc files you speak of. That change is backwards compatible in that it is optional and needs to be turned on by the application (set a flag on

Re: Important: Contributor License Agreement

2004-01-30 Thread Dmitry Serebrennikov

From the message it sounds like this agreement has been around for a while, but this is the first I've ever heard of the CLA. Could be just me, wouldn't be the first time :). Anyway, does anyone know where one can find this agreement? I've looked on ~jim's URL (below) but the link to the CLA th

Re: Compound file index issue

2004-01-15 Thread Dmitry Serebrennikov

Bruce Ritchie wrote: Red Hat Linux release 9 (Shrike) under their 2.4.20-24.9smp kernel and Sun's 1.4.2_03 SDK. We did disable lucene's file based locking mechanism since our app is not setup to allow multiple different JVM's to write to a single index (one index per JVM) and we've had no end o

Re: Compound file index issue

2004-01-14 Thread Dmitry Serebrennikov

Hello Bruce, I've seen a couple of other people reporting similar issues when switching to the latest 1.3 tree. However, usually the problems come from lack of synchronization of multiple threads doing reads and writes against the index. But it's definetely possible that the compound index for

Re: Small optimization in IndexSearcher

2004-01-14 Thread Dmitry Serebrennikov

My understanding is that you are not supposed to insert anything into the queue when it is full (i.e. it is not designed to handle this). Rather (and this is the reason for the if statement you are asking about), one is supposed to check for the full queue condition and remove elements of least

Re: suggestion for a CustomDirectory

2003-12-05 Thread Dmitry Serebrennikov

Doug Cutting wrote: fp235-5 wrote: Dmitry's patch improved this part a lot and in my case reduced by 10-15% the overall time. Sadly it has never been included in the source and could have been useful for all kind of users. If you read the thread associated with that patch you'll see that it

Re: lucene 1.3 RC3 compiled with gcj

2003-12-03 Thread Dmitry Serebrennikov

Andi Vajda wrote: I got the latest lucene to compile and run its demo on Linux (redhat 9) and Mac OS X Panther 10.3.1 using gcc 3.3.2's gcj, compiling to native executables. The demo seems to run. I didn't run the unittests since I didn't attempt to compile junit with gcj yet. Performance seems wa

Re: VOTE: BooleanQuery$TooManyClauses

2003-12-01 Thread Dmitry Serebrennikov

I haven't been following this thread closely because I don't use the wildcard queries, which I think are the main source of this exception, but I agree with Doug's choice below - option 3 at least until 2.0. This is assuming that option 0 - do nothing, is not an option at this point. Dmitry. D

Re: CompoundFileReader

2003-11-19 Thread Dmitry Serebrennikov

+1 Looks like a great optimization. Dmitry. Christoph Goller wrote: Dmitry Serebrennikov schrieb: I put those in mostly to assure myself that I got things right. I think the key question is whether it possible to read part of another file. If not, I think that's fine. If yes, I think tha

Re: improve performance of "AND-queries"

2003-10-23 Thread Dmitry Serebrennikov

I am not familiar enough with the query parser syntax, but is the * a wildcard? If so, that's what is causing the extra delay. If you want to speed this type of query up, the QueryFilter should be probably created on the wildcard query, not on the most restrictive one. Lucene's internals are so

Re: TermVector once again

2003-10-22 Thread Dmitry Serebrennikov

I will also help with what I can, but I can't promise too much time on this. Dmitry. Damian Gajda wrote: Hello, My name is Damian and I am working on a University project on text analysis. I would like to use lucene as part of this project. I have already used it successfully in implementing sear

Re: 1.3 RC2 / timestamp

2003-10-22 Thread Dmitry Serebrennikov

[EMAIL PROTECTED] wrote: Unfortunately, I can´t do anything on the timestamp problem before Friday since I am at a customer´s office. So RC2 has to come out without my contribution, but I think that´s ok since it is only a RC. I am not completely sure if including the new timestamp/version num

Re: 1.3 RC2

2003-10-21 Thread Dmitry Serebrennikov

+1 Doug, I didn't have a chance to apply your patches and test them yet. From just a quick review, it looks like everything should be fine. Now that the changes are in CVS, I can more easily check them out. But if the tests pass, I'm not worried. :) Dmitry. Doug Cutting wrote: Should I go ah

Re: CompoundFileReader

2003-10-18 Thread Dmitry Serebrennikov

haviour of throwing an exception if the seek index is out of bound is required? Its not part of the contract of the other implementations of InputStream. Maybe I am missing something here. Dmitry Serebrennikov schrieb: Dear Christoph, Sounds like an excellent enhancement. From a quick look, it app

Re: File timestamps

2003-10-16 Thread Dmitry Serebrennikov

Can we make the number into a variable? It basically represents timestamp resolution of a directory implementation. It might even be possible to compute it automatically. Dmitry. Hani Suleiman wrote: I guessed (wild, wild guess not based on reality or any sort of investigation) that it's theo

Re: CompoundFileReader

2003-10-16 Thread Dmitry Serebrennikov

Dear Christoph, Sounds like an excellent enhancement. From a quick look, it appears that you are right and everything should work just fine but use less memory. One question: have you tried the other test cases also or just the TestCompoundFile. There are quite a few conditions that TestCompoun

Re: Index locked for write

2003-10-05 Thread Dmitry Serebrennikov

No one seems to have responded to this yet, so I'll give a first shot. There were a few changes that I am aware of that occured in the locking area. One of Lucene's locking rules is that IndexWriter cannot co-exist with an "deleting" IndexReader. An IndexReader becomes "deleting" when delete me

Re: Testcase failure on OSX

2003-10-01 Thread Dmitry Serebrennikov

Hani Suleiman wrote: HFS is the standard FS on all OSX installs. You can find maybe 0.01% of users who are unix savvy enough to use UFS, but everyone else will be on HFS, with all its limitations. The same problem happens on linux too (but more intermittently). I think this problem will also b

Re: Testcase failure on OSX

2003-10-01 Thread Dmitry Serebrennikov

Hani Suleiman wrote: It feels wrong, but I can't figure out any negative ramifications. There are two cases, either the file is modified in the past, in which case the current time should be used, or it's modified in the future, so the current mod time + X should be used. Basically you can cover

Re: Testcase failure on OSX

2003-09-30 Thread Dmitry Serebrennikov

Steve Rowe wrote: Why not just add 1000ms to the current time when touching the files (instead of waiting for a second to pass). I tried in on my Linux box (e2fs) via the java File.setLastModified() method, and it allowed the file to have a timestamp in the future. Wouldn't this be a much smal

Re: Testcase failure on OSX

2003-09-30 Thread Dmitry Serebrennikov

Hani Suleiman wrote: The count for FSDirectory on OSX is around 700-800. The problem is that the filesystem timestamps on OSX has abysmal resolution (nothing finer than 1 whole second!) Wow! Write once, debug everywhere... Is this an OSX "feature" or is this a feature of a particular disk /

Re: Testcase failure on OSX

2003-09-30 Thread Dmitry Serebrennikov

Sounds like the same error that I fixed in the RAMDirectory last week. I didn't think that FSDirectory would have this problem because I thought that the OS filesystem would take care of this. Apparently it does not. Hani, please take a look at the RAMDirectory touchFile method (line 148). It's

CVS commits

2003-09-25 Thread Dmitry Serebrennikov

I've commited all chagnes, and it seemed to have succeeded, but I have not seen any commit e-mails yet. Is this normal? The cvs update now shows that I don't have any differences with repository, so I figured this is good. Regarding CHANGES.txt, should I update it manually and check it in as w

Re: 1.3 release

2003-09-25 Thread Dmitry Serebrennikov

Doug Cutting wrote: Dmitry Serebrennikov wrote: Sounds good. I tried to get some progress on committing last night, but I got bogged down in trying to figuire out ssh. I was just trying to avoid having to type in my password with every cvs command. It seems that ssh is the way to do that, but

Re: per-field Analyzer (was Re: some requests)

2003-09-25 Thread Dmitry Serebrennikov

Erik Hatcher wrote: Cool... I just worked up a simple test case and committed it. I added javadoc comments. My personal opinion is that pointing folks to Lucene's own test cases is the best way to show examples, and also to promote test driven development a bit more - so I didn't put an exam

Re: 1.3 release

2003-09-25 Thread Dmitry Serebrennikov

Doug Cutting wrote: Erik Hatcher wrote: +1 to a 1.3 release. I think we should do another RC as soon as Dmitry's changes are committed. Then, if no issues pop up in the next week or so, quickly follow it with a final release. Does that sound like a good plan? I'm happy to make the releases

Re: PATCH: IndexReaderDelete (Bugzilla Bug 12588), again

2003-09-23 Thread Dmitry Serebrennikov

Ok, the patch proposed by Christoph has another problem. It ends up openning IndexReader twice, but closes it only once. This leaves files open that shows up with running TestIndexReader with FSDirectory. I's just brackets on the if that were missing. I'm attaching an updated patch and a modifie

Re: PATCH: IndexReaderDelete (Bugzilla Bug 12588)

2003-09-23 Thread Dmitry Serebrennikov

Hello Otis, I've been looking at the junit failure in the TestIndexReader and I think I know what's going on. Christoph's patches are, the right solution, but the intermittent failures that you've observed (and which I see as well) are due, I think, to the fact that timestamps from System.curr

Re: file handle changes

2003-09-23 Thread Dmitry Serebrennikov

Doug Cutting wrote: Dmitry Serebrennikov wrote: Doug, I've really considered keeping everything at the Directory level, as you suggested. This would have been preferred, I agree, but I really couldn't find a way to reconsile this approach with the other two goals I had: (a) keep spe

Re: idea for reducing file handle use

2003-09-23 Thread Dmitry Serebrennikov

Doug Cutting wrote: Dmitry Serebrennikov wrote: Ok, I am working on a version that would limit the changes to the Directory class, but this directory would have to make certain assumptions about the names of the files (whereas right now it doesn't care). It would have to differentiat

Re: file handle changes

2003-09-23 Thread Dmitry Serebrennikov

Bruce, PA, and possibly others Thanks for giving the file handle patch a try. I'm very glad that it's working for you. I wander if either one of you has any scripts / data to monitor performance of your Lucene instance. If so, I would be very curious to know if you have seen any performance imp

Re: FSInputStream strangeness

2003-09-20 Thread Dmitry Serebrennikov

Here's a quick program to demonstrate what I mean: private void demo_FSInputStreamBug(FSDirectory fsdir, String file) throws IOException { // Setup the test file - we need more than 1024 bytes OutputStream os = fsdir.createFile(file); for(int i=0; i<2000; i++) {

FSInputStream strangeness

2003-09-20 Thread Dmitry Serebrennikov

Greetings, I'm developing some test cases to verify that my filehandle fix works, and in the process I am comparing the behavior of my InputStream classes with the FSInputStreams. I'm finding that FSInputStream behaves in strange ways sometimes, and I'm wandering if I should emulate its behavi

How do we pass arguments to JUnit test cases?

2003-09-19 Thread Dmitry Serebrennikov

I'm trying to setup a JUnit test case that needs a file path in order to create an FSDirectory. (RAMDirectory is ok for some things, but I'd like to verify that the files are written correctly and can be read on subsequent runs of the program). Does Lucene have any standard policy for doing thi

Re: idea for reducing file handle use

2003-09-18 Thread Dmitry Serebrennikov

Doug Cutting wrote: Dmitry, It would be cleaner if this could be done entirely as a Directory implementation. I know some folks who've implemented a filesystem-within-a-file solution for this problem that they're very happy with. It is a Directory, and requires no changes to Lucene. I'll a

Re: idea for reducing file handle use

2003-09-18 Thread Dmitry Serebrennikov

Doug Cutting wrote: Dmitry, It would be cleaner if this could be done entirely as a Directory implementation. I know some folks who've implemented a filesystem-within-a-file solution for this problem that they're very happy with. It is a Directory, and requires no changes to Lucene. I'll a

Re: Revival of Dmitry's Term Vector patches

2003-09-18 Thread Dmitry Serebrennikov

Otis Gospodnetic wrote: Dmitry and others, One of the relatively frequently asked for features is 'conceptual search', or 'search by similarity', etc. Lucene does not store term vectors in its index, so such searches cannot be supported. However, almost two years ago, Dmitry provided a large set

idea for reducing file handle use

2003-09-18 Thread Dmitry Serebrennikov

Greetings, Luceeners! Looks like lot's of good stuff is happenning with the code as of late. It's great to see this momentum! Here's some more action coming your way... - We all love Lucene, but most would agree that it tends to use a very

Re: Adding lock timeouts to write.lock

2003-08-14 Thread Dmitry Serebrennikov

There is another change around the locks that I have made in our copy of Lucene that has been working really well. Perhaps something like this already exists in the main code base (I haven't checked lately), but the idea was this: most of the problems that we were getting with locks was due to a

Re: Optimizing SegmentTermEnum (and friends)

2003-02-25 Thread Dmitry Serebrennikov

JUnit now works (it was not enough to have it in the lib dir, it actually had to be on the classpath). The patch passes the unit tests. I will think of the .prev optimization - about how to incorporate it. Also, my co-worker has ran some gc test on our entire application, and the patch reduces t

Re: Optimizing SegmentTermEnum (and friends)

2003-02-25 Thread Dmitry Serebrennikov

Doug Cutting wrote: Dmitry Serebrennikov wrote: I typed ant test-unit and it said $ ant test-unit Buildfile: build.xml init: javacc_check: compile: demo: test: test-unit: BUILD SUCCESSFUL Total time: 4 seconds Does that mean the tests passed? Kind of quick... I think it didn't really work

Re: Optimizing SegmentTermEnum (and friends)

2003-02-25 Thread Dmitry Serebrennikov

Doug Cutting wrote: Dmitry Serebrennikov wrote: 1) Since I do not need the intermediate terms, it makes sence to try to have a method that skips to the right term without creating the intermediate Term objects. I have done a version of this yesterday and ended up seeing a factor of 2

Re: Optimizing SegmentTermEnum (and friends)

2003-02-25 Thread Dmitry Serebrennikov

Thanks for your reply, Doug. See blow. Doug Cutting wrote: Dmitry Serebrennikov wrote: 1) Since I do not need the intermediate terms, it makes sence to try to have a method that skips to the right term without creating the intermediate Term objects. I have done a version of this yesterday

Optimizing SegmentTermEnum (and friends)

2003-02-25 Thread Dmitry Serebrennikov

Greetings, I've been running Lucene (inside of our application) in OptimizeIt to see if I can improve garbage generation and performance metrics of our app. Let me say right a way that Lucene is great! :) The more experience I have with it the more I find how well it performs. However, perhaps

Re: FSDirectory patch for file renaming

2003-02-17 Thread Dmitry Serebrennikov

Matt Tucker wrote: Otis, Yesterday, we had a data set on a particular Windows box that would consistently bomb out with the rename error. That's what finally motivated me to make the patch. After applying the patch the rebuild worked perfectly. I also removed the renameTo method call and tested

Re: FSDirectory patch for file renaming

2003-02-17 Thread Dmitry Serebrennikov

Matt Tucker wrote: Advantages to this fix: * Makes indexing more reliable, especially under VM's where the renameTo method doesn't always work. * Uses no additional memory or resources when File.renameTo works normally. Disadvantages: * It always sucks to add workarounds to bugs in other libra

Re: Should Token be immutable?

2003-01-06 Thread Dmitry Serebrennikov

Otis Gospodnetic wrote: Ah, sorry about bringing up performance, I mixed that with another thread. Anyhow, I still think that setPosition offers a nice feature that some people may want to use. It was on a to do list for a while, and it was there because people requested it, so even though Lucen

Re: Should Token be immutable?

2003-01-06 Thread Dmitry Serebrennikov

Agree with Otis. -1 Otis Gospodnetic wrote: It sounds to me that having the ability to do that that point 13. in CHANGES states is more important than trying to only slightly decrease the number of temporary objects instantiated. By the way, have you observed or measured the difference in perfo

Re: How do I get TermPositions for a given document?

2002-10-24 Thread Dmitry Serebrennikov

sound great. Just add it to the contributions area in a project called TermPositions or something more clever if you have a better name. Let me know if you have any problems adding it as other may have time to help out. --Peter On Wednesday, October 23, 2002, at 03:28 PM, Dmitry Serebrennikov

Re: How do I get TermPositions for a given document?

2002-10-23 Thread Dmitry Serebrennikov

On Wed, 23 Oct 2002 16:16:58 Dmitry Serebrennikov wrote: Spencer, Dave wrote: I have an IndexReader and I want to get a TermPositions obj for a given document. Right now it seems that it only works the other way - you can only get TermPositions for a term, or globally for all terms. Basica

Re: How do I get TermPositions for a given document?

2002-10-23 Thread Dmitry Serebrennikov

Spencer, Dave wrote: I have an IndexReader and I want to get a TermPositions obj for a given document. Right now it seems that it only works the other way - you can only get TermPositions for a term, or globally for all terms. Basically I want to know the positions of all the words in a given doc

Re: Question: using boost for sorting

2002-10-16 Thread Dmitry Serebrennikov

Also, please consider that some applications may require multiple Similarity implementations in the same index. For example, I would like to be able to sort by relevance on most searches, but sometimes allow users to request that results were ordered by price or by some other field. I think it

Re: Are score values always between 0 and 1?

2002-10-16 Thread Dmitry Serebrennikov

he picture? Thanks again. Dmitry. Doug Cutting wrote: > Dmitry Serebrennikov wrote: > >> I know that the FAQ says that they are, but in at least one instance >> in my index it appears to be equal to 1.94something. Are the scores >> guaranteed to be between 0 and 1 > &g

Question: using boost for sorting

2002-10-14 Thread Dmitry Serebrennikov

Greetings Everyone, I'm thinking of trying to build something that manipulates a query score in order to achieve a sort order other then the default relevance sort. The idea is to create a new type of query: SortingQuery( Query query, String sortByField ) It would run the sub-query and return

Are score values always between 0 and 1?

2002-10-14 Thread Dmitry Serebrennikov

Greetings, I know that the FAQ says that they are, but in at least one instance in my index it appears to be equal to 1.94something. Are the scores guaranteed to be between 0 and 1, and if not, what would it take to make them such? Thanks. Dmitry. -- To unsubscribe, e-mail:

Re: Reading terms performance

2002-09-05 Thread Dmitry Serebrennikov

Martin Sevigny wrote: >Lucene developers, > >If an application using Lucene wants to read the list of values for a >field, it must use (I think) the IndexReader.terms() method. But this >method is costly, because it returns all values for all fields, although >we could want only the values of a f

Re: [Bug 12137] New: - Can '*' or '?' symbol be used as the firstcharacter of a search?

2002-08-29 Thread Dmitry Serebrennikov

A SMOP? :) (+1 on the idea) Peter Carlson wrote: > I think this is a great idea. > > --Peter > > On Thursday, August 29, 2002, at 03:40 PM, Doug Cutting wrote: > >> Did my suggestion not make sense? >> >> I think we can make everyone happy here. By adding a parameter to >> the existing query

Re: Modifying document with unstored fields

2002-08-26 Thread Dmitry Serebrennikov

Victor Hadianto wrote: >On Mon, 26 Aug 2002 10:47, Dmitry Serebrennikov wrote: > > >>Victor Hadianto wrote: >>Yes, generally, there are two answers -- either make all fields stored >>or use some other database for the storage of the "master" documents. &g

Re: Modifying document with unstored fields

2002-08-25 Thread Dmitry Serebrennikov

Victor Hadianto wrote: >On Thu, 22 Aug 2002 23:14, Otis Gospodnetic wrote: > > >>That's the very top question/answer in Lucene FAQ at jGuru: >>http://www.jguru.com/faq/Lucene >> >> > >Hi Otis, > >Yep I realise that, but I think you haven't read my question closely. My >problem is not simpl

Re: Term Vector support part 2

2002-08-12 Thread Dmitry Serebrennikov

Maurits van Wijland wrote: >Hi Dmitry, > >I feel like a predator asking this... but is the term vector >code available for the latest release of Lucene? >Are you working on this by any chance? >Some of us here are really looking forward to this >great addition! > >regards, > >Maurits > > > > N

Re: batch indexing

2002-08-09 Thread Dmitry Serebrennikov

Doug Cutting wrote: > [ I've moved this discussion to lucene-dev. -drc ] > > Dmitry Serebrennikov wrote: > >> I was just thinking about doing something similar, but after looking >> at your code I thought couldn't the same thing be done by >> mani

Re: [ANN] NLucene 1.2b released

2002-07-12 Thread Dmitry Serebrennikov

So is this really a one-for-one port to C# or is it some sort of a wrapper around the lucene's original jar file? If it's C#, I'd be curious about any performance comparisons between the Java and the C# versions. Dmitry. -- To unsubscribe, e-mail: For additional c

Re: Term Positions

2002-07-08 Thread Dmitry Serebrennikov

This is supported by the same code that does the document term vectors. The code is not currently in Lucene but has been working for a long time. I anticipate having some free time to integrate and submit this code within the next two weeks. Dmitry. none none wrote: >hi, >is there any way to h

Re: Code to provide vector representation for documents

2002-07-05 Thread Dmitry Serebrennikov

;Has the code that Dmitry wrote that Doug describes below been submitted yet? > >I know we were waiting for v 1.2, but that's done now. > >Thanks > >--Peter > > >-----Referenced email --- > >Dmitry Serebrennikov [[EMAIL PROTECTED]] has implemented a subs

Re: COMMENT REQUESTED: Lucene 1.2 Final Release

2002-06-25 Thread Dmitry Serebrennikov

I remember seeing something like that when I used an older version of Ant to compile Lucene. What Ant version were you using, Lex? Dmitry. Lex Lawrence wrote: > Maybe I'm off my rocker, or maybe I've overlooked something. > Regardless, if nobody else noticed anything I'll just drop it gracefu

Re: VOTE: Possible features for next release

2002-05-23 Thread Dmitry Serebrennikov

> > > >2.I see a lot of "problems" when Searching and Updating on the same index. May be is >just me, but what i discovered is: > a)It is not possible "update" a document, it is possible just delete and re-add, >that mean open a Reader, do a delete, close the reader, open a writer, add the >doc

Re: Adding a TermExpansionQuery

2002-05-16 Thread Dmitry Serebrennikov

Great idea! I would suggest the following considerations: - this should be implemented as an interface that can support multiple implementations (such as something based on a simple lookup table and also something based on a wordNet-style synonim database) - different implementations might be pr

Re: Serializable RAMDirectory

2002-05-02 Thread Dmitry Serebrennikov

Is there a good reason for setting serialVersionUID to 1 rather the to the output of serialver? FYI, serialver is a program supplied with all jdks that I know of. It produces a number using the same algorithm that Java uses internally to create the serialVersionUID when none is given. The alg.

Re: InputStream handling problem

2002-04-26 Thread Dmitry Serebrennikov

Roman Rokytskyy wrote: >>I'm sorry, I should have been more specific. The file handle is only in >>the picture when FSInputStream is cloned. From what I can tell after a >>quick look, InputStream is responsible for buffering and it delegates to >>subclasses (via a call to readInternal) to refill

Re: InputStream handling problem

2002-04-25 Thread Dmitry Serebrennikov

Roman Rokytskyy wrote: >>Yes, I forgot about that one. It's even more interesting than that! The >>stream objects that Doug coded are not java.io. streams. They are >>wrappers on top of those. Each clone maintains it's own seek offset. >>Essentially, they share the same OS file handle but present

Re: InputStream handling problem

2002-04-25 Thread Dmitry Serebrennikov

Roman Rokytskyy wrote: >>So, I think Otis is right - it's really not a "problem", besides being >>an interesting design problem that is. There is an issue of whether it >>is a good practice to make use of OS-specific behavior in this way. >>Obviously, the portability suffers. I'm not sure if ther

Re: InputStream handling problem

2002-04-24 Thread Dmitry Serebrennikov

Otis Gospodnetic wrote: > >I don't know, are you sure that what you are seeing really is a >problem, that it is wrong to get rid of a file for which there is >interest? >It sounds logical, but maybe Doug wrote something that we can't find >that makes this an okay thing to do. >If this is a bug I

Re: Disk I/O in Lucene

2002-04-09 Thread Dmitry Serebrennikov

It might also be an interesting project to implement an NIODirectory, which would be just like the current FileDirectory, but would use the nio API. Lucene's abstraction of a Directory should make this possible, I think. Dmitry. Peter Carlson wrote: >Hi, > >Lucene does not use the new nio A

Re: Document Scoring

2002-04-04 Thread Dmitry Serebrennikov

> > > >Now, the class IndexSearcher, which extends Searcher, refers back to the >IndexReader. Here is the puzzling part: > >The method docFreq(Term t) and the method maxDoc() are both declared abstract in >IndexReader. > >Faced with this question, I was obliged to check out how it was done

Re: Lucene Sandbox now official

2002-04-03 Thread Dmitry Serebrennikov

Peter Carlson wrote: >Hi all, > > >The distinction between a contribution and a project is a little fuzzy, but >projects would be for functionality outside the scope of Lucene's current >API, where contributions would be able to be integrated into the current API >(i.e. Analyzers, queryParsers, .

Re: Action Item Vote Request

2002-03-26 Thread Dmitry Serebrennikov

Peter Carlson wrote: >Hi Brian, > >I have a suggestions that would be a little different, but I think would >accomplish the same function. I was looking at the current Jakarta CVS >repositories and it looks like we might create a sublevel under >Jakarta-lucene. > >Right now the CVS repository loo

Re: Action Item Vote Request

2002-03-26 Thread Dmitry Serebrennikov

freeze (unless bugs) >To move into a beta or release candidate stage, you must a vote by the >committers (+3 votes). > >I am willing to help with the build process. I will work with Doug to help >handle these activities. This also would include updating the web site. > >I am +1. > +1, but I won't be able to help here at this time. Dmitry Serebrennikov -- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>

1 2 >

1 - 100 of 165 matches

Mail list logo