Similarity class and searchPayloads

2011-06-08 Thread Alex vB
Hello everybody, I am just curious about following case. Currently, I create a boolean AND query which loads payloads. In some cases it occurs that Lucene loads payloads but does not return hits. Therefore, I assume that payloads are directly loaded whith each doc ID from the posting list before

Lucene query processing

2011-04-26 Thread Alex vB
Hello everybody, As far as I know Lucene processes documents DAAT. Depending on the query either the intersection or union is calculated. For the intersection only documents occurring in all posting lists are scored. In the union case every document is scored which makes it a more expensive

Re: New codecs keep Freq skip/omit Pos

2011-04-23 Thread Alex vB
Hi Robert, the adapted codec is running but it seems to be incredible slow. Will take some time ;) Here are some performance results: Indexing scheme Index Size

Re: New codecs keep Freq skip/omit Pos

2011-04-22 Thread Alex vB
I also indexed one time with Lucene 3.0. Are those sizes really completely the same? Standard 4.0 W Freq W Pos 28.1 GB Standard 4.0 W/O Freq W/O Pos 6.2 GB Standard 3.0 W Freq W Pos 28.1 GB Standard 3.0 WO Freq WO Pos 6.2 GB Regards Alex -- View this message in context:

New codecs keep Freq skip/omit Pos

2011-04-21 Thread Alex vB
Hello everybody, I am currently testing several new Lucene 4.0 codec implementations to compare with an own solution. The difference is that I am only indexing frequencies and not positions. I would like to have this for the other codecs. I know there was already a post for this topic

Lucene 4.0 Payloads

2011-03-17 Thread Alex vB
Hello everybody, I am currently experimenting with Lucene 4.0 and would like to add payloads. Payload should only be added once per term on the first position. My current code looks like this: public final boolean incrementToken() throws java.io.IOException { String term =

Early Termination

2011-03-15 Thread Alex vB
Hi, is Lucene capable of any early termination techniques during query processing? On the forum I only found some information about TimeLimitedCollector. Are there more implementations? Regards Alex -- View this message in context:

How are stored Fields/Payloads loaded

2011-02-28 Thread Alex vB
Hello everybody, I am currently unsure how stored data is written and loaded from index. I want to store for every term of a document some binary data but only once and not for every position! Therefore I am not sure if Payloads or stored Fields are the better solution (Or the not implemented

Storing payloads without term-position and frequency

2011-02-02 Thread Alex vB
Hello everybody, I am currently using Lucene 3.0.2 with payloads. I store extra information in the payloads about the term like frequencies and therefore I don't need frequencies and term positions stored normally by Lucene. I would like to set f.setOmitTermFreqAndPositions(true) but then I am

Could not find implementing class

2011-01-25 Thread Alex vB
Hello everybody, I used a small indexing example from Lucene in Action and can run and compile the program under eclipse. If I want to compile and run it by console I get this error: java.lang.IllegalArgumentException: Could not find implementing class for

Re: Could not find implementing class

2011-01-25 Thread Alex vB
Hello Alexander, isn't it enough to add the classpath through -cp? If I don't use -cp I can't compile my project. I thought after compiling without errors all sources are correctly added. In Eclipse I added Lucene sources the same way(which works) and I also tried using the jar file. Therefore

RE: Could not find implementing class

2011-01-25 Thread Alex vB
Hello Uwe, I recompiled some classes manually in Lucene sources. No it's running fine! Something went wrong there. Thank you very much! Best regards Alex -- View this message in context: http://lucene.472066.n3.nabble.com/Could-not-find-implementing-class-tp2330598p2332141.html Sent from the

Indexing large XML dumps

2011-01-03 Thread Alex vB
Hello everybody, I am currently indexing wikipedia dumps and create an index for versioned document collections. As far everything is working fine but I have never thought that single articles of wikipedia would reach a size of around 2 GB! One article for example has 2 versions with an

Re: Implementing indexing of Versioned Document Collections

2010-11-16 Thread Alex vB
Hello Pulkit, thank you for your answer and excuse me for my late reply. I am currently working on the payload stuff and have implemented my own Analyzer and Tokenfilter for adding custom payloads. As far as I understand I can add Payload for every term occurence and write this into the posting

Re: Implementing indexing of Versioned Document Collections

2010-11-16 Thread Alex vB
Hi again, my Payloads are working fine as I figured out now (haven't seen the nextPosition method). I really have problems with adding the bitvectors. Currently I am creating them during tokenization. Therefore, as already mentioned, they are only completely created when all fields are tokenized

Implementing indexing of Versioned Document Collections

2010-11-09 Thread Alex vB
Hello everybody, I would like to implement the paper Compact Full-Text Indexing of Versioned Document Collections [1] from Torsten Suel for my diploma thesis in Lucene. The basic idea is to create a two-level index structure. On the first level a document is identified by document ID with a

Detailed file handling on hard disk

2010-09-03 Thread Alex vB
Hello everybody, I read the paper http://www2008.org/papers/pdf/p387-zhangA.pdf Performance of Compresses Inverted List Caching in Search Engines and now I am unsure how Lucene implements its structure on the hard disk. I am using Windos as OS and therefore I implemented FSDirectory based on