Hello everybody,
I am just curious about following case.
Currently, I create a boolean AND query which loads payloads.
In some cases it occurs that Lucene loads payloads but does not return hits.
Therefore, I assume that payloads are directly loaded whith each doc ID from
the posting list before
Hello everybody,
As far as I know Lucene processes documents DAAT. Depending on the query
either the intersection or union is calculated. For the intersection only
documents occurring in all posting lists are scored. In the union case every
document is scored which makes it a more expensive
Hi Robert,
the adapted codec is running but it seems to be incredible slow. Will take
some time ;)
Here are some performance results:
Indexing scheme
Index Size
I also indexed one time with Lucene 3.0. Are those sizes really completely
the same?
Standard 4.0 W Freq W Pos 28.1 GB
Standard 4.0 W/O Freq W/O Pos 6.2 GB
Standard 3.0 W Freq W Pos 28.1 GB
Standard 3.0 WO Freq WO Pos 6.2 GB
Regards
Alex
--
View this message in context:
Hello everybody,
I am currently testing several new Lucene 4.0 codec implementations to
compare with an own solution.
The difference is that I am only indexing frequencies and not positions. I
would like to have this for the other codecs. I know there was already a
post for this topic
Hello everybody,
I am currently experimenting with Lucene 4.0 and would like to add payloads.
Payload should only be added once per term on the first position. My current
code looks like this:
public final boolean incrementToken() throws java.io.IOException {
String term =
Hi,
is Lucene capable of any early termination techniques during query
processing?
On the forum I only found some information about TimeLimitedCollector. Are
there more implementations?
Regards
Alex
--
View this message in context:
Hello everybody,
I am currently unsure how stored data is written and loaded from index.
I want to store for every term of a document some binary data but only once
and not for every position!
Therefore I am not sure if Payloads or stored Fields are the better solution
(Or the not implemented
Hello everybody,
I am currently using Lucene 3.0.2 with payloads. I store extra information
in the payloads about the term like frequencies and therefore I don't need
frequencies and term positions stored normally by Lucene. I would like to
set f.setOmitTermFreqAndPositions(true) but then I am
Hello everybody,
I used a small indexing example from Lucene in Action and can run and
compile the program under eclipse. If I want to compile and run it by
console I get this error:
java.lang.IllegalArgumentException: Could not find implementing class for
Hello Alexander,
isn't it enough to add the classpath through -cp? If I don't use -cp I can't
compile my project. I thought after compiling without errors all sources are
correctly added. In Eclipse I added Lucene sources the same way(which
works) and I also tried using the jar file. Therefore
Hello Uwe,
I recompiled some classes manually in Lucene sources. No it's running fine!
Something went wrong there.
Thank you very much!
Best regards
Alex
--
View this message in context:
http://lucene.472066.n3.nabble.com/Could-not-find-implementing-class-tp2330598p2332141.html
Sent from the
Hello everybody,
I am currently indexing wikipedia dumps and create an index for versioned
document collections. As far everything is working fine but I have never
thought that single articles of wikipedia would reach a size of around 2 GB!
One article for example has 2 versions with an
Hello Pulkit,
thank you for your answer and excuse me for my late reply. I am currently
working on the payload stuff and have implemented my own Analyzer and
Tokenfilter for adding custom payloads. As far as I understand I can add
Payload for every term occurence and write this into the posting
Hi again,
my Payloads are working fine as I figured out now (haven't seen the
nextPosition method). I really have problems with adding the bitvectors.
Currently I am creating them during tokenization. Therefore, as already
mentioned, they are only completely created when all fields are tokenized
Hello everybody,
I would like to implement the paper Compact Full-Text Indexing of Versioned
Document Collections [1] from Torsten Suel for my diploma thesis in Lucene.
The basic idea is to create a two-level index structure. On the first level
a document is identified by document ID with a
Hello everybody,
I read the paper http://www2008.org/papers/pdf/p387-zhangA.pdf Performance
of Compresses Inverted List Caching in Search Engines and now I am unsure
how Lucene implements its structure on the hard disk. I am using Windos as
OS and therefore I implemented FSDirectory based on
17 matches
Mail list logo