The code snippet you posted is implementation of MatchAllQuery , it only
gives you the live doc id in the specified segment. If you want to get
extra information about a term, eg. freq, payload, you need to do some
calculation. The good thing is FST is sorted, so you can maintain a list of
TermsEnu
It's not hard to implement one. Store your term value of your document with
payload. Then create your own Query and override the score function with
your cosine similarity logic.
The problem here is you need to watch out the performance, especially for
terms have very high DF. It may decrease your
JavaCC or antlr would be your choice.
On Tue, Jul 23, 2013 at 4:19 AM, dheerajjoshim wrote:
> Greetings,
>
> I am looking a way to tokenize the String based on Logical operators
>
> Below String needs to be tokenized as
> *arg1:aaa,bbb AND arg2:ccc OR arg3:ddd,eee,fff*
>
> Token 1: arg1:aaa,bbb
Hi, Rune:
Per your requirement, you can generate a separated filed for the document
before send document to lucene. Let's say the name is: score_field. The
content of this field in this way:
Doc 1#score_field:
Lucence:0.7 is:0 ...
Doc 2#score_field:
Lucene:0.5 is:0 ...
Store the field with
Try Facets with lucene.
On Wed, Feb 19, 2014 at 5:58 AM, manalgandhi wrote:
> Hi,
>
> Say the docValue of a particular field is known. Is it possible to get the
> list of docId that match the DocValue from the index?
>
> I'm using Lucene 4.6.0.
>
> Regards,
> Manal
>
>
>
> --
> View this messag
According your description and files you attached, are you using
oracle/openjdk JDK on windows/linux separately?
Could you check the GC distribution with profiler? I doubt your Linux
version JDK setting will cause more GC which slows down your index process.
You can try to increase your JVM Heap
Hmmm, the reason i asked this question is regarding to implementation of :
CharTermAttribute.
It seems tokenizer will set token read from reader into it, and the
following tokenstream can also get this instance. My concern is in a
multi-thread envioment. another thread can also change the conte
Hi, thanks for the reply. Could you elaborate "The AttributeFactory creates
a new one for every new TokenStream instance." ? because i only find the
implementation like this:
private static Class getClassForInterface(Class attClass) {
final WeakReference> ref =
attClassImplMap.get(attCla
thanks, Uwe. I missed it.
On Sun, Nov 4, 2012 at 3:04 PM, Uwe Schindler wrote:
> As explained in my first eMail, the class of the implementation is cached,
> not the instance. The factory returns a new instance of the cached class.
>
> Uwe
>
>
>
> lukai schrieb:
>
terms = fileds.terms(...);
termsEnum = terms.iterator(null);
termsEnum.seekExat(...);
docsAndPositionsEnum docsPosEnum = termsEnum.docsAndPositions(...);
You can get the information in "docsPosEnum".
On Thu, Dec 6, 2012 at 2:28 AM, wrote:
> Hi all,
> I am new with Lucene.
> I try to understand
wrap your own parser.
eg. org/apache/lucene/querypasser/classic/QueryParser.jj.
On Fri, Dec 7, 2012 at 1:47 PM, Wu, Stephen T., Ph.D.
wrote:
> I’ve been trying to do semi-structured queries & query parsing. In other
> words, you could have XML snippets mixed in with plain terms, e.g. a query
I had implemented WAND in solr for our own project. It can improve the
performance a lot. For your reference:
http://dl.acm.org/citation.cfm?id=956944
But it needs to change index a little bit.
Thanks,
On Tue, Dec 11, 2012 at 6:19 AM, Matthew Willson wrote:
> Hi all
>
> I'm currently benchmark
Do we have any plan to decouple the index process?
Lucene was design for search, but according the question people ask in the
thread it beyonds search functionality sometimes. Like we might want to
customize our scoring function based on payload. Sometimes i dont need to
store TF/IDF information.
works, thanks Yonik.
On Sun, Dec 16, 2012 at 10:34 PM, Yonik Seeley wrote:
> On Mon, Dec 17, 2012 at 12:58 AM, lukai wrote:
> > Hi, guys:
> > Does queryplugin implementation impacts caching? I have implemented a
> new
> > query parser which just take the input qu
Store the term value as payload, and score with it.
On Mon, Mar 4, 2013 at 10:10 AM, Sharon Tam wrote:
> Hi,
>
> I have generated my own term-frequency vector representations of documents
> and would like to be able to query these with term-frequency vector queries
> instead of a text-string que
I had implemented wand with solr/lucene. So far there is no performance
issue. There is no native support for this functionality, you need to
implement it by yourself..
On Fri, Mar 15, 2013 at 10:09 AM, Lin Ma wrote:
> Hello guys,
>
> Supposing I have one million documents, and each document ha
16 matches
Mail list logo