Re: TermDocs

2013-07-09 Thread lukai
The code snippet you posted is implementation of MatchAllQuery , it only gives you the live doc id in the specified segment. If you want to get extra information about a term, eg. freq, payload, you need to do some calculation. The good thing is FST is sorted, so you can maintain a list of TermsEnu

Re: raw cosine similarity

2013-07-21 Thread lukai
It's not hard to implement one. Store your term value of your document with payload. Then create your own Query and override the score function with your cosine similarity logic. The problem here is you need to watch out the performance, especially for terms have very high DF. It may decrease your

Re: Tokenize String using Operators(Logical Operator, : operator etc)

2013-07-23 Thread lukai
JavaCC or antlr would be your choice. On Tue, Jul 23, 2013 at 4:19 AM, dheerajjoshim wrote: > Greetings, > > I am looking a way to tokenize the String based on Logical operators > > Below String needs to be tokenized as > *arg1:aaa,bbb AND arg2:ccc OR arg3:ddd,eee,fff* > > Token 1: arg1:aaa,bbb

Re: Adding custom weights to individual terms

2014-02-13 Thread lukai
Hi, Rune: Per your requirement, you can generate a separated filed for the document before send document to lucene. Let's say the name is: score_field. The content of this field in this way: Doc 1#score_field: Lucence:0.7 is:0 ... Doc 2#score_field: Lucene:0.5 is:0 ... Store the field with

Re: Lucene - Get docId or document by using the DocValue

2014-02-19 Thread lukai
Try Facets with lucene. On Wed, Feb 19, 2014 at 5:58 AM, manalgandhi wrote: > Hi, > > Say the docValue of a particular field is known. Is it possible to get the > list of docId that match the DocValue from the index? > > I'm using Lucene 4.6.0. > > Regards, > Manal > > > > -- > View this messag

Re: [Suggestions Required] 110 Concurrency users indexing on Lucene dont finish in 200 ms.

2014-02-21 Thread lukai
According your description and files you attached, are you using oracle/openjdk JDK on windows/linux separately? Could you check the GC distribution with profiler? I doubt your Linux version JDK setting will cause more GC which slows down your index process. You can try to increase your JVM Heap

Re: Questions about lucene TokenStream

2012-11-04 Thread lukai
Hmmm, the reason i asked this question is regarding to implementation of : CharTermAttribute. It seems tokenizer will set token read from reader into it, and the following tokenstream can also get this instance. My concern is in a multi-thread envioment. another thread can also change the conte

Re: Questions about lucene TokenStream

2012-11-04 Thread lukai
Hi, thanks for the reply. Could you elaborate "The AttributeFactory creates a new one for every new TokenStream instance." ? because i only find the implementation like this: private static Class getClassForInterface(Class attClass) { final WeakReference> ref = attClassImplMap.get(attCla

Re: Questions about lucene TokenStream

2012-11-04 Thread lukai
thanks, Uwe. I missed it. On Sun, Nov 4, 2012 at 3:04 PM, Uwe Schindler wrote: > As explained in my first eMail, the class of the implementation is cached, > not the instance. The factory returns a new instance of the cached class. > > Uwe > > > > lukai schrieb: >

Re: Lucene 4.0.0 - find term position.

2012-12-06 Thread lukai
terms = fileds.terms(...); termsEnum = terms.iterator(null); termsEnum.seekExat(...); docsAndPositionsEnum docsPosEnum = termsEnum.docsAndPositions(...); You can get the information in "docsPosEnum". On Thu, Dec 6, 2012 at 2:28 AM, wrote: > Hi all, > I am new with Lucene. > I try to understand

Re: Semi-structured queries

2012-12-07 Thread lukai
wrap your own parser. eg. org/apache/lucene/querypasser/classic/QueryParser.jj. On Fri, Dec 7, 2012 at 1:47 PM, Wu, Stephen T., Ph.D. wrote: > I’ve been trying to do semi-structured queries & query parsing. In other > words, you could have XML snippets mixed in with plain terms, e.g. a query

Re: Long query optimisation: using some terms for scoring only

2012-12-11 Thread lukai
I had implemented WAND in solr for our own project. It can improve the performance a lot. For your reference: http://dl.acm.org/citation.cfm?id=956944 But it needs to change index a little bit. Thanks, On Tue, Dec 11, 2012 at 6:19 AM, Matthew Willson wrote: > Hi all > > I'm currently benchmark

Re: What is "flexible indexing" in Lucene 4.0 if it's not the ability to make new postings codecs?

2012-12-12 Thread lukai
Do we have any plan to decouple the index process? Lucene was design for search, but according the question people ask in the thread it beyonds search functionality sometimes. Like we might want to customize our scoring function based on payload. Sometimes i dont need to store TF/IDF information.

Re: About query result cache.

2012-12-16 Thread lukai
works, thanks Yonik. On Sun, Dec 16, 2012 at 10:34 PM, Yonik Seeley wrote: > On Mon, Dec 17, 2012 at 12:58 AM, lukai wrote: > > Hi, guys: > > Does queryplugin implementation impacts caching? I have implemented a > new > > query parser which just take the input qu

Re: Querying with Term Frequency Vectors

2013-03-04 Thread lukai
Store the term value as payload, and score with it. On Mon, Mar 4, 2013 at 10:10 AM, Sharon Tam wrote: > Hi, > > I have generated my own term-frequency vector representations of documents > and would like to be able to query these with term-frequency vector queries > instead of a text-string que

Re: potential query performance issue

2013-03-15 Thread lukai
I had implemented wand with solr/lucene. So far there is no performance issue. There is no native support for this functionality, you need to implement it by yourself.. On Fri, Mar 15, 2013 at 10:09 AM, Lin Ma wrote: > Hello guys, > > Supposing I have one million documents, and each document ha