Re: Extending solr analysis in index time

Ahmet Arslan Mon, 12 Jan 2015 10:50:39 -0800

Hi Ali,

Reading your example, if you could somehow replace idf component with your 
"importance weight",
I think your use case looks like TFIDFSimilarity. Tf component remains same.


https://lucene.apache.org/core/4_0_0/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html

I also suggest you ask this in lucene mailing list. Someone familiar with 
similarity package can give insight on this.

Ahmet



On Monday, January 12, 2015 6:54 PM, Jack Krupansky <jack.krupan...@gmail.com> 
wrote:
Could you clarify what you mean by "Lucene reverse index"? That's not a
term I am familiar with.

-- Jack Krupansky


On Mon, Jan 12, 2015 at 1:01 AM, Ali Nazemian <alinazem...@gmail.com> wrote:

> Dear Jack,
> Thank you very much.
> Yeah I was thinking of function query for sorting, but I have to problems
> in this case, 1) function query do the process at query time which I dont
> want to. 2) I also want to have the score field for retrieving and showing
> to users.
>
> Dear Alexandre,
> Here is some more explanation about the business behind the question:
> I am going to provide a field for each document, lets refer it as
> "document_score". I am going to fill this field based on the information
> that could be extracted from Lucene reverse index. Assume I have a list of
> terms, called important terms and I am going to extract the term frequency
> for each of the terms inside this list per each document. To be honest I
> want to use the term frequency for calculating "document_score".
> "document_score" should be storable since I am going to retrieve this field
> for each document. I also want to do sorting on "document_store" in case of
> preferred by user.
> I hope I did convey my point.
> Best regards.
>
>
> On Mon, Jan 12, 2015 at 12:53 AM, Jack Krupansky <jack.krupan...@gmail.com
> >
> wrote:
>
> > Won't function queries do the job at query time? You can add or multiply
> > the tf*idf score by a function of the term frequency of arbitrary terms,
> > using the tf, mul, and add functions.
> >
> > See:
> > https://cwiki.apache.org/confluence/display/solr/Function+Queries
> >
> > -- Jack Krupansky
> >
> > On Sun, Jan 11, 2015 at 10:55 AM, Ali Nazemian <alinazem...@gmail.com>
> > wrote:
> >
> > > Dear Jack,
> > > Hi,
> > > I think you misunderstood my need. I dont want to change the default
> > > scoring behavior of Lucene (tf-idf) I just want to have another field
> to
> > do
> > > sorting for some specific queries (not all the search business),
> however
> > I
> > > am aware of Lucene payload.
> > > Thank you very much.
> > >
> > > On Sun, Jan 11, 2015 at 7:15 PM, Jack Krupansky <
> > jack.krupan...@gmail.com>
> > > wrote:
> > >
> > > > You would do that with a custom similarity (scoring) class. That's an
> > > > expert feature. In fact a SUPER-expert feature.
> > > >
> > > > Start by completely familiarizing yourself with how TF*IDF
> similarity
> > > > already works:
> > > >
> > > >
> > >
> >
> http://lucene.apache.org/core/4_10_3/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html
> > > >
> > > > And to use your custom similarity class in Solr:
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/solr/Other+Schema+Elements#OtherSchemaElements-Similarity
> > > >
> > > >
> > > > -- Jack Krupansky
> > > >
> > > > On Sun, Jan 11, 2015 at 9:04 AM, Ali Nazemian <alinazem...@gmail.com
> >
> > > > wrote:
> > > >
> > > > > Hi everybody,
> > > > >
> > > > > I am going to add some analysis to Solr at the index time. Here is
> > > what I
> > > > > am considering in my mind:
> > > > > Suppose I have two different fields for Solr schema, field "a" and
> > > field
> > > > > "b". I am going to use the created reverse index in a way that some
> > > terms
> > > > > are considered as important ones and tell lucene to calculate a
> value
> > > > based
> > > > > on these terms frequency per each document. For example let the
> word
> > > > > "hello" considered as important word with the weight of "2.0".
> > Suppose
> > > > the
> > > > > term frequency for this word at field "a" is 3 and at field "b" is
> 6
> > > for
> > > > > document 1. Therefor the score value would be 2*3+(2*6)^2. I want
> to
> > > > > calculate this score based on these fields and put it in the index
> > for
> > > > > retrieving. My question would be how can I do such thing? First I
> did
> > > > > consider using term component for calculating this value from
> outside
> > > and
> > > > > put it back to Solr index, but it seems it is not efficient enough.
> > > > >
> > > > > Thank you very much.
> > > > > Best regards.
> > > > >
> > > > > --
> > > > > A.Nazemian
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > A.Nazemian
> > >
> >
>
>
>
> --
> A.Nazemian
>

Re: Extending solr analysis in index time

Reply via email to