I agree this is very verbose. I didn't even realize you could index a
multidimensional array into a multi-value field until now. Knowing this it
makes sense to support matrix creation directly from multi-value arrays.
I'll add this when i get some time.




Joel Bernstein
http://joelsolr.blogspot.com/


On Thu, Apr 29, 2021 at 10:46 AM FAVORY , XAVIER <[email protected]>
wrote:

> Hi Joel,
>
> Thank you for pointing me to that part of the documentation. valueAt() is
> exactly what I needed here.
> However, as you point out, there seems to be no way to directly get the
> matrix from a multidimensional array.
> As a consequence, my streaming expression is very verbose and quite long
> for my purpose (I perform this over a thousand documents), but it actually
> works by doing it that way (and I get rid of an extra queries to get the
> ids from a text search for instance):
>
> let(
>     s=search(test,q="*",fl="feature"),
>     f1=valueAt(col(s, feature ),0),
>     f2=valueAt(col(s, feature ),1),
>     f3=valueAt(col(s, feature ),2),
>     m=transpose(matrix(f1,f2,f3)),
>     d=distance(m,cosine())
> )
>
>
> Thank you again,
> Best,
>
> Xavier
>
> On Thu, 29 Apr 2021 at 16:04, Joel Bernstein <[email protected]> wrote:
>
> > That's interesting, it seems like you've indexed a matrix into a field.
> >
> > If that's the case I think you'll need to access the arrays using the
> index
> > as described here:
> >
> >
> https://solr.apache.org/guide/8_8/vector-math.html#getting-values-by-index
> >
> > Then you can create a matrix from the arrays.
> >
> > I guess we need to add a way to materialize the matrix directly from a
> > multidimensional array.
> >
> >
> > Joel Bernstein
> > http://joelsolr.blogspot.com/
> >
> >
> > On Tue, Apr 27, 2021 at 6:00 PM FAVORY , XAVIER <[email protected]>
> > wrote:
> >
> > > Hello everyone,
> > >
> > > I am currently trying to create a system for performing distance
> > > computation of different documents based on some pre-computed numerical
> > > feature vector.
> > >
> > > I set up Solr (cloud) 8.7 and I am using streaming expressions. I have
> > > documents as such, with the feature field being pfloat with multiValued
> > set
> > > to True:
> > >
> > >       {
> > >         "id":"1",
> > >         "feature":[
> > >           0.1,
> > >           0.5,
> > >           0.6,
> > >           1.7],
> > >       ,
> > >       {
> > >         "id":"2",
> > >         "feature":[
> > >           0.5,
> > >           0.1,
> > >           0.7,
> > >           0.9],
> > >       },
> > >       {
> > >         "id":"3",
> > >         "feature":[
> > >          -0.5,
> > >           0.9,
> > >           1.5,
> > >           0.2],
> > >       },
> > >
> > > I want to create a matrix so I can then use the distance() function to
> > > compute the distances for the columns of a matrix. The documentation
> > > provides an example of what I am interested in, by defining the vectors
> > on
> > > the fly:
> > >
> > > let(a=array(20, 30, 40),
> > >     b=array(21, 29, 41),
> > >     c=array(31, 40, 50),
> > >     d=matrix(a, b, c),
> > >     c=distance(d))
> > >
> > > By transposing the matrix I can easily perform the distance between the
> > > rows, so I can get what I want.
> > >
> > > However, now I want to extract the numerical features from a feature
> > field
> > > indexed in Solr. The documentation explains how to create a matrix from
> > > numerical values stored in some fields:
> > >
> > > let(
> > >     a=random(collection1, q="market:A", rows="5000", fl="price_f"),
> > >     b=random(collection1, q="market:B", rows="5000", fl="price_f"),
> > >     c=random(collection1, q="market:C", rows="5000", fl="price_f"),
> > >     d=random(collection1, q="market:D", rows="5000", fl="price_f"),
> > >     e=col(a, price_f),
> > >     f=col(b, price_f),
> > >     g=col(c, price_f),
> > >     h=col(d, price_f),
> > >     i=matrix(e, f, g, h),
> > >     j=sumRows(i))
> > >
> > > However, in my case, I already have an array of float values for each
> > > document. So I try to do it that way:
> > >
> > > let(
> > >     s1=search(test,q="id:1",fl="feature"), f1=col(s1, feature),
> > >     s2=search(test,q="id:2",fl="feature"), f2=col(s2, feature),
> > >     s3=search(test,q="id:3",fl="feature"), f3=col(s3, feature),
> > >     m=matrix(f1,f2,f3)
> > > )
> > >
> > > But I get this error:
> > >
> > > {
> > >   "result-set": {
> > >     "docs": [
> > >       {
> > >         "EXCEPTION": "Failed to evaluate expression matrix(f1,f2,f3) -
> > > Numeric value expected but found type java.util.ArrayList for value
> > > [0.1,0.5,0.6,1.7]",
> > >         "EOF": true,
> > >         "RESPONSE_TIME": 5
> > >       }
> > >     ]
> > >   }
> > > }
> > >
> > > When I inspect what I get as f3, I see that I have an array of array,
> > which
> > > is why I think it is failing here to create the matrix. I've been
> > searching
> > > a lot on how to create a matrix from float vectors stored in a field of
> > my
> > > documents, and I still cannot find any solution. What I could do is
> > extract
> > > the vectors, create them on the fly, and construct the vectors and
> > matrix,
> > > but I would like to be able to do it in one request. Moreover, I find
> it
> > > really curious that I cannot directly create the matrix on the results
> > of a
> > > a normal search. For instance, I would prefer to do something like
> that:
> > >
> > > s=search(test,q="*",fl="feature,id"), m=col(s,feature))
> > >
> > > which returns:
> > >
> > > {
> > >   "result-set": {
> > >     "docs": [
> > >       {
> > >         "m": [
> > >           [
> > >             0.1,
> > >             0.5,
> > >             0.6,
> > >             1.7
> > >           ],
> > >           [
> > >             0.5,
> > >             0.1,
> > >             0.7,
> > >             0.9
> > >           ],
> > >           [
> > >             -0.5,
> > >             0.9,
> > >             1.5,
> > >             0.2]
> > >           ]
> > >         ]
> > >       },
> > >       {
> > >         "EOF": true,
> > >         "RESPONSE_TIME": 3
> > >       }
> > >     ]
> > >   }
> > > }
> > >
> > > and be able to use the matrix I obtain here. But again, I was not able
> to
> > > perform matrix operations on "m".
> > >
> > > Does anyone know any elegant way to create a matrix from my numerical
> > > vectors stored in my feature field?
> > >
> > >
> > > Thank you.
> > > --
> > > Xavier Favory
> > > Music Technology Group
> > > Universitat Pompeu Fabra
> > >
> >
>
>
> --
> Xavier Favory
> Music Technology Group
> Universitat Pompeu Fabra
>

Reply via email to