Hello everyone,
I am currently trying to create a system for performing distance
computation of different documents based on some pre-computed numerical
feature vector.
I set up Solr (cloud) 8.7 and I am using streaming expressions. I have
documents as such, with the feature field being pfloat with multiValued set
to True:
{
"id":"1",
"feature":[
0.1,
0.5,
0.6,
1.7],
,
{
"id":"2",
"feature":[
0.5,
0.1,
0.7,
0.9],
},
{
"id":"3",
"feature":[
-0.5,
0.9,
1.5,
0.2],
},
I want to create a matrix so I can then use the distance() function to
compute the distances for the columns of a matrix. The documentation
provides an example of what I am interested in, by defining the vectors on
the fly:
let(a=array(20, 30, 40),
b=array(21, 29, 41),
c=array(31, 40, 50),
d=matrix(a, b, c),
c=distance(d))
By transposing the matrix I can easily perform the distance between the
rows, so I can get what I want.
However, now I want to extract the numerical features from a feature field
indexed in Solr. The documentation explains how to create a matrix from
numerical values stored in some fields:
let(
a=random(collection1, q="market:A", rows="5000", fl="price_f"),
b=random(collection1, q="market:B", rows="5000", fl="price_f"),
c=random(collection1, q="market:C", rows="5000", fl="price_f"),
d=random(collection1, q="market:D", rows="5000", fl="price_f"),
e=col(a, price_f),
f=col(b, price_f),
g=col(c, price_f),
h=col(d, price_f),
i=matrix(e, f, g, h),
j=sumRows(i))
However, in my case, I already have an array of float values for each
document. So I try to do it that way:
let(
s1=search(test,q="id:1",fl="feature"), f1=col(s1, feature),
s2=search(test,q="id:2",fl="feature"), f2=col(s2, feature),
s3=search(test,q="id:3",fl="feature"), f3=col(s3, feature),
m=matrix(f1,f2,f3)
)
But I get this error:
{
"result-set": {
"docs": [
{
"EXCEPTION": "Failed to evaluate expression matrix(f1,f2,f3) -
Numeric value expected but found type java.util.ArrayList for value
[0.1,0.5,0.6,1.7]",
"EOF": true,
"RESPONSE_TIME": 5
}
]
}
}
When I inspect what I get as f3, I see that I have an array of array, which
is why I think it is failing here to create the matrix. I've been searching
a lot on how to create a matrix from float vectors stored in a field of my
documents, and I still cannot find any solution. What I could do is extract
the vectors, create them on the fly, and construct the vectors and matrix,
but I would like to be able to do it in one request. Moreover, I find it
really curious that I cannot directly create the matrix on the results of a
a normal search. For instance, I would prefer to do something like that:
s=search(test,q="*",fl="feature,id"), m=col(s,feature))
which returns:
{
"result-set": {
"docs": [
{
"m": [
[
0.1,
0.5,
0.6,
1.7
],
[
0.5,
0.1,
0.7,
0.9
],
[
-0.5,
0.9,
1.5,
0.2]
]
]
},
{
"EOF": true,
"RESPONSE_TIME": 3
}
]
}
}
and be able to use the matrix I obtain here. But again, I was not able to
perform matrix operations on "m".
Does anyone know any elegant way to create a matrix from my numerical
vectors stored in my feature field?
Thank you.
--
Xavier Favory
Music Technology Group
Universitat Pompeu Fabra