Hi guys! I bumped into a couple of issues when trying to sort a stream or calculate metrics on a Float field which contains values without the decimal part (e.g 1.0, 0.0, etc.).
1. Issues with sorting. Consider this expression: > sort( > list( > tuple(a=val(1.0)), > tuple(a=val(2.0)), > tuple(a=val(3.0)) > ), > by="a desc" > ) It executes sort just fine and returns > "docs": [ > {"a": 3}, > {"a": 2}, > {"a": 1} > ] The only minor issue at this point is that float numbers changed their original type to integers. However, I'll get back to this later. Now let's do a simple calculation over the same stream and try to sort it: > sort( > select( > list( > tuple(a=val(1.0)), > tuple(a=val(2.0)), > tuple(a=val(3.0)) > ), > div(a, 2) as a > ), > by="a desc" > ) This expression returns "EXCEPTION": "java.lang.Long cannot be cast to java.lang.Double". This happens because of the div() function which returns different data types for different tuples. If you execute just the select expression: > select( > list( > tuple(a=val(1.0)), > tuple(a=val(2.0)), > tuple(a=val(3.0)) > ), > div(a, 2) as a > ) It will return tuples where field "a" will have mixed Long and Double data types: > "docs": [ > {"a": 0.5}, > {"a": 1}, > {"a": 1.5} > ] This is why sort stumbles upon it. I think that the root cause of this issue lies in the RecursiveEvaluator#normalizeOutputType method which returns Long is a BigDecimal value has zero scale: } else if(value instanceof BigDecimal){ > BigDecimal bd = (BigDecimal)value; > if(bd.signum() == 0 || bd.scale() <= 0 || bd.stripTrailingZeros().scale() > <= 0){ > try{ > return bd.longValueExact(); > } catch(ArithmeticException e){ > // value was too big for a long, so use a double which can handle > scientific notation > } > } > return bd.doubleValue(); > } I consider this to be a major bug because even when your source stream contains only Float/Double values, applying any arithmetic operation might result in a value without decimal part which will be converted to Long that will break sorting. Can you confirm that this is a bug, so that I'll create a ticket? 2. The fact that Streaming Expressions engine heavily relies on the assumption that a stream will contain numeric values of the same type leads to subtle issues with calculating metrics. Consider this expression: > rollup( > list( > tuple(a=val(1.1), g=1), > tuple(a=val(2), g=1), > tuple(a=val(3.1), g=1) > ), > over="g", > min(a), > max(a), > sum(a), > avg(a) > ) (I showed earlier how you can get a stream of mixed types) It returns: > { > "max(a)": 2, > "avg(a)": 0.6666666666666666, > "min(a)": 2, > "sum(a)": 2, > "g": "1" > } As you can see the results are wrong for all metrics. All metrics considered only Long values from the source stream. In my case, it was value '2'. This happens because the implementation of all metrics holds separate containers for Long and Double values. For example MaxMetric#getValue: public Number getValue() { > if(longMax == Long.MIN_VALUE) { > return doubleMax; > } else { > return longMax; > } > } If a stream contained at least one Long among Doubles, the value of the longMax container would be returned. I consider this a severe design flaw and would like to get your perspective on this. Should I file a bug or I miss something? Can I expect that this will be fixed at some point? My ENV: solr-impl 7.7.1 5bf96d32f88eb8a2f5e775339885cd6ba84a3b58 - ishan - 2019-02-23 02:39:07 Thank you in advance! -- Best Regards, Alex Chornyi