Hi Gilles,

On Wed, May 29, 2019 at 11:18 PM Gilles Sadowski <[email protected]>
wrote:

> Hello.
>
> Le mer. 29 mai 2019 à 12:24, Marco Neumann <[email protected]> a
> écrit :
> >
> > I am evaluating the use of Apache Math Commons Median for the querying of
> > large data sets in another Apache project called Apache Jena.
> >
> > In my preliminary performance tests I was surprised to find that a simple
> > implementation of a median function with Arrays.sort() and a programmatic
> > selection of the median value yields much faster results
> > than Median().evaluate() or DescriptiveStatistics.getPercentile(50).
>
> :-(
>

no worries, I still consider Apache Commons Math still a very valuable
effort.


>
> > Since we only use this function for  Arrays of confirmed numbers
>
> What is a "confirmed number"?
>

should probably read more like programmatically confirmed "numbers" rather
than "confirmed number". I am not dealing with NaN and infinite values in
the sort at the moment.


> > is there a
> > particular benefit in using Apache Commons Math for this task or are we
> > better advised to use our own implementation here?
>
> There is ongoing work to refactor the "o.a.c.m.stat.descriptive" package
> of "Commons Math".  The new code will be in "Commons Statistics".[1]
> Your observation is an interesting data point for this task; could you
> please
> file a report in JIRA[2] and/or mention on the "dev" ML?
>

I can certainly file a report and will do so tomorrow. I am looking forward
to the results of the work on the new stats package!

Best,
Marco

Thanks,
> Gilles
>
> [1] http://commons.apache.org/proper/commons-statistics/
> [2]
> https://issues.apache.org/jira/projects/STATISTICS/issues/STATISTICS-15?filter=allopenissues
>
> >
> > Thank You
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

-- 


---
Marco Neumann
KONA

Reply via email to