Yes they are. As I am working with decision trees (iterating over subsets),
I will not have more than (say) 30 values for the strings.
Thank you! I will take a look at your link. Milan already suggested
PooledDataArrays
(here: https://groups.google.com/forum/#!topic/julia-stats/RCSA2179NLo ).
I
Are the strings coming from a finite collection?
Does
https://mlbasejl.readthedocs.org/en/latest/datapre.html#label-processing
help?
It probably is an implementation of "convert my data to integers (via a
dictionary) and continue working with integer Arrays"
On Tuesday, March 31, 2015 at 12:40
I also note that when I used DataFrames & DataArrays the performance is
much much worse compared to when I use Arrays.
I did not take notes, but the performance difference (between indexing and
looping over the DataArray) was marginal for DataArrays.
This was the reason why I changed the whole c
Le lundi 30 mars 2015 à 09:46 -0500, Tim Holy a écrit :
> Nope, that's the best way to do it. Yet another illustration that it's
> trivial
> to write code that beats any attempt to cleverly mix together library
> functions.
>
> I'd get rid of the @inbounds, though, it doesn't serve any purpose
Nope, that's the best way to do it. Yet another illustration that it's trivial
to write code that beats any attempt to cleverly mix together library
functions.
I'd get rid of the @inbounds, though, it doesn't serve any purpose for a
function that runs a long loop. Inlining too much can actually
After some thinking I defined a custom function (looping over the array)
for the mean.
This is a bit cumbersome but fine given that it is about 3 times faster and
has only 288 bytes of memory allocation (compared to 1 GB!) in the example
below.
If anyone knows how to do this faster I would appr