Re: [julia-users] fastest way to sum columns in a dataframe

2014-03-02 Thread John Myles White
I’m still a little shaky about what’s stored in levels. In general, the secrets to better performance are: (1) Only index by column once to extract columns as a single chunk, then loop over the column. (2) Make sure you look at the `.data` components of a DataArray, rather than do simple indexi

Re: [julia-users] fastest way to sum columns in a dataframe

2014-03-02 Thread Jason Solack
so i'm guessing i'm doing something wrong, this is much slower... i've simplified things in my code a little to maybe help see what is going on for c1 = levels[1], c2 = levels[2], c3 = levels[3], c4 = levels[4], c5 = levels[5], c6 = levels[6], c7 = levels[7], c8 = levels[8], c9 = levels[9], c

Re: [julia-users] fastest way to sum columns in a dataframe

2014-03-02 Thread Jason Solack
i will give this a shot. thank you for the reply and all your work on Julia/DataFrames. It's much appreciated! On Sunday, March 2, 2014 12:13:22 PM UTC-5, John Myles White wrote: > > I’m a little fuzzy still, but I think the answer is probably still that > the problem you’re hitting is the ind

Re: [julia-users] fastest way to sum columns in a dataframe

2014-03-02 Thread John Myles White
I’m a little fuzzy still, but I think the answer is probably still that the problem you’re hitting is the indexing into the DataFrame isn’t sufficient to let the compiler know that the return type of the index is always a Float64. So you’ll want to try some of the tricks described in the thead I

Re: [julia-users] fastest way to sum columns in a dataframe

2014-03-02 Thread Jason Solack
the DataFrame contains floats and i'd ultimately like to have an array of size nrow(data) with the sum of those 13 columns in it (the column combination changes with each iteration). Is that enough detail? I've done the entire algorithm in c++ and at this point julia is a bit slower, but i hav

Re: [julia-users] fastest way to sum columns in a dataframe

2014-03-02 Thread John Myles White
Hi Jason, Can you give a few more details about what objects are? What is data? What is levels? In general, the performance problems with DataFrames are actually performance issues with DataArrays not letting type inference work well. We still haven’t agreed on the right solution, but this thr

[julia-users] fastest way to sum columns in a dataframe

2014-03-02 Thread Jason Solack
Hello everyone, i am doing several millions of iterations over a dataframe and i need to perform several computations over various combinations of columns. The first of which is a simple sum of 13 column, this appears to be a slow point of execution. right now i'm doing something like this: