Great, thank you Jacob, I will try it out!
Do you have a writeup on differences in the way you read CSV files and the
way it is currently done in Julia? Would love to know more!
Obvious perhaps but for completeness: Reading the data using readcsv or
readdlm does not improve much the metrics I
Thank you Tim and Jiahao for your responses. Sorry, I did not mention in my
OP that I was using Version 0.3.10-pre+1 (2015-05-30 11:26 UTC) Commit
80dd75c* (1 day old release-0.3).
I tried other releases as Tim suggested:
On Version 0.4.0-dev+5121 (2015-05-31 12:13 UTC) Commit bfa8648* (0 days
Facebook's Kaggle competition has a dataset with ~7.6e6 rows with 9 columns
(mostly
strings). https://www.kaggle.com/c/facebook-recruiting-iv-human-or-bot/data
Loading the dataset in R using read.csv takes 5 minutes and the resulting
dataframe takes 0.6GB (RStudio takes a total of 1.6GB memory
Ah OK, Sorry for misunderstanding the question.
Yes, there are no methods yet to compute intervals for predicted values.
Sorry again!
On Wednesday, April 1, 2015 at 2:06:07 PM UTC-4, tshort wrote:
>
> I think the question was for prediction intervals. I don't see that in
> GLM, yet.
>
> On Wed,
GLM.jl have prediction methods for Linear and Generalized Linear Models.
They take corresponding models and features as input. Please see for
implementation details
[glm]
https://github.com/JuliaStats/GLM.jl/blob/a7fb0057a7bc835d819e842c6f42f14601840a1b/src/glmfit.jl#L249
and
[lm]
https://githu
Thank you Gunner, Tim, and James!
These are great solutions and many times faster than my implementation.
On Thursday, March 26, 2015 at 10:17:10 AM UTC-4, James Fairbanks wrote:
>
> Since you mentioned a test set and a training set. You might want to use
> MLBase.jl which has reusable tools fo
Hi,
I have an array of 100 elements. I want to split the array to 70 (test set)
and 30 (train set) randomly.
N=100
A = rand(N);
n = convert(Int, ceil(N*0.7))
testindex = sample(1:size(A,1), replace=false,n)
testA = A[testindex];
How can I get the train set?
I could loop through testA and A to g
`convert` does not seem to work with symbols. I get an error (same for
String and UTF8String as well).
Test commands:
testdf = DataFrame(A = [1,2,3], B=[2,3,4])
string(names(testdf)[1]) # works
convert(String, names(testdf)[1]) # throws error
convert(ASCIIString, names(testdf)[1]) # throws error
c