Re: [R] R, Big Data and books

2014-09-11 Thread Angel Rodriguez
septiembre de 2014 18:10 Para: Angel Rodriguez Asunto: Re: [R] R, Big Data and books Angel, Thanks for the references. I am not seeing the name or link to the book that is free and recommended by John Hopskins' Department of Statistics. Can you please let me know the name of that book. Thanks

Re: [R] R, Big Data and books

2014-09-11 Thread Mohan Radhakrishnan
This too. Applied Predictive Modeling Max Kuhn • Kjell Johnson Thanks, Mohan On Tue, Sep 9, 2014 at 10:03 PM, Subba Rao raspb...@tanucoo.com wrote: Hi, I am interested in R programming in the Big Data space. Are there any books that would be a good starting point for this career path?

Re: [R] R, Big Data and books

2014-09-10 Thread Angel Rodriguez
From an email list: R is well known in the world of Big Data and is increasing in popularity. A number of very useful resources are available for anyone undertaking data mining in R. For example, Luis Torgo has just published a book called Data Mining with R � learning with case studies

[R] R, Big Data and books

2014-09-09 Thread Subba Rao
Hi, I am interested in R programming in the Big Data space. Are there any books that would be a good starting point for this career path? Predictive analysis is also an area I am interested in. Thank you in advance for any information and help. Subba Rao

Re: [R] big data?

2014-08-07 Thread Spencer Graves
Thanks to all who replied. For the record, I will summarize here what I tried and what I learned: Mike Harwood suggested the ff package. David Winsemius suggested data.table and colbycol. Peter Langfelder suggested sqldf. sqldf::read.csv.sql allowed me to create an SQL

Re: [R] big data?

2014-08-07 Thread Spencer Graves
correcting a typo (400 MB, not GB. Thanks to David Winsemius for reporting it). Spencer ### Thanks to all who replied. For the record, I will summarize here what I tried and what I learned: Mike Harwood suggested the ff package. David Winsemius suggested

Re: [R] big data?

2014-08-06 Thread Mike Harwood
The read.table.ffdf function in the ff package can read in delimited files and store them to disk as individual columns. The ffbase package provides additional data management and analytic functionality. I have used these packages on 15 Gb files of 18 million rows and 250 columns. On

[R] big data?

2014-08-05 Thread Spencer Graves
What tools do you like for working with tab delimited text files up to 1.5 GB (under Windows 7 with 8 GB RAM)? Standard tools for smaller data sometimes grab all the available RAM, after which CPU usage drops to 3% ;-) The bigmemory project won the 2010 John Chambers

Re: [R] big data?

2014-08-05 Thread Peter Langfelder
Have you tried read.csv.sql from package sqldf? Peter On Tue, Aug 5, 2014 at 10:20 AM, Spencer Graves spencer.gra...@structuremonitoring.com wrote: What tools do you like for working with tab delimited text files up to 1.5 GB (under Windows 7 with 8 GB RAM)? Standard tools for

Re: [R] big data?

2014-08-05 Thread David Winsemius
On Aug 5, 2014, at 10:20 AM, Spencer Graves wrote: What tools do you like for working with tab delimited text files up to 1.5 GB (under Windows 7 with 8 GB RAM)? ?data.table::fread Standard tools for smaller data sometimes grab all the available RAM, after which CPU usage drops

[R] Big Data reading subsample csv

2012-08-16 Thread Tudor Medallion
Hello, I'm most grateful for your time to read this. I have a uber size 30GB file of 6 million records and 3000 (mostly categorical data) columns in csv format. I want to bootstrap subsamples for multinomial regression, but it's proving difficult even with my 64GB RAM in my machine and twice

Re: [R] Big Data reading subsample csv

2012-08-16 Thread jim holtman
Why not put this into a database, and then you can easily extract the records you want specifying the record numbers. You play the one time expense of creating the database, but then have much faster access to the data as you make subsequent runs. On Thu, Aug 16, 2012 at 9:44 AM, Tudor Medallion

Re: [R] Big Data reading subsample csv

2012-08-16 Thread Greg Snow
The read.csv.sql function in the sqldf package may make this approach quite simple. On Thu, Aug 16, 2012 at 10:12 AM, jim holtman jholt...@gmail.com wrote: Why not put this into a database, and then you can easily extract the records you want specifying the record numbers. You play the one

Re: [R] Big data and column correspondence problem

2011-07-27 Thread murilofm
Thanks Daniel, that helped me. Based on your suggestions I built this final code: library(foreign) library(gdata) AA = c(4,4,4,2,2,6,8,9) A1 = c(3,3,11,5,5,7,11,12) A2 = c(3,3,7,3,5,7,11,12) A = cbind(AA, A1, A2) BB = c(2,2,4,6,6) B1 =c(5,11,7,13,NA) B2 =c(4,12,11,NA,NA) B3

Re: [R] Big data and column correspondence problem

2011-07-27 Thread Daniel Malter
If A has more columns than in your example, you could always try to only merge those columns of A with B that are relevant for the merging. You could then cbind the result of the merging back together with the rest of A as long as the merged data preserved the same order as in A. Alternatively,

Re: [R] Big data and column correspondence problem

2011-07-27 Thread murilofm
Daniel, thanks for the help. I finally made it, doing the merging separately. Daniel Malter wrote: On a different note: how are you matching if AA has multiple matches in BB? About that, all I have to do is check whether, for any of the BB which matches with AA, the indicator equals 1. If

[R] Big data and column correspondence problem

2011-07-26 Thread murilofm
Greetings, I've been struggling for some time with a problem concerning a big database that i have to deal with. I'll try to exemplify my problem since the database is really big. Suppose I have the following data: AA = c(4,4,4,2,2,6,8,9) A1 = c(3,3,5,5,5,7,11,12) A2 = c(3,3,5,5,5,7,11,12) A =

Re: [R] Big data and column correspondence problem

2011-07-26 Thread Daniel Malter
For question (a), do: which(AA%in%BB) Question (b) is very ambiguous to me. It makes little sense for your example because all values of BB are in AA. Therefore I am wondering whether you meant in question (a) that you want to find all values in BB that are in AA. That's not the same thing. I am

Re: [R] Big data and column correspondence problem

2011-07-26 Thread murilofm
Daniel, thanks for the answer. I will try to make myself i little bit clearer. Doing step by step I would have (using a loop trough the lines of 'A'): 1. AA[1] is 4. As so, I would have to compare A1[1] = 20 and A2[1] =3 with B1 B2 B3 B[3,2:4] 7 11 NA beacause BB[3]=4. Since there is

Re: [R] Big data and column correspondence problem

2011-07-26 Thread Petr PIKAL
Hi Re: [R] Big data and column correspondence problem Daniel, thanks for the answer. I will try to make myself i little bit clearer. Doing step by step I would have (using a loop trough the lines of 'A'): I am not sure if you are successful in your clarifying. 1. AA[1] is 4. As so, I

Re: [R] Big data and column correspondence problem

2011-07-26 Thread Daniel Malter
This is much clearer. So here is what I think you want to do. In theory and practice: Theory: Check if AA[i] is in BB If AA[i] is in BB, then take the row where BB[j] == AA[i] and check whether A1 and A2 are in B1 to B3. Is that right? Only if both are, you want the indicator to take 1. Here

Re: [R] Big data (over 2GB) and lmer

2010-10-23 Thread Douglas Bates
On Thu, Oct 21, 2010 at 2:00 PM, Ben Bolker bbol...@gmail.com wrote: Michal Figurski figurski at mail.med.upenn.edu writes: I have a data set of roughly 10 million records, 7 columns. It has only about 500MB as a csv, so it fits in the memory. It's painfully slow to do anything with it, but

Re: [R] big data and lmer

2010-10-22 Thread Jay Emerson
Though bigmemory, ff, and other big data solutions (databases, etc...) can help easily manage massive data, their data objects are not natively compatible with all the advanced functionality of R. Exceptions include lm and glm (both ff and bigmemory support his via Lumley's biglm package), kmeans,

[R] Big data (over 2GB) and lmer

2010-10-21 Thread Michal Figurski
Dear R-helpers I have a data set of roughly 10 million records, 7 columns. It has only about 500MB as a csv, so it fits in the memory. It's painfully slow to do anything with it, but it's possible. I also have another dataset of covariates that I would like to explore - with about 4GB of

Re: [R] Big data (over 2GB) and lmer

2010-10-21 Thread Ben Bolker
Michal Figurski figurski at mail.med.upenn.edu writes: I have a data set of roughly 10 million records, 7 columns. It has only about 500MB as a csv, so it fits in the memory. It's painfully slow to do anything with it, but it's possible. I also have another dataset of covariates that I

[R] big data

2010-09-08 Thread André de Boer
Hello, I searched the internet but i didn't find the answer for the next problem: I want to do a glm on a csv file consisting of 25 columns and 4 mln rows. Not all the columns are relevant. My problem is to read the data into R. Manipulate the data and then do a glm. I've tried with:

Re: [R] big data

2010-09-08 Thread Dirk Eddelbuettel
On 8 September 2010 at 13:26, André de Boer wrote: | I searched the internet but i didn't find the answer for the next problem: | I want to do a glm on a csv file consisting of 25 columns and 4 mln rows. | Not all the columns are relevant. My problem is to read the data into R. | Manipulate the

Re: [R] big data

2010-09-08 Thread Greg Snow
Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- project.org] On Behalf Of André de Boer Sent: Wednesday, September 08, 2010 5:27 AM To: r-help@r-project.org Subject: [R] big data Hello, I searched

[R] big data file versus ram memory

2008-12-18 Thread Mauricio Calvao
Hi there I am new to R and would like to ask some questions which might not make perfect sense. Anyhow, here they are: 1) I would like very much to use R for processing some big data files (around 1.7 or more GB) for spatial analysis, wavelets, and power spectra estimation; is this possible

Re: [R] big data file versus ram memory

2008-12-18 Thread Stephan Kolassa
Hi Mauricio, Mauricio Calvao schrieb: 1) I would like very much to use R for processing some big data files (around 1.7 or more GB) for spatial analysis, wavelets, and power spectra estimation; is this possible with R? Within IDL, such a big data set seems to be tractable... There are some

Re: [R] big data file versus ram memory

2008-12-18 Thread David Winsemius
On Dec 18, 2008, at 3:07 PM, Stephan Kolassa wrote: Hi Mauricio, Mauricio Calvao schrieb: 1) I would like very much to use R for processing some big data files (around 1.7 or more GB) for spatial analysis, wavelets, and power spectra estimation; is this possible with R? Within IDL, such