Re: [R] Cleaning data

2017-09-26 Thread Jim Lemon
Hi Bayan, Your question seems to imply that the "age" column contains floating point numbers, e.g. df height weight age 170 72 21.5 ... If this is so, you will only find an integer in diff(age) if two adjacent numbers happen to have the same decimal fraction _and_ the subtraction

Re: [R] Cleaning data

2017-09-26 Thread Eric Berger
Hi Bayan, In your code, 'a' is a vector and is.integer(a) is a logical of length 1 - most likely FALSE if even one element of a is not an integer. (Since R will coerce all the elements of a to the same type.) You need to decide whether something "close enough" to an integer is to be considered an

[R] Cleaning data

2017-09-26 Thread bayan sardini
Hi I want to clean my data frame, based on the age column, whereas i want to delete the rows that the difference between its elements (i+1)-i= integer. i used a <- diff(df$age) for(i in a){if(is.integer(a) == true){df <- df[-a,] }} but, it doesn’t work, any ideas Thanks in advance Bayan

Re: [R] Cleaning

2015-11-11 Thread Ashta
Sarah, Thank you very much. For the other variables I was trying to do the same job in different way because it is easier to list it Example test < which(dat$var1 !="BAA" | dat$var1 !="FAG" ) { dat <- dat[-test,]} and I did not get the right result. What am I missing here? On

Re: [R] Cleaning

2015-11-11 Thread Sarah Goslee
Please keep replies on the list so others may participate in the conversation. If you have a character vector containing the potential values, you might look at %in% for one approach to subsetting your data. Var1 %in% myvalues Sarah On Wed, Nov 11, 2015 at 7:10 PM, Ashta

Re: [R] Cleaning

2015-11-11 Thread Boris Steipe
If what you posted here is what you typed, your syntax is wrong. I strongly advise you to consult the two links here: http://adv-r.had.co.nz/Reproducibility.html http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example ... and please read the posting guide and don't

Re: [R] Cleaning

2015-11-11 Thread Sarah Goslee
On Wed, Nov 11, 2015 at 8:44 PM, Ashta wrote: > Hi Sarah, > > I used the following to clean my data, the program crushed several times. > > test <- dat[dat$Var1 == "YYZ" | dat$Var1 =="MSN" ,] > > What is the difference between these two > > test <- dat[dat$Var1 %in% "YYZ" |

[R] Cleaning

2015-11-11 Thread Ashta
Hi all, I have a data frame with huge rows and columns. When I looked at the data, it has several garbage values need to be cleaned. For a sample I am showing you the frequency distribution of one variables Var1 Freq 1:3 2]6 3MSN 1040 4YYZ 300 5\\4 6+

Re: [R] Cleaning

2015-11-11 Thread Sarah Goslee
Hi, On Wed, Nov 11, 2015 at 6:51 PM, Ashta wrote: > Hi all, > > I have a data frame with huge rows and columns. > > When I looked at the data, it has several garbage values need to be > > cleaned. For a sample I am showing you the frequency distribution > of one variables >

Re: [R] Cleaning

2015-11-11 Thread Ashta
Hi Sarah, I used the following to clean my data, the program crushed several times. *test <- dat[dat$Var1 == "YYZ" | dat$Var1 =="MSN" ,]* *What is the difference between these two**test <- dat[dat$Var1 **%in% "YYZ" | dat$Var1** %in% "MSN" ,]* On Wed, Nov 11, 2015 at 6:38 PM, Sarah Goslee

[R] Cleaning up workspace

2013-10-16 Thread Prof J C Nash (U30A)
In order to have a clean workspace at the start of each chapter of a book I'm kniting I've written a little script as follows: # chapclean.R # This cleans up the R workspace ilist-c(.GlobalEnv, package:stats, package:graphics, package:grDevices, package:utils, package:datasets,

Re: [R] Cleaning up workspace

2013-10-16 Thread Duncan Murdoch
This has been reported before on the bug list (https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=15481). The message is coming from the methods package, but I don't know if it's a bug or ignorable. Duncan Murdoch On 16/10/2013 11:03 AM, Prof J C Nash (U30A) wrote: In order to have a

Re: [R] Cleaning up messy Excel data

2012-03-03 Thread John C Nash
holtman jholt...@gmail.com To: Greg Snow 538...@gmail.com Cc: r-help r-help@r-project.org Subject: Re: [R] Cleaning up messy Excel data Message-ID: caaxdm-6vzxcli4mr0gukwge5eva0-gx03fruey9ej3cajy4...@mail.gmail.com Content-Type: text/plain; charset=ISO-8859-1 Unfortunately they only know

Re: [R] Cleaning up messy Excel data

2012-03-03 Thread Greg Snow
Sometimes we adapt to our environment, sometimes we adapt our environment to us. I like fortune(108). I actually was suggesting that you add a tool to your toolbox, not limit it. In my experience (and I don't expect everyone else's to match) data manipulation that seems easier in Excel than R is

Re: [R] Cleaning up messy Excel data

2012-03-03 Thread John Kane
Seconded John Kane Kingston ON Canada -Original Message- From: rolf.tur...@xtra.co.nz Sent: Sat, 03 Mar 2012 13:46:42 +1300 To: 538...@gmail.com Subject: Re: [R] Cleaning up messy Excel data On 03/03/12 12:41, Greg Snow wrote: SNIP It is possible to do the right thing

Re: [R] Cleaning up messy Excel data

2012-03-02 Thread Greg Snow
Try sending your clients a data set (data frame, table, etc) as an MS Access data table instead. They can still view the data as a table, but will have to go to much more effort to mess up the data, more likely they will do proper edits without messing anything up (mixing characters in with

Re: [R] Cleaning up messy Excel data

2012-03-02 Thread Jim Lemon
Unfortunately, a lot of people who use MS Office don't have or know how to use MS Access. Where I work now (as in the past) I have to tie someone to their chair, give them a few pokes with the cattle prod and then show them that a CSV file will load straight into Excel before I can convince

Re: [R] Cleaning up messy Excel data

2012-03-02 Thread Rolf Turner
On 03/03/12 12:41, Greg Snow wrote: SNIP It is possible to do the right thing in Excel, but Excel does not encourage (let alone force) you to do the right thing, but makes it easy to do the wrong thing. SNIP Fortune! cheers, Rolf Turner

Re: [R] Cleaning up messy Excel data

2012-03-02 Thread jim holtman
Unfortunately they only know how to use Excel and Word. They are not folks who use a computer every day. Many of them run factories or warehouses and asking them to use something like Access would not happen in my lifetime (I have retired twice already). I don't have any problems with them

Re: [R] Cleaning up messy Excel data

2012-03-01 Thread jim holtman
But there are some important reasons to use Excel. In my work there are a lot of people that I have to send the equivalent of a data.frame to who want to look at the data and possibly slice/dice the data differently and then send back to me updates. These folks do not know how to use R, but do

Re: [R] Cleaning up messy Excel data

2012-02-29 Thread John Kane
: noahsilver...@ucla.edu Sent: Tue, 28 Feb 2012 13:27:13 -0800 To: r-help@r-project.org Subject: [R] Cleaning up messy Excel data Unfortunately, some data I need to work with was delivered in a rather messy Excel file. I want to import into R and clean up some things so that I can do my

Re: [R] Cleaning up messy Excel data

2012-02-29 Thread Rolf Turner
On 01/03/12 04:43, John Kane wrote: (mydata- as.factor(c(1,2,3, 2, 5, 2))) str(mydata) newdata- as.character(mydata) newdata[newdata==2]- 0 newdata- as.numeric(newdata) str(newdata) We really need to keep Excel (and other spreadsheets) out of peoples hands. Amen, bro'!!! cheers,

[R] Cleaning up messy Excel data

2012-02-28 Thread Noah Silverman
Unfortunately, some data I need to work with was delivered in a rather messy Excel file. I want to import into R and clean up some things so that I can do my analysis. Pulling in a CSV from Excel is the easy part. My current challenge is dealing with some text mixed in the values. i.e.

Re: [R] Cleaning up messy Excel data

2012-02-28 Thread jim holtman
First of all when reading in the CSV file, use 'as.is = TRUE' to prevent the changing to factors. Now that things are character in that column, you can use some pattern expressions (gsub, regex, ...) to search for and change your data. E.g., sub(.*, 0, yourCol) should do it for you. On Tue,

Re: [R] Cleaning up messy Excel data

2012-02-28 Thread Robert Baer
-Original Message- From: Noah Silverman Sent: Tuesday, February 28, 2012 3:27 PM To: r-help Subject: [R] Cleaning up messy Excel data Unfortunately, some data I need to work with was delivered in a rather messy Excel file. I want to import into R and clean up some things so that I can

Re: [R] Cleaning up messy Excel data

2012-02-28 Thread Noah Silverman
That's exactly what I need. Thank You!! -- Noah Silverman UCLA Department of Statistics 8117 Math Sciences Building Los Angeles, CA 90095 On Feb 28, 2012, at 1:42 PM, jim holtman wrote: First of all when reading in the CSV file, use 'as.is = TRUE' to prevent the changing to factors. Now

Re: [R] Cleaning up messy Excel data

2012-02-28 Thread Stephen Sefick
Just replace that value with zero. If you provide some reproducible code I could probably give you a solution. ?dput good luck, Stephen On 02/28/2012 03:27 PM, Noah Silverman wrote: Unfortunately, some data I need to work with was delivered in a rather messy Excel file. I want to import

Re: [R] Cleaning date columns

2011-03-10 Thread natalie.vanzuydam
Dear Bill, Thanks very much for the reply and for the code. I have amended my personal details for future posts. I was wondering if there were any good books or tutorials for writing code similar to what you have provided above? Best wishes, Natalie Van Zuydam - Natalie Van Zuydam PhD

[R] Cleaning date columns

2011-03-09 Thread Newbie19_02
Hi Everyone, I have the following problem: data - structure(list(prochi = c(IND1, IND1, IND1, IND2, IND2, IND2, IND2, IND3, IND4, IND5), date_admission = structure(c(6468, 6470, 7063, 9981, 9983, 14186, 14372, 5129, 9767, 11168), class = Date)), .Names = c(prochi, date_admission), row.names

Re: [R] Cleaning date columns

2011-03-09 Thread Bill.Venables
Subject: [R] Cleaning date columns Hi Everyone, I have the following problem: data - structure(list(prochi = c(IND1, IND1, IND1, IND2, IND2, IND2, IND2, IND3, IND4, IND5), date_admission = structure(c(6468, 6470, 7063, 9981, 9983, 14186, 14372, 5129, 9767, 11168), class = Date)), .Names = c

[R] cleaning up a vector

2010-10-01 Thread mlarkin
I calculated a large vector. Unfortunately, I have some measurement error in my data and some of the values in the vector are erroneous. I ended up wih some Infs and NaNs in the vector. I would like to filter out the Inf and NaN values and only keep the values in my vector that range from 1 to

Re: [R] cleaning up a vector

2010-10-01 Thread Henrique Dallazuanna
Try this: x[is.finite(x)] On Fri, Oct 1, 2010 at 2:51 PM, mlar...@rsmas.miami.edu wrote: I calculated a large vector. Unfortunately, I have some measurement error in my data and some of the values in the vector are erroneous. I ended up wih some Infs and NaNs in the vector. I would like

Re: [R] cleaning up a vector

2010-10-01 Thread Erik Iverson
Mike, Small, reproducible examples are always useful for the rest of the us. x - c(0, NA, NaN, 1 , 10, 20, 21, Inf) x[!is.na(x) x =1 x= 20] Is that what you're looking for? mlar...@rsmas.miami.edu wrote: I calculated a large vector. Unfortunately, I have some measurement error in my data

Re: [R] cleaning up a vector

2010-10-01 Thread Peter Langfelder
On Fri, Oct 1, 2010 at 10:51 AM, mlar...@rsmas.miami.edu wrote: I calculated a large vector.  Unfortunately, I have some measurement error in my data and some of the values in the vector are erroneous.  I ended up wih some Infs and NaNs in the vector.  I would like to filter out the Inf and

Re: [R] cleaning up a vector

2010-10-01 Thread Marc Schwartz
On Oct 1, 2010, at 12:51 PM, mlar...@rsmas.miami.edu wrote: I calculated a large vector. Unfortunately, I have some measurement error in my data and some of the values in the vector are erroneous. I ended up wih some Infs and NaNs in the vector. I would like to filter out the Inf and NaN

Re: [R] cleaning up a vector

2010-10-01 Thread Henrique Dallazuanna
Complementing: findInterval(x[is.finite(x)], 1:20) On Fri, Oct 1, 2010 at 2:55 PM, Henrique Dallazuanna www...@gmail.comwrote: Try this: x[is.finite(x)] On Fri, Oct 1, 2010 at 2:51 PM, mlar...@rsmas.miami.edu wrote: I calculated a large vector. Unfortunately, I have some measurement

[R] Cleaning a time series

2008-05-23 Thread tolga . i . uzuner
Dear R Users, Was wondering if anyone can give me pointers to functionality in R that can help clean a time series ? For example, some kind of package/functionality which identifies potential errors and takes some action, such as replacement by some suitable value (carry-forward, average of

Re: [R] Cleaning a time series

2008-05-23 Thread Gabor Grothendieck
The zoo package has six na.* routines for carrying values forward, etc. library(zoo) ?zoo describes them. Also see the vignettes. On Fri, May 23, 2008 at 6:55 AM, [EMAIL PROTECTED] wrote: Dear R Users, Was wondering if anyone can give me pointers to functionality in R that can help clean

[R] Cleaning up memory in R

2008-05-14 Thread Anh Tran
I'm trying to work on a large dataset and after each segment of run, I need a command to flush the memory. I tried gc() and rm(list=ls()) but they don't seem to help. gc() does not do anything beside showing the memory usage. I'm using the package BSgenome from BioC. Thanks a bunch -- Regards,

Re: [R] Cleaning up memory in R

2008-05-14 Thread Duncan Murdoch
On 5/14/2008 3:59 PM, Anh Tran wrote: I'm trying to work on a large dataset and after each segment of run, I need a command to flush the memory. I tried gc() and rm(list=ls()) but they don't seem to help. gc() does not do anything beside showing the memory usage. How do you know it does

[R] Cleaning database: grep()? apply()?

2007-11-13 Thread Jonas Malmros
Dear R users, I have a huge database and I need to adjust it somewhat. Here is a very little cut out from database: CODENAME DATE DATA1 4813ADVANCED TELECOM19870.013 3845ADVANCED THERAPEUTIC SYS

Re: [R] Cleaning database: grep()? apply()?

2007-11-13 Thread jim holtman
Here is how to wittle it down for the first two parts of your question. I am not exactly what you are after in the third part. Is it that you want specific DATEs or do you want the ratio of the DATE[max]/DATE[min]? x - read.table(textConnection(CODENAME