Abou,
I am not trying to be negative. Assuming you are a professor of Statistics,
your request seems odd as what you are asking about is very routine in much of
statistical work where you want to make a model or something using just part of
your data and need to reserve some to check if you
Hi Abou,
One way is to shuffle the original data frame using sample(). and
split up the result into three equal parts.
I was going to provide example code, but Avi's response popped up and
I kind of agree with him.
Jim
On Fri, Sep 3, 2021 at 11:31 AM AbouEl-Makarim Aboueissa
wrote:
>
> Dear
Sorry, please forget about it. I believe that I am very serious when I
posted my question.
with thanks
abou
__
*AbouEl-Makarim Aboueissa, PhD*
*Professor, Statistics and Data Science*
*Graduate Coordinator*
*Department of Mathematics and Statistics*
*University of Southern
What is stopping you Abou?
Some of us here start wondering if we have better things to do than homework
for others. Help is supposed to be after they try and encounter issues that we
may help with.
So think about your problem. You supplied data in a file that is NOT in CSV
format but is in
Dear All:
How to split a column data *randomly* into three groups. Please see the
attached data. I need to split column #2 titled "Data"
with many thanks
abou
__
*AbouEl-Makarim Aboueissa, PhD*
*Professor, Statistics and Data Science*
*Graduate Coordinator*
*Department of
Hi Eliza
This seems to work:
plot(BFA3[,1],BFA3[,4],
pch=16, xlab = "", ylab = "",col=(BFA3[,2]==BFA3[,3])+2,axes=FALSE)
but I have no idea what you are trying to do with the
as.numeric(as.Date(...))
business.
Jim
On Fri, Sep 3, 2021 at 8:44 AM Eliza Botto wrote:
>
> Dear useRs,
>
> For
Dear useRs,
For the following dataset,
dput(BFA3)
structure(c(17532, 17533, 17534, 17535, 17536, 17537, 17538,
17539, 17540, 17541, 17542, 17543, 17544, 17545, 17546, 17547,
17548, 17549, 17550, 17551, 17552, 17553, 17554, 17555, 17556,
17557, 17558, 17559, 17560, 17561, 17562, 17563, 17564,
Thanks, that is perfect!
On Thu, Sep 2, 2021 at 7:02 PM Deepayan Sarkar
wrote:
>
> On Thu, Sep 2, 2021 at 9:26 PM Enrico Schumann
> wrote:
> >
> > On Thu, 02 Sep 2021, Luigi Marongiu writes:
> >
> > > Hello, is it possible to show only the header (that is: `'data.frame':
> > > x obs. of y
Hello,
I believe but do not have references that str was meant for interactive
use, not for use in a script or package. If this is the case, then it
should be rare to have to output to an object such as a character vector.
As for my solution, it is far from perfect, I try to avoid
On 02/09/2021 3:20 p.m., Greg Minshall wrote:
Andrew,
x[] <- lapply(x, function(xx) {
xx[is.nan(xx)] <- NA_real_
xx
})
is different from
x <- lapply(x, function(xx) {
xx[is.nan(xx)] <- NA_real_
xx
})
indeed, the two are different -- but some ignorance of mine is
Regardless of whether you use the lower-level split function, or the
higher-level aggregate function, or the tidyverse group_by function, the key is
learning how to create the column that is the same for all records
corresponding to the time interval of interest.
If you convert the sampdate to
On Thu, 2 Sep 2021, Andrew Simmons wrote:
You could use 'split' to create a list of data frames, and then apply a
function to each to get the means and sds.
cols <- "cfs" # add more as necessary
S <- split(discharge[cols], format(discharge$sampdate, format = "%Y-%m"))
means <-
Andrew,
> x[] <- lapply(x, function(xx) {
> xx[is.nan(xx)] <- NA_real_
> xx
> })
>
> is different from
>
> x <- lapply(x, function(xx) {
> xx[is.nan(xx)] <- NA_real_
> xx
> })
indeed, the two are different -- but some ignorance of mine is exposed.
i wonder, can you explain why
You could use 'split' to create a list of data frames, and then apply a
function to each to get the means and sds.
cols <- "cfs" # add more as necessary
S <- split(discharge[cols], format(discharge$sampdate, format = "%Y-%m"))
means <- do.call("rbind", lapply(S, colMeans, na.rm = TRUE))
sds
On Thu, 2 Sep 2021, Rich Shepard wrote:
If I correctly understand the output of as.POSIXlt each date and time
element is separate, so input such as 2016-03-03 12:00 would now be 2016 03
03 12 00 (I've not read how the elements are separated). (The TZ is not
important because all data are either
On Mon, 30 Aug 2021, Richard O'Keefe wrote:
x <- rnorm(samples.per.day * 365)
length(x)
[1] 105120
Reshape the fake data into a matrix where each row represents one
24-hour period.
m <- matrix(x, ncol=samples.per.day, byrow=TRUE)
Richard,
Now I understand the need to keep the date and
On Thu, 2 Sep 2021, Enrico Schumann wrote:
There is no column 'ht'.
Enrico,
New eyeballs caught my change in variable name that I kept missing.
Thanks very much,
Rich
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
On Thu, Sep 2, 2021 at 9:26 PM Enrico Schumann wrote:
>
> On Thu, 02 Sep 2021, Luigi Marongiu writes:
>
> > Hello, is it possible to show only the header (that is: `'data.frame':
> > x obs. of y variables:` part) of the str function?
> > Thank you
>
> Perhaps one more solution. You could limit
On Thu, 02 Sep 2021, Rich Shepard writes:
> The first three commands in the script are:
> stage <- read.csv('../data/water/gauge-ht.dat', header
> = TRUE, sep = ',', stringsAsFactors = FALSE)
> stage$sampdate <- as.Date(stage$sampdate)
> stage$ht <- as.numeric(stage$ht, length = 6)
>
> Running
On Thu, 02 Sep 2021, Rich Shepard writes:
> The first three commands in the script are:
> stage <- read.csv('../data/water/gauge-ht.dat', header
> = TRUE, sep = ',', stringsAsFactors = FALSE)
> stage$sampdate <- as.Date(stage$sampdate)
> stage$ht <- as.numeric(stage$ht, length = 6)
>
> Running
Thanks for the interesting method Rui. So that is a way to do a redirect of
output not to a sinkfile but to an in-memory variable as a textConnection.
Of course, one has to wonder why the makers of str thought it would be too
inefficient to have an option that returns the output in a form that
The first three commands in the script are:
stage <- read.csv('../data/water/gauge-ht.dat', header = TRUE, sep = ',',
stringsAsFactors = FALSE)
stage$sampdate <- as.Date(stage$sampdate)
stage$ht <- as.numeric(stage$ht, length = 6)
Running the script produces this error:
source('stage.R')
On Thu, 02 Sep 2021, Luigi Marongiu writes:
> Hello, is it possible to show only the header (that is: `'data.frame':
> x obs. of y variables:` part) of the str function?
> Thank you
Perhaps one more solution. You could limit the number
of list components to be printed, though it will leave
a
Luigi,
If you are sure you are looking at something like a data.frame, and all you
want o know is how many rows and how many columns are in it, then str() is
perhaps too detailed a tool.
The functions nrow() and ncol() tell you what you want and you can get both
together with dim(). You can, of
Thank you!
On Thu, Sep 2, 2021 at 4:17 PM Andrew Simmons wrote:
>
> It seems like you might've missed one more thing, you need the brackets next
> to 'x' to get it to work.
>
>
> x[] <- lapply(x, function(xx) {
> xx[is.nan(xx)] <- NA_real_
> xx
> })
>
> is different from
>
> x <-
It seems like you might've missed one more thing, you need the brackets
next to 'x' to get it to work.
x[] <- lapply(x, function(xx) {
xx[is.nan(xx)] <- NA_real_
xx
})
is different from
x <- lapply(x, function(xx) {
xx[is.nan(xx)] <- NA_real_
xx
})
Also, if all of your data is
Sorry,
still I don't get it:
```
> dim(df)
[1] 302 626
> # clean
> df <- lapply(x, function(xx) {
+ xx[is.nan(xx)] <- NA
+ xx
+ })
> dim(df)
NULL
```
On Thu, Sep 2, 2021 at 3:47 PM Andrew Simmons wrote:
>
> You removed the second line 'xx' from the function, put it back and it should
> work
Hello,
In the particular case you have, to change to NA based on condition, use
`is.na<-`.
Here is some test data, 3 times the same df.
set.seed(2021)
df3 <- df2 <- df1 <- data.frame(
x = c(0, 0, 1, 2, 3),
y = c(1, 2, 3, 0, 0),
z = rbinom(5, 1, prob = c(0.25, 0.75)),
a =
Hi
you could operate with whole data frame (sometimes)
head(iris)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2
You removed the second line 'xx' from the function, put it back and it
should work
On Thu, Sep 2, 2021, 09:45 Luigi Marongiu wrote:
> `data[sapply(data, is.nan)] <- NA` is a nice compact command, but I
> still get NaN when using the summary function, for instance one of the
> columns give:
>
`data[sapply(data, is.nan)] <- NA` is a nice compact command, but I
still get NaN when using the summary function, for instance one of the
columns give:
```
Min. : NA
1st Qu.: NA
Median : NA
Mean :NaN
3rd Qu.: NA
Max. : NA
NA's :110
```
I tried to implement the second solution but:
```
df
Hello,
it is possible to select the columns of a dataframe in sequence with:
```
for(i in 1:ncol(df)) {
df[ , i]
}
# or
for(i in 1:ncol(df)) {
df[ i]
}
```
And change all values with, for instance:
```
for(i in 1:ncol(df)) {
df[ , i] <- df[ , i] + 10
}
```
Is it possible to apply a
Hello,
I would use something like:
x <- c(1:5, NaN) |> sample(100, replace = TRUE) |> matrix(10, 10) |>
as.data.frame()
x[] <- lapply(x, function(xx) {
xx[is.nan(xx)] <- NA_real_
xx
})
This prevents attributes from being changed in 'x', but accomplishes the
same thing as you have
Hi
what about
data[sapply(data, is.nan)] <- NA
Cheers
Petr
> -Original Message-
> From: R-help On Behalf Of Luigi Marongiu
> Sent: Thursday, September 2, 2021 3:18 PM
> To: r-help
> Subject: [R] How to globally convert NaN to NA in dataframe?
>
> Hello,
> I have some NaN values in
Hello,
I have some NaN values in some elements of a dataframe that I would
like to convert to NA.
The command `df1$col[is.nan(df1$col)]<-NA` allows to work column-wise.
Is there an alternative for the global modification at once of all
instances?
I have seen from
Thank you! better than dim() anyway.
Best regards
Luigi
On Thu, Sep 2, 2021 at 1:31 PM Rui Barradas wrote:
>
> Hello,
>
> Not perfect but works for data.frames:
>
>
> header_str <- function(x){
>capture.output(str(x))[[1]]
> }
> header_str(iris)
> header_str(AirPassengers)
> header_str(1:10)
Hello,
Not perfect but works for data.frames:
header_str <- function(x){
capture.output(str(x))[[1]]
}
header_str(iris)
header_str(AirPassengers)
header_str(1:10)
Hope this helps,
Rui Barradas
Às 12:02 de 02/09/21, Luigi Marongiu escreveu:
Hello, is it possible to show only the header
Hello, is it possible to show only the header (that is: `'data.frame':
x obs. of y variables:` part) of the str function?
Thank you
--
Best regards,
Luigi
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
On Wed, 1 Sep 2021 19:29:32 -0400
Duncan Murdoch wrote:
> I don't know the header of your foo() method, but let's suppose foo()
> is
>
>foo <- function(x, data, ...) {
> UseMethod("foo")
>}
>
> with
>
>foo.formula <- function(x, data, ...) {
> # do something with the
Hello,
With the new data, here are two ways.
The first with a for loop. I find it simple and readable.
for(b in unique(B[,1])){
A[which(A[,1] == b), 2] <- B[which(B[,1] == b), 2]
}
na <- is.na(A[,2])
A[!na, 2]
sum(!na) # [1] 216
sum(A[,1] %in% B[,1]) # [1] 216
# Another way,
Thank you, Eric. Very useful.
From: Eric Berger
Sent: Wednesday, September 1, 2021 12:31 PM
To: cag...@gmail.com
Cc: R mailing list
Subject: Re: [R] how to install npsm package
Instructions can be found at https://github.com/kloke/npsm
On Wed, Sep 1, 2021 at 6:27 PM
Dear useRs,
I'm having a problem to combine geom_boxplot and geom_point with jitter.
It is difficult to explain but the code and result should make it clear
(the example dataset is long so I copy it at the end of the email):
p <- ggplot(my_data, aes(x = Diet, y = value, color = Software))
p
Thank you.
el
On 02/09/2021 00:41, Bill Dunlap wrote:
z <- tibble(Code=c("NA","NZ",NA), Name=c("Namibia","New Zealand","?"))
z
# A tibble: 3 x 2
Code Name
1 NANamibia
2 NZNew Zealand
3 ?
subset(z, Code=="NA")
# A tibble: 1 x 2
Code Name
1 NANamibia
43 matches
Mail list logo