Re: [R] Calculate daily means from 5-minute interval data

2021-09-05 Thread Jeff Newmiller
This problem nearly always boils down to using meta knowledge about the file. Having informal TZ info in the file is very helpful, but PST is not necessarily a uniquely-defined time zone specification so you have to draw on information outside of the file to know that these codes correspond to

Re: [R] Calculate daily means from 5-minute interval data

2021-09-05 Thread Bill Dunlap
What is the best way to read (from a text file) timestamps from the fall time change, where there are two 1:15am's? E.g., here is an extract from a US Geological Survey web site giving data on the river through our county on 2020-11-01, when we changed from PDT to PST,

Re: [R] Calculate daily means from 5-minute interval data

2021-09-04 Thread Rich Shepard
On Fri, 3 Sep 2021, Jeff Newmiller wrote: The fact that your projects are in a single time zone is irrelevant. I am not sure how you can be so confident in saying it does not matter whether the data were recorded in PDT or PST, since if it were recorded in PDT then there would be a day in March

Re: [R] Calculate daily means from 5-minute interval data

2021-09-04 Thread Jeff Newmiller
On Fri, 3 Sep 2021, Rich Shepard wrote: On Thu, 2 Sep 2021, Jeff Newmiller wrote: Regardless of whether you use the lower-level split function, or the higher-level aggregate function, or the tidyverse group_by function, the key is learning how to create the column that is the same for all

Re: [R] Calculate daily means from 5-minute interval data

2021-09-03 Thread Rich Shepard
On Thu, 2 Sep 2021, Jeff Newmiller wrote: Regardless of whether you use the lower-level split function, or the higher-level aggregate function, or the tidyverse group_by function, the key is learning how to create the column that is the same for all records corresponding to the time interval of

Re: [R] Calculate daily means from 5-minute interval data

2021-09-03 Thread Rich Shepard
On Thu, 2 Sep 2021, Jeff Newmiller wrote: Regardless of whether you use the lower-level split function, or the higher-level aggregate function, or the tidyverse group_by function, the key is learning how to create the column that is the same for all records corresponding to the time interval of

Re: [R] Calculate daily means from 5-minute interval data

2021-09-02 Thread Jeff Newmiller
Regardless of whether you use the lower-level split function, or the higher-level aggregate function, or the tidyverse group_by function, the key is learning how to create the column that is the same for all records corresponding to the time interval of interest. If you convert the sampdate to

Re: [R] Calculate daily means from 5-minute interval data

2021-09-02 Thread Rich Shepard
On Thu, 2 Sep 2021, Andrew Simmons wrote: You could use 'split' to create a list of data frames, and then apply a function to each to get the means and sds. cols <- "cfs" # add more as necessary S <- split(discharge[cols], format(discharge$sampdate, format = "%Y-%m")) means <-

Re: [R] Calculate daily means from 5-minute interval data

2021-09-02 Thread Andrew Simmons
You could use 'split' to create a list of data frames, and then apply a function to each to get the means and sds. cols <- "cfs" # add more as necessary S <- split(discharge[cols], format(discharge$sampdate, format = "%Y-%m")) means <- do.call("rbind", lapply(S, colMeans, na.rm = TRUE)) sds

Re: [R] Calculate daily means from 5-minute interval data

2021-09-02 Thread Rich Shepard
On Thu, 2 Sep 2021, Rich Shepard wrote: If I correctly understand the output of as.POSIXlt each date and time element is separate, so input such as 2016-03-03 12:00 would now be 2016 03 03 12 00 (I've not read how the elements are separated). (The TZ is not important because all data are either

Re: [R] Calculate daily means from 5-minute interval data

2021-09-02 Thread Rich Shepard
On Mon, 30 Aug 2021, Richard O'Keefe wrote: x <- rnorm(samples.per.day * 365) length(x) [1] 105120 Reshape the fake data into a matrix where each row represents one 24-hour period. m <- matrix(x, ncol=samples.per.day, byrow=TRUE) Richard, Now I understand the need to keep the date and

Re: [R] Calculate daily means from 5-minute interval data [RESOLVED]

2021-09-01 Thread Rich Shepard
On Tue, 31 Aug 2021, Jeff Newmiller wrote: Never use stringsAsFactors on uncleaned data. For one thing you give a factor to as.Date and it tries to make sense of the integer representation, not the character representation. Jeff, Oops! I had changed it in a previous version of the script and

Re: [R] Calculate daily means from 5-minute interval data

2021-09-01 Thread Rich Shepard
On Wed, 1 Sep 2021, Richard O'Keefe wrote: You have missed the point. The issue is not the temporal distance, but the fact that the data you have are NOT the raw instrumental data and are NOT subject to the limitations of the recording instruments. The data you get from the USGS is not the raw

Re: [R] Calculate daily means from 5-minute interval data

2021-08-31 Thread Richard O'Keefe
I wrote: > > By the time you get the data from the USGS, you are already far past the > > point > > where what the instruments can write is important. Rich Shepard replied: > The data are important because they show what's happened in that period of > record. Don't physicians take a medical

Re: [R] Calculate daily means from 5-minute interval data

2021-08-31 Thread Jeff Newmiller
Never use stringsAsFactors on uncleaned data. For one thing you give a factor to as.Date and it tries to make sense of the integer representation, not the character representation. library(dplyr) dta <- read.csv( text = "sampdate,samptime,cfs 2020-08-26,09:30,136000 2020-08-26,09:35,126000

Re: [R] Calculate daily means from 5-minute interval data

2021-08-31 Thread Rich Shepard
On Sun, 29 Aug 2021, Jeff Newmiller wrote: The general idea is to create a "grouping" column with repeated values for each day, and then to use aggregate to compute your combined results. The dplyr package's group_by/summarise functions can also do this, and there are also proponents of the

Re: [R] Calculate daily means from 5-minute interval data

2021-08-31 Thread Rich Shepard
On Tue, 31 Aug 2021, Richard O'Keefe wrote: By the time you get the data from the USGS, you are already far past the point where what the instruments can write is important. Richard, The data are important because they show what's happened in that period of record. Don't physicians take a

Re: [R] Calculate daily means from 5-minute interval data

2021-08-30 Thread Richard O'Keefe
By the time you get the data from the USGS, you are already far past the point where what the instruments can write is important. (Obviously an instrument can be sufficiently broken that it cannot write anything.) The data for Rogue River that I just downloaded include this comment: # Data for

Re: [R] Calculate daily means from 5-minute interval data

2021-08-30 Thread Bert Gunter
I do not wish to express any opinion on what should be done or how. But... 1. I assume that when data are missing, they are missing -- i.e. simply not present in the data. So there will be possibly several/many in succession missing rows of data corresponding to those times, right? (Apologies for

Re: [R] Calculate daily means from 5-minute interval data

2021-08-30 Thread Avi Gross via R-help
means from 5-minute interval data On Tue, 31 Aug 2021, Richard O'Keefe wrote: > I made up fake data in order to avoid showing untested code. It's not > part of the process I was recommending. I expect data recorded every N > minutes to use NA when something is missing, not

Re: [R] Calculate daily means from 5-minute interval data

2021-08-30 Thread Rich Shepard
On Tue, 31 Aug 2021, Richard O'Keefe wrote: I made up fake data in order to avoid showing untested code. It's not part of the process I was recommending. I expect data recorded every N minutes to use NA when something is missing, not to simply not be recorded. Well and good, all that means is

Re: [R] Calculate daily means from 5-minute interval data

2021-08-30 Thread Richard O'Keefe
I made up fake data in order to avoid showing untested code. It's not part of the process I was recommending. I expect data recorded every N minutes to use NA when something is missing, not to simply not be recorded. Well and good, all that means is that reshaping the data is not a trivial call

Re: [R] Calculate daily means from 5-minute interval data

2021-08-30 Thread Rich Shepard
On Mon, 30 Aug 2021, Richard O'Keefe wrote: Why would you need a package for this? samples.per.day <- 12*24 That's 12 5-minute intervals per hour and 24 hours per day. Generate some fake data. Richard, The problem is that there are days with fewer than 12 recorded values for various

Re: [R] Calculate daily means from 5-minute interval data

2021-08-30 Thread Richard O'Keefe
It is not clear to me who Jeff Newmiller's comment about periodicity is addressed to. The original poster, for asking for daily summaries? A summary of what I wrote: - daily means and standard deviations are a very poor choice for river flow data - if you insist on doing that anyway, no fancy

Re: [R] Calculate daily means from 5-minute interval data

2021-08-29 Thread Jeff Newmiller
IMO assuming periodicity is a bad practice for this. Missing timestamps happen too, and there is no reason to build a broken analysis process. On August 29, 2021 7:09:01 PM PDT, Richard O'Keefe wrote: >Why would you need a package for this? >> samples.per.day <- 12*24 > >That's 12 5-minute

Re: [R] Calculate daily means from 5-minute interval data

2021-08-29 Thread Richard O'Keefe
Why would you need a package for this? > samples.per.day <- 12*24 That's 12 5-minute intervals per hour and 24 hours per day. Generate some fake data. > x <- rnorm(samples.per.day * 365) > length(x) [1] 105120 Reshape the fake data into a matrix where each row represents one 24-hour period. >

Re: [R] Calculate daily means from 5-minute interval data

2021-08-29 Thread Rich Shepard
On Sun, 29 Aug 2021, Andrew Simmons wrote: I would suggest something like: Thanks, Andrew. Stay well, Rich __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the

Re: [R] Calculate daily means from 5-minute interval data

2021-08-29 Thread Rich Shepard
On Sun, 29 Aug 2021, Rui Barradas wrote: Hope this helps, Rui, Greatly! I'll study it carefully so I fully understand the process. Many thanks. Stay well, Rich __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see

Re: [R] Calculate daily means from 5-minute interval data

2021-08-29 Thread Rui Barradas
Hello, I forgot in my previous answer, sorry for the duplicated mails. The function in my previous mail has a na.rm argument, defaulting to FALSE, pass na.rm = TRUE to remove the NA's. agg <- aggregate(cfs ~ date, df1, fun, na.rm = TRUE) Or simply change the default. I prefer to set

Re: [R] Calculate daily means from 5-minute interval data

2021-08-29 Thread Rich Shepard
On Sun, 29 Aug 2021, Jeff Newmiller wrote: You may find something useful on handling timestamp data here: https://jdnewmil.github.io/ Jeff, I'll certainly read those articles. Many thanks, Rich __ R-help@r-project.org mailing list -- To

Re: [R] Calculate daily means from 5-minute interval data

2021-08-29 Thread Rui Barradas
Hello, You have date and hour in two separate columns, so to compute daily stats part of the work is already done. (Were they in the same column you would have to extract the date only.) # convert to class "Date" df1$date <- as.Date(df1$date) # function to compute the stats required # it's

Re: [R] Calculate daily means from 5-minute interval data

2021-08-29 Thread Andrew Simmons
Hello, I would suggest something like: date <- seq(as.Date("2020-01-01"), as.Date("2020-12-31"), 1) time <- sprintf("%02d:%02d", rep(0:23, each = 12), seq.int(0, 55, 5)) x <- data.frame( date = rep(date, each = length(time)), time = time ) x$cfs <- stats::rnorm(nrow(x))

Re: [R] Calculate daily means from 5-minute interval data

2021-08-29 Thread Rich Shepard
On Sun, 29 Aug 2021, Rui Barradas wrote: I forgot in my previous answer, sorry for the duplicated mails. The function in my previous mail has a na.rm argument, defaulting to FALSE, pass na.rm = TRUE to remove the NA's. agg <- aggregate(cfs ~ date, df1, fun, na.rm = TRUE) Or simply change

Re: [R] Calculate daily means from 5-minute interval data

2021-08-29 Thread Jeff Newmiller
You may find something useful on handling timestamp data here: https://jdnewmil.github.io/ On August 29, 2021 9:23:31 AM PDT, Jeff Newmiller wrote: >The general idea is to create a "grouping" column with repeated values for >each day, and then to use aggregate to compute your combined

Re: [R] Calculate daily means from 5-minute interval data

2021-08-29 Thread Rich Shepard
On Sun, 29 Aug 2021, Jeff Newmiller wrote: The general idea is to create a "grouping" column with repeated values for each day, and then to use aggregate to compute your combined results. The dplyr package's group_by/summarise functions can also do this, and there are also proponents of the

Re: [R] Calculate daily means from 5-minute interval data

2021-08-29 Thread Rich Shepard
On Sun, 29 Aug 2021, Eric Berger wrote: Provide dummy data (e.g. 5-10 lines), say like the contents of a csv file, and calculate by hand what you'd like to see in the plot. (And describe what the plot would look like.) Eric, Mea culpa! I extracted a set of sample data and forgot to include

Re: [R] Calculate daily means from 5-minute interval data

2021-08-29 Thread Jeff Newmiller
The general idea is to create a "grouping" column with repeated values for each day, and then to use aggregate to compute your combined results. The dplyr package's group_by/summarise functions can also do this, and there are also proponents of the data.table package which is high performance

Re: [R] Calculate daily means from 5-minute interval data

2021-08-29 Thread Eric Berger
Hi Rich, Your request is a bit open-ended but here's a suggestion that might help get you an answer. Provide dummy data (e.g. 5-10 lines), say like the contents of a csv file, and calculate by hand what you'd like to see in the plot. (And describe what the plot would look like.) It sounds like

[R] Calculate daily means from 5-minute interval data

2021-08-29 Thread Rich Shepard
I have a year's hydraulic data (discharge, stage height, velocity, etc.) from a USGS monitoring gauge recording values every 5 minutes. The data files contain 90K-93K lines and plotting all these data would produce a solid block of color. What I want are the daily means and standard deviation