This enables the user to produce quantiles that are equivalent to those in various statistics package. There is a type argument that allows one to choose between the various sample quantile methods. type=7 gives identical results to the current R function quantile.default(). In our revised function, type=8 is the default following the recommendation of Hyndman and Fan (American Statistician, 1996).
We suggest the attached function replaces the current quantile.default() function in R as it provides additional functionality without increasing computation time or affecting ease-of-use. If backwards-compatibility is important, we are happy to set type=7 as the default. However, we prefer type=8. For moderate to large sample sizes, the difference is negligble.
The code and associated documentation is as close as possible to what already exists.
Regards, Rob Hyndman
__________________________________________________ Professor Rob J Hyndman Department of Econometrics & Business Statistics Monash University, VIC 3800, Australia. http://www-personal.buseco.monash.edu.au/~hyndman/
quantile.default <- function(x, probs = seq(0, 1, 0.25), na.rm = FALSE, names = TRUE, type = 8, ...) { if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) stop("Missing values and NaN's not allowed if `na.rm' is FALSE") if (any((p.ok <- !is.na(probs)) & (probs < 0 | probs > 1))) stop("probs outside [0,1]") if (na.p <- any(!p.ok)) { o.pr <- probs probs <- probs[p.ok] } type <- as.integer(type) if (is.na(type) || (type < 1 | type > 9)) stop("type outside range [1,9]") np <- length(probs) n <- length(x) if (n > 0 && np > 0) { if (type <= 3) { ## Types 1, 2 and 3 are discontinuous sample qs. if (type == 3) nppm <- n * probs - .5 # n * probs + m; m = -0.5 else nppm <- n * probs # m = 0 j <- floor(nppm) switch(type, h <- ifelse(nppm > j, 1, 0), # type 1 h <- ifelse(nppm > j, 1, 0.5), # type 2 h <- ifelse((nppm == j) && ((j %% 2) == 0), 0, 1)) # type 3 } else { ## Types 4 through 9 are continuous sample qs. switch(type - 3, {a <- 0; b <- 1}, # type 4 a <- b <- 0.5, # type 5 a <- b <- 0, # type 6 a <- b <- 1, # type 7 a <- b <- 1 / 3, # type 8 a <- b <- 3 / 8) # type 9 nppm <- a + probs * (n + 1 - a - b) # n*probs + m j <- floor(nppm) # m = a + probs*(1 - a - b) h <- nppm - j } x <- sort(x, partial = unique(c(1,j[j>0 & j<=n],(j+1)[j>0 & j<n],n))) x <- c(x[1], x[1], x, x[n], x[n]) qs <- (1 - h) * x[j + 2] + h * x[j + 3] } else { qs <- rep(as.numeric(NA), np) } if (names && np > 0) { dig <- max(2, getOption("digits")) names(qs) <- paste(if (np < 100) formatC(100 * probs, format = "fg", wid = 1, digits = dig) else format(100 * probs, trim = TRUE, digits = dig), "%", sep = "") } if (na.p) { o.pr[p.ok] <- qs names(o.pr) <- rep("", length(o.pr)) names(o.pr)[p.ok] <- names(qs) o.pr } else qs }
\name{quantile} \title{Sample Quantiles} \alias{quantile} \alias{quantile.default} \description{ The generic function \code{quantile} produces sample quantiles corresponding to the given probabilities. The smallest observation corresponds to a probability of 0 and the largest to a probability of 1. } \usage{ quantile(x, \dots)
\method{quantile}{default}(x, probs = seq(0, 1, 0.25), na.rm = FALSE, names = TRUE, type = 8, ...) } \arguments{ \item{x}{numeric vector whose sample quantiles are wanted.} \item{probs}{numeric vector of probabilities with values in \eqn{[0,1]}{[0,1]}.} \item{na.rm}{logical; if \code{TRUE} any \code{NA} or \code{NaN} is removed from \code{x} before the quantiles are computed. If \code{FALSE} the presence of \code{NA} or \code{NaN} in \code{x} aborts the function.} \item{names}{logical; if \code{TRUE}, the result has a \code{\link[base]{names}} attribute. Set to FALSE for efficiency when \code{probs} is long.} \item{type}{an integer between 1 and 9 selecting one of the nine quantile algorithms detailed below to be used.} \item{\dots}{further arguments passed to or from other methods} } \value{ A vector of the same length as \code{probs} is returned; if \code{names = TRUE}, it has a \code{\link[base]{names}} attribute. \code{\link[base]{NA}} and \code{\link[base]{NaN}} values in \code{probs} are propagated to the result. } \details{ \code{quantile} returns estimates of underlying distribution quantiles based on one or two order statistics from the supplied elements in \code{x} at probabilities in \code{probs}. One of the nine quantile algorithms discussed in Hyndman and Fan (1996), selected by \code{type}, is employed. Sample quantiles of type \eqn{i} are defined by \deqn{Q_{i}(p) = (1 - \gamma)x_{j} + \gamma x_{j+1}}{Q[i](p) = (1 - gamma) x[j] + gamma x[j+1],} where \eqn{1 \le i \le 9}{1 <= i <= 9}, \eqn{ } \eqn{\frac{j - m}{n} \le p < \frac{j - m + 1}{n}}{(j - m) / n <= p < (j - m + 1) / n}, \eqn{ } \eqn{x_{j}}{x[j]} is the \eqn{j}th order statistic, \eqn{n} is the sample size, and \eqn{m} is a constant determined by the sample quantile type. Here \eqn{\gamma}{gamma} depends on \eqn{j = \lfloor np + m\rfloor}{j = floor(np + m),} and \eqn{g = np + m - j.} For the continuous sample quantile types (4 through 9), the sample quantiles can be obtained by linear interpolation between the \eqn{k}th order statistic and \eqn{p(k)}: \deqn{p(k) = \frac{k - \alpha} {n - \alpha - \beta + 1}}{p(k) = (k - alpha) / (n - alpha - beta + 1),} where \eqn{\alpha}{alpha} and \eqn{\beta}{beta} are constants determined by the type. Further, \eqn{m = \alpha + p \left( 1 - \alpha - \beta \right)}{m = alpha + p(1 - alpha - beta),} and \eqn{\gamma = g = np + m - j}{gamma = g = np + m - j.} \strong{Discontinuous sample quantile types 1, 2, and 3} \describe{ \item{Type 1}{Inverse of empirical distribution function.} \item{Type 2}{Similar to type 1 but with averaging at discontinuities.} \item{Type 3}{SAS definition: nearest even order statistic.} } \strong{Continuous sample quantile types 4 through 9} \describe{ \item{Type 4}{\eqn{p(k) = \frac{k}{n}}{p(k) = k / n}. That is, linear interpolation of the empirical cdf.} \item{Type 5}{\eqn{p(k) = \frac{k - 0.5}{n}}{p(k) = (k - 0.5) / n}. That is a piecewise linear function where the knots are the values midway through the steps of the empirical cdf. This is popular amongst hydrologists.} \item{Type 6}{\eqn{p(k) = \frac{k}{n + 1}}{p(k) = k / (n + 1)}. Thus \eqn{p(k) = \mbox{E}[F(x_{k})]}{p(k) = E[F(x[k])]}. This is used by Minitab and by SPSS.} \item{Type 7}{\eqn{p(k) = \frac{k - 1}{n - 1}}{p(k) = (k - 1) / (n - 1)}. In this case, \eqn{p(k) = \mbox{mode}[F(x_{k})]}{p(k) = mode[F(x[k])]}. This is used by S-Plus.} \item{Type 8}{\eqn{p(k) = \frac{k - \frac{1}{3}}{n + \frac{1}{3}}}{p(k) = (k - 1/3) / (n + 1/3)}. Then \eqn{p(k) \approx \mbox{median}[F(x_{k})]}{p(k) =~ median[F(x[k])]}. The resulting quantile estimates are approximately median-unbiased regardless of the distribution of \code{x}.} \item{Type 9}{\eqn{p(k) = \frac{k - \frac{3}{8}}{n + \frac{1}{4}}}{p(k) = (k - 3/8) / (n + 1/4)}. The resulting quantile estimates are approximately unbiased if \code{x} is normally distributed.} } Hyndman and Fan (1996) recommend type 8 which is the default method for this function. } \references{ Hyndman, R. J. and Fan, Y. (1996) Sample quantiles in statistical packages, \emph{American Statistician}, \bold{50}, 361-365. } \seealso{ \code{\link[stats]{ecdf}} for empirical distributions of which quantile is the \dQuote{inverse}; \code{\link[graphics]{boxplot.stats}} and \code{\link[stats]{fivenum}} for computing \dQuote{versions} of quartiles, etc. } \examples{ quantile(x <- rnorm(1001))# Extremes & Quartiles by default ### Compare different methods p <- c(0.1,0.5,1,2,5,10,50)/100 quantile(x, p, type=1) quantile(x, p, type=2) quantile(x, p, type=3) quantile(x, p, type=4) quantile(x, p, type=5) quantile(x, p, type=6) quantile(x, p, type=7) quantile(x, p, type=8) quantile(x, p, type=9) } \keyword{univar} \author{Ivan Frohne and Rob J Hyndman}
______________________________________________ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-devel