[ https://issues.apache.org/jira/browse/ARROW-12994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17360045#comment-17360045 ]
Nic Crane edited comment on ARROW-12994 at 6/9/21, 1:50 PM: ------------------------------------------------------------ [~pachamaltese] - this test fails locally for you and I but not on the CI, as we have non-UTC timezones. The docs for the `tz` argument of `strptime` say "A character string specifying the time zone to be used for the conversion. System-specific (see as.POSIXlt), but "" is the current time zone". The default timezone on the CI is UTC, so the tests will pass there, whereas you and I are both in non-UTC timezones, so we get local failures. Have submitted a PR which specifies tz="UTC" which should fix this. was (Author: thisisnic): [~pachamaltese] - this test fails locally for you and I but not on the CI, as we have non-UTC timezones. The docs for the `tz` argument of `strptime` say "A character string specifying the time zone to be used for the conversion. System-specific (see as.POSIXlt), but "" is the current time zone". The default timezone on the CI is UTC, so the tests will pass there, whereas you and I are both in non-UTC timezones, so we get local failures. I was going to suggest updating the call to strptime to include tz = "UTC"; however, this argument is not currently supported in the NSE function 'strptime' as I think it hadn't yet been supported at the C++ layer when the R binding was being written. > [R] stringr tests: 4 hours of difference between arrow and strptime > ------------------------------------------------------------------- > > Key: ARROW-12994 > URL: https://issues.apache.org/jira/browse/ARROW-12994 > Project: Apache Arrow > Issue Type: Task > Components: R > Affects Versions: 4.0.1 > Reporter: Mauricio 'Pachá' Vargas Sepúlveda > Priority: Major > > Here's the problem I detected while triaging tickets. > This was run locally after merging from apache/arrow at commit 8773b9d and > re-building both Arrow library and Arrow R package. > {code:r} > library(arrow) > #> See arrow_info() for available features > #> > #> Attaching package: 'arrow' > #> The following object is masked from 'package:utils': > #> > #> timestamp > library(dplyr) > #> > #> Attaching package: 'dplyr' > #> The following objects are masked from 'package:stats': > #> > #> filter, lag > #> The following objects are masked from 'package:base': > #> > #> intersect, setdiff, setequal, union > library(testthat) > #> > #> Attaching package: 'testthat' > #> The following object is masked from 'package:dplyr': > #> > #> matches > #> The following object is masked from 'package:arrow': > #> > #> matches > tstring <- tibble(x = c("08-05-2008", NA)) > tstamp <- tibble(x = c(strptime("08-05-2008", format = "%m-%d-%Y"), NA)) > expect_equal( > tstring %>% > Table$create() %>% > mutate( > x = strptime(x, format = "%m-%d-%Y") > ) %>% > collect(), > tstamp, > check.tzone = FALSE > ) > #> Error: `%>%`(...) not equal to `tstamp`. > #> Component "x": Mean absolute difference: 14400 > {code} > We can see that the dates are different by exact 4 hours by removing the > expectation: > {code:r} > library(arrow) > #> See arrow_info() for available features > #> > #> Attaching package: 'arrow' > #> The following object is masked from 'package:utils': > #> > #> timestamp > library(dplyr) > #> > #> Attaching package: 'dplyr' > #> The following objects are masked from 'package:stats': > #> > #> filter, lag > #> The following objects are masked from 'package:base': > #> > #> intersect, setdiff, setequal, union > library(testthat) > #> > #> Attaching package: 'testthat' > #> The following object is masked from 'package:dplyr': > #> > #> matches > #> The following object is masked from 'package:arrow': > #> > #> matches > tstring <- tibble(x = c("08-05-2008", NA)) > tstamp <- tibble(x = c(strptime("08-05-2008", format = "%m-%d-%Y"), NA)) > tstring %>% > Table$create() %>% > mutate( > x = strptime(x, format = "%m-%d-%Y") > ) %>% > collect() > #> # A tibble: 2 x 1 > #> x > #> <dttm> > #> 1 2008-08-04 20:00:00 > #> 2 NA > tstamp > #> # A tibble: 2 x 1 > #> x > #> <dttm> > #> 1 2008-08-05 00:00:00 > #> 2 NA > {code} > _Created on 2021-06-07 by the [reprex package|https://reprex.tidyverse.org] > (v2.0.0)_ -- This message was sent by Atlassian Jira (v8.3.4#803005)