Re: [Rd] [External] Re: 1954 from NA

2021-05-24 Thread Avi Gross via R-devel
I was thinking about how one does things in a language that is properly object-oriented versus R that makes various half-assed attempts at being such. Clearly in some such languages you can make an object that is a wrapper that allows you to save an item that is the main payload as well as anyth

Re: [Rd] 1954 from NA

2021-05-24 Thread Avi Gross via R-devel
Adrian, This is an aside. I note in many machine-learning algorithms they actually do something along the lines being discussed. They may take an item like a paragraph of words or an email message and add thousands of columns with each one being a Boolean specifying if a particular word is

[Rd] FW: 1954 from NA

2021-05-24 Thread Avi Gross via R-devel
Adrian, Agreed. To do what you said hundreds of columns of data by doubling it is indeed a pain just to get what you want. There are straightforward ways especially if you use tidyverse packages rather than base R. Just a warning, this message is a tad long for anyone not interested to sk

[Rd] Locking of base environment in R 4.1.0 breaks simple assignment of .First() (etc) from Rprofile.site

2021-05-24 Thread Jake Elmstedt
Commits 80162 and 80163 lock the base environment and namespace during startup, leading to an error when attempting to directly assign anything from within Rprofile.site. While this is intentional and good, the help file has not been updated to reflect this change. Startup.Rd ( Description, paragr

Re: [Rd] 1954 from NA

2021-05-24 Thread Nicholas Tierney
Hi all, When first hearing about ALTREP I've wondered how it might be able to be used to store special missing value information - how can we learn more about implementing ALTREP classes? The idea of carrying around a "meaning of my NAs" vector, as Gabe said, would be very interesting! I've done

Re: [Rd] 1954 from NA

2021-05-24 Thread Gabriel Becker
Hi All, So there is a not particularly active, but closely curated (ie everything on there should be good in terms of principled examples) github organization of ALTREP examples: https://github.com/ALTREP-examples. Currently there are two examples by Luke (including a package version of the memory

Re: [Rd] 1954 from NA

2021-05-24 Thread Adrian Dușa
On Mon, May 24, 2021 at 5:47 PM Gabriel Becker wrote: > Hi Adrian, > > I had the same thought as Luke. It is possible that you can develop an > ALTREP that carries around the tagging information you're looking for in a > way that is more persistent (in some cases) than R-level attributes and > mo

Re: [Rd] 1954 from NA

2021-05-24 Thread Gabriel Becker
Hi Adrian, I had the same thought as Luke. It is possible that you can develop an ALTREP that carries around the tagging information you're looking for in a way that is more persistent (in some cases) than R-level attributes and more hidden than additional user-visible columns. The downsides to t

Re: [Rd] 1954 from NA

2021-05-24 Thread Adrian Dușa
On Mon, May 24, 2021 at 4:40 PM Bertram, Alexander via R-devel < r-devel@r-project.org> wrote: > Dear Adrian, > SPSS and other packages handle this problem in a very similar way to what I > described: they store additional metadata for each variable. You can see > this in the way that SPSS organiz

Re: [Rd] [External] Re: 1954 from NA

2021-05-24 Thread Greg Minshall
luke, > PLEASE DO NOT DO THIS! very happy to withdraw my offered alternative! cheers, Greg __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] 1954 from NA

2021-05-24 Thread Adrian Dușa
Hi Taras, On Mon, May 24, 2021 at 4:20 PM Taras Zakharko wrote: > Hi Adrian, > > Have a look at vctrs package — they have low-level primitives that might > simplify your life a bit. I think you can get quite far by creating a > custom type that stores NAs in an attribute and utilizes vctrs proxy

Re: [Rd] 1954 from NA

2021-05-24 Thread Bertram, Alexander via R-devel
Dear Adrian, SPSS and other packages handle this problem in a very similar way to what I described: they store additional metadata for each variable. You can see this in the way that SPSS organizes it's file format: each "variable" has additional metadata that indicate how specific values of the va

Re: [Rd] 1954 from NA

2021-05-24 Thread Taras Zakharko
Hi Adrian, Have a look at vctrs package — they have low-level primitives that might simplify your life a bit. I think you can get quite far by creating a custom type that stores NAs in an attribute and utilizes vctrs proxy functionality to preserve these attributes across different operations.

Re: [Rd] [External] Re: 1954 from NA

2021-05-24 Thread luke-tierney
On Mon, 24 May 2021, Adrian Dușa wrote: On Mon, May 24, 2021 at 2:11 PM Greg Minshall wrote: [...] if you have 500 columns of possibly-NA'd variables, you could have one column of 500 "bits", where each bit has one of N values, N being the number of explanations the corresponding column has f

Re: [Rd] 1954 from NA

2021-05-24 Thread Adrian Dușa
Dear Alex, Thanks for piping in, I am learning with each new message. The problem is clear, the solution escapes me though. I've already tried the attributes route: it is going to triple the data size: along with the additional (logical) variable that specifies which level is missing, one also nee

Re: [Rd] 1954 from NA

2021-05-24 Thread Adrian Dușa
On Mon, May 24, 2021 at 2:11 PM Greg Minshall wrote: > [...] > if you have 500 columns of possibly-NA'd variables, you could have one > column of 500 "bits", where each bit has one of N values, N being the > number of explanations the corresponding column has for why the NA > exists. > The mere

Re: [Rd] 1954 from NA

2021-05-24 Thread Bertram, Alexander via R-devel
Dear Adrian, I just wanted to pipe in and underscore Thomas' point: the payload bits of IEEE 754 floating point values are no place to store data that you care about or need to keep. That is not only related to the R APIs, but also how processors handle floating point values and signaling and non-s

Re: [Rd] 1954 from NA

2021-05-24 Thread Adrian Dușa
On Mon, May 24, 2021 at 1:31 PM Tomas Kalibera wrote: > [...] > > For the reasons I explained, I would be against such a change. Keeping the > data on the side, as also recommended by others on this list, would allow > you for a reliable implementation. I don't want to support fragile package > c

Re: [Rd] 1954 from NA

2021-05-24 Thread Greg Minshall
Adrian, > If it was only one column then your solution is neat. But with 5-600 > variables, each of which can contain multiple missing values, to > double this number of variables just to describe NA values seems to me > excessive. Not to mention we should be able to quickly convert / > import /

Re: [Rd] 1954 from NA

2021-05-24 Thread Adrian Dușa
Hmm... If it was only one column then your solution is neat. But with 5-600 variables, each of which can contain multiple missing values, to double this number of variables just to describe NA values seems to me excessive. Not to mention we should be able to quickly convert / import / export from o

Re: [Rd] 1954 from NA

2021-05-24 Thread Tomas Kalibera
On 5/24/21 11:46 AM, Adrian Dușa wrote: > On Sun, May 23, 2021 at 10:14 PM Tomas Kalibera > mailto:tomas.kalib...@gmail.com>> wrote: > > [...] > > Good, but unfortunately the delineation between computation and > non-computation is not always transparent. Even if an operation > d

Re: [Rd] 1954 from NA

2021-05-24 Thread Adrian Dușa
On Sun, May 23, 2021 at 10:14 PM Tomas Kalibera wrote: > [...] > > Good, but unfortunately the delineation between computation and > non-computation is not always transparent. Even if an operation doesn't > look like "computation" on the high-level, it may internally involve > computation - so, r