Timothy,
In reply to what you wrote about a benchmark suggesting some storage formats 
may make the code run slower, it is not a surprise, given what you chose to 
benchmark.

You are using a test of a logical variable in a numeric context when you have 
code like:
    log(a2+0.01)
In order to do the calculation, depending on the internals, you need to convert 
a2 to at least an integer or perhaps a floating point value such as 1L or 1.0 
before adding 0.01 to it. 

You are doing the equivalent of:
    log(as.integer(a2)+0.01)

or perhaps:
        log(as.double(a2)+0.01)

The result is some extra work in THAT context. Note I am NOT saying R calls one 
of those primitive functions, just that the final code does such conversions 
perhaps at the assembler level or lower.
But consider the opposite context such as in a if(...) statement as in:
    if(a2) {do_this) else {do_that}

If a2 is already a logical data form, it happens rapidly. If a2 is something 
more complex that can be evaluated in steps into a logical, it takes those 
steps. As I showed earlier, if a2 was 1 or 666 it would be evaluated to be 
non-zero and thus converted to TRUE and then the statement would choose 
do_this, else it would evaluate to FALSE and do do_that.
So the right storage format depends on how you want to use it and how much 
storage space you are willing to use. On some machines and architectures, they 
may store a logical value in anything from a bit to a byte to multiple bytes, 
and on a lower level, it may be expanded as needed to fit into a fixed register 
on the CPU. In some cases, a natural storage format will be the one that can be 
used immediately with no boxing or unboxing. But as always, there are tradeoffs 
to be considered in terms of how many cycles are used (execution time) or other 
resources like memory in use. In a few cases, it may oddly pay to make two or 
more copies of a vector in different storage formats and then use the best one 
for subsequent calculations. Boolean might turn out to be a great choice for 
indexing into a list or vector or matrix or data.frame, while integer may be 
great if doing mathematics like multiplication into a structure that only 
contains integers, and a double version when interacting with such numbers and 
maybe even versions that are character or complex.
But from a novice perspective, performance is not usually a big concern and 
especially not for small amounts of data. The only reason this is being 
discussed is that the question about what went wrong might be hard to figure 
out without lots more info, while the simple wrok-around might either work fine 
or tell us more about what might be wrong.

-----Original Message-----
From: Ebert,Timothy Aaron <teb...@ufl.edu>
To: Bert Gunter <bgunter.4...@gmail.com>
Cc: R-help <r-help@r-project.org>
Sent: Thu, Jan 27, 2022 2:27 pm
Subject: Re: [R] Error in if (fraction <= 1) { : missing value where TRUE/FALSE 
needed

You did not claim it is faster, but if one is writing large programs or has 
huge quantities of data then thinking about execution time could be useful.

if(!require(microbenchmark)){install.packages("microbenchmark")}
library(microbenchmark)
a1<-c(1,1,0,1,0,0,0,1,1,0,0,0,0,1,1,1,0,1,1,1,1,0,0,0,1,0,1,0,1,0,0,0,1,1,0,0,1,0,0,0,0,1,1,0,0,1)
a2=as.logical(a1)

microbenchmark(
  {
    log(a1+0.01)
  },
  {
    log(a2+0.01)
  },
  times=100000
)

On my system running the code shows that there is an overhead cost if the 
logical has to be converted. In this simple code it was a sometimes significant 
but always a trivial 0.1 microsecond cost. I tried a few other bits of code and 
the mean and minimum values were always smaller performing numeric operations 
on a numeric variable. However, it looks like the range in values for a numeric 
operation on a numeric variable is greater. I don't understand why.

Tim

-----Original Message-----
From: Bert Gunter <bgunter.4...@gmail.com> 
Sent: Thursday, January 27, 2022 1:17 PM
To: Ebert,Timothy Aaron <teb...@ufl.edu>
Cc: PIKAL Petr <petr.pi...@precheza.cz>; R-help <r-help@r-project.org>
Subject: Re: [R] Error in if (fraction <= 1) { : missing value where TRUE/FALSE 
needed

[External Email]

I did not claim it is faster -- and in fact I doubt that it makes any real 
difference. Just simpler, imo. I also think that the logical vector would serve 
equally in any situation in most cases where arithmetic 0/1 coding is used -- 
even arithmetic ops and comparisons:
> TRUE + 2
[1] 3
> TRUE > .5
[1] TRUE
(?'+' has details)
I would appreciate someone responding with a nontrivial counterexample to this 
claim if they have one, other than the sort of thing shown in the ?logical 
example involving conversion to character:

## logical interpretation of particular strings
charvec <- c("FALSE", "F", "False", "false",    "fAlse", "0",
            "TRUE",  "T", "True",  "true",    "tRue",  "1")
as.logical(charvec)

## factors are converted via their levels, so string conversion is used
as.logical(factor(charvec))
as.logical(factor(c(0,1)))  # "0" and "1" give NA

(I mean of course purely internal R code, not export of data to an external 
application).


Bert Gunter

"The trouble with having an open mind is that people keep coming along and 
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Thu, Jan 
27, 2022 at 8:12 AM Ebert,Timothy Aaron <teb...@ufl.edu> wrote:
>
> One could use the microbenchmark package to compare which approach is faster, 
> assuming the dataset is large enough that the outcome will make a measurable 
> difference.
>


        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to