Abstraction in R Programming

Abstraction and Their Types

People who have been using the R programming language for a while are likely familiar with passing functions as arguments to other functions. However, they are less likely to return functions from their custom code. This is unfortunate, as doing so can unlock a new level of abstraction, reducing both the amount and complexity of the code required for certain tasks. Below, we present examples demonstrating how R programmers can leverage lexical closures to encapsulate both data and behavior.

Implementation in R

Simple Example: Adding Numbers

To start with a simple example, suppose you want a function that adds 3 to its argument. You might write something like this:

add_3 <- function(y) { 3 + y }

This function works as expected:

> add_3(1:10)
[1]  4  5  6  7  8  9 10 11 12 13

Now, suppose you need another function that adds 8 to its argument. Instead of writing a new function similar to add_3, a better approach is to create a function that generates these functions dynamically. Here’s how you can do that:

add_x <- function(x) {
  function(y) { x + y }
}

Calling add_x with an argument returns a new function that performs the desired operation:

add_3 <- add_x(3)
add_8 <- add_x(8)

> add_3(1:10)
[1]  4  5  6  7  8  9 10 11 12 13
> add_8(1:10)
[1]  9 10 11 12 13 14 15 16 17 18

If you closely examine the definition of add_x, you may wonder how the returned function knows about x when it is called later. This behavior is due to R’s lexical scoping. When add_x is called, the x argument is captured in the environment of the returned function.

Advanced Example: Bootstrapping with Containers

Now, let’s look at a more practical example. Suppose you’re performing some complex bootstrapping, and for efficiency, you pre-allocate vectors to store results. Here’s a straightforward implementation for a single vector:

nboot <- 100
bootmeans <- numeric(nboot)
data <- rnorm(1000)  # Example dataset

for (i in 1:nboot) {
  bootmeans[i] <- mean(sample(data, length(data), replace = TRUE))
}

> mean(data)
[1] -0.0024
> mean(bootmeans)
[1] -0.0018

However, if you need to track multiple statistics, each requiring a unique index variable, this process can become tedious and error-prone. Using closures, you can abstract away the bookkeeping. Here’s a function that creates a pre-allocated container:

make_container <- function(n) {
  x <- numeric(n)
  i <- 1

  function(value = NULL) {
    if (is.null(value)) {
      return(x)
    } else {
      x[i] <<- value
      i <<- i + 1
    }
  }
}

Calling make_container with a size n returns a function that manages the container. If the argument to the function is NULL, it returns the entire vector. Otherwise, it adds the value to the next position in the vector:

nboot <- 100
bootmeans <- make_container(nboot)
data <- rnorm(1000)

for (i in 1:nboot) {
  bootmeans(mean(sample(data, length(data), replace = TRUE)))
}

> mean(data)
[1] -0.0024
> mean(bootmeans())
[1] -0.0019

This approach simplifies the management of multiple containers and ensures that indexing is handled internally.

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *