A data structure is a specific way of organizing and storing data in a computer so that it can be used efficiently. The goal is to optimize time and space complexity for various tasks. In R programming, data structures are tools designed to handle and manipulate collections of values.
R’s native data structures are often categorized based on their dimensionality (1D, 2D, or nD) and whether they are homogeneous (all elements must be of the same type) or heterogeneous (elements can have different types). This classification results in six commonly used data structures for data analysis in R.
The most essential data structures used in R include:
- Vectors
- Lists
- Dataframes
- Matrices
- Arrays
- Factors
- Tibbles
R Vectors
Vectors in R are analogous to arrays in other programming languages and are used to store multiple values of the same type. A key difference is that R uses 1-based indexing (indexing starts at 1, not 0). Vectors can hold numeric, character, or logical values.
Creating a Vector
A vector is a fundamental data structure in R, representing a one-dimensional array. The c() function is the most common method for creating a vector.
# R program to create Vectors
# Using c() function to create a numeric vector
A <- c(10, 20, 30, 40)
cat('Using c() function:', A, '\n')
# Using seq() function for generating a sequence
# length.out defines the number of values in the sequence
B <- seq(5, 25, length.out = 6)
cat('Using seq() function:', B, '\n')
# Using colon (:) to create a sequence
C <- 3:9
cat('Using colon operator:', C)
Output:
Using c() function: 10 20 30 40
Using seq() function: 5 9 13 17 21 25
Using colon operator: 3 4 5 6 7 8 9
Types of R Vectors
Numeric Vectors: Numeric vectors store numbers, which can be integers or doubles.
# R program to create Numeric Vectors
# Creating a vector using c()
num1 <- c(1.5, 2.5, 3.5)
# Display type of vector
typeof(num1)
# Using L to specify integers
num2 <- c(10L, 20L, 30L)
typeof(num2)
Output:
[1] "double"
[1] "integer"
Character Vectors: Character vectors store strings and can also include numbers treated as characters.
# R program to create Character Vectors
# Creating a character vector
charVec <- c("apple", "orange", "42", "75")
typeof(charVec)
Output:
[1] "character"
Logical Vectors: Logical vectors store Boolean values (TRUE, FALSE, or NA).
# R program to create Logical Vectors
# Creating a logical vector
logVec <- c(TRUE, FALSE, TRUE, NA)
typeof(logVec)
Output:
[1] "logical"
Length of a Vector
The length of a vector is the count of its elements. Use the length() function to find it.
# R program to find vector lengths
# Numeric vector
numVec <- c(3, 6, 9, 12)
length(numVec)
# Character vector
charVec <- c("car", "bike", "train")
length(charVec)
# Logical vector
logVec <- c(TRUE, TRUE, FALSE, NA)
length(logVec)
Output:
[1] 4
[1] 3
[1] 4
Accessing Vector Elements
Access elements using the subscript operator []. R uses 1-based indexing.
# R program to access vector elements
# Creating a vector
vec <- c(8, 15, 23, 42, 57)
# Accessing an element by index
cat('Single element:', vec[3], '\n')
# Accessing multiple elements using c()
cat('Multiple elements:', vec[c(1, 5)], '\n')
Output:
Single element: 23
Multiple elements: 8 57
Modifying a Vector
Example :
# R program to modify a vector
# Creating a vector
vec <- c(5, 10, 15, 20)
# Modify specific elements
vec[2] <- 12
vec[4] <- 25
cat('Modified vector:', vec, '\n')
# Modify using a range
vec[1:3] <- c(7, 14, 21)
cat('Modified using range:', vec, '\n')
Output:
Modified vector: 5 12 15 25
Modified using range: 7 14 21 25
Deleting a Vector
Assign NULL to a vector to delete it.
# R program to delete a vector
# Creating a vector
vec <- c(4, 8, 12, 16)
# Deleting the vector
vec <- NULL
cat('Deleted vector:', vec)
Output:
Deleted vector: NULL
Sorting a Vector
# R program to sort a vector
# Creating a vector
vec <- c(9, 3, 7, 1, 5)
# Sort in ascending order
asc <- sort(vec)
cat('Ascending order:', asc, '\n')
# Sort in descending order
desc <- sort(vec, decreasing = TRUE)
cat('Descending order:', desc)
Output:
Ascending order: 1 3 5 7 9
Descending order: 9 7 5 3 1
R – List
A list in R is a generic object consisting of an ordered collection of objects. It is a one-dimensional, heterogeneous data structure, which means it can store different types of data, such as vectors, matrices, characters, and even functions.
A list is essentially a vector but allows for elements of varying data types. You can create a list using the list() function, and its indexing starts from 1 instead of 0.
Creating a List
To create a list in R, you use the list() function. For example, let’s create a list to store details about books, such as their IDs, titles, and total count.
Example:
# R program to create a List
# Numeric vector for book IDs
bookId <- c(101, 102, 103, 104)
# Character vector for book titles
bookTitle <- c("R Programming", "Data Science", "Machine Learning", "AI")
# Total number of books
totalBooks <- 4
# Combine all attributes into a list
bookList <- list(bookId, bookTitle, totalBooks)
print(bookList)
Output:
[[1]]
[1] 101 102 103 104
[[2]]
[1] "R Programming" "Data Science" "Machine Learning" "AI"
[[3]]
[1] 4
Naming List Components
Assigning names to list components makes it easier to access them.
Example:
# Named list for a person’s details
personDetails <- list(name = "Aarav", age = 30, city = "Mumbai")
# Printing the named list
print(personDetails)
Output:
$name
[1] "Aarav"
$age
[1] 30
$city
[1] "Mumbai"
Accessing R List Components
1. Access Components by Names: Using the $ operator, you can directly access components by their names.
Example:
# List with named components
bookList <- list(
"ID" = c(101, 102, 103, 104),
"Titles" = c("R Programming", "Data Science", "Machine Learning", "AI"),
"Total Books" = 4
)
# Access Titles
cat("Accessing titles using $ operator:\n")
print(bookList$Titles)
Output:
Accessing titles using $ operator:
[1] "R Programming" "Data Science" "Machine Learning" "AI"
2. Access Components by Indices: You can also access components using indices. Use double brackets [[ ]] for top-level components and single brackets [ ] for inner components.
Example:
# Accessing by indices
cat("Accessing titles using indices:\n")
print(bookList[[2]])
cat("Accessing a specific title using indices:\n")
print(bookList[[2]][3])
cat("Accessing the last book ID using indices:\n")
print(bookList[[1]][4])
Output:
Accessing titles using indices:
[1] "R Programming" "Data Science" "Machine Learning" "AI"
Accessing a specific title using indices:
[1] "Machine Learning"
Accessing the last book ID using indices:
[1] 104
Modifying Components of a List
You can modify the components of a list by accessing them and replacing their values.
Example:
# Modify components of a list
cat("Before modifying the list:\n")
print(bookList)
# Modify total books count
bookList$`Total Books` <- 5
# Add a new book ID and title
bookList[[1]][5] <- 105
bookList[[2]][5] <- "Deep Learning"
cat("After modifying the list:\n")
print(bookList)
Output:
Before modifying the list:
$ID
[1] 101 102 103 104
$Titles
[1] "R Programming" "Data Science" "Machine Learning" "AI"
$`Total Books`
[1] 4
After modifying the list:
$ID
[1] 101 102 103 104 105
$Titles
[1] "R Programming" "Data Science" "Machine Learning" "AI" "Deep Learning"
$`Total Books`
[1] 5
Concatenation of Lists
You can concatenate two lists using the c() function.
Example:
# Original list
bookList <- list(
"ID" = c(101, 102, 103, 104),
"Titles" = c("R Programming", "Data Science", "Machine Learning", "AI")
)
# New list with book prices
bookPrices <- list("Prices" = c(500, 600, 700, 800))
# Concatenate lists
mergedList <- c(bookList, bookPrices)
cat("After concatenating the lists:\n")
print(mergedList)
Output:
$ID
[1] 101 102 103 104
$Titles
[1] "R Programming" "Data Science" "Machine Learning" "AI"
$Prices
[1] 500 600 700 800
Adding Items to a List
You can append items to a list using the append() function.
Example:
# Original list
myNumbers <- c(10, 20, 30)
# Append a new number
myNumbers <- append(myNumbers, 40)
# Print updated list
print(myNumbers)
Output:
[1] 10 20 30 40
Deleting Components of a List
To delete components, access them and use a negative index.
Example:
# List with named components
bookList <- list(
"ID" = c(101, 102, 103, 104),
"Titles" = c("R Programming", "Data Science", "Machine Learning", "AI"),
"Total Books" = 4
)
cat("Before deletion:\n")
print(bookList)
# Delete the "Total Books" component
bookList <- bookList[-3]
cat("After deleting 'Total Books':\n")
print(bookList)
Output:
Before deletion:
$ID
[1] 101 102 103 104
$Titles
[1] "R Programming" "Data Science" "Machine Learning" "AI"
$`Total Books`
[1] 4
After deleting 'Total Books':
$ID
[1] 101 102 103 104
$Titles
[1] "R Programming" "Data Science" "Machine Learning" "AI"
R – Array
Arrays are fundamental data storage structures defined with a specific number of dimensions. They are used to allocate space in contiguous memory locations.
In R Programming, one-dimensional arrays are called vectors, where their single dimension is their length. Two-dimensional arrays are referred to as matrices, which consist of a defined number of rows and columns. Arrays in R hold elements of the same data type. Vectors serve as inputs to create arrays, specifying their dimensions.
Creating an Array
In R, arrays can be created using the array() function. The function takes a list of elements and dimensions as inputs to create the desired array.
Syntax:
array(data, dim = c(nrow, ncol, nmat), dimnames = names)
Components:
nrow: Number of rows.
ncol: Number of columns.
nmat: Number of matrices with dimensions nrow * ncol.
dimnames: Defaults to NULL. Alternatively, a list can be provided containing names for each component of the array dimensions.
Uni-Dimensional Array
A vector, a one-dimensional array, has its length as its dimension. It can be created using the c() function.
Example:
vec <- c(10, 20, 30, 40, 50)
print(vec)
# Displaying the length of the vector
cat("Length of the vector: ", length(vec))
Output:
[1] 10 20 30 40 50
Length of the vector: 5
Multi-Dimensional Array
A matrix, or a two-dimensional array, is defined by rows and columns of the same data type. Matrices are created using the array() function.
Example:
# Create a matrix with values from 15 to 26
mat <- array(15:26, dim = c(2, 3, 2))
print(mat)
Output:
, , 1
[,1] [,2] [,3]
[1,] 15 17 19
[2,] 16 18 20
, , 2
[,1] [,2] [,3]
[1,] 21 23 25
[2,] 22 24 26
Naming Array Dimensions
You can assign names to rows, columns, and matrices using vectors for better readability.
Example:
rows <- c("Row1", "Row2")
columns <- c("Col1", "Col2", "Col3")
matrices <- c("Matrix1", "Matrix2")
named_array <- array(1:12, dim = c(2, 3, 2),
dimnames = list(rows, columns, matrices))
print(named_array)
Output:
, , Matrix1
Col1 Col2 Col3
Row1 1 3 5
Row2 2 4 6
, , Matrix2
Col1 Col2 Col3
Row1 7 9 11
Row2 8 10 12
Accessing Arrays
You can access elements of arrays using indices for each dimension. Names or positions can be used.
Example:
vec <- c(5, 10, 15, 20, 25)
cat("Vector:", vec)
cat("Second element:", vec[2])
Output:
Vector: 5 10 15 20 25
Second element: 10
Accessing Matrices in an Array
Example:
rows <- c("A", "B")
columns <- c("X", "Y", "Z")
matrices <- c("M1", "M2")
multi_array <- array(1:12, dim = c(2, 3, 2),
dimnames = list(rows, columns, matrices))
# Accessing first matrix
print("Matrix M1")
print(multi_array[, , "M1"])
# Accessing second matrix by index
print("Matrix 2")
print(multi_array[, , 2])
Output:
Matrix M1
X Y Z
A 1 3 5
B 2 4 6
Matrix 2
X Y Z
A 7 9 11
B 8 10 12
Accessing Specific Rows and Columns
Example:
print("First row of Matrix 1")
print(multi_array[1, , "M1"])
print("Second column of Matrix 2")
print(multi_array[, 2, 2])
Output:
First row of Matrix 1
X Y Z
1 3 5
Second column of Matrix 2
A 9
B 10
Modifying Arrays
Adding Elements to Arrays: New elements can be added at specific positions or appended to the array.
Example:
vec <- c(1, 2, 3, 4)
# Adding an element using c()
vec <- c(vec, 5)
print("After appending an element:")
print(vec)
# Using append() to add after the 2nd element
vec <- append(vec, 10, after = 2)
print("After using append:")
print(vec)
Output:
After appending an element:
[1] 1 2 3 4 5
After using append:
[1] 1 2 10 3 4 5
Removing Elements
Elements can be removed using logical conditions or indices.
Example:
vec <- c(1, 2, 3, 4, 5, 6)
vec <- vec[vec != 4] # Removing element with value 4
print(vec)
Output:
[1] 1 2 3 5 6
R – Matrices
Matrices are two-dimensional, homogeneous data structures arranged in rows and columns.
Creating a Matrix
To create a matrix, use the matrix() function:
matrix(data, nrow, ncol, byrow, dimnames)
Parameters:
- data: Elements to include.
- nrow: Number of rows.
- ncol: Number of columns.
- byrow: Logical (TRUE for row-wise, FALSE for column-wise order).
- dimnames: Names of rows and columns.
Example:
A <- matrix(c(1, 2, 3, 4, 5, 6, 7, 8, 9), nrow = 3, ncol = 3, byrow = TRUE)
rownames(A) <- c("a", "b", "c")
colnames(A) <- c("c", "d", "e")
print(A)
Output:
c d e
a 1 2 3
b 4 5 6
c 7 8 9
Special Matrices
1. Constant Matrix:
matrix(5, 3, 3)
2. Diagonal Matrix:
diag(c(5, 3, 3), 3, 3)
3. Identity Matrix:
diag(1, 3, 3)
Accessing Matrix Elements
Use [row, col] notation:
- Access rows:
matrix[1:2, ]
- Access columns:
matrix[, 1:2]
- Access specific element:
matrix[1, 2]
- Access submatrices:
matrix[1:3, 1:2]
Modifying Matrix Elements
Direct assignment:
Example:
# Create a matrix
matrix <- matrix(1:9, nrow = 3, ncol = 3)
# Modify the element in the 3rd row and 3rd column
matrix[3, 3] <- 30
# Print the modified matrix
print(matrix)
Output:
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 30
Matrix Concatenation
Add a Row: You can add a row to the matrix using the rbind() function.
Example:
# Create a new row
new_row <- c(10, 11, 12)
# Add the new row to the matrix
matrix <- rbind(matrix, new_row)
# Print the updated matrix
print(matrix)
Output:
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 30
[4,] 10 11 12
Add a Column: You can add a column to the matrix using the cbind() function.
# Create a new column
new_col <- c(13, 14, 15, 16)
# Add the new column to the matrix
matrix <- cbind(matrix, new_col)
# Print the updated matrix
print(matrix)
Output:
[,1] [,2] [,3] [,4]
[1,] 1 4 7 13
[2,] 2 5 8 14
[3,] 3 6 30 15
[4,] 10 11 12 16
Accessing Submatrices in R:
You can access specific parts of a matrix (submatrices) using the colon : operator in R.
Example:
# R program to demonstrate accessing submatrices
# Create a 4x4 matrix
M = matrix(
c(10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160),
nrow = 4,
ncol = 4,
byrow = TRUE
)
cat("The 4x4 matrix:\n")
print(M)
cat("\nAccessing the first two rows and the first three columns:\n")
print(M[1:2, 1:3])
Output:
The 4x4 matrix:
[,1] [,2] [,3] [,4]
[1,] 10 20 30 40
[2,] 50 60 70 80
[3,] 90 100 110 120
[4,] 130 140 150 160
Accessing the first two rows and the first three columns:
[,1] [,2] [,3]
[1,] 10 20 30
[2,] 50 60 70
Modifying Elements of an R-Matrix:
You can modify elements in a matrix by directly assigning new values.
Example:
# R program to demonstrate modifying elements in a matrix
# Create a 3x3 matrix
M = matrix(
c(2, 4, 6, 8, 10, 12, 14, 16, 18),
nrow = 3,
ncol = 3,
byrow = TRUE
)
cat("Original matrix:\n")
print(M)
# Change the element in the 2nd row and 2nd column
M[2, 2] = 100
cat("\nMatrix after modification:\n")
print(M)
Output:
Original matrix:
[,1] [,2] [,3]
[1,] 2 4 6
[2,] 8 10 12
[3,] 14 16 18
Matrix after modification:
[,1] [,2] [,3]
[1,] 2 4 6
[2,] 8 100 12
[3,] 14 16 18
Matrix Concatenation in R:
Concatenation merges rows or columns of a matrix.
Adding Rows Using rbind():
# R program to demonstrate adding a row
# Create a 2x3 matrix
M = matrix(
c(1, 2, 3, 4, 5, 6),
nrow = 2,
ncol = 3,
byrow = TRUE
)
cat("Original matrix:\n")
print(M)
# Define a new row
new_row = c(7, 8, 9)
# Add the new row
M_updated = rbind(M, new_row)
cat("\nMatrix after adding a new row:\n")
print(M_updated)
Output:
Original matrix:
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 4 5 6
Matrix after adding a new row:
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 4 5 6
[3,] 7 8 9
Adding Columns Using cbind():
Adding Columns Using cbind():
R
Copy
Edit
Output:
Original matrix:
[,1] [,2]
[1,] 10 20
[2,] 30 40
Matrix after adding a new column:
[,1] [,2] [,3]
[1,] 10 20 50
[2,] 30 40 60
Deleting Rows and Columns:
You can delete rows or columns by using negative indices.
Deleting a Row:
# R program to demonstrate row deletion
# Create a 3x3 matrix
M = matrix(
c(5, 10, 15, 20, 25, 30, 35, 40, 45),
nrow = 3,
ncol = 3,
byrow = TRUE
)
cat("Matrix before row deletion:\n")
print(M)
# Delete the 2nd row
M_updated = M[-2, ]
cat("\nMatrix after deleting the 2nd row:\n")
print(M_updated)
Output:
Matrix before row deletion:
[,1] [,2] [,3]
[1,] 5 10 15
[2,] 20 25 30
[3,] 35 40 45
Matrix after deleting the 2nd row:
[,1] [,2] [,3]
[1,] 5 10 15
[2,] 35 40 45
Deleting a Column:
# R program to demonstrate column deletion
# Create a 3x3 matrix
M = matrix(
c(2, 4, 6, 8, 10, 12, 14, 16, 18),
nrow = 3,
ncol = 3,
byrow = TRUE
)
cat("Matrix before column deletion:\n")
print(M)
# Delete the 3rd column
M_updated = M[, -3]
cat("\nMatrix after deleting the 3rd column:\n")
print(M_updated)
Output:
Matrix before column deletion:
[,1] [,2] [,3]
[1,] 2 4 6
[2,] 8 10 12
[3,] 14 16 18
Matrix after deleting the 3rd column:
[,1] [,2]
[1,] 2 4
[2,] 8 10
[3,] 14 16