Author: Pooja Kotwani

Exporting Data from scripts in R Programming

Exporting Data in detail

When a program terminates, all data held in the program is lost. To ensure data persistence, we store the fetched information in files. This enables transferring data across systems and prevents re-entering large datasets. Files can be stored in formats such as .txt, .csv, or even in online/cloud storage. R provides straightforward methods to export data to these file types.

Exporting Data to a Text File

Text files are a common format for data storage. R provides methods like write.table() to export data frames or matrices to text files.

1. write.table(): The write.table() function writes a data frame or matrix to a text file.

Syntax:

write.table(x, file, append = FALSE, sep = " ", dec = ".", row.names = TRUE, col.names = TRUE)

Parameters:

x: Data frame or matrix to be written.
file: File name as a string.
sep: Field separator (e.g., \t for tab-separated values).
dec: Decimal separator (default is .).
row.names: Logical or character vector for row names.
col.names: Logical or character vector for column names.

Example:

# Creating a data frame
employee_data <- data.frame(
  "Employee" = c("John", "Emma", "Liam"),
  "Department" = c("HR", "IT", "Finance"),
  "Age" = c(29, 34, 41)
)

# Exporting the data frame to a text file
write.table(employee_data,
            file = "employee_data.txt",
            sep = "\t",
            row.names = TRUE,
            col.names = NA)

Output:

""    "Employee"    "Department"    "Age"
"1"    "John"        "HR"             29
"2"    "Emma"        "IT"             34
"3"    "Liam"        "Finance"        41

write_tsv(): The write_tsv() method from the readr package exports tab-separated values.

Syntax:

write_tsv(file, path)

Example:

# Creating a data frame
employee_data <- data.frame(
  "Employee" = c("John", "Emma", "Liam"),
  "Department" = c("HR", "IT", "Finance"),
  "Age" = c(29, 34, 41)
)

# Exporting the data frame to a text file
write.table(employee_data,
            file = "employee_data.txt",
            sep = "\t",
            row.names = TRUE,
            col.names = NA)

Output:

""    "Employee"    "Department"    "Age"
"1"    "John"        "HR"             29
"2"    "Emma"        "IT"             34
"3"    "Liam"        "Finance"        41

write_tsv(): The write_tsv() method from the readr package exports tab-separated values.

Syntax:

write_tsv(file, path)

Example:

# Importing the readr package
library(readr)

# Creating a data frame
student_data <- data.frame(
  "Name" = c("Alice", "Bob", "Charlie"),
  "Grade" = c("A", "B", "A+"),
  "Age" = c(20, 22, 21)
)

# Exporting the data frame using write_tsv()
write_tsv(student_data, path = "student_data.txt")

Output:

Name    Grade    Age
Alice   A        20
Bob     B        22
Charlie A+       21

Exporting Data to a CSV File

CSV files are widely used for storing tabular data. R provides multiple methods for exporting data to .csv files.

write.table(): The write.table() function can also export data to CSV files by specifying sep = ",".

Example:

# Creating a data frame
product_data <- data.frame(
  "Product" = c("Laptop", "Phone", "Tablet"),
  "Price" = c(1000, 500, 300),
  "Stock" = c(50, 200, 150)
)

# Exporting the data frame to a CSV file
write.table(product_data,
            file = "product_data.csv",
            sep = ",",
            row.names = FALSE)

Output:

Product,Price,Stock
Laptop,1000,50
Phone,500,200
Tablet,300,150

write.csv()

The write.csv() function simplifies exporting data to CSV files, using a comma as the default separator.

Example:

# Creating a data frame
city_data <- data.frame(
  "City" = c("New York", "Los Angeles", "Chicago"),
  "Population" = c(8419600, 3980400, 2716000),
  "Area" = c(468.9, 503, 227.3)
)

# Exporting the data frame to a CSV file
write.csv(city_data, file = "city_data.csv")

Output:

"","City","Population","Area"
"1","New York",8419600,468.9
"2","Los Angeles",3980400,503
"3","Chicago",2716000,227.3

write.csv2():The write.csv2() function is similar to write.csv() but uses a semicolon (;) as the separator and a comma for the decimal point.

Example:

# Creating a data frame
sales_data <- data.frame(
  "Month" = c("January", "February", "March"),
  "Sales" = c(15000.50, 17000.75, 16000.30)
)

# Exporting the data frame to a CSV file
write.csv2(sales_data, file = "sales_data.csv")

Output:

";""Month"";""Sales"
"1";"January";"15000,50"
"2";"February";"17000,75"
"3";"March";"16000,30"

write_csv(): The write_csv() method from the readr package exports data to CSV files.

Example:

# Importing the readr package
library(readr)

# Creating a data frame
book_data <- data.frame(
  "Title" = c("R for Data Science", "Python Crash Course", "The Art of R Programming"),
  "Author" = c("Hadley Wickham", "Eric Matthes", "Norman Matloff"),
  "Price" = c(35.99, 29.99, 45.00)
)

# Exporting the data frame using write_csv()
write_csv(book_data, path = "book_data.csv")

Output:

Title,Author,Price
R for Data Science,Hadley Wickham,35.99
Python Crash Course,Eric Matthes,29.99
The Art of R Programming,Norman Matloff,45.00

December 13, 2025

How To Import Data from a File in R Programming

Import Data from a File in detail

Data is a collection of facts and can exist in multiple formats. To analyze data using the R programming language, it first needs to be imported. R allows importing data from various file types such as text files, CSV, and other delimiter-separated files. Once imported, users can manipulate, analyze, and generate reports from the data.

Importing Data from Files into R

This guide demonstrates how to import different file formats into R programming.

Importing CSV Files

Method 1: Using read.csv()

The read.csv() function is a straightforward method for importing CSV files.

read.csv(file_path, header = TRUE, sep = ",")

Arguments:

file_path: The file’s location.
header: TRUE (default) to indicate column headings.
sep: The separator for values in each row (default is a comma ,).

Example:

# Specify file path
file_path <- "data.csv"

# Read the CSV file
content <- read.csv(file_path)

# Print file contents
print(content)

Output:

ID Name   Role Age
1  1  Alex  Dev  30
2  2  Sam   QA   25
3  3  Emma  HR   28

Method 2: Using read.table()

Another way to import CSV files is by using read.table().

# Import CSV using read.table()
data <- read.table("C://data//records.csv", header = TRUE, sep = ",")

# Print file contents
print(data)

Output:

Col1 Col2 Col3
1  101  A1   B1
2  202  A2   B2
3  303  A3   B3

Importing Data from a Text File

read.table() can also be used for importing text files.

Syntax:

read.table("file.txt", header = TRUE/FALSE)

Example:

# Read text file
data <- read.table("C://data//records.txt", header = FALSE)

# Print file contents
print(data)

Output:

V1  V2  V3
1 200  A1  B1
2 300  A2  B2
3 400  A3  B3

Importing Data from a Delimited File

The read.delim() function is used to import delimited files, where values are separated by specific symbols such as |, $, or ,.

Syntax:

read.delim("file.txt", sep="|", header=TRUE)

Example:

# Read a delimited file
data <- read.delim("C://data//info.txt", sep="|", header=TRUE)

# Print file contents
print(data)

Output:

$ID
[1] "101" "102" "103"
$Name
[1] "John" "Lily" "Raj"
$Salary
[1] "1500" "2000" "2500"

Importing XML Files

To import XML files, use the XML package.

XML File Sample:

<RECORDS>
  <EMPLOYEE>
    <ID>1</ID>
    <NAME>Adam</NAME>
    <SALARY>5000</SALARY>
  </EMPLOYEE>
  <EMPLOYEE>
    <ID>2</ID>
    <NAME>Sophia</NAME>
    <SALARY>6000</SALARY>
  </EMPLOYEE>
</RECORDS>

Example:

# Load XML package
library("XML")

# Parse XML file
data <- xmlParse(file = "C://data//employees.xml")

# Print parsed data
print(data)

Output:

1  Adam   5000
2  Sophia 6000

Importing SPSS Files

SPSS .sav files can be imported using the haven package.

Syntax:

read_sav("file.sav")

Example:

# Load haven package
library("haven")

# Read SPSS file
data <- read_sav("C://data//survey.sav")

# Print data
print(data)

Output:

ID   Age  Response  Score
1  1   23   Agree     4.5
2  2   30   Neutral   3.0
3  3   27   Disagree  2.5

December 13, 2025

Importing Data in R Script
Data Handling in detail

R offers several functions to import data from various file formats into your working environment. This guide demonstrates how to import data into R using different file formats.

Importing Data in R

To illustrate, we will use a sample dataset in two formats: .csv and .txt. Let’s dive into the methods for importing data.

Reading a CSV (Comma-Separated Values) File

Method 1: Using read.csv()

The read.csv() function is a simple way to import CSV files. It includes the following parameters:
- file.choose(): Opens a dialog box to select a CSV file.
- header: Indicates if the first row contains column names. Use TRUE if it does or FALSE otherwise.
Example:
```
# Import and store the dataset in data1
data1 <- read.csv(file.choose(), header = TRUE)

# Display the data
print(data1)
```
Output:
```
Name    Age Department
1 John    25   IT
2 Alice   30   HR
3 Robert  28   Finance
```
Method 2: Using read.table()

The read.table() function requires you to specify the delimiter using the sep parameter. For CSV files, use sep=",".

Example:
```
# Import and store the dataset in data2
data2 <- read.table(file.choose(), header = TRUE, sep = ",")

# Display the data
print(data2)
```
Output:
```
Name    Age Department
1 John    25   IT
2 Alice   30   HR
3 Robert  28   Finance
```
Reading a Tab-Delimited (.txt) File

Method 1: Using read.delim()

This function is specifically for tab-delimited files. It also has parameters like:
- file.choose(): Opens a file selection dialog.
- header: Indicates whether the first row contains column names.
Example:
```
# Import and store the dataset in data3
data3 <- read.delim(file.choose(), header = TRUE)

# Display the data
print(data3)
```
Output:
```
Product Price Quantity
1  Apples  100       50
2 Bananas   50      120
3 Oranges   75       80
```
Method 2: Using read.table()

For tab-delimited files, use sep="\t" to specify the delimiter.

Example:
```
# Import and store the dataset in data4
data4 <- read.table(file.choose(), header = TRUE, sep = "\t")

# Display the data
print(data4)
```
Output:
```
Product Price Quantity
1  Apples  100       50
2 Bananas   50      120
3 Oranges   75       80
```
Using RStudio to Import Data

You can also import data interactively using RStudio. Follow these steps:
1. In the Environment tab, click Import Dataset.
2. Choose the file format (CSV, Excel, etc.).
3. Browse your computer to select the file.
4. The data will appear in the RStudio Viewer. Type the dataset name in the console to display it.
Reading JSON Files in R

To work with JSON files, install the rjson package. This package allows you to:
- Load JSON files.
- Convert JSON data into data frames for analysis.
Install the Package:
```
install.packages("rjson")
```
Example JSON File (saved as example.json):
```
{
  "ID": ["101", "102", "103"],
  "Name": ["Alice", "Bob", "Charlie"],
  "Salary": ["5000", "6000", "5500"],
  "Department": ["IT", "HR", "Finance"]
}
```
Code to Read JSON:
```
# Load the rjson library
library(rjson)

# Provide the path to the JSON file
result <- fromJSON(file = "C:\\example.json")

# Print the result
print(result)
```
Output:
```
$ID
[1] "101" "102" "103"

$Name
[1] "Alice"   "Bob"     "Charlie"

$Salary
[1] "5000"  "6000"  "5500"

$Department
[1] "IT"      "HR"      "Finance"
```
Converting JSON to a Data Frame:
```
# Convert JSON to a data frame
data <- as.data.frame(result)
print(data)
```
Output:
```
ID    Name Salary Department
1    101   Alice   5000         IT
2    102     Bob   6000         HR
3    103 Charlie   5500    Finance
```
December 13, 2025
Data Handling in R Programming
Data Handling in detail

The R programming language is extensively used for statistical analysis and data visualization. Handling data involves importing and exporting files, and R simplifies this process by supporting various file types such as CSV, text files, Excel spreadsheets, SPSS, SAS, and more.

R provides several predefined functions to navigate and interact with system directories. These functions allow users to either retrieve the current directory path or change it as needed.

Directory Functions in R
- getwd(): Retrieves the current working directory.
- setwd(): Changes the working directory. The directory path is passed as an argument to this function.
Example:
```
# Change working directory
setwd("D:/RProjects/")

# Alternative way using double backslashes
setwd("D:\\RProjects\\")
```
- list.files(): Displays all files and folders in the current working directory.
```
fluidPage(…, title = NULL, theme = NULL)
```
Importing Files in R

Importing Text Files: Text files can be read into R using the read.table() function.

Syntax:
```
read.table(filename, header = FALSE, sep = "")
```
Parameters:
- header: Indicates whether the file contains a header row.
- sep: Specifies the delimiter used in the file.
For more details, use the command:
```
help("read.table")
```
Example:
Suppose the file “SampleText.txt” in the current working directory contains the following data:
```
101 X p
202 Y q
303 Z r
404 W s
505 V t
606 U u
```
Code:
```
# Get the current working directory
getwd()

# Read the text file into a data frame
data <- read.table("SampleText.txt", header = FALSE, sep = " ")

# Print the data frame
print(data)

# Print the class of the object
print(class(data))
```
Output:
```
[1] "D:/RProjects"
   V1 V2 V3
1 101  X  p
2 202  Y  q
3 303  Z  r
4 404  W  s
5 505  V  t
6 606  U  u
[1] "data.frame"
```
Importing CSV Files: CSV files can be imported using the read.csv() function.

Syntax:
```
read.csv(filename, header = FALSE, sep = "")
```
Parameters:
- header: Specifies if the file contains a header row.
- sep: Indicates the delimiter used.
For details, run:
```
help("read.csv")
```
Example:
Assume the file “SampleCSV.csv” contains the following data:
```
101,XA,pa
202,YB,qb
303,ZC,rc
404,WD,sd
505,VE,te
```
Code:
```
# Read the CSV file
data <- read.csv("SampleCSV.csv", header = FALSE)

# Print the data frame
print(data)

# Print the class of the object
print(class(data))
```
Output:
```
V1  V2  V3
1 101  XA  pa
2 202  YB  qb
3 303  ZC  rc
4 404  WD  sd
5 505  VE  te
[1] "data.frame"
```
Importing Excel Files: To read Excel files, install the openxlsx package and use the read.xlsx() function.

Syntax:
```
read.xlsx(filename, sheet = 1)
```
Parameters:
- sheet: Specifies the sheet name or index.
For help:
```
help("read.xlsx")
```
Example:
Suppose the Excel file “SampleExcel.xlsx” contains the following data:

A B C
1001 XYA xyz
2002 YZB yqw
3003 ZWC wuv

Code:
```
# Install and load the openxlsx package
install.packages("openxlsx")
library(openxlsx)

# Read the Excel file
data <- read.xlsx("SampleExcel.xlsx", sheet = 1)

# Print the data frame
print(data)

# Print the class of the object
print(class(data))
```
Output:
```
A    B   C
1 1001  XYA xyz
2 2002  YZB yqw
3 3003  ZWC wuv
[1] "data.frame"
```
Exporting Files in R

Redirecting Output with cat(): The cat() function outputs objects to the console or redirects them to a file.

Syntax:
```
cat(..., file)
```
Example:
```
# Redirect output to a file
cat("Greetings from R!", file = "OutputText.txt")
```
Output:
```
Greetings from R!
```
Redirecting Output with sink(): The sink() function captures output and redirects it to a file.

Syntax:
```
sink(filename)
...
sink()
```
Example:
```
# Redirect output to a file
sink("OutputSink.txt")

x <- c(2, 4, 6, 8, 12)
print(mean(x))
print(class(x))
print(max(x))

# End redirection
sink()
```
Output (file content):
```
[1] 6.4
[1] "numeric"
[1] 12
```
Writing CSV Files: The write.csv() function writes data to a CSV file.

Syntax:
```
write.csv(x, file)
```
Example:
```
# Create a data frame
df <- data.frame(A = c(11, 22, 33), B = c("X", "Y", "Z"), C = c(TRUE, FALSE, TRUE))

# Write the data frame to a CSV file
write.csv(df, file = "OutputCSV.csv", row.names = FALSE)
```
Output:
```
A,B,C
11,X,TRUE
22,Y,FALSE
33,Z,TRUE
```
December 13, 2025
Data Munging in R Programming
Data Munging in detail

Data Munging refers to the process of transforming raw or erroneous data into a clean and usable format. Without data munging—whether done manually by a user or through an automated system—the data is often unsuitable for downstream analysis or consumption. Essentially, data munging involves cleansing and reformatting data manually or using automated tools.

In R Programming, the following methods are commonly used for the data munging process:
- apply() Family
- aggregate()
- dplyr package
- plyr package
Using the apply() Family for Data Munging

The apply() function is one of the foundational functions in R for performing operations on matrices or arrays. Other functions in the same family include lapply(), sapply(), and tapply(). These functions often serve as an alternative to loops, providing a cleaner and more efficient approach to repetitive tasks.

The apply() function is particularly suited for operations on matrices or arrays with homogeneous elements. When applied to other data structures, such as data frames, the function first converts them into a matrix before processing.

Syntax:
```
apply(X, margin, function)
```
Parameters:
- X: An array or matrix.
- margin: A value (1 for rows, 2 for columns) indicating where to apply the function.
- function: The operation or function to perform.
Example:
```
# Example of apply()
matrix_data <- matrix(1:12,
                      nrow = 3,
                      ncol = 4)
matrix_data

result <- apply(matrix_data, 2, sum)
result
```
Output:
```
[,1] [,2] [,3] [,4]
[1,]    1    4    7   10
[2,]    2    5    8   11
[3,]    3    6    9   12

[1]  6 15 24 33
```
The lapply() Function: The lapply() function operates on lists and returns a list of the same length. Unlike apply(), it does not require a margin parameter. The “l” in lapply() signifies that the output is always a list.

Syntax:
```
lapply(X, func)
```
Parameters:
- X: A list, vector, or object.
- func: The function to apply.
Example:
```
# Example of lapply()
fruits <- c("APPLE", "BANANA", "CHERRY", "MANGO")
fruits

lowercase_fruits <- lapply(fruits, tolower)
lowercase_fruits
```
Output:
```
[1] "APPLE"   "BANANA"  "CHERRY"  "MANGO"

[[1]]
[1] "apple"

[[2]]
[1] "banana"

[[3]]
[1] "cherry"

[[4]]
[1] "mango"
```
The sapply() Function: The sapply() function works similarly to lapply(). However, it tries to simplify the output into a vector or matrix if possible.

Example:
```
# Example of sapply()
fruits <- c("APPLE", "BANANA", "CHERRY", "MANGO")

lowercase_fruits <- sapply(fruits, tolower)
lowercase_fruits
```
Output:
```
[1] "apple"  "banana" "cherry" "mango"
```
The tapply() Function: The tapply() function is used to perform an operation on subsets of data grouped by a factor. It is particularly useful for aggregating data.

Syntax:
```
tapply(X, index, func = NULL)
```
Parameters:
- X: A vector or object.
- index: A factor or list of factors for grouping.
- func: The function to apply.
Example:
```
# Example of tapply()
data(iris)

species_median <- tapply(iris$Sepal.Length,
                         iris$Species,
                         median)
species_median
```
Output:
```
setosa versicolor  virginica
5.0        5.9        6.5
```
Using aggregate() in R

To summarize data by grouping variables and applying a function (e.g., sum, mean).

Syntax:
```
aggregate(formula, data, function)
```
Parameters:
- formula: Specifies the variables for grouping.
- data: The dataset for aggregation.
- function: The operation to perform on the grouped data.
Example:
```
exposures <- aggregate(
  x = assets[c("counterparty.a", "counterparty.b", "counterparty.c")],
  by = assets[c("asset.class", "rating")],
  FUN = function(market.values) { sum(pmax(market.values, 0)) }
)
```
Using the plyr Package

A versatile package for splitting, applying functions, and combining data.

Key Functions:
- ddply(): Operates on data frames.
- llply(): Operates on lists.
Advantages:
- Simplifies operations with consistent syntax.
- Offers parallel computation and progress bars.
Example with ddply():
```
library(plyr)
ddply(dfx, .(group, sex), summarize,
      mean = round(mean(age), 2),
      sd = round(sd(age), 2))
```
Using the dplyr Package

Purpose: Provides a consistent grammar for data manipulation with verbs like arrange, filter, mutate, select, and summarize.

Advantages:
- Fast and efficient backend.
- Easy-to-read pipe (%>%) syntax.
Examples:
- Arrange rows:
```
starwars %>% arrange(desc(mass))
```
- Filter rows:
```
starwars %>% filter(species == "Droid")
```
- Mutate new variables:
```
starwars %>% mutate(bmi = mass / ((height / 100) ^ 2)) %>%
            select(name:mass, bmi)
```
- Summarize grouped data:
```
starwars %>% group_by(species) %>%
            summarize(n = n(), avg_mass = mean(mass, na.rm = TRUE)) %>%
            filter(n > 1)
```
Example:
```
library(dplyr)

# Group by gender, summarise, and filter
starwars %>%
  group_by(gender) %>%
  summarise(
    n = n(),
    avg_height = mean(height, na.rm = TRUE)
  ) %>%
  filter(n > 3)
```
Output:

Assuming the starwars dataset is unmodified:

gender n avg_height
male 60 178.41
female 16 165.56
December 13, 2025
Tidyverse Packages
Tidyverse Packages in detail

When working with Data Science in R, the Tidyverse packages are your ultimate toolkit! These packages were designed specifically for Data Science and share a unified design philosophy.

The Tidyverse packages cover the entire data science workflow, from data import and tidying to transformation and visualization. For example, readr is used for data importing, tibble and tidyr for tidying, dplyr and stringr for transformation, and ggplot2 for visualization.

What Are the Tidyverse Packages in R?

Core Tidyverse Packages

There are eight core Tidyverse packages: ggplot2, dplyr, tidyr, readr, purrr, tibble, stringr, and forcats. These are automatically loaded when you use the command:
```
install.packages("tidyverse")
```
Specialized Packages

In addition to the core packages, the Tidyverse also includes specialized packages like DBI for databases, httr for web APIs, and rvest for web scraping. These need to be loaded individually.

Now, let’s explore the core Tidyverse packages and their uses.

Data Visualization and Exploration

1. ggplot2: ggplot2 is a powerful data visualization library based on the “Grammar of Graphics.” It allows you to create visualizations like bar charts, scatter plots, and histograms using a high-level API. Once you define the mapping of variables to aesthetics, ggplot2 takes care of the rest.

To install ggplot2:
```
install.packages("ggplot2")
```
Or use the development version:
```
devtools::install_github("tidyverse/ggplot2")
```
Example:
```
# Load the library
library(ggplot2)

# Create a dataframe with categories and values
data <- data.frame(
  Category = c('X', 'Y', 'Z', 'W'),
  Value = c(10, 20, 15, 25)
)

# Create a bar plot
ggplot(data, aes(x = Category, y = Value, fill = Category)) +
  geom_bar(stat = "identity")
```
Output: A bar plot with default colors for the bars based on categories.

Data Wrangling and Transformation

1. dplyr: dplyr is a widely-used library for data manipulation. Its key functions, often used with group_by(), include:
- mutate(): Adds new variables.
- select(): Selects specific columns.
- filter(): Filters rows based on conditions.
- summarise(): Aggregates data.
- arrange(): Sorts rows.
To install dplyr:
```
install.packages("dplyr")
```
Or use the development version:
```
devtools::install_github("tidyverse/dplyr")
```
Example: Filtering Rows
```
library(dplyr)

# Using the built-in mtcars dataset
mtcars %>% filter(cyl == 6)
```
Output: Displays rows of the mtcars dataset where the number of cylinders is 6.

2. tidyr: tidyr helps tidy your data, ensuring each variable has its own column and each observation its own row.

Key functions include:
- Pivoting: Reshaping data between wide and long formats.
- Nesting: Grouping data into nested structures.
- Splitting/Combining: Working with character columns.
To install tidyr:
```
install.packages("tidyr")
```
Or use the development version:
```
devtools::install_github("tidyverse/tidyr")
```
Example: Reshaping Data with pivot_longer()
```
library(tidyr)

# Create a data frame
data <- data.frame(
  ID = 1:5,
  Score1 = c(80, 90, 85, 88, 92),
  Score2 = c(75, 85, 82, 89, 95)
)

# Convert wide format to long format
long_data <- data %>%
  pivot_longer(cols = starts_with("Score"),
               names_to = "Score_Type",
               values_to = "Value")

print(long_data)
```
Output:
```
ID Score_Type Value
1  1    Score1    80
2  1    Score2    75
3  2    Score1    90
4  2    Score2    85
...
```
3. stringr: stringr simplifies string manipulation in R, offering consistent naming conventions. Functions include:
- str_detect(): Detect patterns.
- str_extract(): Extract patterns.
- str_replace(): Replace patterns.
- str_length(): Compute string length.
To install stringr:
```
install.packages("stringr")
```
Example: Calculating String Length
```
library(stringr)

# Calculate string length
length <- str_length("Tidyverse")
print(length)
```
Output:
```
9
```
4. Forcats: The forcats library in R is designed to address common challenges associated with working with categorical variables, often referred to as factors. Factors are variables with a fixed set of possible values, which are predefined. forcats helps with tasks like reordering levels, modifying the order of values, and other related operations.

Some key functions in forcats include:
- fct_relevel(): Reorders factor levels manually.
- fct_reorder(): Reorders a factor based on another variable.
- fct_infreq(): Reorders a factor by frequency of values.
To install forcats, the recommended approach is to install the tidyverse package:
```
install.packages("tidyverse")
```
Alternatively, you can install forcats directly:
```
install.packages("forcats")
```
To install the development version from GitHub, use:
```
devtools::install_github("tidyverse/forcats")
```
Example:
```
library(forcats)
library(dplyr)
library(ggplot2)

# Example data: species counts
print(head(starwars %>%
             filter(!is.na(species)) %>%
             count(species, sort = TRUE)))
```
Output:
```
# A tibble: 6 × 2
  species      n
  <chr>    <int>
1 Human       35
2 Droid        6
3 Gungan       3
4 Kaminoan     2
5 Mirialan     2
6 Twi'lek      2
```
Data Import and Management in Tidyverse in R

1. Readr: The readr library offers an efficient way to import rectangular data formats such as .csv, .tsv, .delim, and others. It automatically parses and converts columns into appropriate data types, making data import easier and faster.

Common functions include:
- read_csv(): Reads comma-separated files.
- read_tsv(): Reads tab-separated files.
- read_table(): Reads tabular data.
- read_fwf(): Reads fixed-width files.
- read_delim(): Reads delimited files.
- read_log(): Reads log files.
To install readr, use:
```
install.packages("tidyverse")  # Recommended
install.packages("readr")      # Alternatively
```
For the development version:
```
devtools::install_github("tidyverse/readr")
```
Example:
```
library(readr)

# Read a tab-separated file
data <- read_tsv("sample_data.txt", col_names = FALSE)
print(data)
```
Output:
```
# A tibble: 1 × 1
  X1
  <chr>
1 A platform for data enthusiasts.
```
2. Tibble: A tibble is an enhanced version of a data frame in R. Unlike traditional data frames, tibbles do not modify variable names or types and provide better error handling. This makes the code cleaner and more robust. Tibbles are especially useful for large datasets with complex objects.

Key functions:
- tibble(): Creates a tibble from column vectors.
- tribble(): Creates a tibble row by row.
To install tibble:
```
install.packages("tidyverse")  # Recommended
install.packages("tibble")     # Alternatively
```
Development version:
```
devtools::install_github("tidyverse/tibble")
```
Example:
```
library(tibble)

# Create a tibble
data <- tibble(a = 1:3, b = letters[1:3], c = Sys.Date() - 1:3)
print(data)
```
Output:
```
# A tibble: 3 × 3
      a b     c
  <int> <chr> <date>
1     1 a     2025-01-22
2     2 b     2025-01-21
3     3 c     2025-01-20
```
Functional Programming in Tidyverse in R

Purrr: The purrr package provides tools for functional programming in R, particularly with functions and vectors. It simplifies complex operations by replacing repetitive for loops with clean, readable, and type-stable code.

One of its most popular functions is map(), which applies a function to each element of a list or vector.

To install purrr:
```
install.packages("tidyverse")  # Recommended
install.packages("purrr")      # Alternatively
```
Development version:
```
devtools::install_github("tidyverse/purrr")
```
Example:
```
library(purrr)

# Example: Model fitting and extracting R-squared
mtcars %>%
  split(.$cyl) %>%
  map(~ lm(mpg ~ wt, data = .)) %>%
  map(summary) %>%
  map_dbl("r.squared")
```
Output:
```
4         6         8
0.5086326 0.4645102 0.4229655
```
December 13, 2025

A	B	C
1001	XYA	xyz
2002	YZB	yqw
3003	ZWC	wuv

gender	n	avg_height
male	60	178.41
female	16	165.56

Shiny Package in R Programming

Shiny Package in detail

Packages in the R programming language are a collection of R functions, compiled code, and sample data. They are stored under a directory called “library” in the R environment. By default, R installs a set of packages during installation. One of the most important packages in R is the Shiny package, which makes it easy to build interactive web applications directly from R.

Installing the Shiny Package in R

To use a package in R, it must be installed first. This can be done using the install.packages("packagename") command. To install the Shiny package, use the following command:

install.packages("shiny")

To install the latest development builds directly from GitHub, use this:

if (!require("remotes"))
  install.packages("remotes")
remotes::install_github("rstudio/shiny")

Important Functions in the Shiny Package

1. fluidPage():The fluidPage() function creates a page with a fluid layout. A fluid layout consists of rows that include columns. Rows ensure their elements appear on the same line, while columns define the horizontal space within a 12-unit-wide grid. Fluid pages scale their components dynamically to fit the available browser width.

Syntax:

fluidPage(…, title = NULL, theme = NULL)

Parameter	Description
`…`	Elements to include within the page.
`title`	The browser window title.
`theme`	An alternative Bootstrap stylesheet.

Example:

# Import shiny package
library(shiny)

# Define a page with a fluid layout
ui <- fluidPage(
  h1("Interactive App with Shiny"),
  p(style = "font-family:Arial", "This is a simple Shiny app")
)

server <- function(input, output) {}

shinyApp(ui = ui, server = server)

Output:

2. shinyApp(): The shinyApp() function creates Shiny app objects by combining UI and server components. It can also take the path of a directory containing a Shiny app.

Syntax:

shinyApp(ui, server, onStart = NULL, options = list(), uiPattern = "/", enableBookmarking = NULL)
shinyAppDir(appDir, options = list())
shinyAppFile(appFile, options = list())

Parameter	Description
`ui`	The UI definition of the app.
`server`	Server logic containing `input`, `output`, `session`.
`onStart`	Function to call before the app runs.
`options`	Options passed to `runApp`.
`uiPattern`	Regular expression to match GET requests.
`enableBookmarking`	Can be “url”, “server”, or “disable”. Default is NULL.

Example:

# Import shiny package
library(shiny)

# Define fluid page layout
ui <- fluidPage(
  sliderInput(
    inputId = "num",
    label = "Choose a number",
    value = 10,
    min = 1,
    max = 1000
  ),
  plotOutput("hist")
)

server <- function(input, output) {
  output$hist <- renderPlot({
    hist(rnorm(input$num))
  })
}

shinyApp(ui = ui, server = server)

Output:

3. reactive(): The reactive() function creates a reactive expression, which updates whenever its dependencies change.

Syntax:

reactive(x, env = parent.frame(), quoted = FALSE, label = NULL)

Parameter	Description
`x`	An expression.
`env`	Parent environment for the expression.
`quoted`	Whether the expression is quoted (default: FALSE).
`label`	A label for the reactive expression.

Example:

# Import shiny package
library(shiny)

# Define fluid page layout
ui <- fluidPage(
  numericInput("num", "Enter a number", value = 10),
  plotOutput("hist"),
  verbatimTextOutput("stats")
)

server <- function(input, output) {
  data <- reactive({
    rnorm(input$num)
  })

  output$hist <- renderPlot({
    hist(data())
  })

  output$stats <- renderPrint({
    summary(data())
  })
}

shinyApp(ui = ui, server = server)

Output:

4. observeEvent(): The observeEvent() function responds to event-like reactive inputs and triggers specific code on the server side.

Syntax:

observeEvent(eventExpr, handlerExpr,
event.env = parent.frame(), event.quoted = FALSE,
handler.env = parent.frame(), handler.quoted = FALSE,
label = NULL, suspended = FALSE, priority = 0,
domain = getDefaultReactiveDomain(), autoDestroy = TRUE,
ignoreNULL = TRUE, ignoreInit = FALSE, once = FALSE)

Parameter	Description
`eventExpr`	Reactive expression triggering the event.
`handlerExpr`	Code to execute when `eventExpr` is invalidated.
`ignoreNULL`	Ignore the action when input is NULL (default: TRUE).
`once`	Whether the event is triggered only once.

Example:

# Import shiny package
library(shiny)

# Define fluid page layout
ui <- fluidPage(
  numericInput("num", "Enter a number", value = 10),
  actionButton("calculate", "Show Data"),
  tableOutput("table")
)

server <- function(input, output) {
  observeEvent(input$calculate, {
    num <- as.numeric(input$num)

    if (is.na(num)) {
      cat("Invalid numeric value entered.\n")
      return(NULL)
    }
    cat("Displaying data for", num, "rows.\n")
  })

  df <- eventReactive(input$calculate, {
    num <- as.numeric(input$num)

    if (is.na(num)) {
      return(NULL)
    }

    head(mtcars, num)
  })

  output$table <- renderTable({
    df()
  })
}

shinyApp(ui = ui, server = server)

Output:

Random Rows:
    Name Age Height
1 Sanjay  30    5.9
2  Meera  24     NA

Random Fraction:
    Name Age Height
1  Anita  28    5.4
2  Rahul  25     NA

5. eventReactive() in Shiny:eventReactive() is used to create a reactive expression that triggers only when specific events occur. It listens to “event-like” reactive inputs, values, or expressions.

Syntax

eventReactive(eventExpr,
              valueExpr,
              event.env = parent.frame(),
              event.quoted = FALSE,
              value.env = parent.frame(),
              value.quoted = FALSE,
              label = NULL,
              domain = getDefaultReactiveDomain(),
              ignoreNULL = TRUE,
              ignoreInit = FALSE)

Parameters

Parameter	Description
`eventExpr`	The expression representing the event, which can be a simple or complex reactive expression.
`valueExpr`	Produces the return value of `eventReactive`. Executes within an `isolate()` scope.
`event.env`	Parent environment for `eventExpr`. Default is the calling environment.
`event.quoted`	Indicates if `eventExpr` is quoted. Default is `FALSE`.
`value.env`	Parent environment for `valueExpr`. Default is the calling environment.
`value.quoted`	Indicates if `valueExpr` is quoted. Default is `FALSE`.
`ignoreNULL`	Determines if action should trigger when the input is `NULL`.
`ignoreInit`	If `TRUE`, ignores the handler expression when first initialized. Default is `FALSE`.

Example: Using eventReactive

library(shiny)

ui <- fluidPage(
  sliderInput(inputId = "num",
              label = "Choose a number",
              value = 25, min = 1, max = 100),
  actionButton(inputId = "update",
               label = "Update"),
  plotOutput("histogram")
)

server <- function(input, output) {
  data <- eventReactive(input$update, {
    rnorm(input$num)
  })

  output$histogram <- renderPlot({
    hist(data())
  })
}

shinyApp(ui = ui, server = server)

Output:

6. actionButton() in Shiny: actionButton() creates a button that triggers an action when clicked.

Syntax

actionButton(inputId, label, icon = NULL, width = NULL, ...)

Parameters

Parameter	Description
`inputId`	ID for accessing the button value.
`label`	Text displayed on the button.
`icon`	Icon to display with the button (optional).
`width`	Width of the button (e.g., ‘200px’, ‘100%’).
`...`	Additional attributes for the button.

Example: Using actionButton

library(shiny)

ui <- fluidPage(
  sliderInput("obs", "Number of Observations", min = 1, max = 1000, value = 500),
  actionButton("goButton", "Generate Plot"),
  plotOutput("plot")
)

server <- function(input, output) {
  output$plot <- renderPlot({
    input$goButton
    isolate({
      dist <- rnorm(input$obs)
      hist(dist)
    })
  })
}

shinyApp(ui, server)

Output:

7. checkboxGroupInput() in Shiny:checkboxGroupInput() creates a group of checkboxes for selecting multiple options.

Syntax

checkboxGroupInput(inputId, label, choices = NULL, selected = NULL, inline = FALSE, width = NULL, choiceNames = NULL, choiceValues = NULL)

Parameters

Parameter	Description
`inputId`	ID for accessing the selected checkbox values.
`label`	Label displayed above the checkboxes.
`choices`	List of values for the checkboxes. If named, the name is displayed instead of the value.
`selected`	Initial selected value(s).
`inline`	If `TRUE`, renders the checkboxes horizontally.
`width`	Width of the input element.
`choiceNames`	Names displayed for the choices.
`choiceValues`	Values corresponding to the choices.

Example: Using checkboxGroupInput

library(shiny)

ui <- fluidPage(
  checkboxGroupInput("choices", "Select Options:",
                     choiceNames = list("Apple", "Banana", "Cherry", "Date"),
                     choiceValues = list("apple", "banana", "cherry", "date")),
  textOutput("selection")
)

server <- function(input, output) {
  output$selection <- renderText({
    paste("You selected:", paste(input$choices, collapse = ", "))
  })
}

shinyApp(ui = ui, server = server)

8. textInput(): This function creates a text input box for users to enter text.

Syntax:

textInput(inputId, label, value = "", width = NULL, placeholder = NULL)

Parameters:

Parameter	Description
`inputId`	The ID of the input element, used to retrieve the value in the server function.
`label`	The text label displayed for the input box.
`value`	The initial value of the input box (optional).
`width`	Specifies the width of the input box (e.g., ‘300px’, ‘50%’).
`placeholder`	Provides a hint about the expected input in the box.

Example: Simple Text Input and Display

# Load Shiny library
library(shiny)

# UI layout
ui <- fluidPage(
  textInput("userText", "Enter text here:", "Type something"),
  verbatimTextOutput("displayText")
)

# Server logic
server <- function(input, output) {
  output$displayText <- renderText({ input$userText })
}

# Create Shiny app
shinyApp(ui = ui, server = server)

Output:
A text input box appears, where users can type text. The entered text is displayed below the input box.

9. textOutput():
This function creates an output text element to display reactive text in your Shiny app.

Syntax:

textOutput(outputId, container = if (inline) span else div, inline = FALSE)

Parameters:

Parameter	Description
`outputId`	The ID used to access the output text in the server.
`container`	A function (e.g., `div`, `span`) that wraps the output HTML element.
`inline`	Boolean value indicating if the output should be displayed inline or block.

Example: Welcome Message

# Load Shiny library
library(shiny)

# UI layout
ui <- fluidPage(
  textInput("userName", "Enter your name:"),
  textOutput("welcomeText")
)

# Server logic
server <- function(input, output, session) {
  output$welcomeText <- renderText({
    paste("Hello,", input$userName, "! Welcome to the Shiny app.")
  })
}

# Create Shiny app
shinyApp(ui = ui, server = server)

10. wellPanel():
This function creates a bordered box with a gray background to highlight specific elements in your app.

Syntax:

wellPanel(...)

Output:
A text input box appears, where users can type text. The entered text is displayed below the input box.

Parameters:

Parameter	Description
`...`	UI elements to be placed inside the panel.

Example: Histogram Inside a Panel

# Load Shiny library
library(shiny)

# UI layout
ui <- fluidPage(
  sliderInput("numValues", "Choose a number:", min = 10, max = 100, value = 50),
  wellPanel(
    plotOutput("histPlot")
  )
)

# Server logic
server <- function(input, output) {
  output$histPlot <- renderPlot({
    hist(rnorm(input$numValues), col = "lightblue", main = "Sample Histogram")
  })
}

# Create Shiny app
shinyApp(ui = ui, server = server)

Output:
A histogram is displayed inside a gray-bordered well panel. The number of data points is controlled by a slider.

Enhanced Example: Interactive Scatter Plot

# Load Shiny library
library(shiny)

# UI layout
ui <- fluidPage(
  titlePanel("Interactive Scatter Plot"),
  sidebarLayout(
    sidebarPanel(
      numericInput("numPoints", "Number of Points:", value = 50, min = 10, max = 100),
      br(),
      actionButton("updateBtn", "Generate Plot")
    ),
    mainPanel(
      plotOutput("scatterPlot", height = "400px")
    )
  )
)

# Server logic
server <- function(input, output) {
  # Reactive function to generate data
  scatterData <- reactive({
    data.frame(
      x = rnorm(input$numPoints),
      y = rnorm(input$numPoints)
    )
  })

  # Render scatter plot
  observeEvent(input$updateBtn, {
    output$scatterPlot <- renderPlot({
      plot(
        scatterData()$x, scatterData()$y,
        main = "Scatter Plot",
        xlab = "X-axis", ylab = "Y-axis",
        col = "blue", pch = 19, xlim = c(-3, 3), ylim = c(-3, 3)
      )
    })
  })
}

# Create Shiny app
shinyApp(ui = ui, server = server)

Output:
An interactive scatter plot is displayed. Users can control the number of points with a numeric input and update the plot using a button.

December 13, 2025

Grid and Lattice Packages in R Programming
Grid and Lattice Packages in detail

Every programming language offers packages to implement various functions. In R programming, packages bundle related functions to streamline development. To utilize these functions, installing and loading the respective packages is necessary. The CRAN repository hosts over 10,000 R packages. Notable packages like Grid and Lattice in R are used to implement graphical functions to create visual outputs such as rectangles, circles, histograms, bar plots, etc.

Grid Package in R

The Grid package, previously part of the CRAN repository, is now included as a base package in R. It serves as the foundation for advanced graphical functions in other packages like lattice and ggplot2. Moreover, it can modify lattice-generated outputs. Being a base package, it doesn’t require separate installation as it comes pre-installed with R.

To load the Grid package, use the following command in the console and select “grid” when prompted:
```
local({pkg <- select.list(sort(.packages(all.available = TRUE)), graphics = TRUE)
if(nchar(pkg)) library(pkg, character.only = TRUE)})
```
The Grid package provides several functions to create graphical objects, also known as “grobs.” Some of the functions include:
- circleGrob
- linesGrob
- polygonGrob
- rasterGrob
- rectGrob
- segmentsGrob
- legendGrob
- xaxisGrob
- yaxisGrob
To see the complete list of functions in the Grid package, use the following command:
```
library(help = "grid")
```
Example: Using the Grid Package

The following example demonstrates how to create and save graphical objects using the Grid package:
```
library(grid)

# Save output as a PNG file
png(file = "grid_example.png")

# Create a circular grob
circle <- circleGrob(name = "circle", x = 0.4, y = 0.4, r = 0.3,
                     gp = gpar(col = "blue", lty = 2))

# Draw the circle grob
grid.draw(circle)

# Create a rectangular grob
rectangle <- rectGrob(name = "rectangle", x = 0.6, y = 0.6,
                      width = 0.4, height = 0.3,
                      gp = gpar(fill = "lightgreen", col = "darkgreen"))

# Draw the rectangle grob
grid.draw(rectangle)

# Save the file
dev.off()
```
Output

Lattice Package in R

The Lattice package builds upon the Grid package to create Trellis graphics. These graphics are particularly useful for visualizing relationships between multiple variables under different conditions.

Installing the Lattice Package

The Lattice package can be installed using the following command:
```
install.packages("lattice")
```
Lattice provides various graph types, including:
- barchart
- contourplot
- densityplot
- histogram
The general syntax for using these graphs is:
```
graph_type(formula, data)
```
- graph_type: Specifies the type of graph to generate.
- formula: Defines the variables or conditional relationships.
To view all functions in the Lattice package, use:
```
library(help = "lattice")
```
Example 1: Density Plot
```
library(lattice)

# Use the built-in mtcars dataset

# Save output as a PNG file
png(file = "density_plot_example.png")

# Create a density plot for the variable 'mpg'
densityplot(~mpg, data = mtcars,
            main = "Density Plot of MPG",
            xlab = "Miles per Gallon")

# Save the file
dev.off()
```
Output:

Example 2: Histogram
```
library(lattice)

# Use the built-in ToothGrowth dataset

# Save output as a PNG file
png(file = "histogram_example.png")

# Create a histogram for the variable 'len'
histogram(~len, data = ToothGrowth,
          main = "Histogram of Length",
          xlab = "Length")

# Save the file
dev.off()
```
Output:

Both the Grid and Lattice packages offer powerful tools for graphical representations in R, making it easier to visualize and analyze data effectively.
December 13, 2025

Data visualization with R and ggplot2

Data visualization with ggplot2 in detail

Data visualization with R and ggplot2, also known as the Grammar of Graphics, is a free, open-source, and user-friendly visualization package widely utilized in the R programming language. Created by Hadley Wickham, it is one of the most powerful tools for data visualization.

Key Layers of ggplot2

The ggplot2 package operates on several layers, which include:

Data: The dataset used for visualization.
Aesthetics: Mapping data attributes to visual properties such as x-axis, y-axis, color, fill, size, labels, alpha, shape, line width, and line type.
Geometric Objects: How data is represented visually, such as points, lines, histograms, bars, or boxplots.
Facets: Splitting data into subsets displayed in separate panels using rows or columns.
Statistics: Applying transformations like binning, smoothing, or descriptive summaries.
Coordinates: Mapping data points to specific spaces (e.g., Cartesian, fixed, polar) and adjusting limits.
Themes: Customizing non-data elements like font size, background, and color.

Dataset Used: `mtcars`

The mtcars dataset contains fuel consumption and 10 other automobile design and performance attributes for 32 cars. It comes pre-installed with the R environment.

Viewing the First Few Records

# Print the first 6 records of the dataset
head(mtcars)

Output:

mpg	cyl	disp	hp	drat	wt	qsec	vs	am	gear	carb
Mazda RX4	21.0	6	160	110	3.90	2.620	16.46	0	1	4	4
Mazda RX4 Wag	21.0	6	160	110	3.90	2.875	17.02	0	1	4	4
Datsun 710	22.8	4	108	93	3.85	2.320	18.61	1	1	4	1
Hornet 4 Drive	21.4	6	258	110	3.08	3.215	19.44	1	0	3	1
Hornet Sportabout	18.7	8	360	175	3.15	3.440	17.02	0	0	3	2
Valiant	18.1	6	225	105	2.76	3.460	20.22	1	0	3	1

Summary Statistics of mtcars

# Load dplyr package and get a summary of the dataset
library(dplyr)

# Summary of the dataset
summary(mtcars)

Output:

Variable	Min	1st Quartile	Median	Mean	3rd Quartile	Max
mpg	10.4	15.43	19.20	20.09	22.80	33.90
cyl	4.0	4.0	6.0	6.19	8.0	8.0
disp	71.1	120.8	196.3	230.7	326.0	472.0
hp	52.0	96.5	123.0	146.7	180.0	335.0
drat	2.76	3.08	3.70	3.60	3.92	4.93
wt	1.51	2.58	3.32	3.22	3.61	5.42
qsec	14.5	16.89	17.71	17.85	18.90	22.90
vs	0.0	0.0	0.0	0.44	1.0	1.0
am	0.0	0.0	0.0	0.41	1.0	1.0
gear	3.0	3.0	4.0	3.69	4.0	5.0
carb	1.0	2.0	2.0	2.81	4.0	8.0

Visualizing Data with ggplot2

Data Layer: The data layer specifies the dataset to visualize.

# Load ggplot2 and define the data layer
library(ggplot2)

ggplot(data = mtcars) +
  labs(title = "Visualization of MTCars Data")

Output:

Aesthetic Layer: Mapping data to visual attributes such as axes, color, or shape.

# Add aesthetics
ggplot(data = mtcars, aes(x = hp, y = mpg, col = disp)) +
  labs(title = "Horsepower vs Miles per Gallon")

Output:

Geometric Layer: Adding geometric shapes to display the data.

# Plot data using points
plot1 <- ggplot(data = mtcars, aes(x = hp, y = mpg, col = disp)) +
  geom_point() +
  labs(title = "Horsepower vs Miles per Gallon", x = "Horsepower", y = "Miles per Gallon")

Output:

Faceting: Create separate plots for subsets of data.

# Facet by transmission type
facet_plot <- ggplot(data = mtcars, aes(x = hp, y = mpg, shape = factor(cyl))) +
geom_point()
facet_grid()}

Output:

Statistics Layer: The statistics layer in ggplot2 allows you to transform your data by applying methods like binning, smoothing, or descriptive statistics.

# Scatter plot with a regression line
ggplot(data = mtcars, aes(x = hp, y = mpg)) +
  geom_point() +
  stat_smooth(method = lm, col = "blue") +
  labs(title = "Relationship Between Horsepower and Miles per Gallon")

Output:

Coordinates Layer: In this layer, data coordinates are mapped to the plot’s visual space. Adjustments to axes, zooming, and proportional scaling of the plot can also be made here.

# Scatter plot with controlled axis limits
ggplot(data = mtcars, aes(x = wt, y = mpg)) +
  geom_point() +
  stat_smooth(method = lm, col = "green") +
  scale_y_continuous("Miles per Gallon", limits = c(5, 35), expand = c(0, 0)) +
  scale_x_continuous("Weight", limits = c(1, 6), expand = c(0, 0)) +
  coord_equal() +
  labs(title = "Effect of Weight on Fuel Efficiency")

Output:

Using coord_cartesian() to Zoom In

# Zoom into specific x-axis and y-axis ranges
ggplot(data = mtcars, aes(x = wt, y = hp, col = as.factor(am))) +
  geom_point() +
  geom_smooth() +
  coord_cartesian(xlim = c(3, 5), ylim = c(100, 300)) +
  labs(title = "Zoomed View: Horsepower vs Weight",
       x = "Weight",
       y = "Horsepower",
       color = "Transmission")

Output:

Theme Layer: The theme layer in ggplot2 allows fine control over display elements like background color, font size, and overall styling.

Example 1: Customizing the Background with element_rect()

ggplot(data = mtcars, aes(x = hp, y = mpg)) +
geom_point() +
facet_grid(. ~ cyl) +
theme(plot.background = element_rect(fill = "lightgray", colour = "black")) +
labs(title = "Background Customization: Horsepower vs MPG")

Output:

Example 2: Using theme_gray()

ggplot(data = mtcars, aes(x = hp, y = mpg)) +
geom_point() +
facet_grid(am ~ cyl) +
theme_gray() +
labs(title = "Default Theme: Horsepower and MPG Facets")

Output:

Contour Plot for the mtcars Dataset: Create a density contour plot to visualize the relationship between two continuous variables.

# 2D density contour plot
ggplot(mtcars, aes(x = wt, y = mpg)) +
  stat_density_2d(aes(fill = ..level..), geom = "polygon", color = "black") +
  scale_fill_viridis_c() +
  labs(title = "2D Density Contour: Weight vs MPG",
       x = "Weight",
       y = "Miles per Gallon",
       fill = "Density Levels") +
  theme_minimal()

Output:

Creating a Panel of Plots: Create multiple plots and arrange them in a grid for side-by-side visualization.

library(gridExtra)

# Histograms for selected variables
hist_plot_mpg <- ggplot(mtcars, aes(x = mpg)) +
  geom_histogram(binwidth = 2, fill = "steelblue", color = "black") +
  labs(title = "Miles per Gallon Distribution", x = "MPG", y = "Frequency")

hist_plot_disp <- ggplot(mtcars, aes(x = disp)) +
  geom_histogram(binwidth = 50, fill = "darkred", color = "black") +
  labs(title = "Displacement Distribution", x = "Displacement", y = "Frequency")

hist_plot_hp <- ggplot(mtcars, aes(x = hp)) +
  geom_histogram(binwidth = 20, fill = "forestgreen", color = "black") +
  labs(title = "Horsepower Distribution", x = "Horsepower", y = "Frequency")

hist_plot_drat <- ggplot(mtcars, aes(x = drat)) +
  geom_histogram(binwidth = 0.5, fill = "orange", color = "black") +
  labs(title = "Drat Distribution", x = "Drat", y = "Frequency")

# Arrange plots in a 2x2 grid
grid.arrange(hist_plot_mpg, hist_plot_disp, hist_plot_hp, hist_plot_drat, ncol = 2)

Output:

Saving and Extracting Plots

To save plots as image files or reuse them later:

# Create a plot
plot <- ggplot(data = mtcars, aes(x = hp, y = mpg)) +
  geom_point() +
  labs(title = "Horsepower vs MPG")

# Save the plot as PNG
ggsave("horsepower_vs_mpg.png", plot)

# Save the plot as PDF
ggsave("horsepower_vs_mpg.pdf", plot)

# Extract the plot for reuse
extracted_plot <- plot
plot

Output:

December 13, 2025

dplyr Package in R Programming

dplyr Package in detail

The dplyr package in the R programming language is a powerful tool for data manipulation. It provides a streamlined set of functions (or verbs) to handle common data manipulation tasks efficiently and intuitively.

Key Benefits of dplyr

Simplifies data manipulation by offering a set of well-defined functions.
Speeds up development by enabling concise and readable code.
Reduces computational time through optimized backends for data operations.

Data Frames and Tibbles

Data Frames: Data frames in R are structured tables where each column holds data of a specific type, such as names, ages, or scores. You can create a data frame using the following code:

# Create a data frame
students <- data.frame(
  Name = c("Amit", "Priya", "Rohan"),
  Age = c(20, 21, 19),
  Score = c(88, 92, 85)
)
print(students)

Output:

Name Age Score
1  Amit  20    88
2 Priya  21    92
3 Rohan  19    85

Tibbles: Tibbles, introduced by the tibble package, are a modern version of data frames with enhanced features. You can create a tibble as follows:

# Load tibble library
library(tibble)

# Create a tibble
students_tibble <- tibble(
  Name = c("Amit", "Priya", "Rohan"),
  Age = c(20, 21, 19),
  Score = c(88, 92, 85)
)
print(students_tibble)

Pipes (%>%): The pipe operator (%>%) in dplyr allows chaining multiple operations together for improved code readability.

# Load dplyr library
library(dplyr)

# Use pipes to filter, select, group, and summarize data
result <- mtcars %>%
  filter(mpg > 25) %>%       # Filter rows where mpg is greater than 25
  select(mpg, cyl, hp) %>%   # Select specific columns
  group_by(cyl) %>%          # Group data by the 'cyl' variable
  summarise(mean_hp = mean(hp))  # Calculate mean horsepower for each group

print(result)

Output:

cyl mean_hp
  <dbl>   <dbl>
1     4    81.88

Verb Functions in `dplyr`

1. filter(): Use filter() to select rows based on conditions.

# Create a data frame
data <- data.frame(
  Name = c("Anita", "Rahul", "Sanjay", "Meera"),
  Age = c(28, 25, 30, 24),
  Height = c(5.4, NA, 5.9, NA)
)

# Filter rows with missing Height values
rows_with_na <- data %>% filter(is.na(Height))
print(rows_with_na)

# Filter rows without missing Height values
rows_without_na <- data %>% filter(!is.na(Height))
print(rows_without_na)

Output:

Rows with missing Height:
    Name Age Height
1  Rahul  25     NA
2  Meera  24     NA

Rows without missing Height:
    Name Age Height
1  Anita  28    5.4
2 Sanjay  30    5.9

2. arrange(): Use arrange() to reorder rows based on column values.

# Arrange data by Age in ascending order
sorted_data <- data %>% arrange(Age)
print(sorted_data)

Output:

Name Age Height
1  Meera  24     NA
2  Rahul  25     NA
3  Anita  28    5.4
4 Sanjay  30    5.9

3. select() and rename(): Use select() to choose columns and rename() to rename them.

# Select specific columns
selected_columns <- data %>% select(Name, Age)
print(selected_columns)

# Rename columns
renamed_data <- data %>% rename(FullName = Name, Years = Age)
print(renamed_data)

Output:

Selected Columns:
    Name Age
1  Anita  28
2  Rahul  25
3 Sanjay  30
4  Meera  24

Renamed Columns:
    FullName Years Height
1     Anita    28    5.4
2     Rahul    25     NA
3    Sanjay    30    5.9
4     Meera    24     NA

4. mutate() and transmute(): Use mutate() to add new columns while retaining existing ones. Use transmute() to create new columns and drop others.

# Add a new column (mutate)
mutated_data <- data %>% mutate(BMI = round((Height * 10) / Age, 2))
print(mutated_data)

# Add a new column and drop others (transmute)
transmuted_data <- data %>% transmute(BMI = round((Height * 10) / Age, 2))
print(transmuted_data)

Output:

Mutated Data:
    Name Age Height   BMI
1  Anita  28    5.4  1.93
2  Rahul  25     NA    NA
3 Sanjay  30    5.9  1.97
4  Meera  24     NA    NA

Transmuted Data:
   BMI
1 1.93
2   NA
3 1.97
4   NA

5. summarise(): Use summarise() to condense multiple values into a single summary.

# Calculate the average age
average_age <- data %>% summarise(AverageAge = mean(Age))
print(average_age)

Output:

AverageAge
1       26.75

6. sample_n() and sample_frac():Use these functions to take random samples of rows.

# Take 2 random rows
random_rows <- data %>% sample_n(2)
print(random_rows)

# Take 50% of rows randomly
random_fraction <- data %>% sample_frac(0.5)
print(random_fraction)

Output:

Random Rows:
    Name Age Height
1 Sanjay  30    5.9
2  Meera  24     NA

Random Fraction:
    Name Age Height
1  Anita  28    5.4
2  Rahul  25     NA

December 13, 2025

Author: Pooja Kotwani

Exporting Data in detail

Exporting Data to a Text File

Exporting Data to a CSV File

Import Data from a File in detail

Importing Data from Files into R

Importing Data from a Text File

Importing Data from a Delimited File

Importing XML Files

Importing SPSS Files

Data Handling in detail

Importing Data in R

Reading a CSV (Comma-Separated Values) File

Reading a Tab-Delimited (.txt) File

Using RStudio to Import Data

Reading JSON Files in R

Data Handling in detail

Directory Functions in R

Importing Files in R

Exporting Files in R

Data Munging in detail

Using the apply() Family for Data Munging

Using aggregate() in R

Using the plyr Package

Using the dplyr Package

Tidyverse Packages in detail

What Are the Tidyverse Packages in R?

Specialized Packages

Data Visualization and Exploration

Data Wrangling and Transformation

Data Import and Management in Tidyverse in R

Functional Programming in Tidyverse in R

Shiny Package in detail

Installing the Shiny Package in R

Important Functions in the Shiny Package

Grid and Lattice Packages in detail

Grid Package in R

Lattice Package in R

Data visualization with ggplot2 in detail

Key Layers of ggplot2

Dataset Used: mtcars

Visualizing Data with ggplot2

dplyr Package in detail

Data Frames and Tibbles

Verb Functions in dplyr

Using the `apply()` Family for Data Munging

Using `aggregate()` in R

Using the `plyr` Package

Using the `dplyr` Package

Dataset Used: `mtcars`

Verb Functions in `dplyr`