Data visualization with ggplot2 in detail
Data visualization with R and ggplot2, also known as the Grammar of Graphics, is a free, open-source, and user-friendly visualization package widely utilized in the R programming language. Created by Hadley Wickham, it is one of the most powerful tools for data visualization.
Key Layers of ggplot2
The ggplot2 package operates on several layers, which include:
- Data: The dataset used for visualization.
- Aesthetics: Mapping data attributes to visual properties such as x-axis, y-axis, color, fill, size, labels, alpha, shape, line width, and line type.
- Geometric Objects: How data is represented visually, such as points, lines, histograms, bars, or boxplots.
- Facets: Splitting data into subsets displayed in separate panels using rows or columns.
- Statistics: Applying transformations like binning, smoothing, or descriptive summaries.
- Coordinates: Mapping data points to specific spaces (e.g., Cartesian, fixed, polar) and adjusting limits.
- Themes: Customizing non-data elements like font size, background, and color.
Dataset Used: mtcars
The mtcars dataset contains fuel consumption and 10 other automobile design and performance attributes for 32 cars. It comes pre-installed with the R environment.
Viewing the First Few Records
# Print the first 6 records of the dataset
head(mtcars)
Output:
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
Summary Statistics of mtcars
# Load dplyr package and get a summary of the dataset
library(dplyr)
# Summary of the dataset
summary(mtcars)
Output:
| Variable | Min | 1st Quartile | Median | Mean | 3rd Quartile | Max |
| mpg | 10.4 | 15.43 | 19.20 | 20.09 | 22.80 | 33.90 |
| cyl | 4.0 | 4.0 | 6.0 | 6.19 | 8.0 | 8.0 |
| disp | 71.1 | 120.8 | 196.3 | 230.7 | 326.0 | 472.0 |
| hp | 52.0 | 96.5 | 123.0 | 146.7 | 180.0 | 335.0 |
| drat | 2.76 | 3.08 | 3.70 | 3.60 | 3.92 | 4.93 |
| wt | 1.51 | 2.58 | 3.32 | 3.22 | 3.61 | 5.42 |
| qsec | 14.5 | 16.89 | 17.71 | 17.85 | 18.90 | 22.90 |
| vs | 0.0 | 0.0 | 0.0 | 0.44 | 1.0 | 1.0 |
| am | 0.0 | 0.0 | 0.0 | 0.41 | 1.0 | 1.0 |
| gear | 3.0 | 3.0 | 4.0 | 3.69 | 4.0 | 5.0 |
| carb | 1.0 | 2.0 | 2.0 | 2.81 | 4.0 | 8.0 |
Visualizing Data with ggplot2
Data Layer: The data layer specifies the dataset to visualize.
# Load ggplot2 and define the data layer
library(ggplot2)
ggplot(data = mtcars) +
labs(title = "Visualization of MTCars Data")
Output:

Aesthetic Layer: Mapping data to visual attributes such as axes, color, or shape.
# Add aesthetics
ggplot(data = mtcars, aes(x = hp, y = mpg, col = disp)) +
labs(title = "Horsepower vs Miles per Gallon")
Output:

Geometric Layer: Adding geometric shapes to display the data.
# Plot data using points
plot1 <- ggplot(data = mtcars, aes(x = hp, y = mpg, col = disp)) +
geom_point() +
labs(title = "Horsepower vs Miles per Gallon", x = "Horsepower", y = "Miles per Gallon")
Output:

Faceting: Create separate plots for subsets of data.
# Facet by transmission type
facet_plot <- ggplot(data = mtcars, aes(x = hp, y = mpg, shape = factor(cyl))) +
geom_point()
facet_grid()}
Output:

Statistics Layer: The statistics layer in ggplot2 allows you to transform your data by applying methods like binning, smoothing, or descriptive statistics.
# Scatter plot with a regression line
ggplot(data = mtcars, aes(x = hp, y = mpg)) +
geom_point() +
stat_smooth(method = lm, col = "blue") +
labs(title = "Relationship Between Horsepower and Miles per Gallon")
Output:

Coordinates Layer: In this layer, data coordinates are mapped to the plot’s visual space. Adjustments to axes, zooming, and proportional scaling of the plot can also be made here.
# Scatter plot with controlled axis limits
ggplot(data = mtcars, aes(x = wt, y = mpg)) +
geom_point() +
stat_smooth(method = lm, col = "green") +
scale_y_continuous("Miles per Gallon", limits = c(5, 35), expand = c(0, 0)) +
scale_x_continuous("Weight", limits = c(1, 6), expand = c(0, 0)) +
coord_equal() +
labs(title = "Effect of Weight on Fuel Efficiency")
Output:

Using coord_cartesian() to Zoom In
# Zoom into specific x-axis and y-axis ranges
ggplot(data = mtcars, aes(x = wt, y = hp, col = as.factor(am))) +
geom_point() +
geom_smooth() +
coord_cartesian(xlim = c(3, 5), ylim = c(100, 300)) +
labs(title = "Zoomed View: Horsepower vs Weight",
x = "Weight",
y = "Horsepower",
color = "Transmission")
Output:

Theme Layer: The theme layer in ggplot2 allows fine control over display elements like background color, font size, and overall styling.
Example 1: Customizing the Background with element_rect()
ggplot(data = mtcars, aes(x = hp, y = mpg)) +
geom_point() +
facet_grid(. ~ cyl) +
theme(plot.background = element_rect(fill = "lightgray", colour = "black")) +
labs(title = "Background Customization: Horsepower vs MPG")
Output:

Example 2: Using theme_gray()
ggplot(data = mtcars, aes(x = hp, y = mpg)) +
geom_point() +
facet_grid(am ~ cyl) +
theme_gray() +
labs(title = "Default Theme: Horsepower and MPG Facets")
Output:

Contour Plot for the mtcars Dataset: Create a density contour plot to visualize the relationship between two continuous variables.
# 2D density contour plot
ggplot(mtcars, aes(x = wt, y = mpg)) +
stat_density_2d(aes(fill = ..level..), geom = "polygon", color = "black") +
scale_fill_viridis_c() +
labs(title = "2D Density Contour: Weight vs MPG",
x = "Weight",
y = "Miles per Gallon",
fill = "Density Levels") +
theme_minimal()
Output:

Creating a Panel of Plots: Create multiple plots and arrange them in a grid for side-by-side visualization.
library(gridExtra)
# Histograms for selected variables
hist_plot_mpg <- ggplot(mtcars, aes(x = mpg)) +
geom_histogram(binwidth = 2, fill = "steelblue", color = "black") +
labs(title = "Miles per Gallon Distribution", x = "MPG", y = "Frequency")
hist_plot_disp <- ggplot(mtcars, aes(x = disp)) +
geom_histogram(binwidth = 50, fill = "darkred", color = "black") +
labs(title = "Displacement Distribution", x = "Displacement", y = "Frequency")
hist_plot_hp <- ggplot(mtcars, aes(x = hp)) +
geom_histogram(binwidth = 20, fill = "forestgreen", color = "black") +
labs(title = "Horsepower Distribution", x = "Horsepower", y = "Frequency")
hist_plot_drat <- ggplot(mtcars, aes(x = drat)) +
geom_histogram(binwidth = 0.5, fill = "orange", color = "black") +
labs(title = "Drat Distribution", x = "Drat", y = "Frequency")
# Arrange plots in a 2x2 grid
grid.arrange(hist_plot_mpg, hist_plot_disp, hist_plot_hp, hist_plot_drat, ncol = 2)
Output:

Saving and Extracting Plots
To save plots as image files or reuse them later:
# Create a plot
plot <- ggplot(data = mtcars, aes(x = hp, y = mpg)) +
geom_point() +
labs(title = "Horsepower vs MPG")
# Save the plot as PNG
ggsave("horsepower_vs_mpg.png", plot)
# Save the plot as PDF
ggsave("horsepower_vs_mpg.pdf", plot)
# Extract the plot for reuse
extracted_plot <- plot
plot
Output:
