Data visualization with R and ggplot2

Data visualization with ggplot2 in detail

Data visualization with R and ggplot2, also known as the Grammar of Graphics, is a free, open-source, and user-friendly visualization package widely utilized in the R programming language. Created by Hadley Wickham, it is one of the most powerful tools for data visualization.

Key Layers of ggplot2

The ggplot2 package operates on several layers, which include:

  1. Data: The dataset used for visualization.
  2. Aesthetics: Mapping data attributes to visual properties such as x-axis, y-axis, color, fill, size, labels, alpha, shape, line width, and line type.
  3. Geometric Objects: How data is represented visually, such as points, lines, histograms, bars, or boxplots.
  4. Facets: Splitting data into subsets displayed in separate panels using rows or columns.
  5. Statistics: Applying transformations like binning, smoothing, or descriptive summaries.
  6. Coordinates: Mapping data points to specific spaces (e.g., Cartesian, fixed, polar) and adjusting limits.
  7. Themes: Customizing non-data elements like font size, background, and color.
Dataset Used: mtcars

The mtcars dataset contains fuel consumption and 10 other automobile design and performance attributes for 32 cars. It comes pre-installed with the R environment.

Viewing the First Few Records

# Print the first 6 records of the dataset
head(mtcars)

Output:

mpg	cyl	disp	hp	drat	wt	qsec	vs	am	gear	carb
Mazda RX4	21.0	6	160	110	3.90	2.620	16.46	0	1	4	4
Mazda RX4 Wag	21.0	6	160	110	3.90	2.875	17.02	0	1	4	4
Datsun 710	22.8	4	108	93	3.85	2.320	18.61	1	1	4	1
Hornet 4 Drive	21.4	6	258	110	3.08	3.215	19.44	1	0	3	1
Hornet Sportabout	18.7	8	360	175	3.15	3.440	17.02	0	0	3	2
Valiant	18.1	6	225	105	2.76	3.460	20.22	1	0	3	1

Summary Statistics of mtcars

# Load dplyr package and get a summary of the dataset
library(dplyr)

# Summary of the dataset
summary(mtcars)

Output:

VariableMin1st QuartileMedianMean3rd QuartileMax
mpg10.415.4319.2020.0922.8033.90
cyl4.04.06.06.198.08.0
disp71.1120.8196.3230.7326.0472.0
hp52.096.5123.0146.7180.0335.0
drat2.763.083.703.603.924.93
wt1.512.583.323.223.615.42
qsec14.516.8917.7117.8518.9022.90
vs0.00.00.00.441.01.0
am0.00.00.00.411.01.0
gear3.03.04.03.694.05.0
carb1.02.02.02.814.08.0
Visualizing Data with ggplot2

Data Layer: The data layer specifies the dataset to visualize.

# Load ggplot2 and define the data layer
library(ggplot2)

ggplot(data = mtcars) +
  labs(title = "Visualization of MTCars Data")

Output:

Aesthetic Layer: Mapping data to visual attributes such as axes, color, or shape.

# Add aesthetics
ggplot(data = mtcars, aes(x = hp, y = mpg, col = disp)) +
  labs(title = "Horsepower vs Miles per Gallon")

Output:

Geometric Layer: Adding geometric shapes to display the data.

# Plot data using points
plot1 <- ggplot(data = mtcars, aes(x = hp, y = mpg, col = disp)) +
  geom_point() +
  labs(title = "Horsepower vs Miles per Gallon", x = "Horsepower", y = "Miles per Gallon")

Output:

Faceting: Create separate plots for subsets of data.

# Facet by transmission type
facet_plot <- ggplot(data = mtcars, aes(x = hp, y = mpg, shape = factor(cyl))) +
geom_point()
facet_grid()}

Output:

Statistics Layer: The statistics layer in ggplot2 allows you to transform your data by applying methods like binning, smoothing, or descriptive statistics.

# Scatter plot with a regression line
ggplot(data = mtcars, aes(x = hp, y = mpg)) +
  geom_point() +
  stat_smooth(method = lm, col = "blue") +
  labs(title = "Relationship Between Horsepower and Miles per Gallon")

Output:

Coordinates Layer: In this layer, data coordinates are mapped to the plot’s visual space. Adjustments to axes, zooming, and proportional scaling of the plot can also be made here.

# Scatter plot with controlled axis limits
ggplot(data = mtcars, aes(x = wt, y = mpg)) +
  geom_point() +
  stat_smooth(method = lm, col = "green") +
  scale_y_continuous("Miles per Gallon", limits = c(5, 35), expand = c(0, 0)) +
  scale_x_continuous("Weight", limits = c(1, 6), expand = c(0, 0)) +
  coord_equal() +
  labs(title = "Effect of Weight on Fuel Efficiency")

Output:

Using coord_cartesian() to Zoom In

# Zoom into specific x-axis and y-axis ranges
ggplot(data = mtcars, aes(x = wt, y = hp, col = as.factor(am))) +
  geom_point() +
  geom_smooth() +
  coord_cartesian(xlim = c(3, 5), ylim = c(100, 300)) +
  labs(title = "Zoomed View: Horsepower vs Weight",
       x = "Weight",
       y = "Horsepower",
       color = "Transmission")

Output:

Theme Layer: The theme layer in ggplot2 allows fine control over display elements like background color, font size, and overall styling.

Example 1: Customizing the Background with element_rect()

ggplot(data = mtcars, aes(x = hp, y = mpg)) +
geom_point() +
facet_grid(. ~ cyl) +
theme(plot.background = element_rect(fill = "lightgray", colour = "black")) +
labs(title = "Background Customization: Horsepower vs MPG")

Output:

Example 2: Using theme_gray()

ggplot(data = mtcars, aes(x = hp, y = mpg)) +
geom_point() +
facet_grid(am ~ cyl) +
theme_gray() +
labs(title = "Default Theme: Horsepower and MPG Facets")

Output:

Contour Plot for the mtcars Dataset: Create a density contour plot to visualize the relationship between two continuous variables.

# 2D density contour plot
ggplot(mtcars, aes(x = wt, y = mpg)) +
  stat_density_2d(aes(fill = ..level..), geom = "polygon", color = "black") +
  scale_fill_viridis_c() +
  labs(title = "2D Density Contour: Weight vs MPG",
       x = "Weight",
       y = "Miles per Gallon",
       fill = "Density Levels") +
  theme_minimal()

Output:

Creating a Panel of Plots: Create multiple plots and arrange them in a grid for side-by-side visualization.

library(gridExtra)

# Histograms for selected variables
hist_plot_mpg <- ggplot(mtcars, aes(x = mpg)) +
  geom_histogram(binwidth = 2, fill = "steelblue", color = "black") +
  labs(title = "Miles per Gallon Distribution", x = "MPG", y = "Frequency")

hist_plot_disp <- ggplot(mtcars, aes(x = disp)) +
  geom_histogram(binwidth = 50, fill = "darkred", color = "black") +
  labs(title = "Displacement Distribution", x = "Displacement", y = "Frequency")

hist_plot_hp <- ggplot(mtcars, aes(x = hp)) +
  geom_histogram(binwidth = 20, fill = "forestgreen", color = "black") +
  labs(title = "Horsepower Distribution", x = "Horsepower", y = "Frequency")

hist_plot_drat <- ggplot(mtcars, aes(x = drat)) +
  geom_histogram(binwidth = 0.5, fill = "orange", color = "black") +
  labs(title = "Drat Distribution", x = "Drat", y = "Frequency")

# Arrange plots in a 2x2 grid
grid.arrange(hist_plot_mpg, hist_plot_disp, hist_plot_hp, hist_plot_drat, ncol = 2)

Output:

Saving and Extracting Plots

To save plots as image files or reuse them later:

# Create a plot
plot <- ggplot(data = mtcars, aes(x = hp, y = mpg)) +
  geom_point() +
  labs(title = "Horsepower vs MPG")

# Save the plot as PNG
ggsave("horsepower_vs_mpg.png", plot)

# Save the plot as PDF
ggsave("horsepower_vs_mpg.pdf", plot)

# Extract the plot for reuse
extracted_plot <- plot
plot

Output:

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *