Boxplots in detail
A box graph (or boxplot) is used to display the distribution of data based on a five-number summary: the minimum, first quartile, median, third quartile, and maximum. In R, you can create boxplots using the boxplot() function.
Syntax:
boxplot(x, data, notch, varwidth, names, main)
- x: A vector or a formula.
- data: A data frame containing the variables.
- notch: Logical value indicating whether to display a notch (useful for comparing medians).
- varwidth: Logical value; if
TRUE, the box width is proportional to the square root of the sample size. - names: Group labels to be shown under each box.
- main: The main title of the chart.
Creating a Dataset Example
For illustration, we’ll use the iris dataset. First, let’s inspect a few rows of the data focusing on Sepal.Length and Species:
# Extract the relevant columns from the iris dataset
input <- iris[, c("Sepal.Length", "Species")]
head(input)
Output:
Sepal.Length Species
1 5.1 setosa
2 4.9 setosa
3 4.7 setosa
4 4.6 setosa
5 5.0 setosa
6 5.4 setosa
Basic Boxplot
Now, let’s create a simple boxplot to compare the sepal length across different species:
# Load the iris dataset
data(iris)
# Create the boxplot for Sepal Length grouped by Species
boxplot(Sepal.Length ~ Species, data = iris,
main = "Sepal Length by Species",
xlab = "Species",
ylab = "Sepal Length")
Output:

Boxplot with Notch
Notches can be added to boxplots to provide a rough guide for comparing medians between groups. Here’s how to create a notch boxplot with customized colors:
# Load the iris dataset
data(iris)
# Define custom colors for the boxes
custom_colors <- c("#FF6347", "#3CB371", "#4682B4")
# Create the notch boxplot with custom aesthetics
boxplot(Sepal.Length ~ Species, data = iris,
main = "Sepal Length by Species",
xlab = "Species",
ylab = "Sepal Length",
col = custom_colors, border = "black",
notch = TRUE, notchwidth = 0.5,
medcol = "white", whiskcol = "black",
boxwex = 0.5, outpch = 19, outcol = "black")
# Add a legend to the plot
legend("topright", legend = unique(iris$Species),
fill = custom_colors, border = "black", title = "Species")
Output:

Multiple Boxplots
Let’s create multiple boxplots for different variables from the iris dataset. We will compare the distributions of Sepal.Length, Sepal.Width, Petal.Length, and Petal.Width across the species. The plotting area will be divided into multiple panels.
# Load the iris dataset
data(iris)
# List the variables for which we want to create boxplots
variables <- c("Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width")
# Set up the plotting layout: one row and one column per variable
par(mfrow = c(1, length(variables)))
# Create boxplots for each variable grouped by Species
for (var in variables) {
boxplot(get(var) ~ Species, data = iris,
main = paste("Boxplot of", var),
xlab = "Species",
ylab = var,
col = "lightblue", border = "black",
notch = TRUE, notchwidth = 0.5,
medcol = "white", whiskcol = "black",
boxwex = 0.5, outpch = 19, outcol = "black")
}
# Reset the plotting layout to default
par(mfrow = c(1, 1))
Output:

Leave a Reply