Category: R (programming language)

  • R – Waffle Chart

    Waffle Chart in detail

    A waffle chart provides a clear visual representation of how individual components contribute to a whole. It’s particularly useful for tracking progress toward a goal or for showing parts-to-whole relationships. Unlike pie charts, waffle charts use a grid of equal-sized squares to represent data without distorting the proportions.

    Implementation in R

    We’ll use ggplot2 for its versatile and elegant plotting capabilities along with the waffle package, an extension that simplifies the creation of waffle charts.

    Installing Required Packages

    To install the necessary packages in R Studio, run:

    install.packages("ggplot2")
    install.packages("waffle")

    Loading the Libraries

    Load the libraries with:

    library(ggplot2)
    library(waffle)

    Example: Company Expense Breakdown

    Suppose a company has a total expenditure of $100,000, divided into the following categories:

    • Salaries: $40,000
    • Marketing: $20,000
    • Operations: $15,000
    • Research & Development: $10,000
    • Miscellaneous: $15,000

    We can represent this data as a vector in R:

    Since we want each square to represent $1,000, dividing each value by 1,000 will give us exactly 100 squares (40 + 20 + 15 + 10 + 15).

    Plotting the Waffle Chart

    Use the following code to create the waffle chart:

    waffle(expenses/1000, rows = 5, size = 0.6,
           colors = c("#1f77b4", "#ff7f0e", "#2ca02c", "#d62728", "#9467bd"),
           title = "Company Expense Breakdown",
           xlab = "1 square = $1,000")

    Output:

  • R – Pareto Chart

    Pareto Chart in detail

    The Pareto chart combines a bar chart and a line chart: the left vertical axis shows the frequency of occurrences for different categories (sorted in descending order), and the right vertical axis displays the cumulative percentage. This visualization follows the Pareto principle, which states that roughly 80% of effects come from 20% of the causes.

    Syntax:

    pareto.chart(x,
                 ylab = "Frequency",
                 ylab2 = "Cumulative Percentage",
                 xlab,
                 cumperc = seq(0, 100, by = 25),
                 ylim,
                 main,
                 col = heat.colors(length(x)))

    Parameters:

    • x: A vector of values. The names attached to x are used for labeling the bars.
    • ylab: A string specifying the label for the primary y-axis (left side).
    • ylab2: A string specifying the label for the secondary y-axis (right side) showing the cumulative percentage.
    • xlab: A string specifying the label for the x-axis.
    • cumperc: A vector of percentage values to be used as tick marks for the secondary y-axis.
    • ylim: A numeric vector specifying the limits for the primary y-axis.
    • main: A string specifying the main title for the plot.
    • col: A value for the color, a vector of colors, or a palette for the bars.

    Steps for Plotting a Pareto Chart in R

    1. Create a vector that contains the frequency counts of different categories.
    2. Assign names to the vector elements to label each category.
    3. Plot the vector using the pareto.chart() function.

    Example:

    # Install and load the qcc package
    install.packages("qcc")
    library(qcc)
    
    # Frequency counts for various customer issues
    issues <- c(35, 850, 15, 50, 20, 120, 40, 10, 55, 500)
    
    # Labels for the issues
    names(issues) <- c("Late Delivery", "Damaged Item", "Incorrect Order",
                       "Poor Packaging", "Missing Item", "Wrong Product",
                       "Customer Service", "Billing Error", "Return Issues", "Other")
    
    # Generate the Pareto chart
    pareto.chart(issues,
                 xlab = "Issue Categories",        # Label for x-axis
                 ylab = "Frequency",               # Label for left y-axis
                 col = heat.colors(length(issues)),# Colors for the bars
                 cumperc = seq(0, 100, by = 20),     # Tick marks for the cumulative percentage
                 ylab2 = "Cumulative Percentage",  # Label for right y-axis
                 main = "Customer Complaints")     # Chart title

    Output:

    Example 2: Product Defects

    # Frequency counts for product defects
    defects <- c(6000, 3500, 4800, 2500, 900)
    
    # Labels for the product categories
    names(defects) <- c("Type X", "Type Y", "Type Z", "Type W", "Type V")
    
    # Generate the Pareto chart
    pareto.chart(defects,
                 xlab = "Product Categories",      # Label for x-axis
                 ylab = "Frequency",               # Label for left y-axis
                 col = heat.colors(length(defects)),# Colors for the bars
                 cumperc = seq(0, 100, by = 10),     # Tick marks for the cumulative percentage
                 ylab2 = "Cumulative Percentage",  # Label for right y-axis
                 main = "Product Defects")         # Chart title

    Output:

  • Create a Heatmap in R Programming – heatmap() Function

    heatmap() Function in detail

    A boxplot is a graphical summary that represents groups of numerical data using their quartiles. Since boxplots are non-parametric, they display the variation in samples from a statistical population without assuming any specific underlying distribution. The spacing within the box indicates the degree of dispersion and skewness in the data, while also highlighting outliers. Boxplots can be oriented vertically or horizontally, and they get their name from the rectangular “box” in the center.

    Stratified boxplots are used to examine the relationship between a categorical variable and a numeric variable, or to compare multiple groups defined by an additional categorical variable. They are especially useful for comparing the distributions of a numeric variable across different categories.

    Implementation in R

    Stratified boxplots in R can be created using the boxplot() function from the R Graphics Package. Here is the syntax and a brief description of key parameters:

    boxplot(formula, data = NULL, …, subset, na.action = NULL,
            xlab = mklab(y_var = horizontal),
            ylab = mklab(y_var = !horizontal),
            add = FALSE, ann = !add, horizontal = FALSE, drop = FALSE,
            sep = ".", lex.order = FALSE)

    Key Parameters

    • formula: A formula describing the relationship between the numeric and categorical variables.
    • data: A data frame or list containing the variables specified in the formula.
    • subset: An optional vector specifying a subset of observations.
    • na.action: A function indicating what should be done with missing values.
    • xlab, ylab: Labels for the x- and y-axes; can be suppressed with ann = FALSE.
    • add: Logical flag to add the boxplot to an existing plot.
    • horizontal: Logical flag; if TRUE, boxplots are drawn horizontally.
    • range: Determines how far the whiskers extend from the box.
    • width: A vector specifying the relative widths of the boxes.
    • varwidth: If TRUE, the widths of the boxes are proportional to the square roots of the number of observations in each group.
    • notch: If TRUE, a notch is drawn in each side of the boxes.
    • outline: If FALSE, outliers are not plotted.
    • names: Group labels displayed under each boxplot.
    • border: Colors for the outlines of the boxplots.
    • col: Colors for the bodies of the boxplots.
    • log: A character string indicating whether the x or y (or both) should be on a log scale.
    • pars: A list of additional graphical parameters.
    • at: Numeric vector specifying the locations for drawing the boxplots when adding to an existing plot.

    Example

    In this example, we will use the iris dataset to create a stratified boxplot that compares the petal lengths across the three iris species.

    # Load the iris dataset
    data(iris)
    
    # Create a stratified boxplot of Petal.Length by Species
    boxplot(Petal.Length ~ Species, data = iris,
            main = "Boxplot of Petal Length by Iris Species",
            xlab = "Iris Species",
            ylab = "Petal Length (cm)",
            col = c("lightblue", "lightgreen", "lightpink"),
            border = "darkblue")

    Output:

    Example:

    Below is an example where we compare lung capacity by gender rather than smoking status. In this revised example, we:

    1. Load the same dataset.
    2. Categorize ages into new groups.
    3. Create three boxplots:
      • Boxplot 1: Compares lung capacity between males and females.
      • Boxplot 2: Compares lung capacity between males and females for subjects aged 20 and above.
      • Boxplot 3: Displays stratified boxplots of lung capacity by gender within the defined age groups.
    # Load the dataset
    LungCapData <- read.csv("LungCapData.csv", header = TRUE)
    LungCapData <- data.frame(LungCapData)
    attach(LungCapData)
    
    # Categorise Age into groups with new breakpoints
    AgeGroups <- cut(LungCapData$Age,
                     breaks = c(0, 15, 20, 30),
                     labels = c("Under 15", "15-20", "Over 20"))
    
    # BoxPlot 1: Lung Capacity by Gender
    boxplot(LungCapData$LungCap ~ LungCapData$Gender,
            ylab = "Lung Capacity",
            main = "Lung Capacity: Males vs Females",
            col = c("skyblue", "salmon"),
            las = 1)
    
    # BoxPlot 2: Lung Capacity by Gender for subjects aged 20 and above
    boxplot(LungCapData$LungCap[LungCapData$Age >= 20] ~ LungCapData$Gender[LungCapData$Age >= 20],
            ylab = "Lung Capacity",
            main = "Lung Capacity (Age >= 20): Males vs Females",
            col = c("skyblue", "salmon"),
            las = 1)
    
    # BoxPlot 3: Stratified Lung Capacity by Gender across Age Groups
    boxplot(LungCapData$LungCap ~ LungCapData$Gender * AgeGroups,
            ylab = "Lung Capacity",
            xlab = "Gender and Age Group",
            main = "Stratified Lung Capacity by Gender and Age Groups",
            col = c("skyblue", "salmon"),
            las = 2)

    Output:

  • Stratified Boxplot in R Programming

    Stratified Boxplot in detail

    A boxplot is a graphical summary that represents groups of numerical data using their quartiles. Since boxplots are non-parametric, they display the variation in samples from a statistical population without assuming any specific underlying distribution. The spacing within the box indicates the degree of dispersion and skewness in the data, while also highlighting outliers. Boxplots can be oriented vertically or horizontally, and they get their name from the rectangular “box” in the center.

    Stratified boxplots are used to examine the relationship between a categorical variable and a numeric variable, or to compare multiple groups defined by an additional categorical variable. They are especially useful for comparing the distributions of a numeric variable across different categories.

    Implementation in R

    Stratified boxplots in R can be created using the boxplot() function from the R Graphics Package. Here is the syntax and a brief description of key parameters:

    boxplot(formula, data = NULL, …, subset, na.action = NULL,
            xlab = mklab(y_var = horizontal),
            ylab = mklab(y_var = !horizontal),
            add = FALSE, ann = !add, horizontal = FALSE, drop = FALSE,
            sep = ".", lex.order = FALSE)

    Key Parameters

    • formula: A formula describing the relationship between the numeric and categorical variables.
    • data: A data frame or list containing the variables specified in the formula.
    • subset: An optional vector specifying a subset of observations.
    • na.action: A function indicating what should be done with missing values.
    • xlab, ylab: Labels for the x- and y-axes; can be suppressed with ann = FALSE.
    • add: Logical flag to add the boxplot to an existing plot.
    • horizontal: Logical flag; if TRUE, boxplots are drawn horizontally.
    • range: Determines how far the whiskers extend from the box.
    • width: A vector specifying the relative widths of the boxes.
    • varwidth: If TRUE, the widths of the boxes are proportional to the square roots of the number of observations in each group.
    • notch: If TRUE, a notch is drawn in each side of the boxes.
    • outline: If FALSE, outliers are not plotted.
    • names: Group labels displayed under each boxplot.
    • border: Colors for the outlines of the boxplots.
    • col: Colors for the bodies of the boxplots.
    • log: A character string indicating whether the x or y (or both) should be on a log scale.
    • pars: A list of additional graphical parameters.
    • at: Numeric vector specifying the locations for drawing the boxplots when adding to an existing plot.

    Example

    In this example, we will use the iris dataset to create a stratified boxplot that compares the petal lengths across the three iris species.

    # Load the iris dataset
    data(iris)
    
    # Create a stratified boxplot of Petal.Length by Species
    boxplot(Petal.Length ~ Species, data = iris,
            main = "Boxplot of Petal Length by Iris Species",
            xlab = "Iris Species",
            ylab = "Petal Length (cm)",
            col = c("lightblue", "lightgreen", "lightpink"),
            border = "darkblue")

    Output:

    Example:

    Below is an example where we compare lung capacity by gender rather than smoking status. In this revised example, we:

    1. Load the same dataset.
    2. Categorize ages into new groups.
    3. Create three boxplots:
      • Boxplot 1: Compares lung capacity between males and females.
      • Boxplot 2: Compares lung capacity between males and females for subjects aged 20 and above.
      • Boxplot 3: Displays stratified boxplots of lung capacity by gender within the defined age groups.
    # Load the dataset
    LungCapData <- read.csv("LungCapData.csv", header = TRUE)
    LungCapData <- data.frame(LungCapData)
    attach(LungCapData)
    
    # Categorise Age into groups with new breakpoints
    AgeGroups <- cut(LungCapData$Age,
                     breaks = c(0, 15, 20, 30),
                     labels = c("Under 15", "15-20", "Over 20"))
    
    # BoxPlot 1: Lung Capacity by Gender
    boxplot(LungCapData$LungCap ~ LungCapData$Gender,
            ylab = "Lung Capacity",
            main = "Lung Capacity: Males vs Females",
            col = c("skyblue", "salmon"),
            las = 1)
    
    # BoxPlot 2: Lung Capacity by Gender for subjects aged 20 and above
    boxplot(LungCapData$LungCap[LungCapData$Age >= 20] ~ LungCapData$Gender[LungCapData$Age >= 20],
            ylab = "Lung Capacity",
            main = "Lung Capacity (Age >= 20): Males vs Females",
            col = c("skyblue", "salmon"),
            las = 1)
    
    # BoxPlot 3: Stratified Lung Capacity by Gender across Age Groups
    boxplot(LungCapData$LungCap ~ LungCapData$Gender * AgeGroups,
            ylab = "Lung Capacity",
            xlab = "Gender and Age Group",
            main = "Stratified Lung Capacity by Gender and Age Groups",
            col = c("skyblue", "salmon"),
            las = 2)

    Output:

  • Boxplots in R Language

    Boxplots in detail

    A box graph (or boxplot) is used to display the distribution of data based on a five-number summary: the minimum, first quartile, median, third quartile, and maximum. In R, you can create boxplots using the boxplot() function.

    Syntax:

    boxplot(x, data, notch, varwidth, names, main)
    • x: A vector or a formula.
    • data: A data frame containing the variables.
    • notch: Logical value indicating whether to display a notch (useful for comparing medians).
    • varwidth: Logical value; if TRUE, the box width is proportional to the square root of the sample size.
    • names: Group labels to be shown under each box.
    • main: The main title of the chart.
    Creating a Dataset Example

    For illustration, we’ll use the iris dataset. First, let’s inspect a few rows of the data focusing on Sepal.Length and Species:

    # Extract the relevant columns from the iris dataset
    input <- iris[, c("Sepal.Length", "Species")]
    head(input)

    Output:

    Sepal.Length Species
    1          5.1  setosa
    2          4.9  setosa
    3          4.7  setosa
    4          4.6  setosa
    5          5.0  setosa
    6          5.4  setosa
    Basic Boxplot

    Now, let’s create a simple boxplot to compare the sepal length across different species:

    # Load the iris dataset
    data(iris)
    
    # Create the boxplot for Sepal Length grouped by Species
    boxplot(Sepal.Length ~ Species, data = iris,
            main = "Sepal Length by Species",
            xlab = "Species",
            ylab = "Sepal Length")

    Output:

    Boxplot with Notch

    Notches can be added to boxplots to provide a rough guide for comparing medians between groups. Here’s how to create a notch boxplot with customized colors:

    # Load the iris dataset
    data(iris)
    
    # Define custom colors for the boxes
    custom_colors <- c("#FF6347", "#3CB371", "#4682B4")
    
    # Create the notch boxplot with custom aesthetics
    boxplot(Sepal.Length ~ Species, data = iris,
            main = "Sepal Length by Species",
            xlab = "Species",
            ylab = "Sepal Length",
            col = custom_colors, border = "black",
            notch = TRUE, notchwidth = 0.5,
            medcol = "white", whiskcol = "black",
            boxwex = 0.5, outpch = 19, outcol = "black")
    
    # Add a legend to the plot
    legend("topright", legend = unique(iris$Species),
           fill = custom_colors, border = "black", title = "Species")

    Output:

    Multiple Boxplots

    Let’s create multiple boxplots for different variables from the iris dataset. We will compare the distributions of Sepal.LengthSepal.WidthPetal.Length, and Petal.Width across the species. The plotting area will be divided into multiple panels.

    # Load the iris dataset
    data(iris)
    
    # List the variables for which we want to create boxplots
    variables <- c("Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width")
    
    # Set up the plotting layout: one row and one column per variable
    par(mfrow = c(1, length(variables)))
    
    # Create boxplots for each variable grouped by Species
    for (var in variables) {
      boxplot(get(var) ~ Species, data = iris,
              main = paste("Boxplot of", var),
              xlab = "Species",
              ylab = var,
              col = "lightblue", border = "black",
              notch = TRUE, notchwidth = 0.5,
              medcol = "white", whiskcol = "black",
              boxwex = 0.5, outpch = 19, outcol = "black")
    }
    
    # Reset the plotting layout to default
    par(mfrow = c(1, 1))

    Output:

  • Create Dot Charts in R Programming – dotchart () Function

    dotchart () Function in detail

    The dotchart() function in R is used to create a Cleveland dot plot, where individual data points are represented as dots. This type of plot is particularly useful for comparing a set of numerical values along an axis, with optional grouping and labeling for enhanced clarity.

    Syntax

    dotchart(x, labels = NULL, groups = NULL, gcolor = par("fg"), color = par("fg"), ...)
    • x: A numeric vector or matrix.
    • labels: A vector of labels for each point.
    • groups: A grouping variable to indicate how the elements of x are categorized.
    • gcolor: Color(s) used for group labels and group values.
    • color: Color(s) used for the individual points and their labels.

    Example 1: Dot Chart of a Single Numeric Vector

    This example demonstrates how to create a dot chart for a numeric vector representing exam scores, with each score labeled by a student’s name.

    # Example 1: Dot Chart for Exam Scores
    # Create a numeric vector for exam scores
    scores <- c(85, 90, 78, 92, 88, 75, 95)
    
    # Create labels for each student
    student_names <- c("Alice", "Bob", "Charlie", "David", "Eva", "Frank", "Grace")
    
    # Generate the dot chart
    dotchart(scores, labels = student_names,
             cex = 0.9, xlab = "Exam Score",
             main = "Dot Chart of Exam Scores")

    Output:

    Example 2: Dot Chart with Grouping

    In this example, a numeric vector representing monthly sales is grouped into two categories (e.g., “First Half” and “Second Half” of the year). Different colors are used to distinguish the groups.

    # Example 2: Dot Chart Grouped by Period
    # Create a numeric vector for monthly sales figures
    sales <- c(120, 150, 180, 130, 160, 170, 140, 190, 200, 155)
    
    # Create labels for each month
    months <- c("Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct")
    
    # Define a grouping variable: "First Half" for the first 5 months and "Second Half" for the rest
    group <- factor(ifelse(1:10 <= 5, "First Half", "Second Half"))
    
    # Define colors for the two groups
    group_colors <- c("purple", "teal")
    
    # Generate the dot chart with grouping
    dotchart(sales, labels = months, groups = group,
             gcolor = group_colors,
             color = group_colors[as.numeric(group)],
             cex = 0.9, pch = 21, xlab = "Monthly Sales",
             main = "Monthly Sales Dot Chart by Group")

    Output:

    Example 3: Dot Chart with Custom Labels and Titles

    This example shows how to create a dot chart for a series of measurements with custom point labels and axis titles.

    # Example 3: Dot Chart for Measurements
    # Create a numeric vector of measurement values
    measurements <- c(5.2, 7.8, 6.1, 8.5, 7.0, 6.8, 9.2)
    
    # Create custom labels for each measurement
    labels <- c("P1", "P2", "P3", "P4", "P5", "P6", "P7")
    
    # Generate the dot chart
    dotchart(measurements, labels = labels,
             main = "Dot Chart of Measurements",
             xlab = "Measurement Value", ylab = "Points",
             cex = 0.9)

    Output:

  • Scatter plots in R Language

    Scatter plots in detail

    A scatter plot is a set of dotted points representing individual data pieces on the horizontal and vertical axis. In a graph in which the values of two variables are plotted along the X-axis and Y-axis, the pattern of the resulting points reveals a correlation between them.

    R – Scatter Plots

    We can create a scatter plot in R Programming Language using the plot() function.

    Syntax:

    plot(x, y, main, xlab, ylab, xlim, ylim, axes)

    Parameters:

    • x: This parameter sets the horizontal coordinates.
    • y: This parameter sets the vertical coordinates.
    • xlab: This parameter is the label for the horizontal axis.
    • ylab: This parameter is the label for the vertical axis.
    • main: This parameter is the title of the chart.
    • xlim: This parameter is used for plotting values of x.
    • ylim: This parameter is used for plotting values of y.
    • axes: This parameter indicates whether both axes should be drawn on the plot.

    Simple Scatterplot Chart

    To create a Scatterplot Chart:

    • We use the dataset iris.
    • Use the columns Sepal.Length and Petal.Length in iris.

    Example:

    # Get the input values.
    input <- iris[, c('Sepal.Length', 'Petal.Length')]
    
    # Print the first few rows
    print(head(input))

    Output:

    Sepal.Length Petal.Length
    1          5.1          1.4
    2          4.9          1.4
    3          4.7          1.3
    4          4.6          1.5
    5          5.0          1.4
    6          5.4          1.7
    Creating a Scatterplot Graph

    To create an R Scatterplot graph:

    • We use the plot() function to generate the scatterplot.
    • The xlab parameter describes the X-axis and ylab describes the Y-axis.

    Example:

    # Get the input values.
    input <- iris[, c('Sepal.Length', 'Petal.Length')]
    
    # Plot the chart for Sepal.Length and Petal.Length.
    plot(x = input$Sepal.Length, y = input$Petal.Length,
        xlab = "Sepal Length",
        ylab = "Petal Length",
        xlim = c(4, 8),
        ylim = c(1, 7),
        main = "Sepal Length vs Petal Length"
    )

    Output:

    Scatterplot Matrices

    When we have two or more variables and we want to correlate between one variable and others, we use an R scatterplot matrix.

    The pairs() function is used to create matrices of scatterplots.

    Syntax:

    pairs(formula, data)

    Parameters:

    • formula: This parameter represents the series of variables used in pairs.
    • data: This parameter represents the dataset from which the variables will be taken.

    Example:

    # Load the built-in iris dataset
    data(iris)
    
    # Create the scatterplot matrix
    pairs(~Sepal.Length + Sepal.Width + Petal.Length + Petal.Width,
          data = iris,
          main = "Scatterplot Matrix")

    Output:

    Scatterplot with Fitted Values

    Creating a Scatterplot in R

    To create a scatterplot in R, we use the ggplot2 package, which provides the ggplot() and geom_point() functions for visualization.

    In this example, we use the mtcars dataset and plot the relationship between the logarithm of mpg (miles per gallon) and drat (rear axle ratio). The stat_smooth() function is used to add a fitted linear regression line.

    Example:

    # Loading ggplot2 package
    library(ggplot2)
    
    # Creating scatterplot with fitted values.
    ggplot(mtcars, aes(x = log(mpg), y = log(drat))) +
            geom_point(aes(color = factor(gear))) +
            stat_smooth(method = "lm", col = "#C42126", se = FALSE, size = 1)

    Output:

    Adding Titles Dynamically

    To enhance the visualization, we add a title, subtitle, and caption using the labs() function.

    Example:

    # Loading ggplot2 package
    library(ggplot2)
    
    # Creating scatterplot with fitted values.
    graph <- ggplot(mtcars, aes(x = log(mpg), y = log(drat))) +
              geom_point(aes(color = factor(gear))) +
              stat_smooth(method = "lm", col = "#C42126", se = FALSE, size = 1)
    
    # Adding title, subtitle, and caption
    graph + labs(
            title = "Relationship between Mileage and Drat",
            subtitle = "Data categorized by gear count",
            caption = "Computed using mtcars dataset"
    )

    Output:

    3D Scatterplots

    For 3D scatterplots, we use the plotly package, which enables interactive visualizations.

    Example:

    # Loading required library
    library(plotly)
    
    # Attaching mtcars dataset
    attach(mtcars)
    
    # Creating a 3D scatterplot
    plot_ly(data = mtcars, x = ~mpg, y = ~hp, z = ~cyl, color = ~gear)

    Output:

    Scatter Plot

    A scatter plot visually represents the relationship between two numerical variables. The x-axis represents one data vector, while the y-axis represents another.

    Syntax:

    plot(x, y, type, xlab, ylab, main)

    Parameters:

    • x: Data vector for the x-axis
    • y: Data vector for the y-axis
    • type: Type of plot (e.g., “l” for lines, “p” for points, “s” for steps)
    • xlab: Label for the x-axis
    • ylab: Label for the y-axis
    • main: Title of the graph
    help("plot")

    Example:

    # Creating a dataset
    data_set <- data.frame(Height = c(150, 160, 165, 170, 175, 180, 185, 190),
                           Weight = c(50, 55, 60, 65, 70, 75, 80, 85))
    
    # Output as a PNG file
    png(file = "scatterplot_output.png")
    
    # Creating the scatter plot
    plot(x = data_set$Height, y = data_set$Weight,
         xlab = "Height (cm)", ylab = "Weight (kg)",
         main = "Height vs. Weight", col = "red", pch = 19)
    
    # Saving the file
    dev.off()

    Output:

  • R – Pie Charts

    R – Pie Charts in detail

    A pie chart is a circular statistical graphic divided into slices to illustrate numerical proportions. Each sector (or slice) represents the relative sizes of data. It is also known as a circle graph, where a circular chart is cut into segments to describe relative frequencies or magnitudes.

    The R programming language provides the pie() function to create pie charts. It takes positive numbers as a vector input.

    Syntax:

    pie(x, labels, radius, main, col, clockwise)

    Parameters:

    • x: A vector containing numeric values used in the pie chart.
    • labels: Descriptions for the slices in the pie chart.
    • radius: Defines the radius of the circle (value between -1 and +1).
    • main: Title of the pie chart.
    • clockwise: Logical value indicating whether slices are drawn clockwise or counterclockwise.
    • col: Specifies colors for the pie slices.

    Creating a Simple Pie Chart

    By using the above parameters, we can create a basic pie chart with labels.

    Example:

    # Create data for the graph
    values <- c(30, 50, 40, 60)
    labels <- c("Apple", "Banana", "Grapes", "Mango")
    
    # Plot the chart
    pie(values, labels)

    Output:

    Pie Chart with Title and Colors

    We can enhance the pie chart by adding a title and colors using the col parameter.

    Example:

    # Create data for the graph
    values <- c(25, 45, 35, 55)
    labels <- c("New York", "London", "Tokyo", "Sydney")
    
    # Plot the chart with title and rainbow color palette
    pie(values, labels, main = "City Pie Chart",
        col = rainbow(length(values)))

    Output:

    Pie Chart with Color Palettes

    Using the RColorBrewer package to add colors to a pie chart.

    # Load necessary library
    library(RColorBrewer)
    
    # Create data for the graph
    sales <- c(40, 60, 30, 50)
    cities <- c("New York", "Los Angeles", "Chicago", "Houston")
    
    # Assign colors using brewer.pal
    colors <- brewer.pal(length(sales), "Set2")
    
    # Plot the pie chart
    pie(sales, labels = cities, col = colors)

    Output:

    Modify Border Line Type

    Using the lty argument to change the border style.

    # Load necessary library
    library(RColorBrewer)
    
    # Create data for the graph
    sales <- c(40, 60, 30, 50)
    cities <- c("New York", "Los Angeles", "Chicago", "Houston")
    
    # Assign colors using brewer.pal
    colors <- brewer.pal(length(sales), "Set2")
    
    # Plot the pie chart with modified border type
    pie(sales, labels = cities, col = colors, lty = 2)

    Output:

    Add Shading Lines

    Using the density and angle arguments to add shading.

    # Load necessary library
    library(RColorBrewer)
    
    # Create data for the graph
    sales <- c(40, 60, 30, 50)
    cities <- c("New York", "Los Angeles", "Chicago", "Houston")
    
    # Assign colors using brewer.pal
    colors <- brewer.pal(length(sales), "Set2")
    
    # Plot the pie chart with shading lines
    pie(sales, labels = cities, col = colors, density = 50, angle = 45)

    Output:

    3D Pie Chart

    Using the plotrix package to create a 3D pie chart.

    # Load necessary library
    library(plotrix)
    
    # Create data for the graph
    sales <- c(40, 60, 30, 50)
    cities <- c("New York", "Los Angeles", "Chicago", "Houston")
    
    # Calculate percentages
    sales_percent <- round(100 * sales / sum(sales), 1)
    
    # Plot the 3D pie chart
    pie3D(sales, labels = sales_percent,
          main = "Sales Distribution", col = rainbow(length(sales)))
    
    # Add a legend
    legend("topright", cities, cex = 0.5, fill = rainbow(length(sales)))

    Output:

  • Histograms in R language

    Histograms in detail

    A histogram is a graphical representation of statistical data that groups data points into specified ranges. The rectangular bars in a histogram represent frequencies, with their heights proportional to the frequency of values in each range. Unlike bar graphs, histograms do not have gaps between bars.

    Creating Histograms in R

    Histograms in R can be created using the hist() function.

    Syntax:

    hist(v, main, xlab, xlim, ylim, breaks, col, border)

    Parameters:

    • v: Numeric values used to create the histogram.
    • main: Title of the chart.
    • col: Color of the bars.
    • xlab: Label for the horizontal axis.
    • border: Color of the bar borders.
    • xlim: Range of values on the x-axis.
    • ylim: Range of values on the y-axis.
    • breaks: Defines the width of each bar.

    Example 1: Creating a Simple Histogram

    # Creating data for the graph
    values <- c(10, 25, 15, 8, 20, 18, 30, 12, 22, 28, 35)
    
    # Creating the histogram
    hist(values, xlab = "Frequency of Items",
         col = "blue", border = "black")

    Output:

    Example 2: Setting X and Y Ranges

    # Creating data for the graph
    values <- c(10, 25, 15, 8, 20, 18, 30, 12, 22, 28, 35)
    
    # Creating the histogram
    hist(values, xlab = "Frequency of Items", col = "blue",
        border = "black", xlim = c(0, 40),
        ylim = c(0, 5), breaks = 5)

    Output:

    Example 3: Adding Labels Using text()

    # Creating data for the graph
    values <- c(10, 25, 15, 8, 20, 18, 30, 12, 22, 28, 35, 110, 50, 80, 95)
    
    # Creating the histogram
    hist_data <- hist(values, xlab = "Weight", ylab = "Frequency",
                      col = "purple", border = "black",
                      breaks = 5)
    
    # Adding labels
    text(hist_data$mids, hist_data$counts, labels = hist_data$counts,
         adj = c(0.5, -0.5))

    Output:

    Example 4: Histogram with Non-Uniform Width

    # Creating data for the graph
    values <- c(10, 25, 15, 8, 20, 18, 30, 12, 22, 28, 35, 110, 50, 80, 95)
    
    # Creating the histogram
    hist(values, xlab = "Weight", ylab = "Frequency",
         xlim = c(10, 120),
        col = "purple", border = "black",
        breaks = c(5, 55, 60, 70, 75, 80, 100, 140))

    Output:

  • Addition of Lines to a Plot in R Programming – lines() Function

    lines() Function in detail

    The lines() function in R is used to add lines of different types, colors, and widths to an existing plot.

    Syntax:

    lines(x, y, col, lwd, lty)

    Parameters:

    • x, y: Vectors of coordinates
    • col: Color of the line
    • lwd: Width of the line
    • lty: Type of line

    Adding Lines to a Plot using lines() Function

    Example 1: Adding a Line to a Scatter Plot

    This example demonstrates how to create a scatter plot and add a line to it.

    # Creating coordinate vectors
    x <- c(2.1, 4.2, 1.5, -2.8, 6.3,
           3.1, 4.0, 2.8, 2.6, 2.2, 2.0, 2.8)
    y <- c(3.2, 6.5, 2.8, -2.5, 10.5, 4.8,
           5.9, 5.1, 3.9, 3.2, 3.4, 4.8)
    
    # Plotting the scatter plot
    plot(x, y, cex = 1, pch = 3, xlab = "X-axis",
         ylab = "Y-axis", col = "black")
    
    # Creating another set of coordinates for the line
    x2 <- c(3.5, 1.0, -1.8, 0.2)
    y2 <- c(4.0, 5.2, 3.0, 3.5)
    
    # Adding a red line to the plot
    lines(x2, y2, col = "red", lwd = 2, lty = 1)

    Output:

    Example 2: Connecting Points with lines()

    This example shows how to plot a scatter plot and connect the points using lines().

    # Creating coordinate vectors
    x <- c(2.1, 4.2, 1.5, -2.8, 6.3, 3.1,
           4.0, 2.8, 2.6, 2.2, 2.0, 2.8)
    y <- c(3.2, 6.5, 2.8, -2.5, 10.5, 4.8,
           5.9, 5.1, 3.9, 3.2, 3.4, 4.8)
    
    # Plotting the scatter plot
    plot(x, y, cex = 1, pch = 3, xlab = "X-axis",
         ylab = "Y-axis", col = "black")
    
    # Connecting points with a red line
    lines(x, y, col = "red")

    Output:

    Example: Adding Lines to a Plot in R using lines()

    # Create sample data
    x <- seq(-5, 5, length.out = 10)
    y <- x^3
    
    # Create a plot of the data
    plot(x, y, main = "Adding Lines to a Plot", col = "blue")
    
    # Add a vertical line at x = 0
    abline(v = 0, col = "green", lwd = 2)
    
    # Add a horizontal line at y = 0
    abline(h = 0, col = "purple", lwd = 2)
    
    # Add a diagonal line with slope -2 and intercept 3
    abline(a = 3, b = -2, col = "orange", lty = 2, lwd = 2)
    
    # Add a custom line using lines() function
    x2 <- seq(-5, 5, length.out = 10)
    y2 <- -x2^2 + 4
    lines(x2, y2, col = "red", lty = 2, lwd = 2)

    Output: