Author: Pooja Kotwani

  • Outer() Function in R

    Outer() Function in detail

    A flexible tool for working with matrices and vectors in R is the outer() function. It enables you to create a new matrix or array by applying a function to every possible combination of the elements from two input vectors. The outer() function in R is used to apply a function to two arrays.

    Syntax:

    outer(X, Y, FUN = "*")

    Parameters:

    • x, y: Arrays
    • FUN: Function to use on the outer products, default value is multiplication (*)

    The outer() function in R produces a matrix or array with dimensions corresponding to the outer product of X and Y. The function FUN is applied to the respective pair of elements from X and Y to generate each element of the result.

    Examples

    Example 1: Outer Product of Two Vectors

    # Initializing two arrays of elements
    x <- c(2, 4, 6, 8, 10)
    y <- c(3, 6, 9)
    
    # Multiplying elements of x with elements of y
    outer(x, y)

    Output:

    [,1] [,2] [,3]
    [1,]    6   12   18
    [2,]   12   24   36
    [3,]   18   36   54
    [4,]   24   48   72
    [5,]   30   60   90

    Example 2: Outer Function for a Vector and a Single Value

    # Initializing a vector and a single value
    a <- 1:7
    b <- 5
    
    # Adding elements of a with b
    outer(a, b, "+")

    Output:

    [,1]
    [1,]    6
    [2,]    7
    [3,]    8
    [4,]    9
    [5,]   10
    [6,]   11
    [7,]   12
    Types of outer() Functions

    Since the outer() function is general, you can define custom functions and use them with outer(). Below are some commonly used types:

    1. Arithmetic Functions

    The most common use of outer() is for performing arithmetic operations such as addition, subtraction, multiplication, and division on two vectors. The operators +-*/%%, and %/% can be applied.

    Example:

    x <- 2:4
    y <- 5:7
    outer(x, y, FUN = "-")

    Output:

    [,1] [,2] [,3]
    [1,]   -3   -4   -5
    [2,]   -2   -3   -4
    [3,]   -1   -2   -3

    2. Statistical Functions

    Statistical operations can also be applied using outer(). For example, suppose we want to find the product of two matrices.

    Example:

    # Creating two matrices
    A <- matrix(2:7, nrow = 2, ncol = 3)
    B <- matrix(1:4, nrow = 2, ncol = 2)
    
    # Multiplying the two matrices using the outer function
    outer(A, B, "*")

    Output:

    , , 1, 1
    
         [,1] [,2] [,3]
    [1,]    2    4    6
    [2,]    3    5    7
    
    , , 2, 1
    
         [,1] [,2] [,3]
    [1,]    4    8   12
    [2,]    6   10   14
    
    , , 1, 2
    
         [,1] [,2] [,3]
    [1,]    6   12   18
    [2,]    9   15   21
    
    , , 2, 2
    
         [,1] [,2] [,3]
    [1,]    8   16   24
    [2,]   12   20   28
  • Convert values of an Object to Logical Vector in R Programming – as.logical() Function

    as.logical() Function in details

    The as.logical() function in R is used to convert an object to a logical vector.

    Syntax:

    as.logical(x)

    Parameters:

    • x: Numeric or character object.

    Example 1: Basic Example of as.logical() Function in R

    # R Program to convert an object to a logical vector
    
    # Creating a vector
    x <- c(0, 1, 2, -3, 4, NA)
    
    # Calling as.logical() function
    print(as.logical(1))
    print(as.logical("FALSE"))
    print(as.logical(0))
    print(as.logical(x))

    Output:

    [1] TRUE
    [1] FALSE
    [1] FALSE
    [1] FALSE  TRUE  TRUE  TRUE  TRUE    NA

    Example 2: Converting Matrices with as.logical() Function in R

    # R Program to convert matrices to logical vectors
    
    # Creating matrices
    matrix1 <- matrix(c(0, 1, 3, 4), 2, 2)
    matrix2 <- matrix(c(0, 0, 1, -1), 2, 2)
    
    # Calling as.logical() function
    print(as.logical(matrix1))
    print(as.logical(matrix2))

    Output:

    [1] FALSE  TRUE  TRUE  TRUE
    [1] FALSE FALSE  TRUE  TRUE
  • Sorting of Arrays in R Programming

    Sorting of Arrays in details

    A vector is a one-dimensional array, defined by a single length dimension. A vector can be created using the c() function by passing a list of values. Sorting can be done in either ascending or descending order. Before sorting, certain factors should be considered:

    • Sorting order – Ascending or Descending.
    • Sorting based on multiple criteria – If sorting involves multiple columns, specify the order.
    • Handling missing and duplicate values – Decide whether to remove or replace them, considering the impact on the data.
    Method 1: sort() function

    The sort() function in R is used to sort a vector. By default, it sorts in increasing order. To sort in descending order, set the decreasing parameter to TRUE.

    Syntax:

    sort(vector_name, decreasing = TRUE)

    Parameters:

    • vector_name: The vector to be sorted.
    • decreasing: A Boolean value that determines whether sorting should be in descending order.

    Example 1: Sorting in Ascending Order

    # Create a vector
    numbers <- c(45, 12, 78, 23, 56, 89, 34)
    
    # Sort in ascending order
    sort(numbers)

    Output:

    [1] 12 23 34 45 56 78 89

    Example 2: Sorting in Descending Order

    # Sort in descending order
    sort(numbers, decreasing = TRUE)

    Output:

    [1] 89 78 56 45 34 23 12
    Method 2: order() function

    To sort data frames, the order() function is used. It sorts the data based on the specified column. To sort in descending order, use a negative sign. Sorting can also be done with multiple criteria. If two values in a column are the same, a secondary column can be used for sorting (e.g., sorting names alphabetically when ages are the same).

    Example: Sorting a Data Frame by Age

    # Define a data frame
    students <- data.frame("Age" = c(20, 18, 22, 25, 19),
                           "Name" = c("Aria", "Leo", "Sophia", "Daniel", "Mia"))
    
    # Sort the data frame based on the Age column
    sorted_students <- students[order(students$Age), ]
    
    # Print the sorted data frame
    print(sorted_students)

    Output:

    Age    Name
    2  18    Leo
    5  19    Mia
    1  20    Aria
    3  22    Sophia
    4  25    Daniel

    Example 1: Sorting a Vector in Decreasing Order

    # Define vector
    numbers <- c(35, 10, 50, 25, 5, 40)
    
    # Sort in decreasing order and return indices
    order(-numbers)

    Output:

    [1] 3 6 1 4 2 5

    Example 2: Sorting a Data Frame by Multiple Columns

    # Define dataframe
    students <- data.frame("Age" = c(14, 18, 14, 21, 18, 14),
                           "Name" = c("Liam", "Emma", "Noah",
                                      "Olivia", "Ava", "Sophia"))
    
    # Sort the dataframe first by Age, then by Name
    sorted_students <- students[order(students$Age, students$Name), ]
    
    # Print sorted dataframe
    print(sorted_students)

    Output:

    Age    Name
    6   14  Sophia
    3   14   Noah
    1   14   Liam
    5   18     Ava
    2   18   Emma
    4   21 Olivia

    Method 3: Sorting an Array Using a Loop

    # Create linear array
    arr <- c(8, 3, 7, 2, 6, 5, 4, 1)
    
    # Repeat until the array is sorted
    repeat
    {
        swapped <- FALSE
    
        # Iterate through the array
        for (i in 2:length(arr))
        {
            newArr <- arr
            if (arr[i - 1] > arr[i])
            {
                newArr[i - 1] <- arr[i]
                newArr[i] <- arr[i - 1]
                arr <- newArr
                swapped <- TRUE
            }
        }
    
        if (!swapped) {break}
    }
    
    # Print sorted array
    print(arr)

    Output:

    [1] 1 2 3 4 5 6 7 8

    Method 4: Using dplyr Package for Sorting

    # Install and load dplyr package
    install.packages("dplyr")
    library(dplyr)
    
    # Create dataframe
    employees <- data.frame("Age" = c(30, 45, 28, 35, 40),
                            "Name" = c("David", "Alice", "Ethan",
                                       "Olivia", "Sophia"))
    
    # Sort the dataframe by Age using arrange()
    sorted_employees <- arrange(employees, Age)
    
    # Print sorted dataframe
    print(sorted_employees)

    Output:

    Age    Name
    3   28   Ethan
    1   30   David
    4   35  Olivia
    5   40  Sophia
    2   45   Alice

  • Array Operations in R Programming

    Array Operations in detail

    Arrays are R data objects that store data in more than two dimensions. Arrays are n-dimensional data structures. For example, if we create an array of dimensions (2, 3, 3), it creates three rectangular matrices, each with two rows and three columns. They are homogeneous data structures.

    To create an array in R, use the function array(). The arguments to this function include a set of elements in vectors and a vector containing the dimensions of the array.

    Syntax:

    Array_NAME <- array(data, dim = (row_Size, column_Size, matrices, dimnames))

    where:

    • data – An input vector given to the array.
    • matrices – Consists of multi-dimensional matrices.
    • row_Size – Number of row elements that an array can store.
    • column_Size – Number of column elements that an array can store.
    • dimnames – Used to change the default names of rows and columns according to user preference.

    Example:

    # Create the vectors with different lengths
    vector1 <- c(5, 7, 9)
    vector2 <- c(12, 14, 16, 18, 20, 22)
    
    # Creating an array using these vectors
    result <- array(c(vector1, vector2), dim = c(3, 3, 2))
    print(result)
    Naming Columns and Rows

    We can assign names to the rows and columns using dimnames.

    Example:

    # Creating Vectors
    vector1 <- c(2, 4, 6)
    vector2 <- c(8, 10, 12, 14, 16, 18)
    
    # Assigning Names to rows and columns
    column.names <- c("A", "B", "C")
    row.names <- c("X", "Y", "Z")
    matrix.names <- c("Table1", "Table2")
    
    # Creating an array with named dimensions
    result <- array(c(vector1, vector2), dim = c(3, 3, 2),
                    dimnames = list(row.names, column.names, matrix.names))
    print(result)

    Output:

    , , 1
    
         [,1] [,2] [,3]
    [1,]    5   12   18
    [2,]    7   14   20
    [3,]    9   16   22
    
    , , 2
    
         [,1] [,2] [,3]
    [1,]    5   12   18
    [2,]    7   14   20
    [3,]    9   16   22
    Manipulating Array Elements

    An array consists of multiple dimensions, and operations can be performed by accessing elements.

    Example:

    # Creating vectors
    vector1 <- c(3, 6, 9)
    vector2 <- c(2, 4, 6, 8, 10, 12)
    array1 <- array(c(vector1, vector2), dim = c(3, 3, 2))
    
    # Creating another array
    vector3 <- c(1, 3, 5)
    vector4 <- c(7, 9, 11, 13, 15, 17)
    array2 <- array(c(vector3, vector4), dim = c(3, 3, 2))
    
    # Extracting matrices and adding them
    matrix1 <- array1[,,2]
    matrix2 <- array2[,,2]
    result <- matrix1 + matrix2
    print(result)

    Output:

    [,1] [,2] [,3]
    [1,]    9   13   17
    [2,]   11   15   19
    [3,]   13   19   23
    Accessing Array Elements in R

    Using index positions in a matrix, any element can be accessed easily. Additionally, elements in an array can be modified using index positions.

    Syntax:

    Array_Name[row_position, Column_Position, Matrix_Level]

    Example:

    # Creating Vectors
    vector1 <- c(5, 8, 2)
    vector2 <- c(14, 7, 9, 6, 11, 3)
    
    # Defining names
    column.names <- c("Col1", "Col2", "Col3")
    row.names <- c("Row1", "Row2", "Row3")
    matrix.names <- c("Matrix1", "Matrix2")
    
    # Creating an array
    result <- array(c(vector1, vector2), dim = c(3, 3, 2),
                    dimnames = list(row.names, column.names, matrix.names))
    
    print(result)
    
    # Print second row of first matrix
    print(result[2,,1])

    Output:

    , , Matrix1
    
         Col1 Col2 Col3
    Row1    5   14    6
    Row2    8    7   11
    Row3    2    9    3
    
    , , Matrix2
    
         Col1 Col2 Col3
    Row1    5   14    6
    Row2    8    7   11
    Row3    2    9    3
    
    Col1 Col2 Col3
       8    7   11
    Performing Calculations Across Array Elements

    The apply() function is used for performing calculations on array elements.

    Syntax:

    apply(x, margin, fun)
    • x – an array.
    • margin – dimension specification (1 for rows, 2 for columns).
    • fun – function to be applied to the elements of the array.

    Example:

    # Creating vectors
    vector1 <- c(4, 3, 7)
    vector2 <- c(1, 5, 6, 9, 2, 8)
    
    # Creating an array
    new_array <- array(c(vector1, vector2), dim = c(3, 3, 2))
    
    print(new_array)
    
    # Calculate sum of columns in matrices
    result <- apply(new_array, c(2), sum)
    
    print(result)

    Output:

    , , 1
    
         [,1] [,2] [,3]
    [1,]    4    1    9
    [2,]    3    5    2
    [3,]    7    6    8
    
    , , 2
    
         [,1] [,2] [,3]
    [1,]    4    1    9
    [2,]    3    5    2
    [3,]    7    6    8
    
    [1] 28 24 38

  • Multidimensional Array in R

    Multidimensional Array in detail

    Arrays in R are data objects that store data in more than two dimensions. For example, if we create an array of dimensions (2, 3, 4), it forms 4 rectangular matrices, each containing 2 rows and 3 columns. These types of arrays are called Multidimensional Arrays.

    Creating a Multidimensional Array

    An array is created using the array() function. It takes vectors as input and uses the values in the dim parameter to define the number of dimensions.

    Syntax:

    grep(pattern, text_vector, ignore.case=FALSE)MArray = array(c(vec1, vec2), dim)

    Example:

    # Create two vectors
    vector1 <- c(2, 4, 6)
    vector2 <- c(8, 10, 12, 14, 16, 18)
    
    # Create an array from these vectors
    result <- array(c(vector1, vector2), dim = c(3, 3, 2))
    
    # Print the array
    print(result)

    Output:

    , , 1
    
         [,1] [,2] [,3]
    [1,]    2    8   14
    [2,]    4   10   16
    [3,]    6   12   18
    
    , , 2
    
         [,1] [,2] [,3]
    [1,]    2    8   14
    [2,]    4   10   16
    [3,]    6   12   18
    Naming Columns and Rows

    We can assign names to rows, columns, and matrices in the array using the dimnames parameter.

    Example:

    # Create two vectors
    vector1 <- c(2, 4, 6)
    vector2 <- c(8, 10, 12, 14, 16, 18)
    
    # Define names for rows, columns, and matrices
    column.names <- c("Col_A", "Col_B", "Col_C")
    row.names <- c("Row_1", "Row_2", "Row_3")
    matrix.names <- c("Matrix_A", "Matrix_B")
    
    # Create an array with names
    result <- array(c(vector1, vector2), dim = c(3, 3, 2),
                    dimnames = list(row.names, column.names, matrix.names))
    
    # Print the array
    print(result)

    Output:

    , , Matrix_A
    
           Col_A Col_B Col_C
    Row_1     2     8    14
    Row_2     4    10    16
    Row_3     6    12    18
    
    , , Matrix_B
    
           Col_A Col_B Col_C
    Row_1     2     8    14
    Row_2     4    10    16
    Row_3     6    12    18
    Manipulating Array Elements

    Since arrays consist of matrices in multiple dimensions, operations can be performed by accessing individual matrix elements.

    Example:

    # Create first array
    vector1 <- c(2, 4, 6)
    vector2 <- c(8, 10, 12, 14, 16, 18)
    array1 <- array(c(vector1, vector2), dim = c(3, 3, 2))
    
    # Create second array
    vector3 <- c(1, 3, 5)
    vector4 <- c(7, 9, 11, 13, 15, 17)
    array2 <- array(c(vector3, vector4), dim = c(3, 3, 2))
    
    # Extract second matrices from both arrays
    matrix1 <- array1[,,2]
    matrix2 <- array2[,,2]
    
    # Add the matrices
    result <- matrix1 + matrix2
    print(result)

    Output:

    [,1] [,2] [,3]
    [1,]    9   21   27
    [2,]   17   25   31
    [3,]   23   33   35
  • Intoduction to Arrays

    R – Array

    Arrays are fundamental data storage structures defined with a specific number of dimensions. They are used to allocate space in contiguous memory locations.

    In R Programming, one-dimensional arrays are called vectors, where their single dimension is their length. Two-dimensional arrays are referred to as matrices, which consist of a defined number of rows and columns. Arrays in R hold elements of the same data type. Vectors serve as inputs to create arrays, specifying their dimensions.

    Creating an Array

    In R, arrays can be created using the array() function. The function takes a list of elements and dimensions as inputs to create the desired array.

    Syntax:

    array(data, dim = c(nrow, ncol, nmat), dimnames = names)

    Components:

    • nrow: Number of rows.
    • ncol: Number of columns.
    • nmat: Number of matrices with dimensions nrow * ncol.
    • dimnames: Defaults to NULL. Alternatively, a list can be provided containing names for each component of the array dimensions.
    Uni-Dimensional Array

    A vector, a one-dimensional array, has its length as its dimension. It can be created using the c() function.

    Example:

    vec <- c(10, 20, 30, 40, 50)
    print(vec)
    
    # Displaying the length of the vector
    cat("Length of the vector: ", length(vec))

    Output:

    [1] 10 20 30 40 50
    Length of the vector:  5
    Multi-Dimensional Array

    A matrix, or a two-dimensional array, is defined by rows and columns of the same data type. Matrices are created using the array() function.

    Example:

    # Create a matrix with values from 15 to 26
    mat <- array(15:26, dim = c(2, 3, 2))
    print(mat)

    Output:

    , , 1
         [,1] [,2] [,3]
    [1,]   15   17   19
    [2,]   16   18   20
    
    , , 2
         [,1] [,2] [,3]
    [1,]   21   23   25
    [2,]   22   24   26
    Naming Array Dimensions

    You can assign names to rows, columns, and matrices using vectors for better readability.

    Example:

    rows <- c("Row1", "Row2")
    columns <- c("Col1", "Col2", "Col3")
    matrices <- c("Matrix1", "Matrix2")
    
    named_array <- array(1:12, dim = c(2, 3, 2),
                         dimnames = list(rows, columns, matrices))
    print(named_array)

    Output:

    , , Matrix1
         Col1 Col2 Col3
    Row1    1    3    5
    Row2    2    4    6
    
    , , Matrix2
         Col1 Col2 Col3
    Row1    7    9   11
    Row2    8   10   12
    Accessing Arrays

    You can access elements of arrays using indices for each dimension. Names or positions can be used.

    Example:

    vec <- c(5, 10, 15, 20, 25)
    cat("Vector:", vec)
    cat("Second element:", vec[2])

    Output:

    Vector: 5 10 15 20 25
    Second element: 10
    Accessing Matrices in an Array

    Example:

    rows <- c("A", "B")
    columns <- c("X", "Y", "Z")
    matrices <- c("M1", "M2")
    
    multi_array <- array(1:12, dim = c(2, 3, 2),
                         dimnames = list(rows, columns, matrices))
    
    # Accessing first matrix
    print("Matrix M1")
    print(multi_array[, , "M1"])
    
    # Accessing second matrix by index
    print("Matrix 2")
    print(multi_array[, , 2])

    Output:

    Matrix M1
         X Y Z
    A    1 3 5
    B    2 4 6
    
    Matrix 2
         X Y Z
    A    7 9 11
    B    8 10 12
    Accessing Specific Rows and Columns

    Example:

    print("First row of Matrix 1")
    print(multi_array[1, , "M1"])
    
    print("Second column of Matrix 2")
    print(multi_array[, 2, 2])

    Output:

    First row of Matrix 1
    X Y Z
    1 3 5
    
    Second column of Matrix 2
    A 9
    B 10
    Modifying Arrays

    Adding Elements to Arrays: New elements can be added at specific positions or appended to the array.

    Example:

    vec <- c(1, 2, 3, 4)
    
    # Adding an element using c()
    vec <- c(vec, 5)
    print("After appending an element:")
    print(vec)
    
    # Using append() to add after the 2nd element
    vec <- append(vec, 10, after = 2)
    print("After using append:")
    print(vec)

    Output:

    After appending an element:
    [1] 1 2 3 4 5
    
    After using append:
    [1]  1  2 10  3  4  5

    Removing Elements

    Elements can be removed using logical conditions or indices.

    Example:

    vec <- c(1, 2, 3, 4, 5, 6)
    vec <- vec[vec != 4]  # Removing element with value 4
    print(vec)

    Output:

    [1] 1 2 3 5 6
  • Principal Component Analysis with R Programming

    Principal Component Analysis in detail

    Principal Component Analysis (PCA) is a technique used to analyze the linear components of all existing attributes in a dataset. Principal components are linear combinations (orthogonal transformations) of the original predictors in the dataset. PCA is widely used in Exploratory Data Analysis (EDA) as it helps in visualizing the variations present in high-dimensional data.

    Understanding PCA

    The first principal component captures the maximum variance in the dataset and determines the direction of the highest variability. The second principal component captures the remaining variance while being uncorrelated with the first component (PC1). This pattern continues with all succeeding principal components, ensuring that they capture the remaining variance without correlation with previous components.

    Dataset

    We will use the iris dataset, which is built into R. It contains measurements of sepal length, sepal width, petal length, and petal width for three different species of flowers.

    Installing Required Packages

    legend(x, y, legend, fill, col, bg, lty, cex, title, text.font)

    Loading the Package and Dataset

    library(dplyr)
    data(iris)
    str(iris)

    Output:

    ' data.frame': 150 obs. of  5 variables:
    $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 ...
    $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 ...
    $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 ...
    $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 ...
    $ Species     : Factor w/ 3 levels "setosa","versicolor", "virginica" ...
    Principal Component Analysis with R language using dataset

    We perform Principal Component Analysis (PCA) on the mtcars dataset, which includes 32 car models and 10 variables.

    # Load dataset
    data(iris)
    
    # Remove non-numeric column
    iris_numeric <- iris[, -5]
    
    # Apply PCA using prcomp function
    my_pca <- prcomp(iris_numeric, scale = TRUE, center = TRUE, retx = TRUE)
    
    # View summary
    summary(my_pca)
    
    # View principal component loadings
    my_pca$rotation
    
    # View transformed principal components
    dim(my_pca$x)
    my_pca$x
    
    # Plot the resultant principal components
    biplot(my_pca, main = "Biplot", scale = 0)
    
    # Compute variance and proportion of variance explained
    my_pca.var <- my_pca$sdev^2
    propve <- my_pca.var / sum(my_pca.var)
    
    # Scree plot
    plot(propve, xlab = "Principal Component",
         ylab = "Proportion of Variance Explained",
         ylim = c(0, 1), type = "b", main = "Scree Plot")
    
    # Cumulative variance plot
    plot(cumsum(propve),
         xlab = "Principal Component",
         ylab = "Cumulative Proportion of Variance Explained",
         ylim = c(0, 1), type = "b")
    
    # Find the number of components covering at least 90% variance
    which(cumsum(propve) >= 0.9)[1]
    
    # Prepare data for Decision Tree
    train.data <- data.frame(Sepal.Length = iris$Sepal.Length, my_pca$x[, 1:4])
    
    # Install and load decision tree packages
    install.packages("rpart")
    install.packages("rpart.plot")
    library(rpart)
    library(rpart.plot)
    
    # Build Decision Tree model
    rpart.model <- rpart(Sepal.Length ~ ., data = train.data, method = "anova")
    
    # Plot the Decision Tree
    rpart.plot(rpart.model)

    Output:

    Variance explained for each principal component

    Cumulative proportion of variance

    Decision tree model

  • Describe Parts of a Chart in Graphical Form in R Programming – legend() Function

    legend() Function in detail

    The legend() function in R is used to add legends to an existing plot. A legend is an area within the graph plot that describes the elements of the plot. The legend helps visualize statistical data effectively.

    Syntax:

    legend(x, y, legend, fill, col, bg, lty, cex, title, text.font)

    Parameters:

    • x and y: Coordinates used to position the legend.
    • legend: Text for the legend.
    • fill: Colors used for filling the boxes in the legend.
    • col: Colors of lines.
    • bg: Background color for the legend box.
    • title: Optional title for the legend.
    • text.font: Integer specifying the font style of the legend (optional).

    Returns:

    A legend added to the plot.

    Example 1: Basic Usage of legend()

    # Generate some data
    x <- 1:10
    y1 <- x * x
    y2 <- 2 * y1
    
    # Create a plot with two lines
    plot(x, y1, type = "b", pch = 19, col = "blue", xlab = "X", ylab = "Y")
    lines(x, y2, pch = 22, col = "red", type = "b", lty = 6)
    
    # Add a basic legend
    legend("topright", legend = c("Line A", "Line B"), col = c("blue", "red"), lty = 1:2)

    Output:

    Example 2: Adding Title, Font, and Background Color to Legend

    makePlot <- function(){
      x <- 1:10
      y1 <- x * x
      y2 <- 2 * y1
      plot(x, y1, type = "b", pch = 19, col = "blue", xlab = "X", ylab = "Y")
      lines(x, y2, pch = 22, col = "red", type = "b", lty = 6)
    }
    makePlot()
    
    # Add a legend with customization
    legend(1, 95, legend = c("Curve A", "Curve B"), col = c("blue", "red"), lty = 1:2, cex = 0.9,
           title = "Graph Types", text.font = 6, bg = "lightgray")

    Output:

    Example 3: Modifying Legend Box Border

    makePlot <- function(){
      x <- 1:10
      y1 <- x * x
      y2 <- 2 * y1
      plot(x, y1, type = "b", pch = 22, col = "blue", xlab = "X", ylab = "Y")
      lines(x, y2, pch = 18, col = "red", type = "b", lty = 4)
    }
    
    # Change the border of the legend
    makePlot()
    legend(1, 100, legend = c("Curve A", "Curve B"), col = c("blue", "red"), lty = 1:2, cex = 0.8,
           box.lty = 4, box.lwd = 2, box.col = "blue")

    Output:

    Example 4: Removing the Legend Border

    makePlot <- function(){
      x <- 1:10
      y1 <- x * x
      y2 <- 2 * y1
      plot(x, y1, type = "b", pch = 22, col = "blue", xlab = "X", ylab = "Y")
      lines(x, y2, pch = 18, col = "red", type = "b", lty = 4)
    }
    
    # Remove legend border using box.lty = 0
    makePlot()
    legend(2, 100, legend = c("Curve A", "Curve B"), col = c("blue", "red"), lty = 1:2, cex = 0.8, box.lty = 0)

    Output:

    Example 5: Creating a Horizontal Legend with Different Symbols

    makePlot()
    
    # Add a horizontal legend with different symbols
    legend("bottom", legend = c("Curve A", "Curve B"), col = c("blue", "red"), lty = 1:2, pch = c(19, 22), horiz = TRUE)

    Output:

  • Draw a Quantile-Quantile Plot in R Programming

    Draw a Quantile-Quantile Plot in detail

    A Quantile-Quantile (Q-Q) plot is a graphical tool used to compare two probability distributions by plotting their quantiles against each other. Often, it is employed to compare the distribution of observed data with a theoretical distribution (for example, the normal distribution).

    When to Use Q-Q Plots in R

    Q-Q plots are useful in statistical analysis to:

    • Assess Normality: Check if a dataset follows a normal distribution, which is a common assumption in many tests.
    • Detect Skewness or Kurtosis: Identify whether data have heavy tails or skewed shapes.
    • Compare Distributions: Evaluate if two datasets originate from the same distribution.
    Setting Up R for Q-Q Plotting

    Before creating Q-Q plots, ensure that the necessary packages are installed and loaded. While base R provides built-in Q-Q plotting functions, the ggplot2 package allows for enhanced customization.

    # Install and load ggplot2 if needed
    install.packages("ggplot2")
    library(ggplot2)
    1. Creating a Basic Q-Q Plot Using Base R

    The qqnorm() function in base R is a straightforward method to produce a Q-Q plot against the normal distribution.

    Example

    # Generate sample data from a normal distribution (150 values)
    normal_data <- rnorm(150, mean = 0, sd = 1)
    
    # Create a Q-Q plot using base R
    qqnorm(normal_data, main = "Normal Q-Q Plot (Base R)")
    qqline(normal_data, col = "blue")  # Adds a reference line

    Output:

    2. Creating a Basic Q-Q Plot Using ggplot2

    For a more refined and customizable plot, ggplot2 offers an excellent alternative.

    Example

    # Create a data frame containing the sample data
    df <- data.frame(value = normal_data)
    
    # Generate a Q-Q plot using ggplot2
    ggplot(df, aes(sample = value)) +
      stat_qq(color = "darkgreen") +
      stat_qq_line(color = "blue") +
      theme_classic() +
      ggtitle("Normal Q-Q Plot using ggplot2")

    Output:

    3. Q-Q Plots for Other Distributions

    A. Exponential Distribution

    To compare your data with an exponential distribution, you can generate the theoretical quantiles using qexp() and then create a Q-Q plot with qqplot().

    Example

    # Generate sample data from an exponential distribution (120 values, rate = 1)
    exp_data <- rexp(120, rate = 1)
    
    # Create a Q-Q plot comparing the exponential sample to theoretical quantiles
    qqplot(qexp(ppoints(120), rate = 1), exp_data,
           main = "Exponential Q-Q Plot",
           xlab = "Theoretical Quantiles",
           ylab = "Sample Quantiles")
    abline(0, 1, col = "blue")  # Reference line for comparison

    Output:

    B. t-Distribution

    Similarly, to check if data follow a t-distribution, use the qt() function for generating theoretical quantiles.

    Example

    # Generate sample data from a t-distribution (150 values, df = 7)
    t_data <- rt(150, df = 7)
    
    # Create a Q-Q plot comparing the t-distributed sample to theoretical quantiles
    qqplot(qt(ppoints(150), df = 7), t_data,
           main = "t-Distribution Q-Q Plot",
           xlab = "Theoretical Quantiles",
           ylab = "Sample Quantiles")
    abline(0, 1, col = "red")  # Adds the equality line

    Output:

  • R – Waffle Chart

    Waffle Chart in detail

    A waffle chart provides a clear visual representation of how individual components contribute to a whole. It’s particularly useful for tracking progress toward a goal or for showing parts-to-whole relationships. Unlike pie charts, waffle charts use a grid of equal-sized squares to represent data without distorting the proportions.

    Implementation in R

    We’ll use ggplot2 for its versatile and elegant plotting capabilities along with the waffle package, an extension that simplifies the creation of waffle charts.

    Installing Required Packages

    To install the necessary packages in R Studio, run:

    install.packages("ggplot2")
    install.packages("waffle")

    Loading the Libraries

    Load the libraries with:

    library(ggplot2)
    library(waffle)

    Example: Company Expense Breakdown

    Suppose a company has a total expenditure of $100,000, divided into the following categories:

    • Salaries: $40,000
    • Marketing: $20,000
    • Operations: $15,000
    • Research & Development: $10,000
    • Miscellaneous: $15,000

    We can represent this data as a vector in R:

    Since we want each square to represent $1,000, dividing each value by 1,000 will give us exactly 100 squares (40 + 20 + 15 + 10 + 15).

    Plotting the Waffle Chart

    Use the following code to create the waffle chart:

    waffle(expenses/1000, rows = 5, size = 0.6,
           colors = c("#1f77b4", "#ff7f0e", "#2ca02c", "#d62728", "#9467bd"),
           title = "Company Expense Breakdown",
           xlab = "1 square = $1,000")

    Output: