Blog

  • R – Pie Charts

    R – Pie Charts in detail

    A pie chart is a circular statistical graphic divided into slices to illustrate numerical proportions. Each sector (or slice) represents the relative sizes of data. It is also known as a circle graph, where a circular chart is cut into segments to describe relative frequencies or magnitudes.

    The R programming language provides the pie() function to create pie charts. It takes positive numbers as a vector input.

    Syntax:

    pie(x, labels, radius, main, col, clockwise)

    Parameters:

    • x: A vector containing numeric values used in the pie chart.
    • labels: Descriptions for the slices in the pie chart.
    • radius: Defines the radius of the circle (value between -1 and +1).
    • main: Title of the pie chart.
    • clockwise: Logical value indicating whether slices are drawn clockwise or counterclockwise.
    • col: Specifies colors for the pie slices.

    Creating a Simple Pie Chart

    By using the above parameters, we can create a basic pie chart with labels.

    Example:

    # Create data for the graph
    values <- c(30, 50, 40, 60)
    labels <- c("Apple", "Banana", "Grapes", "Mango")
    
    # Plot the chart
    pie(values, labels)

    Output:

    Pie Chart with Title and Colors

    We can enhance the pie chart by adding a title and colors using the col parameter.

    Example:

    # Create data for the graph
    values <- c(25, 45, 35, 55)
    labels <- c("New York", "London", "Tokyo", "Sydney")
    
    # Plot the chart with title and rainbow color palette
    pie(values, labels, main = "City Pie Chart",
        col = rainbow(length(values)))

    Output:

    Pie Chart with Color Palettes

    Using the RColorBrewer package to add colors to a pie chart.

    # Load necessary library
    library(RColorBrewer)
    
    # Create data for the graph
    sales <- c(40, 60, 30, 50)
    cities <- c("New York", "Los Angeles", "Chicago", "Houston")
    
    # Assign colors using brewer.pal
    colors <- brewer.pal(length(sales), "Set2")
    
    # Plot the pie chart
    pie(sales, labels = cities, col = colors)

    Output:

    Modify Border Line Type

    Using the lty argument to change the border style.

    # Load necessary library
    library(RColorBrewer)
    
    # Create data for the graph
    sales <- c(40, 60, 30, 50)
    cities <- c("New York", "Los Angeles", "Chicago", "Houston")
    
    # Assign colors using brewer.pal
    colors <- brewer.pal(length(sales), "Set2")
    
    # Plot the pie chart with modified border type
    pie(sales, labels = cities, col = colors, lty = 2)

    Output:

    Add Shading Lines

    Using the density and angle arguments to add shading.

    # Load necessary library
    library(RColorBrewer)
    
    # Create data for the graph
    sales <- c(40, 60, 30, 50)
    cities <- c("New York", "Los Angeles", "Chicago", "Houston")
    
    # Assign colors using brewer.pal
    colors <- brewer.pal(length(sales), "Set2")
    
    # Plot the pie chart with shading lines
    pie(sales, labels = cities, col = colors, density = 50, angle = 45)

    Output:

    3D Pie Chart

    Using the plotrix package to create a 3D pie chart.

    # Load necessary library
    library(plotrix)
    
    # Create data for the graph
    sales <- c(40, 60, 30, 50)
    cities <- c("New York", "Los Angeles", "Chicago", "Houston")
    
    # Calculate percentages
    sales_percent <- round(100 * sales / sum(sales), 1)
    
    # Plot the 3D pie chart
    pie3D(sales, labels = sales_percent,
          main = "Sales Distribution", col = rainbow(length(sales)))
    
    # Add a legend
    legend("topright", cities, cex = 0.5, fill = rainbow(length(sales)))

    Output:

  • Histograms in R language

    Histograms in detail

    A histogram is a graphical representation of statistical data that groups data points into specified ranges. The rectangular bars in a histogram represent frequencies, with their heights proportional to the frequency of values in each range. Unlike bar graphs, histograms do not have gaps between bars.

    Creating Histograms in R

    Histograms in R can be created using the hist() function.

    Syntax:

    hist(v, main, xlab, xlim, ylim, breaks, col, border)

    Parameters:

    • v: Numeric values used to create the histogram.
    • main: Title of the chart.
    • col: Color of the bars.
    • xlab: Label for the horizontal axis.
    • border: Color of the bar borders.
    • xlim: Range of values on the x-axis.
    • ylim: Range of values on the y-axis.
    • breaks: Defines the width of each bar.

    Example 1: Creating a Simple Histogram

    # Creating data for the graph
    values <- c(10, 25, 15, 8, 20, 18, 30, 12, 22, 28, 35)
    
    # Creating the histogram
    hist(values, xlab = "Frequency of Items",
         col = "blue", border = "black")

    Output:

    Example 2: Setting X and Y Ranges

    # Creating data for the graph
    values <- c(10, 25, 15, 8, 20, 18, 30, 12, 22, 28, 35)
    
    # Creating the histogram
    hist(values, xlab = "Frequency of Items", col = "blue",
        border = "black", xlim = c(0, 40),
        ylim = c(0, 5), breaks = 5)

    Output:

    Example 3: Adding Labels Using text()

    # Creating data for the graph
    values <- c(10, 25, 15, 8, 20, 18, 30, 12, 22, 28, 35, 110, 50, 80, 95)
    
    # Creating the histogram
    hist_data <- hist(values, xlab = "Weight", ylab = "Frequency",
                      col = "purple", border = "black",
                      breaks = 5)
    
    # Adding labels
    text(hist_data$mids, hist_data$counts, labels = hist_data$counts,
         adj = c(0.5, -0.5))

    Output:

    Example 4: Histogram with Non-Uniform Width

    # Creating data for the graph
    values <- c(10, 25, 15, 8, 20, 18, 30, 12, 22, 28, 35, 110, 50, 80, 95)
    
    # Creating the histogram
    hist(values, xlab = "Weight", ylab = "Frequency",
         xlim = c(10, 120),
        col = "purple", border = "black",
        breaks = c(5, 55, 60, 70, 75, 80, 100, 140))

    Output:

  • Addition of Lines to a Plot in R Programming – lines() Function

    lines() Function in detail

    The lines() function in R is used to add lines of different types, colors, and widths to an existing plot.

    Syntax:

    lines(x, y, col, lwd, lty)

    Parameters:

    • x, y: Vectors of coordinates
    • col: Color of the line
    • lwd: Width of the line
    • lty: Type of line

    Adding Lines to a Plot using lines() Function

    Example 1: Adding a Line to a Scatter Plot

    This example demonstrates how to create a scatter plot and add a line to it.

    # Creating coordinate vectors
    x <- c(2.1, 4.2, 1.5, -2.8, 6.3,
           3.1, 4.0, 2.8, 2.6, 2.2, 2.0, 2.8)
    y <- c(3.2, 6.5, 2.8, -2.5, 10.5, 4.8,
           5.9, 5.1, 3.9, 3.2, 3.4, 4.8)
    
    # Plotting the scatter plot
    plot(x, y, cex = 1, pch = 3, xlab = "X-axis",
         ylab = "Y-axis", col = "black")
    
    # Creating another set of coordinates for the line
    x2 <- c(3.5, 1.0, -1.8, 0.2)
    y2 <- c(4.0, 5.2, 3.0, 3.5)
    
    # Adding a red line to the plot
    lines(x2, y2, col = "red", lwd = 2, lty = 1)

    Output:

    Example 2: Connecting Points with lines()

    This example shows how to plot a scatter plot and connect the points using lines().

    # Creating coordinate vectors
    x <- c(2.1, 4.2, 1.5, -2.8, 6.3, 3.1,
           4.0, 2.8, 2.6, 2.2, 2.0, 2.8)
    y <- c(3.2, 6.5, 2.8, -2.5, 10.5, 4.8,
           5.9, 5.1, 3.9, 3.2, 3.4, 4.8)
    
    # Plotting the scatter plot
    plot(x, y, cex = 1, pch = 3, xlab = "X-axis",
         ylab = "Y-axis", col = "black")
    
    # Connecting points with a red line
    lines(x, y, col = "red")

    Output:

    Example: Adding Lines to a Plot in R using lines()

    # Create sample data
    x <- seq(-5, 5, length.out = 10)
    y <- x^3
    
    # Create a plot of the data
    plot(x, y, main = "Adding Lines to a Plot", col = "blue")
    
    # Add a vertical line at x = 0
    abline(v = 0, col = "green", lwd = 2)
    
    # Add a horizontal line at y = 0
    abline(h = 0, col = "purple", lwd = 2)
    
    # Add a diagonal line with slope -2 and intercept 3
    abline(a = 3, b = -2, col = "orange", lty = 2, lwd = 2)
    
    # Add a custom line using lines() function
    x2 <- seq(-5, 5, length.out = 10)
    y2 <- -x2^2 + 4
    lines(x2, y2, col = "red", lty = 2, lwd = 2)

    Output:

  • Adding Straight Lines to a Plot in R Programming – abline() Function

    abline() Function in detail

    The abline() function in R is used to add one or more straight lines to a graph. It can be used to add vertical, horizontal, or regression lines to a plot.

    Syntax:

    abline(a=NULL, b=NULL, h=NULL, v=NULL, ...)

    Parameters:

    • a, b: Specifies the intercept and the slope of the line.
    • h: Specifies y-value(s) for horizontal line(s).
    • v: Specifies x-value(s) for vertical line(s).

    Returns:

    A straight line in the plot.

    Example 1: Adding a Vertical Line to the Plot

    # Create scatter plot
    plot(pressure)
    
    # Add vertical line at x = 200
    abline(v = 200, col = "blue")

    Output:

    Example 2: Adding a Horizontal Line to the Plot

    # Create scatter plot
    plot(pressure)
    
    # Add horizontal line at y = 300
    abline(h = 300, col = "red")

    Output:

    Example 3: Adding a Regression Line

    par(mgp = c(2, 1, 0), mar = c(3, 3, 1, 1))
    
    # Fit regression line
    reg <- lm(pressure ~ temperature, data = pressure)
    coeff = coefficients(reg)
    
    # Equation of the line
    eq = paste0("y = ", round(coeff[1], 1), " + ", round(coeff[2], 1), "*x")
    
    # Plot
    plot(pressure, main = eq)
    abline(reg, col = "darkgreen")

    Output:

  • R – Line Graphs

    R – Line Graphs in detail

    line graph is a chart used to display information in the form of a series of data points. It utilizes points and lines to represent changes over time. Line graphs are created by plotting different points on their X and Y coordinates and joining them with a line from beginning to end. The graph represents different values that may move up and down based on the suitable variable.

    Creating Line Graphs in R

    The plot() function in R is used to create line graphs.

    Syntax:

    plot(v, type, col, xlab, ylab)

    Bar Plot (Bar Chart)

    bar plot in R represents values in a data vector as the height of bars. The data vector is mapped on the y-axis, and categories can be labeled on the x-axis. Bar charts can also resemble histograms when using the table() function instead of a data vector.

    Syntax:

    plot(v, type, col, xlab, ylab)

    Parameters:

    • v: A numeric vector representing the data points.
    • type: Specifies the type of graph:
      • "p" : Draws only points.
      • "l" : Draws only lines.
      • "o" : Draws both points and lines.
    • xlab: Label for the X-axis.
    • ylab: Label for the Y-axis.
    • main: Title of the chart.
    • col: Specifies colors for the points and lines.

    Example 1: Creating a Simple Line Graph

    This example creates a simple line graph using the type = "o" parameter to show both points and lines.

    Code:

    # Create the data for the chart.
    sales <- c(10, 15, 22, 18, 30)
    
    # Plot the line graph.
    plot(sales, type = "o")

    Output:

    Example 2: Adding Title, Color, and Labels in a Line Graph

    To enhance readability, we can add a title, axis labels, and color to the graph.

    Code:

    # Create the data for the chart.
    sales <- c(10, 15, 22, 18, 30)
    
    # Plot the line graph with title and labels.
    plot(sales, type = "o", col = "blue",
        xlab = "Month", ylab = "Sales (in units)",
        main = "Monthly Sales Chart")

    Output:

    To compare multiple datasets, we can plot multiple lines on the same graph using the lines() function.

    Code:

    # Defining a vector with counts of different fruits
    counts <- c(120, 300, 150, 80, 45, 95)
    
    # Defining labels for each segment
    names(counts) <- c("Apples", "Bananas", "Oranges", "Grapes", "Mangoes", "Pineapples")
    
    # Output to be saved as PNG file
    png(file = "piechart.png")
    
    # Creating pie chart
    pie(counts, labels = names(counts), col = "lightblue",
        main = "Fruit Distribution", radius = -1,
        col.main = "black")
    
    # Saving the file
    dev.off()

    Output:

  • Data visualization with R and ggplot2

    Data visualization with ggplot2 in detail

    Data visualization with R and ggplot2, also known as the Grammar of Graphics, is a free, open-source, and user-friendly visualization package widely utilized in the R programming language. Created by Hadley Wickham, it is one of the most powerful tools for data visualization.

    Key Layers of ggplot2

    The ggplot2 package operates on several layers, which include:

    1. Data: The dataset used for visualization.
    2. Aesthetics: Mapping data attributes to visual properties such as x-axis, y-axis, color, fill, size, labels, alpha, shape, line width, and line type.
    3. Geometric Objects: How data is represented visually, such as points, lines, histograms, bars, or boxplots.
    4. Facets: Splitting data into subsets displayed in separate panels using rows or columns.
    5. Statistics: Applying transformations like binning, smoothing, or descriptive summaries.
    6. Coordinates: Mapping data points to specific spaces (e.g., Cartesian, fixed, polar) and adjusting limits.
    7. Themes: Customizing non-data elements like font size, background, and color.
    Dataset Used: mtcars

    The mtcars dataset contains fuel consumption and 10 other automobile design and performance attributes for 32 cars. It comes pre-installed with the R environment.

    Viewing the First Few Records

    # Print the first 6 records of the dataset
    head(mtcars)

    Output:

    mpg	cyl	disp	hp	drat	wt	qsec	vs	am	gear	carb
    Mazda RX4	21.0	6	160	110	3.90	2.620	16.46	0	1	4	4
    Mazda RX4 Wag	21.0	6	160	110	3.90	2.875	17.02	0	1	4	4
    Datsun 710	22.8	4	108	93	3.85	2.320	18.61	1	1	4	1
    Hornet 4 Drive	21.4	6	258	110	3.08	3.215	19.44	1	0	3	1
    Hornet Sportabout	18.7	8	360	175	3.15	3.440	17.02	0	0	3	2
    Valiant	18.1	6	225	105	2.76	3.460	20.22	1	0	3	1

    Summary Statistics of mtcars

    # Load dplyr package and get a summary of the dataset
    library(dplyr)
    
    # Summary of the dataset
    summary(mtcars)

    Output:

    VariableMin1st QuartileMedianMean3rd QuartileMax
    mpg10.415.4319.2020.0922.8033.90
    cyl4.04.06.06.198.08.0
    disp71.1120.8196.3230.7326.0472.0
    hp52.096.5123.0146.7180.0335.0
    drat2.763.083.703.603.924.93
    wt1.512.583.323.223.615.42
    qsec14.516.8917.7117.8518.9022.90
    vs0.00.00.00.441.01.0
    am0.00.00.00.411.01.0
    gear3.03.04.03.694.05.0
    carb1.02.02.02.814.08.0
    Visualizing Data with ggplot2

    Data Layer: The data layer specifies the dataset to visualize.

    # Load ggplot2 and define the data layer
    library(ggplot2)
    
    ggplot(data = mtcars) +
      labs(title = "Visualization of MTCars Data")

    Output:

    Aesthetic Layer: Mapping data to visual attributes such as axes, color, or shape.

    # Add aesthetics
    ggplot(data = mtcars, aes(x = hp, y = mpg, col = disp)) +
      labs(title = "Horsepower vs Miles per Gallon")

    Output:

    Geometric Layer: Adding geometric shapes to display the data.

    # Plot data using points
    plot1 <- ggplot(data = mtcars, aes(x = hp, y = mpg, col = disp)) +
      geom_point() +
      labs(title = "Horsepower vs Miles per Gallon", x = "Horsepower", y = "Miles per Gallon")

    Output:

    Faceting: Create separate plots for subsets of data.

    # Facet by transmission type
    facet_plot <- ggplot(data = mtcars, aes(x = hp, y = mpg, shape = factor(cyl))) +
    geom_point()
    facet_grid()}

    Output:

    Statistics Layer: The statistics layer in ggplot2 allows you to transform your data by applying methods like binning, smoothing, or descriptive statistics.

    # Scatter plot with a regression line
    ggplot(data = mtcars, aes(x = hp, y = mpg)) +
      geom_point() +
      stat_smooth(method = lm, col = "blue") +
      labs(title = "Relationship Between Horsepower and Miles per Gallon")

    Output:

    Coordinates Layer: In this layer, data coordinates are mapped to the plot’s visual space. Adjustments to axes, zooming, and proportional scaling of the plot can also be made here.

    # Scatter plot with controlled axis limits
    ggplot(data = mtcars, aes(x = wt, y = mpg)) +
      geom_point() +
      stat_smooth(method = lm, col = "green") +
      scale_y_continuous("Miles per Gallon", limits = c(5, 35), expand = c(0, 0)) +
      scale_x_continuous("Weight", limits = c(1, 6), expand = c(0, 0)) +
      coord_equal() +
      labs(title = "Effect of Weight on Fuel Efficiency")

    Output:

    Using coord_cartesian() to Zoom In

    # Zoom into specific x-axis and y-axis ranges
    ggplot(data = mtcars, aes(x = wt, y = hp, col = as.factor(am))) +
      geom_point() +
      geom_smooth() +
      coord_cartesian(xlim = c(3, 5), ylim = c(100, 300)) +
      labs(title = "Zoomed View: Horsepower vs Weight",
           x = "Weight",
           y = "Horsepower",
           color = "Transmission")

    Output:

    Theme Layer: The theme layer in ggplot2 allows fine control over display elements like background color, font size, and overall styling.

    Example 1: Customizing the Background with element_rect()

    ggplot(data = mtcars, aes(x = hp, y = mpg)) +
    geom_point() +
    facet_grid(. ~ cyl) +
    theme(plot.background = element_rect(fill = "lightgray", colour = "black")) +
    labs(title = "Background Customization: Horsepower vs MPG")

    Output:

    Example 2: Using theme_gray()

    ggplot(data = mtcars, aes(x = hp, y = mpg)) +
    geom_point() +
    facet_grid(am ~ cyl) +
    theme_gray() +
    labs(title = "Default Theme: Horsepower and MPG Facets")

    Output:

    Contour Plot for the mtcars Dataset: Create a density contour plot to visualize the relationship between two continuous variables.

    # 2D density contour plot
    ggplot(mtcars, aes(x = wt, y = mpg)) +
      stat_density_2d(aes(fill = ..level..), geom = "polygon", color = "black") +
      scale_fill_viridis_c() +
      labs(title = "2D Density Contour: Weight vs MPG",
           x = "Weight",
           y = "Miles per Gallon",
           fill = "Density Levels") +
      theme_minimal()

    Output:

    Creating a Panel of Plots: Create multiple plots and arrange them in a grid for side-by-side visualization.

    library(gridExtra)
    
    # Histograms for selected variables
    hist_plot_mpg <- ggplot(mtcars, aes(x = mpg)) +
      geom_histogram(binwidth = 2, fill = "steelblue", color = "black") +
      labs(title = "Miles per Gallon Distribution", x = "MPG", y = "Frequency")
    
    hist_plot_disp <- ggplot(mtcars, aes(x = disp)) +
      geom_histogram(binwidth = 50, fill = "darkred", color = "black") +
      labs(title = "Displacement Distribution", x = "Displacement", y = "Frequency")
    
    hist_plot_hp <- ggplot(mtcars, aes(x = hp)) +
      geom_histogram(binwidth = 20, fill = "forestgreen", color = "black") +
      labs(title = "Horsepower Distribution", x = "Horsepower", y = "Frequency")
    
    hist_plot_drat <- ggplot(mtcars, aes(x = drat)) +
      geom_histogram(binwidth = 0.5, fill = "orange", color = "black") +
      labs(title = "Drat Distribution", x = "Drat", y = "Frequency")
    
    # Arrange plots in a 2x2 grid
    grid.arrange(hist_plot_mpg, hist_plot_disp, hist_plot_hp, hist_plot_drat, ncol = 2)

    Output:

    Saving and Extracting Plots

    To save plots as image files or reuse them later:

    # Create a plot
    plot <- ggplot(data = mtcars, aes(x = hp, y = mpg)) +
      geom_point() +
      labs(title = "Horsepower vs MPG")
    
    # Save the plot as PNG
    ggsave("horsepower_vs_mpg.png", plot)
    
    # Save the plot as PDF
    ggsave("horsepower_vs_mpg.pdf", plot)
    
    # Extract the plot for reuse
    extracted_plot <- plot
    plot

    Output:

  • Data Visualization in R Programming

    Introduction to Data Visualization

    Data Visualization is the process of converting raw data into visual representations such as graphs, charts, and plots so that information can be understood quickly and clearly. Humans understand visuals far more efficiently than tables of numbers, which makes visualization a critical step in data analysis.

    In R, data visualization is one of the strongest features because R was originally designed for statistical analysis and graphical modeling. Visualization is not only used to present final results, but also to explore data, identify trends, patterns, anomalies, and relationships before applying models.

    Why Data Visualization is Important

    • Simplifies complex datasets
    • Reveals hidden patterns and trends
    • Helps detect outliers and errors
    • Improves communication of results
    • Supports decision-making

    Graph Plotting in R

    What is Graph Plotting?

    Graph plotting refers to creating visual representations of data values using graphical elements such as points, lines, bars, or shapes. In R, graph plotting is mainly done using:

    • Base R graphics
    • Advanced systems like ggplot2, lattice

    Base R graphics are foundational and widely used for learning concepts.


    Generic Plotting System in R

    R uses a generic plotting system, where the same function behaves differently based on the data type.

    The most important generic function is:

    plot()
    

    The plot() function automatically determines:

    • Type of plot
    • Axis scaling
    • Labels (if available)

    This behavior is called method dispatch.


    Using the plot() Function

    Basic Syntax

    plot(x, y)
    

    Example

    x <- c(1, 2, 3, 4, 5)
    y <- c(2, 4, 6, 8, 10)
    
    plot(x, y)
    

    This produces a scatter plot, showing the relationship between x and y.


    Types of Plots Using plot()

    Scatter Plot

    Used to analyze relationships between two numerical variables.

    plot(x, y, type = "p")
    

    Line Plot

    Used to show trends over time or ordered data.

    plot(x, y, type = "l")
    

    Combined Points and Lines

    plot(x, y, type = "b")
    

    Vertical Line Plot

    plot(x, y, type = "h")
    

    Graphical Models in R

    Introduction to Graphical Models

    Graphical models in R are visual representations of statistical data and relationships. They are used to:

    • Understand data distribution
    • Visualize correlations
    • Validate statistical assumptions
    • Analyze model performance

    Graphical models include:

    • Scatter plots
    • Histograms
    • Boxplots
    • Regression plots
    • Residual plots

    Example: Visualizing a Relationship

    plot(mtcars$wt, mtcars$mpg)
    

    This graph shows how car weight affects mileage, a common statistical analysis.


    Charts and Graphs in R

    Common Chart Types

    Chart TypePurpose
    Line graphTrends over time
    Bar chartCategory comparison
    HistogramDistribution
    Scatter plotRelationship
    BoxplotSpread and outliers

    Choosing the correct chart is crucial to avoid misleading interpretation.


    Adding Titles to a Graph

    Main Title

    The main title describes what the graph represents.

    plot(x, y, main = "Relationship Between X and Y")
    

    Axis Labels

    Axis labels explain what each axis represents.

    plot(x, y,
         main = "Sales Growth",
         xlab = "Months",
         ylab = "Revenue")
    

    Clear labels are essential for readability.


    Adding Colors to Charts

    Importance of Colors

    Colors:

    • Improve readability
    • Highlight differences
    • Separate categories
    • Make graphs visually appealing

    Using col Argument

    plot(x, y, col = "blue")
    

    Using Multiple Colors

    plot(x, y, col = c("red", "green", "blue", "orange", "black"))
    

    Each point gets a different color.


    Color in Bar Charts

    barplot(scores, col = "skyblue")
    

    Adding Text to Plots

    Using text()

    Used to label data points.

    plot(x, y)
    text(x, y, labels = y, pos = 3)
    
    • pos controls label position
    • Helps annotate important values

    Using mtext()

    Adds text in margins.

    mtext("Data Source: Survey", side = 1, line = 3)
    

    Adding Axis to a Plot

    Default Axes

    R automatically generates axes based on data range.


    Custom Axes

    Disable default axes:

    plot(x, y, xaxt = "n", yaxt = "n")
    

    Add custom axes:

    axis(1, at = 1:5)
    axis(2, at = seq(0, 10, 2))
    box()
    

    Custom axes provide better control.


    Axis Limits

    Set axis limits manually:

    plot(x, y, xlim = c(0, 6), ylim = c(0, 12))
    

    Graphics Palette in R

    What is a Graphics Palette?

    A graphics palette defines the set of colors used when multiple colors are needed automatically.


    View Current Palette

    palette()
    

    Set a Custom Palette

    palette(c("red", "blue", "green", "orange"))
    

    Reset:

    palette("default")
    

    Plotting Data Using Generic Plots

    Plotting a Single Vector

    v <- c(5, 10, 15, 20)
    plot(v)
    

    R plots index vs value.


    Plotting Two Vectors

    plot(x, y)
    

    Plotting Data Frames

    plot(mtcars)
    

    This creates multiple pairwise plots.


    Bar Charts in R

    Introduction to Bar Charts

    A bar chart displays data using rectangular bars. The length of each bar represents the value of a category.

    Bar charts are ideal for:

    • Comparing categories
    • Displaying frequency counts
    • Showing grouped data

    Creating a Simple Bar Chart

    scores <- c(80, 90, 75)
    names(scores) <- c("Math", "Science", "English")
    
    barplot(scores)
    

    Adding Titles and Labels

    barplot(scores,
            main = "Student Performance",
            xlab = "Subjects",
            ylab = "Marks",
            col = "lightblue")
    

    Horizontal Bar Chart

    barplot(scores, horiz = TRUE)
    

    Grouped Bar Chart

    data <- matrix(c(80, 85, 90, 88), nrow = 2)
    
    barplot(data,
            beside = TRUE,
            col = c("red", "blue"),
            legend.text = TRUE)
    

    Stacked Bar Chart

    barplot(data,
            col = c("orange", "green"),
            legend.text = TRUE)
    

    Adding Values on Bars

    bp <- barplot(scores)
    text(bp, scores, labels = scores, pos = 3)
    

    Common Mistakes in Visualization

    • Missing titles or labels
    • Overuse of colors
    • Incorrect chart type
    • Misleading scales
    • Overcrowded graphs

    Summary

    Data visualization in R is a powerful tool for exploring and communicating data. Base R graphics provide flexible and customizable plotting options. Understanding titles, colors, axes, text annotations, palettes, and bar charts ensures clear, accurate, and effective visual communication.

  • Manipulate R Data Frames Using SQL

    R Data Frames Using SQL in detail

    The sqldf package in R enables seamless manipulation of data frames using SQL commands. It provides an efficient way to work with structured data and can be used to interact with a limited range of databases. Instead of using table names as in traditional SQL, sqldf allows you to specify data frame names, making it easy to execute queries within R.

    Key Operations of sqldf

    When executing an SQL statement on a data frame using sqldf, the following steps occur:

    • A temporary database is created with an appropriate schema.
    • The data frames are automatically loaded into this database.
    • The SQL query is executed.
    • The resulting output is returned as a new data frame in R.
    • The temporary database is automatically deleted after execution.

    This approach optimizes calculations and improves efficiency by leveraging SQL operations.

    install.packages("sqldf")
    library(sqldf)
    Loading Sample Data

    For demonstration, we use two CSV files:

    • accidents.csv: Contains Year, Highway, Crash_Count, and Traffic.
    • routes.csv: Contains Highway, Region, and Distance.

    Set the working directory and load the data:

    setwd("C:/Users/User/Documents/R")
    accidents <- read.csv("accidents.csv")
    routes <- read.csv("routes.csv")
    
    head(accidents)
    tail(accidents)
    print(routes)
    Sample Output:

    accidents.csv Data:

    Year      Highway   Crash_Count Traffic
    1 2000 Highway-101        30     50000
    2 2001 Highway-101        35     52000
    3 2002 Highway-101        40     54000

    routes.csv Data:

    Highway      Region    Distance
    1 Highway-101  North Zone      200
    2 Highway-405  South Zone      150
    SQL Operations with sqldf

    1. Performing a Left Join

    library(tcltk)
    join_query <- "SELECT accidents.*, routes.Region, routes.Distance
                  FROM accidents
                  LEFT JOIN routes ON accidents.Highway = routes.Highway"
    
    accidents_routes <- sqldf(join_query, stringsAsFactors = FALSE)
    head(accidents_routes)
    tail(accidents_routes)

    Sample Output:

    Year     Highway   Crash_Count Traffic    Region    Distance
    1 2000 Highway-101        30     50000 North Zone       200
    2 2001 Highway-101        35     52000 North Zone       200
    3 2002 Highway-101        40     54000 North Zone       200

    2. Performing an Inner Join

    inner_query <- "SELECT accidents.*, routes.Region, routes.Distance
                    FROM accidents
                    INNER JOIN routes ON accidents.Highway = routes.Highway"
    
    accidents_routes_inner <- sqldf(inner_query, stringsAsFactors = FALSE)
    head(accidents_routes_inner)
    tail(accidents_routes_inner)

    Sample Output:

    Year     Highway   Crash_Count Traffic    Region    Distance
    1 2000 Highway-101        30     50000 North Zone       200
    2 2001 Highway-101        35     52000 North Zone       200

    3. Using merge() for Joining Data Frames

    The merge() function in R allows for various types of joins, including full outer joins and right joins.

    accidents_merge_routes <- merge(accidents, routes, by = "Highway", all.x = TRUE)
    head(accidents_merge_routes)
    tail(accidents_merge_routes)

    Sample Output:

    Highway Year Crash_Count Traffic    Region    Distance
    1 Highway-101 2000        30     50000 North Zone       200
    2 Highway-101 2001        35     52000 North Zone       200

    4. Filtering Data Using WHERE Clause

    filter_query <- "SELECT * FROM accidents
                    WHERE Highway = 'Highway-405'"
    
    filtered_data <- sqldf(filter_query, stringsAsFactors = FALSE)
    head(filtered_data)

    Sample Output:

    Year      Highway  Crash_Count Traffic
    1 2000 Highway-405         50    60000
    2 2001 Highway-405         55    62000

    5. Using Aggregate Functions

    The GROUP BY clause helps perform aggregate calculations.

    aggregate_query <- "SELECT Highway, AVG(Crash_Count) AS Avg_Crashes
                        FROM accidents
                        GROUP BY Highway"
    
    sqldf(aggregate_query)

    Sample Output:

    Highway    Avg_Crashes
    1 Highway-101        35.5
    2 Highway-405        52.5

    6. Using plyr for Advanced Aggregation

    For more advanced calculations, the plyr package is useful.

    library(plyr)
    ddply(accidents_merge_routes, .(Highway), function(X) {
      data.frame(
        Avg_Crashes = mean(X$Crash_Count),
        Q1_Crashes = quantile(X$Crash_Count, 0.25),
        Q3_Crashes = quantile(X$Crash_Count, 0.75),
        Median_Crashes = median(X$Crash_Count)
      )
    })

    Output:

    Highway  Avg_Crashes  Q1_Crashes  Q3_Crashes  Median_Crashes
    1 Highway-101      35.5       32.5       38.5           35
    2 Highway-405      52.5       50.5       54.5           52.5
  • Database Connectivity with R Programming

    Import Data from a File in detail

    database is a structured collection of organized data that allows easy access, storage, and management. It can be handled using a Database Management System (DBMS), which is specialized software for managing databases efficiently. A database contains related and structured data that can be stored and retrieved when needed.

    A database primarily supports data storage, retrieval, and manipulation through various sublanguages:

    1. Data Definition Language (DDL)
    2. Data Query Language (DQL)
    3. Data Manipulation Language (DML)
    4. Data Control Language (DCL)
    5. Transaction Control Language (TCL)
    Step 1: Install MySQL

    To begin, download and install MySQL from its official website:

    Once installed, create a new database in MySQL using the following command:

    CREATE DATABASE studentDB;

    Step 2: Install R Studio

    To write and execute R scripts, install RStudio from:

    CREATE DATABASE studentDB;

    Step 3: Install MySQL Library in R

    In RStudio, install the MySQL package with the command:

    install.packages("RMySQL")

    Now, execute the following R script to connect MySQL with R:

    # Load the RMySQL library
    library(RMySQL)
    
    # Establish a connection to MySQL database
    mysql_connection = dbConnect(MySQL(),
                                 user = 'root',
                                 password = 'root',
                                 dbname = 'studentDB',
                                 host = 'localhost')
    
    # List available tables in the database
    dbListTables(mysql_connection)
    
    # Creating a table in MySQL database
    dbSendQuery(mysql_connection, "CREATE TABLE students (id INT, name VARCHAR(20));")
    
    # Inserting records into the table
    dbSendQuery(mysql_connection, "INSERT INTO students VALUES (201, 'Rahul');")
    dbSendQuery(mysql_connection, "INSERT INTO students VALUES (202, 'Neha');")
    dbSendQuery(mysql_connection, "INSERT INTO students VALUES (203, 'Ankit');")
    
    # Retrieving records from the table
    query_result = dbSendQuery(mysql_connection, "SELECT * FROM students")
    
    # Storing result in an R data frame
    data_frame = fetch(query_result)
    
    # Displaying the data frame
    print(data_frame)

    Output:

    id   name
    1 201  Rahul
    2 202  Neha
    3 203  Ankit
  • Working with Databases in R Programming

    Working with Databases in detail

    In R, working with datasets is a crucial aspect of statistical analysis and visualization. Instead of manually creating datasets in the console each time, we can retrieve structured and normalized data directly from relational databases such as MySQL, Oracle, and SQL Server. This integration allows for seamless data manipulation and visualization within R.

    This guide focuses on MySQL connectivity in R, covering database connection, table creation, deletion, data insertion, updating, and querying.

    RMySQL Package

    R provides the RMySQL package to facilitate communication between R and MySQL databases. This package needs to be installed and loaded before connecting to MySQL.

    Installation

    install.packages("RMySQL")
    Establishing Connection to MySQL

    To connect to MySQL, the dbConnect() function is used, which requires a database driver along with authentication credentials such as username, password, database name, and host details.

    Syntax:

    dbConnect(drv, user, password, dbname, host)

    Parameters

    • drv – Specifies the database driver
    • user – MySQL username
    • password – Corresponding password
    • dbname – Name of the database
    • host – Server hosting the database

    Example: Connecting to MySQL Database

    # Load necessary library
    library("RMySQL")
    
    # Establish connection
    conn <- dbConnect(MySQL(), user = 'admin', password = 'mypassword',
                      dbname = 'SampleDB', host = 'localhost')
    
    # Display available tables
    dbListTables(conn)

    Output:

    Loading required package: DBI
    [1] "employees"
    Creating a Table in MySQL Using R

    A table can be created in MySQL from R using the dbWriteTable() function. If the table already exists, this function will replace it.

    Syntax

    dbWriteTable(conn, name, value)

    Parameters

    • conn – Connection object
    • name – Name of the MySQL table
    • value – Dataframe to be converted into a MySQL table

    Example: Creating a Table

    # Establish connection
    conn <- dbConnect(MySQL(), user = 'admin', password = 'mypassword',
                      dbname = 'SampleDB', host = 'localhost')
    
    # Create new table with selected data
    dbWriteTable(conn, "iris_table", iris[1:10, ], overwrite = TRUE)

    Output:

    [1] TRUE
    Deleting a Table in MySQL Using R

    To perform various database operations, the dbSendQuery() function can be used to execute SQL queries directly in MySQL from R.

    Syntax:

    dbSendQuery(conn, statement)
    Importing Data from a Delimited File

    The read.delim() function is used to import delimited files, where values are separated by specific symbols such as |$, or ,.

    Syntax:

    read.delim("file.txt", sep="|", header=TRUE)

    Parameters

    • conn – Connection object
    • statement – SQL command to be executed

    Example: Dropping a Table

    # Establish connection
    conn <- dbConnect(MySQL(), user = 'admin', password = 'mypassword',
                      dbname = 'SampleDB', host = 'localhost')
    
    # Drop existing table
    dbSendQuery(conn, 'DROP TABLE iris_table')

    Output:

    <MySQLResult:9845732, 3, 4>
    Inserting Data into MySQL Table Using R

    Data can be inserted into a MySQL table from R using SQL INSERT INTO queries.

    Example: Inserting Data

    # Establish connection
    conn <- dbConnect(MySQL(), user = 'admin', password = 'mypassword',
                      dbname = 'SampleDB', host = 'localhost')
    
    # Insert new record into employees table
    dbSendQuery(conn, "INSERT INTO employees(id, name) VALUES (1, 'John Doe')")

    Output:

    <MySQLResult:9845732, 3, 5>
    Updating Data in a MySQL Table Using R

    An existing record in the table can be modified using the UPDATE query.

    Example: Updating a Table

    # Establish connection
    conn <- dbConnect(MySQL(), user = 'admin', password = 'mypassword',
                      dbname = 'SampleDB', host = 'localhost')
    
    # Update a record in employees table
    dbSendQuery(conn, "UPDATE employees SET name = 'Jane Doe' WHERE id = 1")

    Output:

    <MySQLResult:-1, 3, 6>
    Retrieving Data from MySQL Using R

    To fetch data from MySQL, the dbSendQuery() function is used to send a SQL SELECT statement. The retrieved data can be stored in a dataframe using the fetch() function.

    Example:

    # Establish connection
    conn <- dbConnect(MySQL(), user = 'admin', password = 'mypassword',
                      dbname = 'SampleDB', host = 'localhost')
    
    # Fetch records from employees table
    res <- dbSendQuery(conn, "SELECT * FROM employees")
    
    # Retrieve first 3 rows as dataframe
    df <- fetch(res, n = 3)
    print(df)

    Output:

    id      name
    1  1  John Doe
    2  2  Alice Ray
    3  3  Mark Smith