Blog

R – Pie Charts

R – Pie Charts in detail

A pie chart is a circular statistical graphic divided into slices to illustrate numerical proportions. Each sector (or slice) represents the relative sizes of data. It is also known as a circle graph, where a circular chart is cut into segments to describe relative frequencies or magnitudes.

The R programming language provides the pie() function to create pie charts. It takes positive numbers as a vector input.

Syntax:

pie(x, labels, radius, main, col, clockwise)

Parameters:

x: A vector containing numeric values used in the pie chart.
labels: Descriptions for the slices in the pie chart.
radius: Defines the radius of the circle (value between -1 and +1).
main: Title of the pie chart.
clockwise: Logical value indicating whether slices are drawn clockwise or counterclockwise.
col: Specifies colors for the pie slices.

Creating a Simple Pie Chart

By using the above parameters, we can create a basic pie chart with labels.

Example:

# Create data for the graph
values <- c(30, 50, 40, 60)
labels <- c("Apple", "Banana", "Grapes", "Mango")

# Plot the chart
pie(values, labels)

Output:

Pie Chart with Title and Colors

We can enhance the pie chart by adding a title and colors using the col parameter.

Example:

# Create data for the graph
values <- c(25, 45, 35, 55)
labels <- c("New York", "London", "Tokyo", "Sydney")

# Plot the chart with title and rainbow color palette
pie(values, labels, main = "City Pie Chart",
    col = rainbow(length(values)))

Output:

Pie Chart with Color Palettes

Using the RColorBrewer package to add colors to a pie chart.

# Load necessary library
library(RColorBrewer)

# Create data for the graph
sales <- c(40, 60, 30, 50)
cities <- c("New York", "Los Angeles", "Chicago", "Houston")

# Assign colors using brewer.pal
colors <- brewer.pal(length(sales), "Set2")

# Plot the pie chart
pie(sales, labels = cities, col = colors)

Output:

Modify Border Line Type

Using the lty argument to change the border style.

# Load necessary library
library(RColorBrewer)

# Create data for the graph
sales <- c(40, 60, 30, 50)
cities <- c("New York", "Los Angeles", "Chicago", "Houston")

# Assign colors using brewer.pal
colors <- brewer.pal(length(sales), "Set2")

# Plot the pie chart with modified border type
pie(sales, labels = cities, col = colors, lty = 2)

Output:

Add Shading Lines

Using the density and angle arguments to add shading.

# Load necessary library
library(RColorBrewer)

# Create data for the graph
sales <- c(40, 60, 30, 50)
cities <- c("New York", "Los Angeles", "Chicago", "Houston")

# Assign colors using brewer.pal
colors <- brewer.pal(length(sales), "Set2")

# Plot the pie chart with shading lines
pie(sales, labels = cities, col = colors, density = 50, angle = 45)

Output:

3D Pie Chart

Using the plotrix package to create a 3D pie chart.

# Load necessary library
library(plotrix)

# Create data for the graph
sales <- c(40, 60, 30, 50)
cities <- c("New York", "Los Angeles", "Chicago", "Houston")

# Calculate percentages
sales_percent <- round(100 * sales / sum(sales), 1)

# Plot the 3D pie chart
pie3D(sales, labels = sales_percent,
      main = "Sales Distribution", col = rainbow(length(sales)))

# Add a legend
legend("topright", cities, cex = 0.5, fill = rainbow(length(sales)))

Output:

December 13, 2025

Histograms in R language

Histograms in detail

A histogram is a graphical representation of statistical data that groups data points into specified ranges. The rectangular bars in a histogram represent frequencies, with their heights proportional to the frequency of values in each range. Unlike bar graphs, histograms do not have gaps between bars.

Creating Histograms in R

Histograms in R can be created using the hist() function.

Syntax:

hist(v, main, xlab, xlim, ylim, breaks, col, border)

Parameters:

v: Numeric values used to create the histogram.
main: Title of the chart.
col: Color of the bars.
xlab: Label for the horizontal axis.
border: Color of the bar borders.
xlim: Range of values on the x-axis.
ylim: Range of values on the y-axis.
breaks: Defines the width of each bar.

Example 1: Creating a Simple Histogram

# Creating data for the graph
values <- c(10, 25, 15, 8, 20, 18, 30, 12, 22, 28, 35)

# Creating the histogram
hist(values, xlab = "Frequency of Items",
     col = "blue", border = "black")

Output:

Example 2: Setting X and Y Ranges

# Creating data for the graph
values <- c(10, 25, 15, 8, 20, 18, 30, 12, 22, 28, 35)

# Creating the histogram
hist(values, xlab = "Frequency of Items", col = "blue",
    border = "black", xlim = c(0, 40),
    ylim = c(0, 5), breaks = 5)

Output:

Example 3: Adding Labels Using text()

# Creating data for the graph
values <- c(10, 25, 15, 8, 20, 18, 30, 12, 22, 28, 35, 110, 50, 80, 95)

# Creating the histogram
hist_data <- hist(values, xlab = "Weight", ylab = "Frequency",
                  col = "purple", border = "black",
                  breaks = 5)

# Adding labels
text(hist_data$mids, hist_data$counts, labels = hist_data$counts,
     adj = c(0.5, -0.5))

Output:

Example 4: Histogram with Non-Uniform Width

# Creating data for the graph
values <- c(10, 25, 15, 8, 20, 18, 30, 12, 22, 28, 35, 110, 50, 80, 95)

# Creating the histogram
hist(values, xlab = "Weight", ylab = "Frequency",
     xlim = c(10, 120),
    col = "purple", border = "black",
    breaks = c(5, 55, 60, 70, 75, 80, 100, 140))

Output:

December 13, 2025

Addition of Lines to a Plot in R Programming – lines() Function

lines() Function in detail

The lines() function in R is used to add lines of different types, colors, and widths to an existing plot.

Syntax:

lines(x, y, col, lwd, lty)

Parameters:

x, y: Vectors of coordinates
col: Color of the line
lwd: Width of the line
lty: Type of line

Adding Lines to a Plot using lines() Function

Example 1: Adding a Line to a Scatter Plot

This example demonstrates how to create a scatter plot and add a line to it.

# Creating coordinate vectors
x <- c(2.1, 4.2, 1.5, -2.8, 6.3,
       3.1, 4.0, 2.8, 2.6, 2.2, 2.0, 2.8)
y <- c(3.2, 6.5, 2.8, -2.5, 10.5, 4.8,
       5.9, 5.1, 3.9, 3.2, 3.4, 4.8)

# Plotting the scatter plot
plot(x, y, cex = 1, pch = 3, xlab = "X-axis",
     ylab = "Y-axis", col = "black")

# Creating another set of coordinates for the line
x2 <- c(3.5, 1.0, -1.8, 0.2)
y2 <- c(4.0, 5.2, 3.0, 3.5)

# Adding a red line to the plot
lines(x2, y2, col = "red", lwd = 2, lty = 1)

Output:

Example 2: Connecting Points with lines()

This example shows how to plot a scatter plot and connect the points using lines().

# Creating coordinate vectors
x <- c(2.1, 4.2, 1.5, -2.8, 6.3, 3.1,
       4.0, 2.8, 2.6, 2.2, 2.0, 2.8)
y <- c(3.2, 6.5, 2.8, -2.5, 10.5, 4.8,
       5.9, 5.1, 3.9, 3.2, 3.4, 4.8)

# Plotting the scatter plot
plot(x, y, cex = 1, pch = 3, xlab = "X-axis",
     ylab = "Y-axis", col = "black")

# Connecting points with a red line
lines(x, y, col = "red")

Output:

Example: Adding Lines to a Plot in R using lines()

# Create sample data
x <- seq(-5, 5, length.out = 10)
y <- x^3

# Create a plot of the data
plot(x, y, main = "Adding Lines to a Plot", col = "blue")

# Add a vertical line at x = 0
abline(v = 0, col = "green", lwd = 2)

# Add a horizontal line at y = 0
abline(h = 0, col = "purple", lwd = 2)

# Add a diagonal line with slope -2 and intercept 3
abline(a = 3, b = -2, col = "orange", lty = 2, lwd = 2)

# Add a custom line using lines() function
x2 <- seq(-5, 5, length.out = 10)
y2 <- -x2^2 + 4
lines(x2, y2, col = "red", lty = 2, lwd = 2)

Output:

December 13, 2025

Adding Straight Lines to a Plot in R Programming – abline() Function
abline() Function in detail

The abline() function in R is used to add one or more straight lines to a graph. It can be used to add vertical, horizontal, or regression lines to a plot.

Syntax:
```
abline(a=NULL, b=NULL, h=NULL, v=NULL, ...)
```
Parameters:
- a, b: Specifies the intercept and the slope of the line.
- h: Specifies y-value(s) for horizontal line(s).
- v: Specifies x-value(s) for vertical line(s).
Returns:

A straight line in the plot.

Example 1: Adding a Vertical Line to the Plot
```
# Create scatter plot
plot(pressure)

# Add vertical line at x = 200
abline(v = 200, col = "blue")
```
Output:

Example 2: Adding a Horizontal Line to the Plot
```
# Create scatter plot
plot(pressure)

# Add horizontal line at y = 300
abline(h = 300, col = "red")
```
Output:

Example 3: Adding a Regression Line
```
par(mgp = c(2, 1, 0), mar = c(3, 3, 1, 1))

# Fit regression line
reg <- lm(pressure ~ temperature, data = pressure)
coeff = coefficients(reg)

# Equation of the line
eq = paste0("y = ", round(coeff[1], 1), " + ", round(coeff[2], 1), "*x")

# Plot
plot(pressure, main = eq)
abline(reg, col = "darkgreen")
```
Output:
December 13, 2025
R – Line Graphs
R – Line Graphs in detail

A line graph is a chart used to display information in the form of a series of data points. It utilizes points and lines to represent changes over time. Line graphs are created by plotting different points on their X and Y coordinates and joining them with a line from beginning to end. The graph represents different values that may move up and down based on the suitable variable.

Creating Line Graphs in R

The plot() function in R is used to create line graphs.

Syntax:
```
plot(v, type, col, xlab, ylab)
```
Bar Plot (Bar Chart)

A bar plot in R represents values in a data vector as the height of bars. The data vector is mapped on the y-axis, and categories can be labeled on the x-axis. Bar charts can also resemble histograms when using the table() function instead of a data vector.

Syntax:
```
plot(v, type, col, xlab, ylab)
```
Parameters:
- v: A numeric vector representing the data points.
- type: Specifies the type of graph:
  - "p" : Draws only points.
  - "l" : Draws only lines.
  - "o" : Draws both points and lines.
- xlab: Label for the X-axis.
- ylab: Label for the Y-axis.
- main: Title of the chart.
- col: Specifies colors for the points and lines.
Example 1: Creating a Simple Line Graph

This example creates a simple line graph using the type = "o" parameter to show both points and lines.

Code:
```
# Create the data for the chart.
sales <- c(10, 15, 22, 18, 30)

# Plot the line graph.
plot(sales, type = "o")
```
Output:

Example 2: Adding Title, Color, and Labels in a Line Graph

To enhance readability, we can add a title, axis labels, and color to the graph.

Code:
```
# Create the data for the chart.
sales <- c(10, 15, 22, 18, 30)

# Plot the line graph with title and labels.
plot(sales, type = "o", col = "blue",
    xlab = "Month", ylab = "Sales (in units)",
    main = "Monthly Sales Chart")
```
Output:

To compare multiple datasets, we can plot multiple lines on the same graph using the lines() function.

Code:
```
# Defining a vector with counts of different fruits
counts <- c(120, 300, 150, 80, 45, 95)

# Defining labels for each segment
names(counts) <- c("Apples", "Bananas", "Oranges", "Grapes", "Mangoes", "Pineapples")

# Output to be saved as PNG file
png(file = "piechart.png")

# Creating pie chart
pie(counts, labels = names(counts), col = "lightblue",
    main = "Fruit Distribution", radius = -1,
    col.main = "black")

# Saving the file
dev.off()
```
Output:
December 13, 2025

Data visualization with R and ggplot2

Data visualization with ggplot2 in detail

Data visualization with R and ggplot2, also known as the Grammar of Graphics, is a free, open-source, and user-friendly visualization package widely utilized in the R programming language. Created by Hadley Wickham, it is one of the most powerful tools for data visualization.

Key Layers of ggplot2

The ggplot2 package operates on several layers, which include:

Data: The dataset used for visualization.
Aesthetics: Mapping data attributes to visual properties such as x-axis, y-axis, color, fill, size, labels, alpha, shape, line width, and line type.
Geometric Objects: How data is represented visually, such as points, lines, histograms, bars, or boxplots.
Facets: Splitting data into subsets displayed in separate panels using rows or columns.
Statistics: Applying transformations like binning, smoothing, or descriptive summaries.
Coordinates: Mapping data points to specific spaces (e.g., Cartesian, fixed, polar) and adjusting limits.
Themes: Customizing non-data elements like font size, background, and color.

Dataset Used: `mtcars`

The mtcars dataset contains fuel consumption and 10 other automobile design and performance attributes for 32 cars. It comes pre-installed with the R environment.

Viewing the First Few Records

# Print the first 6 records of the dataset
head(mtcars)

Output:

mpg	cyl	disp	hp	drat	wt	qsec	vs	am	gear	carb
Mazda RX4	21.0	6	160	110	3.90	2.620	16.46	0	1	4	4
Mazda RX4 Wag	21.0	6	160	110	3.90	2.875	17.02	0	1	4	4
Datsun 710	22.8	4	108	93	3.85	2.320	18.61	1	1	4	1
Hornet 4 Drive	21.4	6	258	110	3.08	3.215	19.44	1	0	3	1
Hornet Sportabout	18.7	8	360	175	3.15	3.440	17.02	0	0	3	2
Valiant	18.1	6	225	105	2.76	3.460	20.22	1	0	3	1

Summary Statistics of mtcars

# Load dplyr package and get a summary of the dataset
library(dplyr)

# Summary of the dataset
summary(mtcars)

Output:

Variable	Min	1st Quartile	Median	Mean	3rd Quartile	Max
mpg	10.4	15.43	19.20	20.09	22.80	33.90
cyl	4.0	4.0	6.0	6.19	8.0	8.0
disp	71.1	120.8	196.3	230.7	326.0	472.0
hp	52.0	96.5	123.0	146.7	180.0	335.0
drat	2.76	3.08	3.70	3.60	3.92	4.93
wt	1.51	2.58	3.32	3.22	3.61	5.42
qsec	14.5	16.89	17.71	17.85	18.90	22.90
vs	0.0	0.0	0.0	0.44	1.0	1.0
am	0.0	0.0	0.0	0.41	1.0	1.0
gear	3.0	3.0	4.0	3.69	4.0	5.0
carb	1.0	2.0	2.0	2.81	4.0	8.0

Visualizing Data with ggplot2

Data Layer: The data layer specifies the dataset to visualize.

# Load ggplot2 and define the data layer
library(ggplot2)

ggplot(data = mtcars) +
  labs(title = "Visualization of MTCars Data")

Output:

Aesthetic Layer: Mapping data to visual attributes such as axes, color, or shape.

# Add aesthetics
ggplot(data = mtcars, aes(x = hp, y = mpg, col = disp)) +
  labs(title = "Horsepower vs Miles per Gallon")

Output:

Geometric Layer: Adding geometric shapes to display the data.

# Plot data using points
plot1 <- ggplot(data = mtcars, aes(x = hp, y = mpg, col = disp)) +
  geom_point() +
  labs(title = "Horsepower vs Miles per Gallon", x = "Horsepower", y = "Miles per Gallon")

Output:

Faceting: Create separate plots for subsets of data.

# Facet by transmission type
facet_plot <- ggplot(data = mtcars, aes(x = hp, y = mpg, shape = factor(cyl))) +
geom_point()
facet_grid()}

Output:

Statistics Layer: The statistics layer in ggplot2 allows you to transform your data by applying methods like binning, smoothing, or descriptive statistics.

# Scatter plot with a regression line
ggplot(data = mtcars, aes(x = hp, y = mpg)) +
  geom_point() +
  stat_smooth(method = lm, col = "blue") +
  labs(title = "Relationship Between Horsepower and Miles per Gallon")

Output:

Coordinates Layer: In this layer, data coordinates are mapped to the plot’s visual space. Adjustments to axes, zooming, and proportional scaling of the plot can also be made here.

# Scatter plot with controlled axis limits
ggplot(data = mtcars, aes(x = wt, y = mpg)) +
  geom_point() +
  stat_smooth(method = lm, col = "green") +
  scale_y_continuous("Miles per Gallon", limits = c(5, 35), expand = c(0, 0)) +
  scale_x_continuous("Weight", limits = c(1, 6), expand = c(0, 0)) +
  coord_equal() +
  labs(title = "Effect of Weight on Fuel Efficiency")

Output:

Using coord_cartesian() to Zoom In

# Zoom into specific x-axis and y-axis ranges
ggplot(data = mtcars, aes(x = wt, y = hp, col = as.factor(am))) +
  geom_point() +
  geom_smooth() +
  coord_cartesian(xlim = c(3, 5), ylim = c(100, 300)) +
  labs(title = "Zoomed View: Horsepower vs Weight",
       x = "Weight",
       y = "Horsepower",
       color = "Transmission")

Output:

Theme Layer: The theme layer in ggplot2 allows fine control over display elements like background color, font size, and overall styling.

Example 1: Customizing the Background with element_rect()

ggplot(data = mtcars, aes(x = hp, y = mpg)) +
geom_point() +
facet_grid(. ~ cyl) +
theme(plot.background = element_rect(fill = "lightgray", colour = "black")) +
labs(title = "Background Customization: Horsepower vs MPG")

Output:

Example 2: Using theme_gray()

ggplot(data = mtcars, aes(x = hp, y = mpg)) +
geom_point() +
facet_grid(am ~ cyl) +
theme_gray() +
labs(title = "Default Theme: Horsepower and MPG Facets")

Output:

Contour Plot for the mtcars Dataset: Create a density contour plot to visualize the relationship between two continuous variables.

# 2D density contour plot
ggplot(mtcars, aes(x = wt, y = mpg)) +
  stat_density_2d(aes(fill = ..level..), geom = "polygon", color = "black") +
  scale_fill_viridis_c() +
  labs(title = "2D Density Contour: Weight vs MPG",
       x = "Weight",
       y = "Miles per Gallon",
       fill = "Density Levels") +
  theme_minimal()

Output:

Creating a Panel of Plots: Create multiple plots and arrange them in a grid for side-by-side visualization.

library(gridExtra)

# Histograms for selected variables
hist_plot_mpg <- ggplot(mtcars, aes(x = mpg)) +
  geom_histogram(binwidth = 2, fill = "steelblue", color = "black") +
  labs(title = "Miles per Gallon Distribution", x = "MPG", y = "Frequency")

hist_plot_disp <- ggplot(mtcars, aes(x = disp)) +
  geom_histogram(binwidth = 50, fill = "darkred", color = "black") +
  labs(title = "Displacement Distribution", x = "Displacement", y = "Frequency")

hist_plot_hp <- ggplot(mtcars, aes(x = hp)) +
  geom_histogram(binwidth = 20, fill = "forestgreen", color = "black") +
  labs(title = "Horsepower Distribution", x = "Horsepower", y = "Frequency")

hist_plot_drat <- ggplot(mtcars, aes(x = drat)) +
  geom_histogram(binwidth = 0.5, fill = "orange", color = "black") +
  labs(title = "Drat Distribution", x = "Drat", y = "Frequency")

# Arrange plots in a 2x2 grid
grid.arrange(hist_plot_mpg, hist_plot_disp, hist_plot_hp, hist_plot_drat, ncol = 2)

Output:

Saving and Extracting Plots

To save plots as image files or reuse them later:

# Create a plot
plot <- ggplot(data = mtcars, aes(x = hp, y = mpg)) +
  geom_point() +
  labs(title = "Horsepower vs MPG")

# Save the plot as PNG
ggsave("horsepower_vs_mpg.png", plot)

# Save the plot as PDF
ggsave("horsepower_vs_mpg.pdf", plot)

# Extract the plot for reuse
extracted_plot <- plot
plot

Output:

December 13, 2025

Data Visualization in R Programming
Introduction to Data Visualization

Data Visualization is the process of converting raw data into visual representations such as graphs, charts, and plots so that information can be understood quickly and clearly. Humans understand visuals far more efficiently than tables of numbers, which makes visualization a critical step in data analysis.

In R, data visualization is one of the strongest features because R was originally designed for statistical analysis and graphical modeling. Visualization is not only used to present final results, but also to explore data, identify trends, patterns, anomalies, and relationships before applying models.

Why Data Visualization is Important
- Simplifies complex datasets
- Reveals hidden patterns and trends
- Helps detect outliers and errors
- Improves communication of results
- Supports decision-making
Graph Plotting in R

What is Graph Plotting?

Graph plotting refers to creating visual representations of data values using graphical elements such as points, lines, bars, or shapes. In R, graph plotting is mainly done using:
- Base R graphics
- Advanced systems like ggplot2, lattice
Base R graphics are foundational and widely used for learning concepts.

Generic Plotting System in R

R uses a generic plotting system, where the same function behaves differently based on the data type.

The most important generic function is:
```
plot()
```
The plot() function automatically determines:
- Type of plot
- Axis scaling
- Labels (if available)
This behavior is called method dispatch.

Using the plot() Function

Basic Syntax
```
plot(x, y)
```
Example
```
x <- c(1, 2, 3, 4, 5)
y <- c(2, 4, 6, 8, 10)

plot(x, y)
```
This produces a scatter plot, showing the relationship between x and y.

Types of Plots Using plot()

Scatter Plot

Used to analyze relationships between two numerical variables.
```
plot(x, y, type = "p")
```
Line Plot

Used to show trends over time or ordered data.
```
plot(x, y, type = "l")
```
Combined Points and Lines
```
plot(x, y, type = "b")
```
Vertical Line Plot
```
plot(x, y, type = "h")
```
Graphical Models in R

Introduction to Graphical Models

Graphical models in R are visual representations of statistical data and relationships. They are used to:
- Understand data distribution
- Visualize correlations
- Validate statistical assumptions
- Analyze model performance
Graphical models include:
- Scatter plots
- Histograms
- Boxplots
- Regression plots
- Residual plots
Example: Visualizing a Relationship
```
plot(mtcars$wt, mtcars$mpg)
```
This graph shows how car weight affects mileage, a common statistical analysis.

Charts and Graphs in R

Common Chart Types

Chart Type Purpose
Line graph Trends over time
Bar chart Category comparison
Histogram Distribution
Scatter plot Relationship
Boxplot Spread and outliers

Choosing the correct chart is crucial to avoid misleading interpretation.

Adding Titles to a Graph

Main Title

The main title describes what the graph represents.
```
plot(x, y, main = "Relationship Between X and Y")
```
Axis Labels

Axis labels explain what each axis represents.
```
plot(x, y,
     main = "Sales Growth",
     xlab = "Months",
     ylab = "Revenue")
```
Clear labels are essential for readability.

Adding Colors to Charts

Importance of Colors

Colors:
- Improve readability
- Highlight differences
- Separate categories
- Make graphs visually appealing
Using col Argument
```
plot(x, y, col = "blue")
```
Using Multiple Colors
```
plot(x, y, col = c("red", "green", "blue", "orange", "black"))
```
Each point gets a different color.

Color in Bar Charts
```
barplot(scores, col = "skyblue")
```
Adding Text to Plots

Using text()

Used to label data points.
```
plot(x, y)
text(x, y, labels = y, pos = 3)
```
- pos controls label position
- Helps annotate important values
Using mtext()

Adds text in margins.
```
mtext("Data Source: Survey", side = 1, line = 3)
```
Adding Axis to a Plot

Default Axes

R automatically generates axes based on data range.

Custom Axes

Disable default axes:
```
plot(x, y, xaxt = "n", yaxt = "n")
```
Add custom axes:
```
axis(1, at = 1:5)
axis(2, at = seq(0, 10, 2))
box()
```
Custom axes provide better control.

Axis Limits

Set axis limits manually:
```
plot(x, y, xlim = c(0, 6), ylim = c(0, 12))
```
Graphics Palette in R

What is a Graphics Palette?

A graphics palette defines the set of colors used when multiple colors are needed automatically.

View Current Palette
```
palette()
```
Set a Custom Palette
```
palette(c("red", "blue", "green", "orange"))
```
Reset:
```
palette("default")
```
Plotting Data Using Generic Plots

Plotting a Single Vector
```
v <- c(5, 10, 15, 20)
plot(v)
```
R plots index vs value.

Plotting Two Vectors
```
plot(x, y)
```
Plotting Data Frames
```
plot(mtcars)
```
This creates multiple pairwise plots.

Bar Charts in R

Introduction to Bar Charts

A bar chart displays data using rectangular bars. The length of each bar represents the value of a category.

Bar charts are ideal for:
- Comparing categories
- Displaying frequency counts
- Showing grouped data
Creating a Simple Bar Chart
```
scores <- c(80, 90, 75)
names(scores) <- c("Math", "Science", "English")

barplot(scores)
```
Adding Titles and Labels
```
barplot(scores,
        main = "Student Performance",
        xlab = "Subjects",
        ylab = "Marks",
        col = "lightblue")
```
Horizontal Bar Chart
```
barplot(scores, horiz = TRUE)
```
Grouped Bar Chart
```
data <- matrix(c(80, 85, 90, 88), nrow = 2)

barplot(data,
        beside = TRUE,
        col = c("red", "blue"),
        legend.text = TRUE)
```
Stacked Bar Chart
```
barplot(data,
        col = c("orange", "green"),
        legend.text = TRUE)
```
Adding Values on Bars
```
bp <- barplot(scores)
text(bp, scores, labels = scores, pos = 3)
```
Common Mistakes in Visualization
- Missing titles or labels
- Overuse of colors
- Incorrect chart type
- Misleading scales
- Overcrowded graphs
Summary

Data visualization in R is a powerful tool for exploring and communicating data. Base R graphics provide flexible and customizable plotting options. Understanding titles, colors, axes, text annotations, palettes, and bar charts ensures clear, accurate, and effective visual communication.
December 13, 2025

Chart Type	Purpose
Line graph	Trends over time
Bar chart	Category comparison
Histogram	Distribution
Scatter plot	Relationship
Boxplot	Spread and outliers

Manipulate R Data Frames Using SQL

R Data Frames Using SQL in detail

The sqldf package in R enables seamless manipulation of data frames using SQL commands. It provides an efficient way to work with structured data and can be used to interact with a limited range of databases. Instead of using table names as in traditional SQL, sqldf allows you to specify data frame names, making it easy to execute queries within R.

Key Operations of `sqldf`

When executing an SQL statement on a data frame using sqldf, the following steps occur:

A temporary database is created with an appropriate schema.
The data frames are automatically loaded into this database.
The SQL query is executed.
The resulting output is returned as a new data frame in R.
The temporary database is automatically deleted after execution.

This approach optimizes calculations and improves efficiency by leveraging SQL operations.

install.packages("sqldf")
library(sqldf)

Loading Sample Data

For demonstration, we use two CSV files:

accidents.csv: Contains Year, Highway, Crash_Count, and Traffic.
routes.csv: Contains Highway, Region, and Distance.

Set the working directory and load the data:

setwd("C:/Users/User/Documents/R")
accidents <- read.csv("accidents.csv")
routes <- read.csv("routes.csv")

head(accidents)
tail(accidents)
print(routes)

Sample Output:

accidents.csv Data:

Year      Highway   Crash_Count Traffic
1 2000 Highway-101        30     50000
2 2001 Highway-101        35     52000
3 2002 Highway-101        40     54000

routes.csv Data:

Highway      Region    Distance
1 Highway-101  North Zone      200
2 Highway-405  South Zone      150

SQL Operations with `sqldf`

1. Performing a Left Join

library(tcltk)
join_query <- "SELECT accidents.*, routes.Region, routes.Distance
              FROM accidents
              LEFT JOIN routes ON accidents.Highway = routes.Highway"

accidents_routes <- sqldf(join_query, stringsAsFactors = FALSE)
head(accidents_routes)
tail(accidents_routes)

Sample Output:

Year     Highway   Crash_Count Traffic    Region    Distance
1 2000 Highway-101        30     50000 North Zone       200
2 2001 Highway-101        35     52000 North Zone       200
3 2002 Highway-101        40     54000 North Zone       200

2. Performing an Inner Join

inner_query <- "SELECT accidents.*, routes.Region, routes.Distance
                FROM accidents
                INNER JOIN routes ON accidents.Highway = routes.Highway"

accidents_routes_inner <- sqldf(inner_query, stringsAsFactors = FALSE)
head(accidents_routes_inner)
tail(accidents_routes_inner)

Sample Output:

Year     Highway   Crash_Count Traffic    Region    Distance
1 2000 Highway-101        30     50000 North Zone       200
2 2001 Highway-101        35     52000 North Zone       200

3. Using merge() for Joining Data Frames

The merge() function in R allows for various types of joins, including full outer joins and right joins.

accidents_merge_routes <- merge(accidents, routes, by = "Highway", all.x = TRUE)
head(accidents_merge_routes)
tail(accidents_merge_routes)

Sample Output:

Highway Year Crash_Count Traffic    Region    Distance
1 Highway-101 2000        30     50000 North Zone       200
2 Highway-101 2001        35     52000 North Zone       200

4. Filtering Data Using WHERE Clause

filter_query <- "SELECT * FROM accidents
                WHERE Highway = 'Highway-405'"

filtered_data <- sqldf(filter_query, stringsAsFactors = FALSE)
head(filtered_data)

Sample Output:

Year      Highway  Crash_Count Traffic
1 2000 Highway-405         50    60000
2 2001 Highway-405         55    62000

5. Using Aggregate Functions

The GROUP BY clause helps perform aggregate calculations.

aggregate_query <- "SELECT Highway, AVG(Crash_Count) AS Avg_Crashes
                    FROM accidents
                    GROUP BY Highway"

sqldf(aggregate_query)

Sample Output:

Highway    Avg_Crashes
1 Highway-101        35.5
2 Highway-405        52.5

6. Using plyr for Advanced Aggregation

For more advanced calculations, the plyr package is useful.

library(plyr)
ddply(accidents_merge_routes, .(Highway), function(X) {
  data.frame(
    Avg_Crashes = mean(X$Crash_Count),
    Q1_Crashes = quantile(X$Crash_Count, 0.25),
    Q3_Crashes = quantile(X$Crash_Count, 0.75),
    Median_Crashes = median(X$Crash_Count)
  )
})

Output:

Highway  Avg_Crashes  Q1_Crashes  Q3_Crashes  Median_Crashes
1 Highway-101      35.5       32.5       38.5           35
2 Highway-405      52.5       50.5       54.5           52.5

December 13, 2025

Database Connectivity with R Programming

Import Data from a File in detail

A database is a structured collection of organized data that allows easy access, storage, and management. It can be handled using a Database Management System (DBMS), which is specialized software for managing databases efficiently. A database contains related and structured data that can be stored and retrieved when needed.

A database primarily supports data storage, retrieval, and manipulation through various sublanguages:

Data Definition Language (DDL)
Data Query Language (DQL)
Data Manipulation Language (DML)
Data Control Language (DCL)
Transaction Control Language (TCL)

Step 1: Install MySQL

To begin, download and install MySQL from its official website:

Once installed, create a new database in MySQL using the following command:

CREATE DATABASE studentDB;

Step 2: Install R Studio

To write and execute R scripts, install RStudio from:

CREATE DATABASE studentDB;

Step 3: Install MySQL Library in R

In RStudio, install the MySQL package with the command:

install.packages("RMySQL")

Now, execute the following R script to connect MySQL with R:

# Load the RMySQL library
library(RMySQL)

# Establish a connection to MySQL database
mysql_connection = dbConnect(MySQL(),
                             user = 'root',
                             password = 'root',
                             dbname = 'studentDB',
                             host = 'localhost')

# List available tables in the database
dbListTables(mysql_connection)

# Creating a table in MySQL database
dbSendQuery(mysql_connection, "CREATE TABLE students (id INT, name VARCHAR(20));")

# Inserting records into the table
dbSendQuery(mysql_connection, "INSERT INTO students VALUES (201, 'Rahul');")
dbSendQuery(mysql_connection, "INSERT INTO students VALUES (202, 'Neha');")
dbSendQuery(mysql_connection, "INSERT INTO students VALUES (203, 'Ankit');")

# Retrieving records from the table
query_result = dbSendQuery(mysql_connection, "SELECT * FROM students")

# Storing result in an R data frame
data_frame = fetch(query_result)

# Displaying the data frame
print(data_frame)

Output:

id   name
1 201  Rahul
2 202  Neha
3 203  Ankit

December 13, 2025

Working with Databases in R Programming
Working with Databases in detail

In R, working with datasets is a crucial aspect of statistical analysis and visualization. Instead of manually creating datasets in the console each time, we can retrieve structured and normalized data directly from relational databases such as MySQL, Oracle, and SQL Server. This integration allows for seamless data manipulation and visualization within R.

This guide focuses on MySQL connectivity in R, covering database connection, table creation, deletion, data insertion, updating, and querying.

RMySQL Package

R provides the RMySQL package to facilitate communication between R and MySQL databases. This package needs to be installed and loaded before connecting to MySQL.

Installation
```
install.packages("RMySQL")
```
Establishing Connection to MySQL

To connect to MySQL, the dbConnect() function is used, which requires a database driver along with authentication credentials such as username, password, database name, and host details.

Syntax:
```
dbConnect(drv, user, password, dbname, host)
```
Parameters
- drv – Specifies the database driver
- user – MySQL username
- password – Corresponding password
- dbname – Name of the database
- host – Server hosting the database
Example: Connecting to MySQL Database
```
# Load necessary library
library("RMySQL")

# Establish connection
conn <- dbConnect(MySQL(), user = 'admin', password = 'mypassword',
                  dbname = 'SampleDB', host = 'localhost')

# Display available tables
dbListTables(conn)
```
Output:
```
Loading required package: DBI
[1] "employees"
```
Creating a Table in MySQL Using R

A table can be created in MySQL from R using the dbWriteTable() function. If the table already exists, this function will replace it.

Syntax
```
dbWriteTable(conn, name, value)
```
Parameters
- conn – Connection object
- name – Name of the MySQL table
- value – Dataframe to be converted into a MySQL table
Example: Creating a Table
```
# Establish connection
conn <- dbConnect(MySQL(), user = 'admin', password = 'mypassword',
                  dbname = 'SampleDB', host = 'localhost')

# Create new table with selected data
dbWriteTable(conn, "iris_table", iris[1:10, ], overwrite = TRUE)
```
Output:
```
[1] TRUE
```
Deleting a Table in MySQL Using R

To perform various database operations, the dbSendQuery() function can be used to execute SQL queries directly in MySQL from R.

Syntax:
```
dbSendQuery(conn, statement)
```
Importing Data from a Delimited File

The read.delim() function is used to import delimited files, where values are separated by specific symbols such as |, $, or ,.

Syntax:
```
read.delim("file.txt", sep="|", header=TRUE)
```
Parameters
- conn – Connection object
- statement – SQL command to be executed
Example: Dropping a Table
```
# Establish connection
conn <- dbConnect(MySQL(), user = 'admin', password = 'mypassword',
                  dbname = 'SampleDB', host = 'localhost')

# Drop existing table
dbSendQuery(conn, 'DROP TABLE iris_table')
```
Output:
```
<MySQLResult:9845732, 3, 4>
```
Inserting Data into MySQL Table Using R

Data can be inserted into a MySQL table from R using SQL INSERT INTO queries.

Example: Inserting Data
```
# Establish connection
conn <- dbConnect(MySQL(), user = 'admin', password = 'mypassword',
                  dbname = 'SampleDB', host = 'localhost')

# Insert new record into employees table
dbSendQuery(conn, "INSERT INTO employees(id, name) VALUES (1, 'John Doe')")
```
Output:
```
<MySQLResult:9845732, 3, 5>
```
Updating Data in a MySQL Table Using R

An existing record in the table can be modified using the UPDATE query.

Example: Updating a Table
```
# Establish connection
conn <- dbConnect(MySQL(), user = 'admin', password = 'mypassword',
                  dbname = 'SampleDB', host = 'localhost')

# Update a record in employees table
dbSendQuery(conn, "UPDATE employees SET name = 'Jane Doe' WHERE id = 1")
```
Output:
```
<MySQLResult:-1, 3, 6>
```
Retrieving Data from MySQL Using R

To fetch data from MySQL, the dbSendQuery() function is used to send a SQL SELECT statement. The retrieved data can be stored in a dataframe using the fetch() function.

Example:
```
# Establish connection
conn <- dbConnect(MySQL(), user = 'admin', password = 'mypassword',
                  dbname = 'SampleDB', host = 'localhost')

# Fetch records from employees table
res <- dbSendQuery(conn, "SELECT * FROM employees")

# Retrieve first 3 rows as dataframe
df <- fetch(res, n = 3)
print(df)
```
Output:
```
id      name
1  1  John Doe
2  2  Alice Ray
3  3  Mark Smith
```
December 13, 2025

Blog

R – Pie Charts in detail

Pie Chart with Title and Colors

Pie Chart with Color Palettes

Modify Border Line Type

Add Shading Lines

3D Pie Chart

Histograms in detail

Creating Histograms in R

lines() Function in detail

abline() Function in detail

R – Line Graphs in detail

Creating Line Graphs in R

Data visualization with ggplot2 in detail

Key Layers of ggplot2

Dataset Used: mtcars

Visualizing Data with ggplot2

Introduction to Data Visualization

Why Data Visualization is Important

Graph Plotting in R

What is Graph Plotting?

Generic Plotting System in R

Using the plot() Function

Basic Syntax

Example

Types of Plots Using plot()

Scatter Plot

Line Plot

Combined Points and Lines

Vertical Line Plot

Graphical Models in R

Introduction to Graphical Models

Example: Visualizing a Relationship

Charts and Graphs in R

Common Chart Types

Adding Titles to a Graph

Main Title

Axis Labels

Adding Colors to Charts

Importance of Colors

Using col Argument

Using Multiple Colors

Color in Bar Charts

Adding Text to Plots

Using text()

Using mtext()

Adding Axis to a Plot

Default Axes

Custom Axes

Axis Limits

Graphics Palette in R

What is a Graphics Palette?

View Current Palette

Set a Custom Palette

Plotting Data Using Generic Plots

Plotting a Single Vector

Plotting Two Vectors

Plotting Data Frames

Bar Charts in R

Introduction to Bar Charts

Creating a Simple Bar Chart

Adding Titles and Labels

Horizontal Bar Chart

Grouped Bar Chart

Stacked Bar Chart

Adding Values on Bars

Common Mistakes in Visualization

Summary

R Data Frames Using SQL in detail

Key Operations of sqldf

Loading Sample Data

Sample Output:

SQL Operations with sqldf

Import Data from a File in detail

Step 1: Install MySQL

Working with Databases in detail

RMySQL Package

Establishing Connection to MySQL

Creating a Table in MySQL Using R

Deleting a Table in MySQL Using R

Dataset Used: `mtcars`

Using the `plot()` Function

Types of Plots Using `plot()`

Using `col` Argument

Using `text()`

Using `mtext()`

Key Operations of `sqldf`

SQL Operations with `sqldf`