Blog

Reading Tabular Data from files in R Programming
Reading Tabular Data in detail

In data analysis, it is often necessary to read and process data stored outside the R environment. Importing data into R is a crucial step in such cases. R supports multiple file formats, including CSV, JSON, Excel, Text, and XML. Most data is available in tabular format, and R provides functions to read this structured data into a data frame. Data frames are widely used in R because they facilitate data extraction from rows and columns, making statistical computations easier than with other data structures.

Common Functions for Importing Data into R

The most frequently used functions for reading tabular data into R are:
- read.table()
- read.csv()
- fromJSON()
- read.xlsx()
Reading Data from a Text File

The read.table() function is used to read tabular data from a text file.

Parameters:
- file: Specifies the file name.
- header: A logical flag indicating if the first line contains column names.
- nrows: Specifies the number of rows to read.
- skip: Skips a specified number of lines from the beginning.
- colClasses: A character vector indicating the class of each column.
- sep: A string that defines column separators (e.g., commas, spaces, tabs).
For small or moderately sized datasets, read.table() can be called without arguments. R automatically detects rows, columns, column classes, and skips lines starting with # (comments). Specifying arguments enhances efficiency, especially for large datasets.

Example:

Assume a text file data.txt in the current directory contains the following data:
```
Name Age Salary
John  28  50000
Emma  25  60000
Alex  30  70000
```
Reading the file in R:
```
read.table("data.txt", header=TRUE)
```
Output:
```
Name Age Salary
1  John  28 50000
2  Emma  25 60000
3  Alex  30 70000
```
Reading Data from a CSV File

The read.csv() function is used for reading CSV files, which are commonly generated by spreadsheet applications like Microsoft Excel. It is similar to read.table() but uses a comma as the default separator and assumes header=TRUE by default.

Example:

Assume a CSV file data.csv contains the following:
```
Name,Age,Salary
John,28,50000
Emma,25,60000
Alex,30,70000
```
Reading the file in R:
```
read.table("data.txt", header=TRUE)
```
Output:
```
Name Age Salary
1  John  28 50000
2  Emma  25 60000
3  Alex  30 70000
```
Reading Data from a CSV File

The read.csv() function is used for reading CSV files, which are commonly generated by spreadsheet applications like Microsoft Excel. It is similar to read.table() but uses a comma as the default separator and assumes header=TRUE by default.

Example:

Assume a CSV file data.csv contains the following:
```
Name,Age,Salary
John,28,50000
Emma,25,60000
Alex,30,70000
```
Reading Data from a CSV File

The read.csv() function is used for reading CSV files, which are commonly generated by spreadsheet applications like Microsoft Excel. It is similar to read.table() but uses a comma as the default separator and assumes header=TRUE by default.

Example:

Assume a CSV file data.csv contains the following:
```
Name,Age,Salary
John,28,50000
Emma,25,60000
Alex,30,70000
```
Reading Data from a CSV File

The read.csv() function is used for reading CSV files, which are commonly generated by spreadsheet applications like Microsoft Excel. It is similar to read.table() but uses a comma as the default separator and assumes header=TRUE by default.

Example:

Assume a CSV file data.csv contains the following:
```
Name,Age,Salary
John,28,50000
Emma,25,60000
Alex,30,70000
```
Reading the file in R:
```
3  Alex  30 70000
```
Output:
```
Name Age Salary
1  John  28 50000
2  Emma  25 60000
3  Alex  30 70000
```
Reading Data from a CSV File

The read.csv() function is used for reading CSV files, which are commonly generated by spreadsheet applications like Microsoft Excel. It is similar to read.table() but uses a comma as the default separator and assumes header=TRUE by default.

Example:

Assume a CSV file data.csv contains the following:
```
Name,Age,Salary
John,28,50000
Emma,25,60000
Alex,30,70000
```
Reading the file in R:
```
read.csv("data.csv")
```
Output:
```
Name Age Salary
1  John  28 50000
2  Emma  25 60000
3  Alex  30 70000
```
Memory Considerations

For large files, it is essential to estimate the memory required before loading data. The approximate memory needed for a dataset with 2,000,000 rows and 200 numeric columns can be calculated as:
```
2000000 x 200 x 8 bytes = 3.2 GB
```
Since R requires additional memory for processing, at least twice this amount (6.4 GB) should be available.

Reading Data from a JSON File

The fromJSON() function from the rjson package is used to import JSON data into R.

Installation:
```
install.packages("rjson")
```
Example:

Assume a JSON file data.json contains:
```
{
  "Name": ["John", "Emma", "Alex"],
  "Age": [28, 25, 30],
  "Salary": [50000, 60000, 70000]
}
```
Reading the JSON file in R:
```
library(rjson)
data <- fromJSON(file="data.json")
as.data.frame(data)
```
Reading Excel Sheets

The read.xlsx() function is used to import Excel worksheets into R. It requires the xlsx package.

Installation:
```
install.packages("xlsx")
```
Example:

Assume an Excel file data.xlsx with the following content:

Name Age Salary
John 28 50000
Emma 25 60000
Alex 30 70000

Reading the first sheet:
```
library("xlsx")
read.xlsx("data.xlsx", 1)
```
Output:
```
Name Age Salary
1  John  28 50000
2  Emma  25 60000
3  Alex  30 70000
```
For large datasets (over 100,000 cells), read.xlsx2() is preferred as it works faster by using the readColumns() function optimized for tabular data.

By using these functions, data can be efficiently imported into R for further processing and analysis.
December 13, 2025

Name	Age	Salary
John	28	50000
Emma	25	60000
Alex	30	70000

Working with JSON Files in R Programming

Working with JSON Files in detail

JSON (JavaScript Object Notation) is a widely used data format that stores information in a structured and readable manner, using text-based key-value pairs. Just like other files, JSON files can be both read and written in R. To work with JSON files in R, we need to install and use the rjson package.

Common JSON Operations in R

Using the rjson package, we can perform various tasks, including:

Installing and loading the rjson package
Creating a JSON file
Reading data from a JSON file
Writing data into a JSON file
Converting JSON data into a dataframe
Extracting data from URLs

Installing and Loading the `rjson` Package

To use JSON functionality in R, install the rjson package using the command below:

install.packages("rjson")

Once installed, load the package into the R environment using:

library("rjson")

To create a JSON file, follow these steps:

Open a text editor (such as Notepad) and enter data in the JSON format.
Save the file with a .json extension (e.g., sample.json).

Example JSON Data:

{
   "EmployeeID":["101","102","103","104","105"],
   "Name":["Amit","Rohit","Sneha","Priya","Karan"],
   "Salary":["55000","63000","72000","80000","59000"],
   "JoiningDate":["2015-03-25","2018-07-10","2020-01-15","2017-09-12","2019-05-30"],
   "Department":["IT","HR","Finance","Operations","Marketing"]
}

Reading a JSON File in R

The fromJSON() function helps read and parse JSON data from a file. The extracted data is stored as a list by default.

Example Code:

# Load required package
library("rjson")

# Read the JSON file from a specified location
data <- fromJSON(file = "D:\\sample.json")

# Print the data
print(data)

Output:

$EmployeeID
[1] "101" "102" "103" "104" "105"

$Name
[1] "Amit"   "Rohit"   "Sneha"   "Priya"   "Karan"

$Salary
[1] "55000" "63000" "72000" "80000" "59000"

$JoiningDate
[1] "2015-03-25" "2018-07-10" "2020-01-15" "2017-09-12" "2019-05-30"

$Department
[1] "IT"         "HR"         "Finance"    "Operations" "Marketing"

Writing Data to a JSON File in R

To write data into a JSON file, we first convert data into a JSON object using the toJSON() function and then use the write() function to store it in a file.

Example Code:

# Load the required package
library("rjson")

# Creating a list with sample data
data_list <- list(
  Fruits = c("Apple", "Banana", "Mango"),
  Category = c("Fruit", "Fruit", "Fruit")
)

# Convert list to JSON format
json_output <- toJSON(data_list)

# Write JSON data to a file
write(json_output, "output.json")

# Read and print the created JSON file
result <- fromJSON(file = "output.json")
print(result)

Output:

$Fruits
[1] "Apple"  "Banana" "Mango"

$Category
[1] "Fruit"  "Fruit"  "Fruit"

Converting JSON Data into a Dataframe

In R, JSON data can be transformed into a dataframe using as.data.frame(), allowing easy manipulation and analysis.

Example Code:

# Load required package
library("rjson")

# Read JSON file
data <- fromJSON(file = "D:\\sample.json")

# Convert JSON data to a dataframe
json_df <- as.data.frame(data)

# Print the dataframe
print(json_df)

Output:

EmployeeID   Name Salary JoiningDate  Department
1       101   Amit  55000  2015-03-25          IT
2       102  Rohit  63000  2018-07-10          HR
3       103  Sneha  72000  2020-01-15     Finance
4       104  Priya  80000  2017-09-12 Operations
5       105  Karan  59000  2019-05-30  Marketing

Working with JSON Data from a URL

JSON data can be extracted from online sources using either the jsonlite or RJSONIO package.

Example Code:

# Load the required package
library(RJSONIO)

# Fetch JSON data from a URL
data_url <- fromJSON("https://api.publicapis.org/entries")

# Extract specific fields
API_Names <- sapply(data_url$entries, function(x) x$API)

# Display first few API names
head(API_Names)

Output:

[1] "AdoptAPet" "Axolotl" "Cat Facts" "Dog CEO" "Fun Translations"

December 13, 2025

Working with Excel Files in R Programming

Working with Excel Files in detail

Excel files commonly have extensions such as .xls, .xlsx, and .csv (comma-separated values). To begin working with Excel files in R, they need to be imported into RStudio or any other R-compatible Integrated Development Environment (IDE).

Reading Excel Files in R

Before reading Excel files, the readxl package must be installed and loaded. Below is an example demonstrating how to do so.

Example Excel Files:

data1.xlsx:

ID    Name    Age
1     Alex    25
2     Bob     30
3     Cathy   22

data2.xlsx:

ID    City       Country
1     New York   USA
2     London     UK
3     Sydney     Australia

Reading Files from the Working Directory

# Installing the required package
install.packages("readxl")

# Loading the package
library(readxl)

# Importing Excel files
data1 <- read_excel("data1.xlsx")
data2 <- read_excel("data2.xlsx")

# Printing the data
head(data1)
head(data2)

Output:

data1:

ID   Name    Age
1  1   Alex    25
2  2   Bob     30
3  3   Cathy   22

data2:

ID    City      Country   Region
1  1    New York USA       Unknown
2  2    London   UK        Unknown
3  3    Sydney   Australia Unknown

Deleting Content from Files

Columns can be removed using the - sign in R.

# Deleting columns
data1 <- data1[-2]
data2 <- data2[-3]

# Printing updated data
head(data1)
head(data2)

Output:

data1:

ID   Age   Status
1  1   25    Active
2  2   30    Active
3  3   22    Active

data2:

ID    City      Region
1  1    New York Unknown
2  2    London   Unknown
3  3    Sydney   Unknown

Writing Data to New Excel Files

After making modifications, the datasets can be saved into new Excel files using the writexl package.

# Installing the package
install.packages("writexl")

# Loading the package
library(writexl)

# Writing modified data to new Excel files
write_xlsx(data1, "Updated_data1.xlsx")
write_xlsx(data2, "Updated_data2.xlsx")

These files will be saved in the current working directory. The final datasets include all modifications and can be used for further analysis.

December 13, 2025

Working with XML Files in R Programming

Working with XML Files in detail

XML, short for Extensible Markup Language, is composed of markup tags where each tag represents specific data within an XML file. To manipulate XML files in R, we need to use the XML package, which must be installed explicitly using the following command:

install.packages("XML")

Creating an XML File

An XML file is structured using hierarchical tags that contain information about data. It must be saved with a .xml extension.

Consider the following XML file named students.xml:

<STUDENTS>
  <STUDENT>
      <ID>101</ID>
      <NAME>Rahul</NAME>
      <SCORE>750</SCORE>
      <DEPARTMENT>Science</DEPARTMENT>
  </STUDENT>
  <STUDENT>
      <ID>102</ID>
      <NAME>Sneha</NAME>
      <SCORE>540</SCORE>
      <DEPARTMENT>Arts</DEPARTMENT>
  </STUDENT>
  <STUDENT>
      <ID>103</ID>
      <NAME>Amit</NAME>
      <SCORE>680</SCORE>
      <DEPARTMENT>Commerce</DEPARTMENT>
  </STUDENT>
  <STUDENT>
      <ID>104</ID>
      <NAME>Priya</NAME>
      <SCORE>720</SCORE>
      <DEPARTMENT>Science</DEPARTMENT>
  </STUDENT>
  <STUDENT>
      <ID>105</ID>
      <NAME>Varun</NAME>
      <SCORE>590</SCORE>
      <DEPARTMENT>Science</DEPARTMENT>
  </STUDENT>
</STUDENTS>

Reading an XML File in R

After installing the required package, we can read and parse an XML file using the xmlParse() function. This function takes the filename as an argument and returns the content as a structured list.

# Load necessary libraries
library("XML")
library("methods")

# Parse the XML file
student_data <- xmlParse(file = "students.xml")

print(student_data)

Output:

101
Rahul
750
Science
102
Sneha
540
Arts
103
Amit
680
Commerce
104
Priya
720
Science
105
Varun
590
Science

Extracting Information from an XML File

Using R, we can extract specific details from the XML structure, such as the number of nodes, specific elements, or attributes.

# Load required libraries
library("XML")
library("methods")

# Parse the XML file
parsed_data <- xmlParse(file = "students.xml")

# Extract the root node
root_node <- xmlRoot(parsed_data)

# Count the number of nodes
total_nodes <- xmlSize(root_node)

# Retrieve a specific record (2nd student)
second_student <- root_node[2]

# Extract a particular attribute (Score of 4th student)
specific_score <- root_node[[4]][[3]]

cat('Total number of students:', total_nodes, '\n')
print('Details of the 2nd student:')
print(second_student)

print('Score of the 4th student:', specific_score)

Output:

Total number of students: 5
Details of the 2nd student:
$STUDENT
    102
    Sneha
    540
    Arts

Score of the 4th student: 720

Converting XML to a Data Frame

To improve readability and ease of analysis, XML data can be converted into a structured data frame using the xmlToDataFrame() function in R.

# Load required libraries
library("XML")
library("methods")

# Convert XML to a data frame
student_df <- xmlToDataFrame("students.xml")
print(student_df)

Output:

ID    NAME   SCORE   DEPARTMENT
1   101   Rahul   750   Science
2   102   Sneha   540   Arts
3   103   Amit    680   Commerce
4   104   Priya   720   Science
5   105   Varun   590   Science

December 13, 2025

Working with CSV files in R Programming

Working with CSV files in detail

In this article, we will explore how to handle CSV files in the R programming language.

Understanding CSV Files in R

CSV (Comma-Separated Values) files are plain text files where data is stored in tabular form, with values in each row separated by a delimiter such as a comma or tab. We will use a sample CSV file for demonstration purposes.

Managing the Working Directory in R

Before working with a CSV file, it is essential to check and set the working directory where the file is located.

# Display the current working directory
print(getwd())

# Change the working directory
setwd("/data/analysis")

# Confirm the new working directory
print(getwd())

Output:

[1] "C:/Users/DataScience/Documents"
[1] "C:/Users/DataScience/Documents"

Using the getwd() function, we can retrieve the current working directory, and with setwd(), we can modify it as needed.

Sample CSV File for Input

id, name, department, salary, projects
1,   Alex,   IT,        75000,   4
2,   Brian,  HR,        67000,   3
3,   Clara,  Marketing, 72000,   5
4,   Daniel, Sales,     58000,   2
5,   Emma,   Tech,      65000,   3
6,   Frank,  IT,        70000,   6
7,   Grace,  HR,        69000,   4

Save this file as employees.csv to use it in R.

Reading a CSV File in R

The read.csv() function allows us to read the contents of a CSV file into a data frame.

Example:

# Load the CSV file as a data frame
csv_data <- read.csv(file = 'C:\\Users\\DataScience\\Documents\\employees.csv')
print(csv_data)

# Display the number of columns
print(ncol(csv_data))

# Display the number of rows
print(nrow(csv_data))

Output:

id   name  department  salary  projects
1  1   Alex        IT   75000        4
2  2  Brian        HR   67000        3
3  3  Clara Marketing   72000        5
4  4 Daniel    Sales   58000        2
5  5   Emma      Tech   65000        3
6  6  Frank        IT   70000        6
7  7  Grace        HR   69000        4
[1] 5
[1] 7

The read.csv() function reads the file and stores it as a data frame in R. The ncol() and nrow() functions return the number of columns and rows, respectively.

Filtering Data from a CSV File

We can perform queries on the data using functions like subset() and logical conditions.

Finding the Minimum Value

# Find the minimum number of projects
min_projects <- min(csv_data$projects)
print(min_projects)

Output:

Filtering Employees with Salary Above 65000

# Select 'name' and 'salary' columns for employees with salary greater than 65000
result <- csv_data[csv_data$salary > 65000, c("name", "salary")]

# Display the filtered result
print(result)

Output:

name salary
1  Alex  75000
2 Brian  67000
3 Clara  72000
7 Grace  69000

The subset of employees meeting the condition is stored as a new data frame.

Writing Data to a CSV File

R allows exporting data frames to CSV files using write.csv().

# Calculate the average salary for each department
avg_salary <- tapply(csv_data$salary, csv_data$department, mean)

# Display the results
print(avg_salary)

Output:

HR        IT  Marketing   Sales    Tech
68000.0  72500.0  72000.0  58000.0  65000.0

December 13, 2025

Exporting Data from scripts in R Programming

Exporting Data in detail

When a program terminates, all data held in the program is lost. To ensure data persistence, we store the fetched information in files. This enables transferring data across systems and prevents re-entering large datasets. Files can be stored in formats such as .txt, .csv, or even in online/cloud storage. R provides straightforward methods to export data to these file types.

Exporting Data to a Text File

Text files are a common format for data storage. R provides methods like write.table() to export data frames or matrices to text files.

1. write.table(): The write.table() function writes a data frame or matrix to a text file.

Syntax:

write.table(x, file, append = FALSE, sep = " ", dec = ".", row.names = TRUE, col.names = TRUE)

Parameters:

x: Data frame or matrix to be written.
file: File name as a string.
sep: Field separator (e.g., \t for tab-separated values).
dec: Decimal separator (default is .).
row.names: Logical or character vector for row names.
col.names: Logical or character vector for column names.

Example:

# Creating a data frame
employee_data <- data.frame(
  "Employee" = c("John", "Emma", "Liam"),
  "Department" = c("HR", "IT", "Finance"),
  "Age" = c(29, 34, 41)
)

# Exporting the data frame to a text file
write.table(employee_data,
            file = "employee_data.txt",
            sep = "\t",
            row.names = TRUE,
            col.names = NA)

Output:

""    "Employee"    "Department"    "Age"
"1"    "John"        "HR"             29
"2"    "Emma"        "IT"             34
"3"    "Liam"        "Finance"        41

write_tsv(): The write_tsv() method from the readr package exports tab-separated values.

Syntax:

write_tsv(file, path)

Example:

# Creating a data frame
employee_data <- data.frame(
  "Employee" = c("John", "Emma", "Liam"),
  "Department" = c("HR", "IT", "Finance"),
  "Age" = c(29, 34, 41)
)

# Exporting the data frame to a text file
write.table(employee_data,
            file = "employee_data.txt",
            sep = "\t",
            row.names = TRUE,
            col.names = NA)

Output:

""    "Employee"    "Department"    "Age"
"1"    "John"        "HR"             29
"2"    "Emma"        "IT"             34
"3"    "Liam"        "Finance"        41

write_tsv(): The write_tsv() method from the readr package exports tab-separated values.

Syntax:

write_tsv(file, path)

Example:

# Importing the readr package
library(readr)

# Creating a data frame
student_data <- data.frame(
  "Name" = c("Alice", "Bob", "Charlie"),
  "Grade" = c("A", "B", "A+"),
  "Age" = c(20, 22, 21)
)

# Exporting the data frame using write_tsv()
write_tsv(student_data, path = "student_data.txt")

Output:

Name    Grade    Age
Alice   A        20
Bob     B        22
Charlie A+       21

Exporting Data to a CSV File

CSV files are widely used for storing tabular data. R provides multiple methods for exporting data to .csv files.

write.table(): The write.table() function can also export data to CSV files by specifying sep = ",".

Example:

# Creating a data frame
product_data <- data.frame(
  "Product" = c("Laptop", "Phone", "Tablet"),
  "Price" = c(1000, 500, 300),
  "Stock" = c(50, 200, 150)
)

# Exporting the data frame to a CSV file
write.table(product_data,
            file = "product_data.csv",
            sep = ",",
            row.names = FALSE)

Output:

Product,Price,Stock
Laptop,1000,50
Phone,500,200
Tablet,300,150

write.csv()

The write.csv() function simplifies exporting data to CSV files, using a comma as the default separator.

Example:

# Creating a data frame
city_data <- data.frame(
  "City" = c("New York", "Los Angeles", "Chicago"),
  "Population" = c(8419600, 3980400, 2716000),
  "Area" = c(468.9, 503, 227.3)
)

# Exporting the data frame to a CSV file
write.csv(city_data, file = "city_data.csv")

Output:

"","City","Population","Area"
"1","New York",8419600,468.9
"2","Los Angeles",3980400,503
"3","Chicago",2716000,227.3

write.csv2():The write.csv2() function is similar to write.csv() but uses a semicolon (;) as the separator and a comma for the decimal point.

Example:

# Creating a data frame
sales_data <- data.frame(
  "Month" = c("January", "February", "March"),
  "Sales" = c(15000.50, 17000.75, 16000.30)
)

# Exporting the data frame to a CSV file
write.csv2(sales_data, file = "sales_data.csv")

Output:

";""Month"";""Sales"
"1";"January";"15000,50"
"2";"February";"17000,75"
"3";"March";"16000,30"

write_csv(): The write_csv() method from the readr package exports data to CSV files.

Example:

# Importing the readr package
library(readr)

# Creating a data frame
book_data <- data.frame(
  "Title" = c("R for Data Science", "Python Crash Course", "The Art of R Programming"),
  "Author" = c("Hadley Wickham", "Eric Matthes", "Norman Matloff"),
  "Price" = c(35.99, 29.99, 45.00)
)

# Exporting the data frame using write_csv()
write_csv(book_data, path = "book_data.csv")

Output:

Title,Author,Price
R for Data Science,Hadley Wickham,35.99
Python Crash Course,Eric Matthes,29.99
The Art of R Programming,Norman Matloff,45.00

December 13, 2025

How To Import Data from a File in R Programming

Import Data from a File in detail

Data is a collection of facts and can exist in multiple formats. To analyze data using the R programming language, it first needs to be imported. R allows importing data from various file types such as text files, CSV, and other delimiter-separated files. Once imported, users can manipulate, analyze, and generate reports from the data.

Importing Data from Files into R

This guide demonstrates how to import different file formats into R programming.

Importing CSV Files

Method 1: Using read.csv()

The read.csv() function is a straightforward method for importing CSV files.

read.csv(file_path, header = TRUE, sep = ",")

Arguments:

file_path: The file’s location.
header: TRUE (default) to indicate column headings.
sep: The separator for values in each row (default is a comma ,).

Example:

# Specify file path
file_path <- "data.csv"

# Read the CSV file
content <- read.csv(file_path)

# Print file contents
print(content)

Output:

ID Name   Role Age
1  1  Alex  Dev  30
2  2  Sam   QA   25
3  3  Emma  HR   28

Method 2: Using read.table()

Another way to import CSV files is by using read.table().

# Import CSV using read.table()
data <- read.table("C://data//records.csv", header = TRUE, sep = ",")

# Print file contents
print(data)

Output:

Col1 Col2 Col3
1  101  A1   B1
2  202  A2   B2
3  303  A3   B3

Importing Data from a Text File

read.table() can also be used for importing text files.

Syntax:

read.table("file.txt", header = TRUE/FALSE)

Example:

# Read text file
data <- read.table("C://data//records.txt", header = FALSE)

# Print file contents
print(data)

Output:

V1  V2  V3
1 200  A1  B1
2 300  A2  B2
3 400  A3  B3

Importing Data from a Delimited File

The read.delim() function is used to import delimited files, where values are separated by specific symbols such as |, $, or ,.

Syntax:

read.delim("file.txt", sep="|", header=TRUE)

Example:

# Read a delimited file
data <- read.delim("C://data//info.txt", sep="|", header=TRUE)

# Print file contents
print(data)

Output:

$ID
[1] "101" "102" "103"
$Name
[1] "John" "Lily" "Raj"
$Salary
[1] "1500" "2000" "2500"

Importing XML Files

To import XML files, use the XML package.

XML File Sample:

<RECORDS>
  <EMPLOYEE>
    <ID>1</ID>
    <NAME>Adam</NAME>
    <SALARY>5000</SALARY>
  </EMPLOYEE>
  <EMPLOYEE>
    <ID>2</ID>
    <NAME>Sophia</NAME>
    <SALARY>6000</SALARY>
  </EMPLOYEE>
</RECORDS>

Example:

# Load XML package
library("XML")

# Parse XML file
data <- xmlParse(file = "C://data//employees.xml")

# Print parsed data
print(data)

Output:

1  Adam   5000
2  Sophia 6000

Importing SPSS Files

SPSS .sav files can be imported using the haven package.

Syntax:

read_sav("file.sav")

Example:

# Load haven package
library("haven")

# Read SPSS file
data <- read_sav("C://data//survey.sav")

# Print data
print(data)

Output:

ID   Age  Response  Score
1  1   23   Agree     4.5
2  2   30   Neutral   3.0
3  3   27   Disagree  2.5

December 13, 2025

Importing Data in R Script
Data Handling in detail

R offers several functions to import data from various file formats into your working environment. This guide demonstrates how to import data into R using different file formats.

Importing Data in R

To illustrate, we will use a sample dataset in two formats: .csv and .txt. Let’s dive into the methods for importing data.

Reading a CSV (Comma-Separated Values) File

Method 1: Using read.csv()

The read.csv() function is a simple way to import CSV files. It includes the following parameters:
- file.choose(): Opens a dialog box to select a CSV file.
- header: Indicates if the first row contains column names. Use TRUE if it does or FALSE otherwise.
Example:
```
# Import and store the dataset in data1
data1 <- read.csv(file.choose(), header = TRUE)

# Display the data
print(data1)
```
Output:
```
Name    Age Department
1 John    25   IT
2 Alice   30   HR
3 Robert  28   Finance
```
Method 2: Using read.table()

The read.table() function requires you to specify the delimiter using the sep parameter. For CSV files, use sep=",".

Example:
```
# Import and store the dataset in data2
data2 <- read.table(file.choose(), header = TRUE, sep = ",")

# Display the data
print(data2)
```
Output:
```
Name    Age Department
1 John    25   IT
2 Alice   30   HR
3 Robert  28   Finance
```
Reading a Tab-Delimited (.txt) File

Method 1: Using read.delim()

This function is specifically for tab-delimited files. It also has parameters like:
- file.choose(): Opens a file selection dialog.
- header: Indicates whether the first row contains column names.
Example:
```
# Import and store the dataset in data3
data3 <- read.delim(file.choose(), header = TRUE)

# Display the data
print(data3)
```
Output:
```
Product Price Quantity
1  Apples  100       50
2 Bananas   50      120
3 Oranges   75       80
```
Method 2: Using read.table()

For tab-delimited files, use sep="\t" to specify the delimiter.

Example:
```
# Import and store the dataset in data4
data4 <- read.table(file.choose(), header = TRUE, sep = "\t")

# Display the data
print(data4)
```
Output:
```
Product Price Quantity
1  Apples  100       50
2 Bananas   50      120
3 Oranges   75       80
```
Using RStudio to Import Data

You can also import data interactively using RStudio. Follow these steps:
1. In the Environment tab, click Import Dataset.
2. Choose the file format (CSV, Excel, etc.).
3. Browse your computer to select the file.
4. The data will appear in the RStudio Viewer. Type the dataset name in the console to display it.
Reading JSON Files in R

To work with JSON files, install the rjson package. This package allows you to:
- Load JSON files.
- Convert JSON data into data frames for analysis.
Install the Package:
```
install.packages("rjson")
```
Example JSON File (saved as example.json):
```
{
  "ID": ["101", "102", "103"],
  "Name": ["Alice", "Bob", "Charlie"],
  "Salary": ["5000", "6000", "5500"],
  "Department": ["IT", "HR", "Finance"]
}
```
Code to Read JSON:
```
# Load the rjson library
library(rjson)

# Provide the path to the JSON file
result <- fromJSON(file = "C:\\example.json")

# Print the result
print(result)
```
Output:
```
$ID
[1] "101" "102" "103"

$Name
[1] "Alice"   "Bob"     "Charlie"

$Salary
[1] "5000"  "6000"  "5500"

$Department
[1] "IT"      "HR"      "Finance"
```
Converting JSON to a Data Frame:
```
# Convert JSON to a data frame
data <- as.data.frame(result)
print(data)
```
Output:
```
ID    Name Salary Department
1    101   Alice   5000         IT
2    102     Bob   6000         HR
3    103 Charlie   5500    Finance
```
December 13, 2025
Data Handling in R Programming
Data Handling in detail

The R programming language is extensively used for statistical analysis and data visualization. Handling data involves importing and exporting files, and R simplifies this process by supporting various file types such as CSV, text files, Excel spreadsheets, SPSS, SAS, and more.

R provides several predefined functions to navigate and interact with system directories. These functions allow users to either retrieve the current directory path or change it as needed.

Directory Functions in R
- getwd(): Retrieves the current working directory.
- setwd(): Changes the working directory. The directory path is passed as an argument to this function.
Example:
```
# Change working directory
setwd("D:/RProjects/")

# Alternative way using double backslashes
setwd("D:\\RProjects\\")
```
- list.files(): Displays all files and folders in the current working directory.
```
fluidPage(…, title = NULL, theme = NULL)
```
Importing Files in R

Importing Text Files: Text files can be read into R using the read.table() function.

Syntax:
```
read.table(filename, header = FALSE, sep = "")
```
Parameters:
- header: Indicates whether the file contains a header row.
- sep: Specifies the delimiter used in the file.
For more details, use the command:
```
help("read.table")
```
Example:
Suppose the file “SampleText.txt” in the current working directory contains the following data:
```
101 X p
202 Y q
303 Z r
404 W s
505 V t
606 U u
```
Code:
```
# Get the current working directory
getwd()

# Read the text file into a data frame
data <- read.table("SampleText.txt", header = FALSE, sep = " ")

# Print the data frame
print(data)

# Print the class of the object
print(class(data))
```
Output:
```
[1] "D:/RProjects"
   V1 V2 V3
1 101  X  p
2 202  Y  q
3 303  Z  r
4 404  W  s
5 505  V  t
6 606  U  u
[1] "data.frame"
```
Importing CSV Files: CSV files can be imported using the read.csv() function.

Syntax:
```
read.csv(filename, header = FALSE, sep = "")
```
Parameters:
- header: Specifies if the file contains a header row.
- sep: Indicates the delimiter used.
For details, run:
```
help("read.csv")
```
Example:
Assume the file “SampleCSV.csv” contains the following data:
```
101,XA,pa
202,YB,qb
303,ZC,rc
404,WD,sd
505,VE,te
```
Code:
```
# Read the CSV file
data <- read.csv("SampleCSV.csv", header = FALSE)

# Print the data frame
print(data)

# Print the class of the object
print(class(data))
```
Output:
```
V1  V2  V3
1 101  XA  pa
2 202  YB  qb
3 303  ZC  rc
4 404  WD  sd
5 505  VE  te
[1] "data.frame"
```
Importing Excel Files: To read Excel files, install the openxlsx package and use the read.xlsx() function.

Syntax:
```
read.xlsx(filename, sheet = 1)
```
Parameters:
- sheet: Specifies the sheet name or index.
For help:
```
help("read.xlsx")
```
Example:
Suppose the Excel file “SampleExcel.xlsx” contains the following data:

A B C
1001 XYA xyz
2002 YZB yqw
3003 ZWC wuv

Code:
```
# Install and load the openxlsx package
install.packages("openxlsx")
library(openxlsx)

# Read the Excel file
data <- read.xlsx("SampleExcel.xlsx", sheet = 1)

# Print the data frame
print(data)

# Print the class of the object
print(class(data))
```
Output:
```
A    B   C
1 1001  XYA xyz
2 2002  YZB yqw
3 3003  ZWC wuv
[1] "data.frame"
```
Exporting Files in R

Redirecting Output with cat(): The cat() function outputs objects to the console or redirects them to a file.

Syntax:
```
cat(..., file)
```
Example:
```
# Redirect output to a file
cat("Greetings from R!", file = "OutputText.txt")
```
Output:
```
Greetings from R!
```
Redirecting Output with sink(): The sink() function captures output and redirects it to a file.

Syntax:
```
sink(filename)
...
sink()
```
Example:
```
# Redirect output to a file
sink("OutputSink.txt")

x <- c(2, 4, 6, 8, 12)
print(mean(x))
print(class(x))
print(max(x))

# End redirection
sink()
```
Output (file content):
```
[1] 6.4
[1] "numeric"
[1] 12
```
Writing CSV Files: The write.csv() function writes data to a CSV file.

Syntax:
```
write.csv(x, file)
```
Example:
```
# Create a data frame
df <- data.frame(A = c(11, 22, 33), B = c("X", "Y", "Z"), C = c(TRUE, FALSE, TRUE))

# Write the data frame to a CSV file
write.csv(df, file = "OutputCSV.csv", row.names = FALSE)
```
Output:
```
A,B,C
11,X,TRUE
22,Y,FALSE
33,Z,TRUE
```
December 13, 2025
Data Munging in R Programming
Data Munging in detail

Data Munging refers to the process of transforming raw or erroneous data into a clean and usable format. Without data munging—whether done manually by a user or through an automated system—the data is often unsuitable for downstream analysis or consumption. Essentially, data munging involves cleansing and reformatting data manually or using automated tools.

In R Programming, the following methods are commonly used for the data munging process:
- apply() Family
- aggregate()
- dplyr package
- plyr package
Using the apply() Family for Data Munging

The apply() function is one of the foundational functions in R for performing operations on matrices or arrays. Other functions in the same family include lapply(), sapply(), and tapply(). These functions often serve as an alternative to loops, providing a cleaner and more efficient approach to repetitive tasks.

The apply() function is particularly suited for operations on matrices or arrays with homogeneous elements. When applied to other data structures, such as data frames, the function first converts them into a matrix before processing.

Syntax:
```
apply(X, margin, function)
```
Parameters:
- X: An array or matrix.
- margin: A value (1 for rows, 2 for columns) indicating where to apply the function.
- function: The operation or function to perform.
Example:
```
# Example of apply()
matrix_data <- matrix(1:12,
                      nrow = 3,
                      ncol = 4)
matrix_data

result <- apply(matrix_data, 2, sum)
result
```
Output:
```
[,1] [,2] [,3] [,4]
[1,]    1    4    7   10
[2,]    2    5    8   11
[3,]    3    6    9   12

[1]  6 15 24 33
```
The lapply() Function: The lapply() function operates on lists and returns a list of the same length. Unlike apply(), it does not require a margin parameter. The “l” in lapply() signifies that the output is always a list.

Syntax:
```
lapply(X, func)
```
Parameters:
- X: A list, vector, or object.
- func: The function to apply.
Example:
```
# Example of lapply()
fruits <- c("APPLE", "BANANA", "CHERRY", "MANGO")
fruits

lowercase_fruits <- lapply(fruits, tolower)
lowercase_fruits
```
Output:
```
[1] "APPLE"   "BANANA"  "CHERRY"  "MANGO"

[[1]]
[1] "apple"

[[2]]
[1] "banana"

[[3]]
[1] "cherry"

[[4]]
[1] "mango"
```
The sapply() Function: The sapply() function works similarly to lapply(). However, it tries to simplify the output into a vector or matrix if possible.

Example:
```
# Example of sapply()
fruits <- c("APPLE", "BANANA", "CHERRY", "MANGO")

lowercase_fruits <- sapply(fruits, tolower)
lowercase_fruits
```
Output:
```
[1] "apple"  "banana" "cherry" "mango"
```
The tapply() Function: The tapply() function is used to perform an operation on subsets of data grouped by a factor. It is particularly useful for aggregating data.

Syntax:
```
tapply(X, index, func = NULL)
```
Parameters:
- X: A vector or object.
- index: A factor or list of factors for grouping.
- func: The function to apply.
Example:
```
# Example of tapply()
data(iris)

species_median <- tapply(iris$Sepal.Length,
                         iris$Species,
                         median)
species_median
```
Output:
```
setosa versicolor  virginica
5.0        5.9        6.5
```
Using aggregate() in R

To summarize data by grouping variables and applying a function (e.g., sum, mean).

Syntax:
```
aggregate(formula, data, function)
```
Parameters:
- formula: Specifies the variables for grouping.
- data: The dataset for aggregation.
- function: The operation to perform on the grouped data.
Example:
```
exposures <- aggregate(
  x = assets[c("counterparty.a", "counterparty.b", "counterparty.c")],
  by = assets[c("asset.class", "rating")],
  FUN = function(market.values) { sum(pmax(market.values, 0)) }
)
```
Using the plyr Package

A versatile package for splitting, applying functions, and combining data.

Key Functions:
- ddply(): Operates on data frames.
- llply(): Operates on lists.
Advantages:
- Simplifies operations with consistent syntax.
- Offers parallel computation and progress bars.
Example with ddply():
```
library(plyr)
ddply(dfx, .(group, sex), summarize,
      mean = round(mean(age), 2),
      sd = round(sd(age), 2))
```
Using the dplyr Package

Purpose: Provides a consistent grammar for data manipulation with verbs like arrange, filter, mutate, select, and summarize.

Advantages:
- Fast and efficient backend.
- Easy-to-read pipe (%>%) syntax.
Examples:
- Arrange rows:
```
starwars %>% arrange(desc(mass))
```
- Filter rows:
```
starwars %>% filter(species == "Droid")
```
- Mutate new variables:
```
starwars %>% mutate(bmi = mass / ((height / 100) ^ 2)) %>%
            select(name:mass, bmi)
```
- Summarize grouped data:
```
starwars %>% group_by(species) %>%
            summarize(n = n(), avg_mass = mean(mass, na.rm = TRUE)) %>%
            filter(n > 1)
```
Example:
```
library(dplyr)

# Group by gender, summarise, and filter
starwars %>%
  group_by(gender) %>%
  summarise(
    n = n(),
    avg_height = mean(height, na.rm = TRUE)
  ) %>%
  filter(n > 3)
```
Output:

Assuming the starwars dataset is unmodified:

gender n avg_height
male 60 178.41
female 16 165.56
December 13, 2025

A	B	C
1001	XYA	xyz
2002	YZB	yqw
3003	ZWC	wuv

gender	n	avg_height
male	60	178.41
female	16	165.56

Blog

Reading Tabular Data in detail

Common Functions for Importing Data into R

Reading Data from a Text File

Reading Data from a CSV File

Reading Data from a CSV File

Reading Data from a CSV File

Reading Data from a CSV File

Reading Data from a CSV File

Memory Considerations

Reading Data from a JSON File

Reading Excel Sheets

Working with JSON Files in detail

Common JSON Operations in R

Installing and Loading the rjson Package

Reading a JSON File in R

Writing Data to a JSON File in R

Converting JSON Data into a Dataframe

Working with JSON Data from a URL

Working with Excel Files in detail

Reading Excel Files in R

Deleting Content from Files

Writing Data to New Excel Files

Working with XML Files in detail

Creating an XML File

Reading an XML File in R

Extracting Information from an XML File

Converting XML to a Data Frame

Working with CSV files in detail

Understanding CSV Files in R

Managing the Working Directory in R

Sample CSV File for Input

Reading a CSV File in R

Filtering Data from a CSV File

Writing Data to a CSV File

Exporting Data in detail

Exporting Data to a Text File

Exporting Data to a CSV File

Import Data from a File in detail

Importing Data from Files into R

Importing Data from a Text File

Importing Data from a Delimited File

Importing XML Files

Importing SPSS Files

Data Handling in detail

Importing Data in R

Reading a CSV (Comma-Separated Values) File

Reading a Tab-Delimited (.txt) File

Using RStudio to Import Data

Reading JSON Files in R

Data Handling in detail

Directory Functions in R

Importing Files in R

Exporting Files in R

Data Munging in detail

Using the apply() Family for Data Munging

Using aggregate() in R

Using the plyr Package

Using the dplyr Package

Installing and Loading the `rjson` Package

Using the `apply()` Family for Data Munging

Using `aggregate()` in R

Using the `plyr` Package

Using the `dplyr` Package