Reading Tabular Data from files in R Programming

Reading Tabular Data in detail

In data analysis, it is often necessary to read and process data stored outside the R environment. Importing data into R is a crucial step in such cases. R supports multiple file formats, including CSV, JSON, Excel, Text, and XML. Most data is available in tabular format, and R provides functions to read this structured data into a data frame. Data frames are widely used in R because they facilitate data extraction from rows and columns, making statistical computations easier than with other data structures.

Common Functions for Importing Data into R

The most frequently used functions for reading tabular data into R are:

read.table()
read.csv()
fromJSON()
read.xlsx()

Reading Data from a Text File

The read.table() function is used to read tabular data from a text file.

Parameters:

file: Specifies the file name.
header: A logical flag indicating if the first line contains column names.
nrows: Specifies the number of rows to read.
skip: Skips a specified number of lines from the beginning.
colClasses: A character vector indicating the class of each column.
sep: A string that defines column separators (e.g., commas, spaces, tabs).

For small or moderately sized datasets, read.table() can be called without arguments. R automatically detects rows, columns, column classes, and skips lines starting with # (comments). Specifying arguments enhances efficiency, especially for large datasets.

Example:

Assume a text file data.txt in the current directory contains the following data:

Name Age Salary
John  28  50000
Emma  25  60000
Alex  30  70000

Reading the file in R:

read.table("data.txt", header=TRUE)

Output:

Name Age Salary
1  John  28 50000
2  Emma  25 60000
3  Alex  30 70000

Reading Data from a CSV File

The read.csv() function is used for reading CSV files, which are commonly generated by spreadsheet applications like Microsoft Excel. It is similar to read.table() but uses a comma as the default separator and assumes header=TRUE by default.

Example:

Assume a CSV file data.csv contains the following:

Name,Age,Salary
John,28,50000
Emma,25,60000
Alex,30,70000

Reading the file in R:

read.table("data.txt", header=TRUE)

Output:

Name Age Salary
1  John  28 50000
2  Emma  25 60000
3  Alex  30 70000

Reading Data from a CSV File

Example:

Assume a CSV file data.csv contains the following:

Name,Age,Salary
John,28,50000
Emma,25,60000
Alex,30,70000

Reading Data from a CSV File

Example:

Assume a CSV file data.csv contains the following:

Name,Age,Salary
John,28,50000
Emma,25,60000
Alex,30,70000

Reading Data from a CSV File

Example:

Assume a CSV file data.csv contains the following:

Name,Age,Salary
John,28,50000
Emma,25,60000
Alex,30,70000

Reading the file in R:

3  Alex  30 70000

Output:

Name Age Salary
1  John  28 50000
2  Emma  25 60000
3  Alex  30 70000

Reading Data from a CSV File

Example:

Assume a CSV file data.csv contains the following:

Name,Age,Salary
John,28,50000
Emma,25,60000
Alex,30,70000

Reading the file in R:

read.csv("data.csv")

Output:

Name Age Salary
1  John  28 50000
2  Emma  25 60000
3  Alex  30 70000

Memory Considerations

For large files, it is essential to estimate the memory required before loading data. The approximate memory needed for a dataset with 2,000,000 rows and 200 numeric columns can be calculated as:

2000000 x 200 x 8 bytes = 3.2 GB

Since R requires additional memory for processing, at least twice this amount (6.4 GB) should be available.

Reading Data from a JSON File

The fromJSON() function from the rjson package is used to import JSON data into R.

Installation:

install.packages("rjson")

Example:

Assume a JSON file data.json contains:

{
  "Name": ["John", "Emma", "Alex"],
  "Age": [28, 25, 30],
  "Salary": [50000, 60000, 70000]
}

Reading the JSON file in R:

library(rjson)
data <- fromJSON(file="data.json")
as.data.frame(data)

Reading Excel Sheets

The read.xlsx() function is used to import Excel worksheets into R. It requires the xlsx package.

Installation:

install.packages("xlsx")

Example:

Assume an Excel file data.xlsx with the following content:

Name	Age	Salary
John	28	50000
Emma	25	60000
Alex	30	70000

Reading the first sheet:

library("xlsx")
read.xlsx("data.xlsx", 1)

Output:

Name Age Salary
1  John  28 50000
2  Emma  25 60000
3  Alex  30 70000

For large datasets (over 100,000 cells), read.xlsx2() is preferred as it works faster by using the readColumns() function optimized for tabular data.

By using these functions, data can be efficiently imported into R for further processing and analysis.

Reading Tabular Data from files in R Programming

Reading Tabular Data in detail

Common Functions for Importing Data into R

Reading Data from a Text File

Reading Data from a CSV File

Reading Data from a CSV File

Reading Data from a CSV File

Reading Data from a CSV File

Reading Data from a CSV File

Memory Considerations

Reading Data from a JSON File

Reading Excel Sheets

Comments

Leave a Reply Cancel reply

More posts

Balancing CFA Level I and a Full-Time Job: A Practical Roadmap for Working Professionals

Best FRM Coaching Providers: A Detailed, Experience Based Comparison

Best CFA Coaching in India: Honest Review & Comparison of Top CFA Institutes

JavaScript Functions