Working with CSV files in detail
In this article, we will explore how to handle CSV files in the R programming language.
Understanding CSV Files in R
CSV (Comma-Separated Values) files are plain text files where data is stored in tabular form, with values in each row separated by a delimiter such as a comma or tab. We will use a sample CSV file for demonstration purposes.
Managing the Working Directory in R
Before working with a CSV file, it is essential to check and set the working directory where the file is located.
# Display the current working directory
print(getwd())
# Change the working directory
setwd("/data/analysis")
# Confirm the new working directory
print(getwd())
Output:
[1] "C:/Users/DataScience/Documents"
[1] "C:/Users/DataScience/Documents"
Using the getwd() function, we can retrieve the current working directory, and with setwd(), we can modify it as needed.
Sample CSV File for Input
id, name, department, salary, projects
1, Alex, IT, 75000, 4
2, Brian, HR, 67000, 3
3, Clara, Marketing, 72000, 5
4, Daniel, Sales, 58000, 2
5, Emma, Tech, 65000, 3
6, Frank, IT, 70000, 6
7, Grace, HR, 69000, 4
Save this file as employees.csv to use it in R.
Reading a CSV File in R
The read.csv() function allows us to read the contents of a CSV file into a data frame.
Example:
# Load the CSV file as a data frame
csv_data <- read.csv(file = 'C:\\Users\\DataScience\\Documents\\employees.csv')
print(csv_data)
# Display the number of columns
print(ncol(csv_data))
# Display the number of rows
print(nrow(csv_data))
Output:
id name department salary projects
1 1 Alex IT 75000 4
2 2 Brian HR 67000 3
3 3 Clara Marketing 72000 5
4 4 Daniel Sales 58000 2
5 5 Emma Tech 65000 3
6 6 Frank IT 70000 6
7 7 Grace HR 69000 4
[1] 5
[1] 7
The read.csv() function reads the file and stores it as a data frame in R. The ncol() and nrow() functions return the number of columns and rows, respectively.
Filtering Data from a CSV File
We can perform queries on the data using functions like subset() and logical conditions.
Finding the Minimum Value
# Find the minimum number of projects
min_projects <- min(csv_data$projects)
print(min_projects)
Output:
2
Filtering Employees with Salary Above 65000
# Select 'name' and 'salary' columns for employees with salary greater than 65000
result <- csv_data[csv_data$salary > 65000, c("name", "salary")]
# Display the filtered result
print(result)
Output:
name salary
1 Alex 75000
2 Brian 67000
3 Clara 72000
7 Grace 69000
The subset of employees meeting the condition is stored as a new data frame.
Writing Data to a CSV File
R allows exporting data frames to CSV files using write.csv().
# Calculate the average salary for each department
avg_salary <- tapply(csv_data$salary, csv_data$department, mean)
# Display the results
print(avg_salary)
Output:
HR IT Marketing Sales Tech
68000.0 72500.0 72000.0 58000.0 65000.0
Leave a Reply