Level Ordering of Factors in detail
Factors are data objects used to categorize data and store it as levels. They can store both strings and integers. Factors represent columns with a limited number of unique values. In R, factors can be created using the factor() function, which takes a vector as input. The c() function is used to create a vector with explicitly provided values.
Example:
items <- c("Apple", "Banana", "Grapes", "Apple", "Grapes", "Grapes", "Banana", "Banana")
print(items)
print(is.factor(items))
# Convert to factor
type_items <- factor(items)
print(levels(type_items))
Parameters:
- x: A matrix, array, or data frame.
- na.rm: A logical argument. If set to
TRUE, it removes missing values (NA) before calculating the sum. Default isFALSE. - dims: An integer specifying the dimensions regarded as ‘rows’ to sum over. It applies summation over
dims+1, dims+2, ...
[1] "Apple" "Banana" "Grapes" "Apple" "Grapes" "Grapes" "Banana" "Banana"
[1] FALSE
[1] "Apple" "Banana" "Grapes"
Output:
[1] "Apple" "Banana" "Grapes" "Apple" "Grapes" "Grapes" "Banana" "Banana"
[1] FALSE
[1] "Apple" "Banana" "Grapes"
Here, items is a vector with 8 elements. It is converted to a factor using the factor() function. The unique elements in the data are called levels, which can be retrieved using the levels() function.
Ordering Factor Levels
Ordered factors are an extension of factors, arranging the levels in increasing order. This can be done using the factor() function with the ordered argument.
Syntax:
factor(data, levels = c(""), ordered = TRUE)
Parameters:
data: Input vector with explicitly defined values.
levels: List of levels mentioned using the c() function.
ordered: Set to TRUE to enable ordering.
Example:
# Creating size vector
sizes <- c("small", "large", "large", "small", "medium", "large", "medium", "medium")
# Converting to factor
size_factor <- factor(sizes)
print(size_factor)
# Ordering the levels
ordered_size <- factor(sizes, levels = c("small", "medium", "large"), ordered = TRUE)
print(ordered_size)
Output:
[1] "Apple" "Banana" "Grapes" "Apple" "Grapes" "Grapes" "Banana" "Banana"
[1] FALSE
[1] "Apple" "Banana" "Grapes"
Here, items is a vector with 8 elements. It is converted to a factor using the factor() function. The unique elements in the data are called levels, which can be retrieved using the levels() function.
Ordering Factor Levels
Ordered factors are an extension of factors, arranging the levels in increasing order. This can be done using the factor() function with the ordered argument.
Syntax:
factor(data, levels = c(""), ordered = TRUE)
Parameters:
- data: Input vector with explicitly defined values.
- levels: List of levels mentioned using the c() function.
- ordered: Set to TRUE to enable ordering.
Example:
# Creating size vector
sizes <- c("small", "large", "large", "small", "medium", "large", "medium", "medium")
# Converting to factor
size_factor <- factor(sizes)
print(size_factor)
# Ordering the levels
ordered_size <- factor(sizes, levels = c("small", "medium", "large"), ordered = TRUE)
print(ordered_size)
Output:
[1] small large large small medium large medium medium
Levels: large medium small
[1] small large large small medium large medium medium
Levels: small < medium < large
In this example, the sizes vector is created using the c() function. It is then converted to a factor, and for ordering the levels, the factor() function is used with the specified order.
Alternative Method Using ordered():
# Creating vector sizes
sizes <- c("small", "large", "large", "small", "medium")
size_ordered <- ordered(sizes, levels = c("small", "medium", "large"))
print(size_ordered)
Output:
[1] small large large small medium
Levels: small < medium < large
Level Ordering Visualization in R
This example creates a dataset of student ages categorized by education level (high school, college, and graduate). It then generates a boxplot to visualize the distribution of ages for each education level using pandas and matplotlib.
# Create a sample dataset of student grades
grade_data <- data.frame(
score = c(70, 85, 60, 95, 88, 76, 82, 91, 69, 79, 92, 84, 77, 83, 90),
class_level = factor(c(rep("freshman", 5), rep("sophomore", 4), rep("junior", 3), rep("senior", 3)))
)
# Specify level ordering for the "class_level" factor
grade_data$class_level <- factor(grade_data$class_level, levels = c("freshman", "sophomore", "junior", "senior"))
# Create a boxplot of grades by class level
boxplot(score ~ class_level, data = grade_data, main = "Student Grades by Class Level")
