Working with XML Files in detail
XML, short for Extensible Markup Language, is composed of markup tags where each tag represents specific data within an XML file. To manipulate XML files in R, we need to use the XML package, which must be installed explicitly using the following command:
install.packages("XML")
Creating an XML File
An XML file is structured using hierarchical tags that contain information about data. It must be saved with a .xml extension.
Consider the following XML file named students.xml:
<STUDENTS>
<STUDENT>
<ID>101</ID>
<NAME>Rahul</NAME>
<SCORE>750</SCORE>
<DEPARTMENT>Science</DEPARTMENT>
</STUDENT>
<STUDENT>
<ID>102</ID>
<NAME>Sneha</NAME>
<SCORE>540</SCORE>
<DEPARTMENT>Arts</DEPARTMENT>
</STUDENT>
<STUDENT>
<ID>103</ID>
<NAME>Amit</NAME>
<SCORE>680</SCORE>
<DEPARTMENT>Commerce</DEPARTMENT>
</STUDENT>
<STUDENT>
<ID>104</ID>
<NAME>Priya</NAME>
<SCORE>720</SCORE>
<DEPARTMENT>Science</DEPARTMENT>
</STUDENT>
<STUDENT>
<ID>105</ID>
<NAME>Varun</NAME>
<SCORE>590</SCORE>
<DEPARTMENT>Science</DEPARTMENT>
</STUDENT>
</STUDENTS>
Reading an XML File in R
After installing the required package, we can read and parse an XML file using the xmlParse() function. This function takes the filename as an argument and returns the content as a structured list.
# Load necessary libraries
library("XML")
library("methods")
# Parse the XML file
student_data <- xmlParse(file = "students.xml")
print(student_data)
Output:
101
Rahul
750
Science
102
Sneha
540
Arts
103
Amit
680
Commerce
104
Priya
720
Science
105
Varun
590
Science
Extracting Information from an XML File
Using R, we can extract specific details from the XML structure, such as the number of nodes, specific elements, or attributes.
# Load required libraries
library("XML")
library("methods")
# Parse the XML file
parsed_data <- xmlParse(file = "students.xml")
# Extract the root node
root_node <- xmlRoot(parsed_data)
# Count the number of nodes
total_nodes <- xmlSize(root_node)
# Retrieve a specific record (2nd student)
second_student <- root_node[2]
# Extract a particular attribute (Score of 4th student)
specific_score <- root_node[[4]][[3]]
cat('Total number of students:', total_nodes, '\n')
print('Details of the 2nd student:')
print(second_student)
print('Score of the 4th student:', specific_score)
Output:
Total number of students: 5
Details of the 2nd student:
$STUDENT
102
Sneha
540
Arts
Score of the 4th student: 720
Converting XML to a Data Frame
To improve readability and ease of analysis, XML data can be converted into a structured data frame using the xmlToDataFrame() function in R.
# Load required libraries
library("XML")
library("methods")
# Convert XML to a data frame
student_df <- xmlToDataFrame("students.xml")
print(student_df)
Output:
ID NAME SCORE DEPARTMENT
1 101 Rahul 750 Science
2 102 Sneha 540 Arts
3 103 Amit 680 Commerce
4 104 Priya 720 Science
5 105 Varun 590 Science
Leave a Reply