Working with XML Files in R Programming

Working with XML Files in detail

XML, short for Extensible Markup Language, is composed of markup tags where each tag represents specific data within an XML file. To manipulate XML files in R, we need to use the XML package, which must be installed explicitly using the following command:

install.packages("XML")
Creating an XML File

An XML file is structured using hierarchical tags that contain information about data. It must be saved with a .xml extension.

Consider the following XML file named students.xml:

<STUDENTS>
  <STUDENT>
      <ID>101</ID>
      <NAME>Rahul</NAME>
      <SCORE>750</SCORE>
      <DEPARTMENT>Science</DEPARTMENT>
  </STUDENT>
  <STUDENT>
      <ID>102</ID>
      <NAME>Sneha</NAME>
      <SCORE>540</SCORE>
      <DEPARTMENT>Arts</DEPARTMENT>
  </STUDENT>
  <STUDENT>
      <ID>103</ID>
      <NAME>Amit</NAME>
      <SCORE>680</SCORE>
      <DEPARTMENT>Commerce</DEPARTMENT>
  </STUDENT>
  <STUDENT>
      <ID>104</ID>
      <NAME>Priya</NAME>
      <SCORE>720</SCORE>
      <DEPARTMENT>Science</DEPARTMENT>
  </STUDENT>
  <STUDENT>
      <ID>105</ID>
      <NAME>Varun</NAME>
      <SCORE>590</SCORE>
      <DEPARTMENT>Science</DEPARTMENT>
  </STUDENT>
</STUDENTS>
Reading an XML File in R

After installing the required package, we can read and parse an XML file using the xmlParse() function. This function takes the filename as an argument and returns the content as a structured list.

# Load necessary libraries
library("XML")
library("methods")

# Parse the XML file
student_data <- xmlParse(file = "students.xml")

print(student_data)

Output:

101
Rahul
750
Science
102
Sneha
540
Arts
103
Amit
680
Commerce
104
Priya
720
Science
105
Varun
590
Science
Extracting Information from an XML File

Using R, we can extract specific details from the XML structure, such as the number of nodes, specific elements, or attributes.

# Load required libraries
library("XML")
library("methods")

# Parse the XML file
parsed_data <- xmlParse(file = "students.xml")

# Extract the root node
root_node <- xmlRoot(parsed_data)

# Count the number of nodes
total_nodes <- xmlSize(root_node)

# Retrieve a specific record (2nd student)
second_student <- root_node[2]

# Extract a particular attribute (Score of 4th student)
specific_score <- root_node[[4]][[3]]

cat('Total number of students:', total_nodes, '\n')
print('Details of the 2nd student:')
print(second_student)

print('Score of the 4th student:', specific_score)

Output:

Total number of students: 5
Details of the 2nd student:
$STUDENT
    102
    Sneha
    540
    Arts

Score of the 4th student: 720
Converting XML to a Data Frame

To improve readability and ease of analysis, XML data can be converted into a structured data frame using the xmlToDataFrame() function in R.

# Load required libraries
library("XML")
library("methods")

# Convert XML to a data frame
student_df <- xmlToDataFrame("students.xml")
print(student_df)

Output:

ID    NAME   SCORE   DEPARTMENT
1   101   Rahul   750   Science
2   102   Sneha   540   Arts
3   103   Amit    680   Commerce
4   104   Priya   720   Science
5   105   Varun   590   Science

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *