How to Parse XML using SAX/DOM?

Introduction

In the world of Java programming, working with structured data is crucial, especially when dealing with XML files. How to Parse XML using SAX/DOM? XML (Extensible Markup Language) is widely used in data storage, configuration files, and web services. Parsing XML efficiently is an essential skill for Java Full Stack Developers.

When working with Parse XML in Java, two primary parsing techniques are SAX (Simple API for XML) and DOM (Document Object Model). Each method has its use cases and benefits, making it important to understand when and how to use them effectively.

This blog will guide you through XML parsing using SAX and DOM, providing in-depth explanations, real-world applications, and hands-on examples. Whether you are a beginner or an advanced Java developer, mastering XML parsing will enhance your Java Full Stack Developer skills.

Understanding XML Parsing in Java

XML parsing refers to the process of reading Parse XML data and extracting meaningful information from it. In Java, we primarily use two approaches:

SAX (Simple API for XML) – An event-driven, sequential parsing method that is memory-efficient.
DOM (Document Object Model) – A tree-based parsing method that loads the entire XML structure into memory for easy manipulation.

Each approach has its advantages and limitations, which we will explore in detail.

DOM Parser:

DOM (Document Object Model) provides an interface to update the style, structure, and contents of an XML document. DOM should be used when you want to have knowledge about the structure of the XML document, move parts of an XML document, and want to use the information more than once. After parsing the XML document with the help of DOM parser, you get a tree-like structure that contains all the elements of a document.

DOM Interfaces:

Node: It is the base datatype of the document.
Element: The objects in the document are Elements.
Text: The content of an element.
Attr: Represents the attribute of an element.
Document: It refers to the entire Parse XML document.

DOM Methods:

Document.getDocumentElement() – It returns the root element of the document.
Node.getFirstChild() − It returns the first child of a given Node.
Node.getLastChild() − It returns the last child of a given Node.
Node.getNextSibling() − It returns the next sibling of a given Node.
Node.getPreviousSibling() − It returns the previous sibling of a given Node.
Node.getAttrib

When to Use DOM?

When working with smaller XML files.
When random access and data modification are required.
When the document structure needs to be retained in memory for multiple operations.

Steps to use DOM:

Import the XML-related packages:
- import org.w3c.dom.*;
- import javax.xml.parsers.*;
- import java.io.*;
Create a document builder:
- DocumentBuilderFactory factory =
- DocumentBuilderFactory.newInstance();
- DocumentBuilder builder = factory.newDocumentBuilder();
Create a document from a file:
- StringBuilder xmlStringBuilder = new StringBuilder();
- xmlStringBuilder.append(“<?xml version=”1.0″?> <class> </class>”);
- ByteArrayInputStream input = new ByteArrayInputStream( xmlStringBuilder.toString().getBytes(“UTF-8”));
- Document doc = builder.parse(input);
Extract the root element:
- Element root = document.getDocumentElement();
Examine the attributes:
- getAttribute(“attributeName”);
- getAttributes();
Examine the sub elements:
- getElementsByTagName(“subelementName”);
- getChildNodes();

Example:

XML File

<?xml version="1.0" encoding="UTF-8"?>
<stocks>
       <stock>
              <symbol>Citibank</symbol>
              <price>100</price>
              <quantity>1000</quantity>
       </stock>
       <stock>
              <symbol>Axis bank</symbol>
              <price>90</price>
              <quantity>2000</quantity>
       </stock>
</stocks>

JAVA File:

import java.io.File;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;

public class DOMExampleJava {

public static void main(String args[]) {
try {

File stocks = new File("Stocks.xml");
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse(stocks);
doc.getDocumentElement().normalize();

System.out.println("root of xml file" + doc.getDocumentElement().getNodeName());
NodeList nodes = doc.getElementsByTagName("stock");
System.out.println("==========================");

for (int k = 0; k < nodes.getLength(); k++) {
Node node = nodes.item(k);

if (node.getNodeType() == Node.ELEMENT_NODE) {
Element element = (Element) node;
System.out.println("Stock Symbol: " + getValue("symbol", element));
System.out.println("Stock Price: " + getValue("price", element));
System.out.println("Stock Quantity: " + getValue("quantity", element));
}
}
} catch (Exception ex) {
ex.printStackTrace();
}
}

private static String getValue(String tag, Element element) {
NodeList nodes = element.getElementsByTagName(tag).item(0).getChildNodes();
Node node = (Node) nodes.item(0);
return node.getNodeValue();
}
}

OUTPUT:

root of xml file stocks

==========================
Stock Symbol: Citibank
Stock Price: 100
Stock Quantity: 1000
Stock Symbol: Axis bank
Stock Price: 90
Stock Quantity: 2000

Key Takeaways from DOM Parsing

It loads the entire Parse XML document into memory.
It allows for easy manipulation and traversal of data.
It consumes more memory compared to SAX, making it inefficient for large files.

SAX Parser:

Understanding SAX Parser

SAX is an event-driven parser, which means it does not load the entire XML into memory. Instead, it reads the document sequentially and Triggers Events like startElement(), characters(), and endElement() to handle Parse XML nodes. This makes it a memory-efficient solution for large XML files.

Advantages of SAX Parser

Efficient Memory Usage: Suitable for parsing large Parse XML files.
Faster Execution: Reads Parse XML sequentially, making it faster than DOM.
Event-Driven Model: Processes Parse XML elements as they appear.

SAX (Simple API for Parse XML) is based on event-based triggers. It does not create a tree as we do in DOM. It is faster and takes less memory. You can use SAX Parser when you need to parse the document in a linear manner from top to bottom.

Example:

1. XML File:

<users>
    <user id="100">
        <firstname>Tom</firstname>
        <lastname>Hanks</lastname>
    </user>
    <user id="101">
        <firstname>Lokesh</firstname>
        <lastname>Gupta</lastname>
    </user>
    <user id="102">
        <firstname>HowToDo</firstname>
        <lastname>InJava</lastname>
    </user>
</users>

2. Model Class

public class User
{
    private int id;
    private String firstName;
    private String lastName;
 
    public int getId() {
        return id;
    }
    public void setId(int id) {
        this.id = id;
    }
    public String getFirstName() {
        return firstName;
    }
    public void setFirstName(String firstName) {
        this.firstName = firstName;
    }
    public String getLastName() {
        return lastName;
    }
    public void setLastName(String lastName) {
        this.lastName = lastName;
    }
 
    @Override
    public String toString() {
        return this.id + ":" + this.firstName +  ":" +this.lastName ;
    }
}

3. DOM Parser code in JAVA:

import java.util.ArrayList;
import java.util.Stack;
 
import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;
 
public class UserParserHandler extends DefaultHandler
{
    private ArrayList userList = new ArrayList();
 
    private Stack elementStack = new Stack();
 
    private Stack objectStack = new Stack();
 
    public void startDocument() throws SAXException
    {
    }
 
    public void endDocument() throws SAXException
    {
     }
 
    public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException
    {
        this.elementStack.push(qName);
 
   
        if ("user".equals(qName))
        {
            User user = new User();
 
            if(attributes != null &amp;&amp; attributes.getLength() == 1)
            {
                user.setId(Integer.parseInt(attributes.getValue(0)));
            }
            this.objectStack.push(user);
        }
    }
 
    public void endElement(String uri, String localName, String qName) throws SAXException
    {
        this.elementStack.pop();
 
        if ("user".equals(qName))
        {
            User object = this.objectStack.pop();
            this.userList.add(object);
        }
    }
 
    /**
     * This will be called everytime parser encounter a value node
     * */
    public void characters(char[] ch, int start, int length) throws SAXException
    {
        String value = new String(ch, start, length).trim();
 
        if (value.length() == 0)
        {
            return;
        }
 
        if ("firstName".equals(currentElement()))
        {
            User user = (User) this.objectStack.peek();
            user.setFirstName(value);
        }
        else if ("lastName".equals(currentElement()))
        {
            User user = (User) this.objectStack.peek();
            user.setLastName(value);
        }
    }
 
    
    private String currentElement()
    {
        return this.elementStack.peek();
    }
 
    public ArrayList getUsers()
    {
        return userList;
    }
}

4. SAX Parser to read XML file:

import java.io.IOException;
import java.io.InputStream;
import java.util.ArrayList;
 
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
import org.xml.sax.XMLReader;
import org.xml.sax.helpers.XMLReaderFactory;
 
public class UsersXmlParser
{
    public ArrayList parseXml(InputStream in)
    {
        ArrayList<user> users = new ArrayList</user><user>();
        try
        {
            UserParserHandler handler = new UserParserHandler();
 
            XMLReader parser = XMLReaderFactory.createXMLReader();
 
            parser.setContentHandler(handler);
 
            InputSource source = new InputSource(in);
 
            parser.parse(source);
 
            users = handler.getUsers();
 
        } catch (SAXException e) {
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();
        } finally {
 
        }
        return users;
    }
}

5. Test SAX Parser:

import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.util.ArrayList;
 
public class TestSaxParser
{
    public static void main(String arg[]) throws FileNotFoundException
    {
        File xmlFile = new File("D:/temp/sample.xml");
 
        UsersXmlParser parser = new UsersXmlParser();
 
        ArrayList users = parser.parseXml(new FileInputStream(xmlFile));
 
        System.out.println(users);
    }
}

Key Takeaways from SAX Parsing

It processes Parse XML data sequentially.
It is event-driven and memory-efficient.
It does not allow backward traversal or modification of data.

SAX vs. DOM: Which One Should You Choose?

Feature	SAX	DOM
Memory Usage	Low	High
Speed	Fast	Slower
Backward Traversal	Not Allowed	Allowed
Modification of Data	Not Possible	Possible
Best Use Case	Large XML files, Streaming Data	Small XML files, Data Manipulation

Choosing between SAX and DOM depends on your application requirements. If you are working with large Parse XML files and need fast processing, SAX is the preferred choice. However, if you need to modify or access data randomly, DOM is a better option.

Conclusion

XML parsing is an essential skill for any Java Full Stack Developer. Understanding SAX and DOM helps in making informed decisions when working with Parse XML data in Java applications. Whether you’re building a web service, processing configuration files, or handling structured data, mastering XML parsing will enhance your Java programming skills.

Ready to elevate your Java expertise? Enroll in H2K Infosys’ Java course today and gain hands-on experience in Java programming, Java Full Stack Development, and much more!