In this post, let’s see how you can parse XML data using PHP.

By playing this video, you agree to YouTube's Terms

Watch on YouTube →

By playing this video, you agree to YouTube's Terms
Watch on YouTube →

In today’s world, XML format is commonly used for two purposes:

  • to display blog post feeds
  • and for website sitemaps

In order to show you how it works let me take the example of a blog post feed. So, here is the RSS feed page on one of my blogs, which is made using WordPress.

It contains the list of posts recently published on this blog. What I want to do is, I will take this document as an input, then parse it using PHP, and then display the posts on a separate web page. Basically, that’s how a feed reader application like Feedly works.

We will use cURL to fetch the feed from the remote URL:

Using cURL to fetch a remote XML feed

$url = "https://www.coralnodes.com/feed/";
$handle = curl_init();

We need to set a couple of options. The first one is the CURLOPT_URL option, which sets the curl URL to the URL we have defined above. Then set the CURLOPT_RETURNTRANSFER option to true, so that it returns the result as a string.

curl_setopt($handle, CURLOPT_URL, $url);
curl_setopt($handle, CURLOPT_RETURNTRANSFER, true);

Set the CURLOPT_FOLLOWLOCATION option as well to true so that the request follows any redirects to reach the final destination. For instance, HTTP to HTTPS redirect, non-www to www redirection, etc.

curl_setopt($handle, CURLOPT_FOLLOWLOCATION, true);

In our case we have set the exact URL. So there shouldn’t be any problem even if we omit this option. Then execute the curl request by calling the curl_exec() function and pass the handle variable. Finally let’s close the curl Connection by calling the curl_close() function.

$res = curl_exec($handle);
curl_close($handle);

Parsing using SimpleXMLElement

Next we want to parse this response. Fortunately PHP gives a built-in class called SimpleXMLElement to do that.

$feed = new SimpleXMLElement($res);

The class has a constructor method, which accepts an external string as a parameter.

In other words, what the SimpleXMLElement class does is, it converts the XML tags into PHP objects so that we can handle them using the methods defined in the class.

Let’s see how we can display the post title and description using these class methods. Close the PHP tag and then open the HTML tag.

$feed = new SimpleXMLElement($res);

?><!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta http-equiv="X-UA-Compatible" content="IE=edge">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>XML Parsing</title>
</head>
<body>
</body>
</html>

To get the post items we have to iterate the items array. So let’s open a foreach loop inside PHP tags:

<body>
    <?php foreach($feed->channel->item as $item) : ?>
        <article>
            <h2><?= $item->title ?></h2>
            <p><?= $item->description ?></p>
        </article> 
    <?php endforeach; ?>
</body>

Reload the page and we can see a list of blog posts with their title and description. Suppose I want to show the published date and the author’s name as well below each post.

<div>
    <?php
    $dt = new DateTime($item->pubDate);
    $pub_date = $dt->format('l, F d Y');
    ?>
    written by <?= $item->children('dc', true)->creator ?> on <?= $pub_date ?>
</div>

Creating XML

That’s how you can parse an XML document. Next let’s see how you can create an XML document using PHP.

For instance, to create an XML sitemap from the above feed data, you can do it like this:

<?php 

$url = "https://www.coralnodes.com/feed/";

$handle = curl_init();
curl_setopt($handle, CURLOPT_URL, $url);
curl_setopt($handle, CURLOPT_RETURNTRANSFER, true);
curl_setopt($handle, CURLOPT_FOLLOWLOCATION, true);

$response = curl_exec($handle);

curl_close($handle);

$feed = new SimpleXMLElement($response);

$sitemap = new SimpleXMLElement('<urlset></urlset>');

$sitemap->addAttribute("xmlns", "http://www.sitemaps.org/schemas/sitemap/0.9");

foreach($feed->channel->item as $item) {
    $url = $sitemap->addChild("url");
    $url->addChild("loc", $item->link);
    $url->addChild("changefreq", "monthly");
}

$saved_sitemap = $sitemap->asXML();
echo $saved_sitemap;
file_put_contents("sitemap.xml", $saved_sitemap);

Searching XML using Xpath

Suppose I want to find and display all the title tags in the above feed data:

<?php

$url = "https://www.coralnodes.com/feed/";

$handle = curl_init();
curl_setopt($handle, CURLOPT_URL, $url);
curl_setopt($handle, CURLOPT_RETURNTRANSFER, true);
curl_setopt($handle, CURLOPT_FOLLOWLOCATION, true);

$response = curl_exec($handle);

curl_close($handle);

$feed = new SimpleXMLElement($response);

$titles = $feed->xpath('/rss/channel/item/title');

foreach($titles as $title) {
    echo $title . "<br>";
}