The following short PHP code will enable you to extract urls from a standard WordPress xml sitemap or any other website sitemap adhering to the sitemap schema.
# extract-urls.php
#
# Extract only URLS from a XML sitemap.
# Sitemap schema : https://www.sitemaps.org/protocol.html
if(count($argv) < 2) {
exit("Error: Invalid number of arguments. Specify an input XML file.");
}
$xml_filename = $argv[1];
if (file_exists($xml_filename))
{
$xml = simplexml_load_file($xml_filename);
if($xml->getName() == 'urlset')
{
$children = $xml->children();
foreach($children as $child)
{
if($child->getName() == 'url')
{
echo $child->loc . PHP_EOL;
}
}
}
} else {
exit('Failed to open XML file.');
}
You can then use it from the command-line. This will extract urls from the example ‘XML-Sitemap.xml’ file and pipe to a text file.
c:\tools>php extract-urls.php XML-Sitemap.xml > urls.txt