image sitemap

Image Sitemap Generation with PHP

It's natural for us to want to squeeze every ounce of SEO juice out of our websites. Image Sitemaps are yet another channel to feed search engines our web content. You may have avoided them in the past, labeling them as too difficult to maintain. But, there are a few simple tricks that can put your site's Image Sitemap on auto-pilot for you. This is made even easier if you are using a content management tool to power your site, as all page and image information is stored in a database for you.

For this article, we're using PHP to generate the image sitemap. If you're relatively new to web development, you may be wondering how we can pull something like this off. Well, let's get started!

 

Maintaining the .xml file extension

So, how do we force PHP to handle the request, but still maintain that .xml file extension. Those using Apache as their web server will find the answer to be surprisingly simple.

With any website I develop, I let to set aside a dedicated "/xml" directory, where all XML files generated by PHP are placed. Then, within that directory, add an .htaccess file with just one line:

AddHandler application/x-httpd-php .xml

This will force Apache to send all .xml files within that directory over to PHP, as if they were .php files.

As long as you set the Content-type to application/xml (as seen in the first line of code below), web browser, Google or anyone else is none-the-wiser.

Depending on your Apache configuration, you may need to adjust your httpd.conf to allow .htaccess files to be processed. Look for the AllowOverride directive.

 

Typical Issues

  • <?xml is interpreted as PHP if your PHP.ini allows <? as a directive to start PHP processing.  Echo the opening <?xml directive with PHP to ensure that this script works across all server configurations.
  • Images may not be available for a given product or category. Add conditions to your SQL query to check for this.
  • If you don't have a web page URL for a given image, don't include it in the sitemap. For instance, if it's just a standalone image, not referenced on any particular page, Google will not take it. A <loc> child element is required for all <url> elements.

 

The PHP Code

<?php header('Content-type: application/xml'); echo '<?xml version="1.0" encoding="UTF-8"?>'; ?>
 <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
  xmlns:image="http://www.google.com/schemas/sitemap-image/1.1">

<?php

// get database connection
$conn = new mysqli("localhost", "myUser", "myPassword", "myDatabaseName");

// category pages
$query = "SELECT category_url, category_image_url, category_name
          FROM categories
          WHERE category_image_url IS NOT NULL AND category_image_url != '' ";

$result = $conn->query($query);

while($row = $result->fetch_assoc()) {
    echo sprintf("
    <url>
        <loc>%s</loc>
        <image:image>
            <image:loc>%s</image:loc>
            <image:title>%s</image:title>
        </image:image>
    </url>", $row['category_url'], $row['category_image_url'], $row['category_name']);
}

?>
</urlset>

 

Dealing with "unclean" data

Depending on your site administrator's technical knowledge, it's possible that the data being plugged into the Image Sitemap doesn't validate as XML. We can use a combination of SQL and PHP functions to "scrub" the data a bit.

Where the above example outputs category images, this example focuses on individual product images:

$query = "SELECT item_url, image_url,
            Replace(Replace(Replace(title, '&bull;', ''), '&', '&amp;'), '\"', '&quot;') as title,
            Replace(Replace(Replace(description, '&bull;', ''), '&', '&amp;'), '\"', '&quot;') as description
          FROM warehouse_item
          WHERE image_url != '' AND item_url != ''
            AND active = 1";

$result = $conn->query($query);

while($row = $result->fetch_assoc()) {
    echo sprintf("
    <url>
        <loc>%s</loc>
        <image:image>
            <image:loc>%s</image:loc>
            <image:title>%s</image:title>
            %s
        </image:image>
    </url>",
      $row['item_url'],
      (strpos($row['image_url'], 'http:') === false ? 'http://www.mysite.com' : '') . $row['image_url'],
      $row['title'],
      ($row['description'] != '' ? '<image:caption>' . strip_tags($row['description']) . '</image:caption>' : '') 
    );
}

Not only does the above code clean up certain entities such as removing &bull; (unsupported) and escaping ampersand and double quotes, but it also strips HTML tags from image captions.

 

What not to do

You'll notice that I didn't provide any code that deals directly with file system images. Such code might loop over all files from an image directory on your site, then output the appropriate XML for each image.

While possible, that method does not allow us to associate a page with an image. The Image Sitemap specification indicates that the URL that the image is display on is required. I think you'd be asking for trouble by simply using your site's home page as the URL for every image.

So, my recommendation is that if you don't have a way to associate the images with a URL, skip those images for now.

 

Submit It!

The final step is to submit your Image Sitemap to Google Webmaster Tools. Once you do, you'll notice that a new count will appear specifically for images. Be sure to choose the "Test Sitemap" option. There's no since in submitting a sitemap that doesn't validate.
 

add-sitemap

Check for errors, revise your code, repeat until Google is happy!

Why follow me on Twitter?

  • I tweet about new technologies, services or libraries I find interesting
  • Yeah, sometimes I'll post a pet-peeve or rant about something trivial
  • If I discover something that made my web development life easier, I share it
  • I'll shout out any handy tip that I think might be useful to other devs


Tagged , .

Updated: 2013-05-07

Phil LaNasa follow us in feedly