Generating dynamic sitemaps with CakePHP 1.2

2008 July 20
by Eddie

Sitemaps are although not critical, have been accepted as a standard way to let engines and users find the content on your site.

You can generate those sitemaps on the fly in Cake, and show xml to engines, and formatted text to users.

Sitemaps that are generated dynamically are always up to date, which is critical in achieving those top search results.

How you may ask? Read on and I shall tell you.

You need to decide what content goes in the sitemap. Most would agree that things like pages, posts are good choices. Others way want to add user profiles or other various model records.

In this example I care about two models Info, which are like my static pages, and Post which are user posts.

Create the controller ( /app/controllers/sitemaps_controller.php)

<?php
class SitemapsController extends AppController{
 
	var $name = 'Sitemaps';
	var $uses = array('Post', 'Info');
	var $helpers = array('Time');
	var $components = array('RequestHandler');
 
	function index (){	
		//prevent xml validation errors caused by sql log
	    Configure::write('debug', 0);
		$this->Post->recursive=-1;
		$this->Info->recursive=-1;
		$this->set('posts', $this->Post->find('all', array( 'conditions' => array('is_published'=>1,'is_public'=>'1'), 'fields' => array('date_modified','id'))));
		$this->set('pages', $this->Info->find('all', array( 'conditions' => array('ispublished' => 1 ), 'fields' => array('date_modified','id','url'))));
	}
}
?>

Now rather then building our xml in the standard layout, well need a nice clean xml doctype layout instead.

Create the xml layout (/app/views/layouts/xml/default.ctp)

<?php header('Content-type: text/xml'); ?>
<?php echo $content_for_layout; ?>

Now that we have a nice clean xml layout, we can populate it using a cool sitemap view.

Create the sitemap view (/app/views/sitemaps/xml/index.ctp)

<urlset xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd" xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
	<url>
		<loc><?php echo Router::url('/',true); ?></loc>
		<changefreq>daily</changefreq>
		<priority>1.0</priority>
	</url>
	<!-- static pages -->	
	<?php foreach ($pages as $post):?>
	<url>
		<loc><?php echo Router::url('/info/'.$post['Info']['url'],true); ?></loc>
		<lastmod><?php echo $time->toAtom($post['Info']['date_modified']); ?></lastmod>
		<priority>0.8</priority>
	</url>
	<?php endforeach; ?>
	<!-- posts-->	
	<?php foreach ($posts as $post):?>
	<url>
		<loc><?php echo Router::url(array('controller'=>'posts','action'=>'view','id'=>$post['Post']['id']),true); ?></loc>
		<lastmod><?php echo $time->toAtom($post['Post']['date_modified']); ?></lastmod>
		<priority>0.8</priority>
	</url>
	<?php endforeach; ?>
</urlset>

You’ll notice the use of the Router class to give up the proper fully expanded domain. You can see my two model names ‘Info’ and ‘Post’ that were set in the controller.

Almost DOne!

We need to let Cake parse extensions like xml, and instead use them as part of our directory structure (hence both views belong to xml folders above) this turns urls like /sitemaps/index.xml into /views/sitemaps/xml/index.ctp and uses the appropriate layout based on extension as well, pretty cool huh?

(You’ll notice I also parse rss extension for my news feed, but thats another post.)

In /app/config/routes.php add;

Router::parseExtensions('rss','xml');

Your done, now if you want to class it up, add a better route than /sitemaps/index.xml

again, in /app/config/routes.php add;

Router::connect('/sitemap', array('controller' =>; 'sitemaps', 'action' =>; 'index'));


Now http://example.org/sitemap.xml will dynamically create the most up to date sitemap possible!
Go ahead and submit it to google.

All done, enjoy.

Wait, didn’t he promise a sitemap users could see too?

Your right, I did.

create a view /app/views/sitemaps/index.ctp (notice no xml folder here)

<h3>Site Pages</h3>
 
	<?php
		$i = 0;
		foreach ($posts as $post):
			$class = null;
			if ($i++ % 2 == 0) {
				$class = ' class="altrow"';
			}
	?>
			<div<?php echo $class;?>>
				<h4>
					<?php 
						echo $html->link($post['title'],'/posts/view/'.$post['id'],array('title'=>'Read more about '.$post['title']));
					?>
				</h4>
				<?php echo $date->regularize($post['modified']);?>
				<hr width="40%"/>
			</div>
		<?php endforeach; ?>

**Note, the regular index can be helpful for debugging as well.

Now hitting the url http://example.org/sitemap (no xml) will load whatever user friendly code you put into the file above. I only included the posts, to demonstrate use, The actual layout is up to you :)

Summary

My goal was to provide a instance that took advantage of Cake’s Router class and eliminated the need to statically code any urls.
Perks;

  • Works to serve multiple domain sites.Ex. if your site is hosted on example.com, and example.org, both sitemaps will have the proper urls even though they are physically the same code.
  • Can be reused across applications
  • If you serve multiple applications, the code can be used as part of the core shared by all those apps.

  • Never needs to be updated!
10 Responses leave one →
  1. 2008 July 20

    Does CakePHP provide a layer for database caching? I would think that having a dynamic sitemap for a site with thousands of articles could be rather inefficient without some caching mechanism or running this every so often via a cronjob, etc. Could open up a site to be brought down rather quickly with bots intentionally hitting it over and over.

  2. 2008 July 20
    admin permalink

    Great question Paris,. Yes, there is a wonderful cache layer, but still I thought thousands might be a choir.

    I dumped 1,638 articles into my local development machine, and the same amount on my live servers. each article has 6 kb blob of html and text etc. Even though we don’t care about the body, I thought best to be thorough.

    After the initial results I trimmed up the queries a bit (and update the post). Her are the final results.

    Locally;
    2 queries took 13 ms; Page 2.1 s

    Live;

    2 queries took 45 ms; Page 1.5 s

    For me to load an xml file containing the same amount of urls takes roughly the same time. (1.3 seconds)

  3. 2008 August 4

    You could also have an robots.txt file to let the SE spiders know about your sitemap automatically:

    robots.txt:

    User-agent: *
    Sitemap: http://www.yoursite.com/sitemap.xml

    that can be generated thru cakephp and routing too.

  4. 2008 August 4
    admin permalink

    @Matti
    Thanks, thats a great piece of advice for our readers.

  5. 2009 March 14

    what if one wants to make an xml of the form required by google for video site maps.. i try it by making an action video in the sitemaps_controller. but the exact form of xml is what i am unable to get due to its format

    ut i need to make videositemaps and the format is different as of the link sent by you. and the xml is of the format i sent you..
    i am unable to generate the type of

    and so on…
    like..

    2005-06-18

    Google Local and Google Maps
    How to use Google Local and Google Maps to
    find local information.

    Google.com
    Joe Smith
    Ads & Promotional
    News

    Can u help me out of this..
    how to make that kind of format..

  6. 2009 March 15
    Eddie permalink

    @Aman:

    Just update the controller action to pull back the fields you need (director, genre, etc) and then update the format of index.ctp to use the format specified by Google. This is a pretty straight forward task I feel..

  7. 2010 July 21

    I need to make a dynamic site map for a client who has a bunch of products and adds pages daily to his site. I followed the instructions above to a T but nothing happened when I go to domain.com/sitemap or domain.com/sitemap.xml. There are no files called that so I am a bit confused as to how to get this to work.

    Do I need to turn something on/off on the server itself? Does this require a certain mimetype?

    Eric

  8. 2010 July 21

    On a side note, can you post a zip of the entire thing? This will help me deduce where the problem is.

    eric

Trackbacks & Pingbacks

  1. links for 2008-07-27 « Richard@Home
  2. Generate RSS Feeds with CakePHP | Edward A. Webb (.com)

Leave a Reply

Note: You can use basic XHTML in your comments. Your email address will never be published.

Subscribe to this comment feed via RSS