Generating dynamic sitemaps with CakePHP 1.2
Sitemaps are although not critical, have been accepted as a standard way to let engines and users find the content on your site.
You can generate those sitemaps on the fly in Cake, and show xml to engines, and formatted text to users.
Sitemaps that are generated dynamically are always up to date, which is critical in achieving those top search results.
How you may ask? Read on and I shall tell you.
You need to decide what content goes in the sitemap. Most would agree that things like pages, posts are good choices. Others way want to add user profiles or other various model records.
In this example I care about two models Info, which are like my static pages, and Post which are user posts.
Create the controller ( /app/controllers/sitemaps_controller.php)
<?php class SitemapsController extends AppController{ var $name = 'Sitemaps'; var $uses = array('Post', 'Info'); var $helpers = array('Time'); var $components = array('RequestHandler'); function index (){ //prevent xml validation errors caused by sql log Configure::write('debug', 0); $this->Post->recursive=-1; $this->Info->recursive=-1; $this->set('posts', $this->Post->find('all', array( 'conditions' => array('is_published'=>1,'is_public'=>'1'), 'fields' => array('date_modified','id')))); $this->set('pages', $this->Info->find('all', array( 'conditions' => array('ispublished' => 1 ), 'fields' => array('date_modified','id','url')))); } } ?>
Now rather then building our xml in the standard layout, well need a nice clean xml doctype layout instead.
Create the xml layout (/app/views/layouts/xml/default.ctp)
<?php header('Content-type: text/xml'); ?> <?php echo $content_for_layout; ?>
Now that we have a nice clean xml layout, we can populate it using a cool sitemap view.
Create the sitemap view (/app/views/sitemaps/xml/index.ctp)
<urlset xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd" xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> <url> <loc><?php echo Router::url('/',true); ?></loc> <changefreq>daily</changefreq> <priority>1.0</priority> </url> <!-- static pages --> <?php foreach ($pages as $post):?> <url> <loc><?php echo Router::url('/info/'.$post['Info']['url'],true); ?></loc> <lastmod><?php echo $time->toAtom($post['Info']['date_modified']); ?></lastmod> <priority>0.8</priority> </url> <?php endforeach; ?> <!-- posts--> <?php foreach ($posts as $post):?> <url> <loc><?php echo Router::url(array('controller'=>'posts','action'=>'view','id'=>$post['Post']['id']),true); ?></loc> <lastmod><?php echo $time->toAtom($post['Post']['date_modified']); ?></lastmod> <priority>0.8</priority> </url> <?php endforeach; ?> </urlset>
You’ll notice the use of the Router class to give up the proper fully expanded domain. You can see my two model names ‘Info’ and ‘Post’ that were set in the controller.
Almost DOne!
We need to let Cake parse extensions like xml, and instead use them as part of our directory structure (hence both views belong to xml folders above) this turns urls like /sitemaps/index.xml into /views/sitemaps/xml/index.ctp and uses the appropriate layout based on extension as well, pretty cool huh?
(You’ll notice I also parse rss extension for my news feed, but thats another post.)
In /app/config/routes.php add;
Router::parseExtensions('rss','xml');
Your done, now if you want to class it up, add a better route than /sitemaps/index.xml
again, in /app/config/routes.php add;
Router::connect('/sitemap', array('controller' =>; 'sitemaps', 'action' =>; 'index'));
Now http://example.org/sitemap.xml will dynamically create the most up to date sitemap possible! Go ahead and submit it to google.
All done, enjoy.
Wait, didn’t he promise a sitemap users could see too?
Your right, I did.
create a view /app/views/sitemaps/index.ctp (notice no xml folder here)
<h3>Site Pages</h3> <?php $i = 0; foreach ($posts as $post): $class = null; if ($i++ % 2 == 0) { $class = ' class="altrow"'; } ?> <div<?php echo $class;?>> <h4> <?php echo $html->link($post['title'],'/posts/view/'.$post['id'],array('title'=>'Read more about '.$post['title'])); ?> </h4> <?php echo $date->regularize($post['modified']);?> <hr width="40%"/> </div> <?php endforeach; ?>
**Note, the regular index can be helpful for debugging as well.
Now hitting the url http://example.org/sitemap (no xml) will load whatever user friendly code you put into the file above. I only included the posts, to demonstrate use, The actual layout is up to you
Summary
My goal was to provide a instance that took advantage of Cake’s Router class and eliminated the need to statically code any urls.
Perks;
- Works to serve multiple domain sites.Ex. if your site is hosted on example.com, and example.org, both sitemaps will have the proper urls even though they are physically the same code.
- Can be reused across applications
- Never needs to be updated!
If you serve multiple applications, the code can be used as part of the core shared by all those apps.
Does CakePHP provide a layer for database caching? I would think that having a dynamic sitemap for a site with thousands of articles could be rather inefficient without some caching mechanism or running this every so often via a cronjob, etc. Could open up a site to be brought down rather quickly with bots intentionally hitting it over and over.
Great question Paris,. Yes, there is a wonderful cache layer, but still I thought thousands might be a choir.
I dumped 1,638 articles into my local development machine, and the same amount on my live servers. each article has 6 kb blob of html and text etc. Even though we don’t care about the body, I thought best to be thorough.
After the initial results I trimmed up the queries a bit (and update the post). Her are the final results.
Locally;
2 queries took 13 ms; Page 2.1 sLive;
2 queries took 45 ms; Page 1.5 sFor me to load an xml file containing the same amount of urls takes roughly the same time. (1.3 seconds)
You could also have an robots.txt file to let the SE spiders know about your sitemap automatically:
robots.txt:
User-agent: *
Sitemap: http://www.yoursite.com/sitemap.xml
that can be generated thru cakephp and routing too.
@Matti
Thanks, thats a great piece of advice for our readers.
what if one wants to make an xml of the form required by google for video site maps.. i try it by making an action video in the sitemaps_controller. but the exact form of xml is what i am unable to get due to its format
ut i need to make videositemaps and the format is different as of the link sent by you. and the xml is of the format i sent you..
i am unable to generate the type of
and so on…
like..
2005-06-18
Google Local and Google Maps
How to use Google Local and Google Maps to
find local information.
Google.com
Joe Smith
Ads & Promotional
News
Can u help me out of this..
how to make that kind of format..
@Aman:
Just update the controller action to pull back the fields you need (director, genre, etc) and then update the format of index.ctp to use the format specified by Google. This is a pretty straight forward task I feel..
I need to make a dynamic site map for a client who has a bunch of products and adds pages daily to his site. I followed the instructions above to a T but nothing happened when I go to domain.com/sitemap or domain.com/sitemap.xml. There are no files called that so I am a bit confused as to how to get this to work.
Do I need to turn something on/off on the server itself? Does this require a certain mimetype?
Eric
On a side note, can you post a zip of the entire thing? This will help me deduce where the problem is.
eric
Great thanks, Eddie!
@Eric
Sorry for the delay. This may be my worse lapse ever..
The code is invoked due to the CakePHP routes in config that associate xml extensions and the specific path to the right layouts, views and controller.
/app/views/layouts/xml/default.ctp for instance is what sets the header and calls the content. Cake will invoke that template based on the controller.
At the time of this writing, the two lines in /app/config/routes.php were all that was needed to make that magic happen. But I know Cake under went a pretty large refactoring since then that may have changed the way extensions and routes are handled.
The CakePHP Manual would be the best place to turn.
As for zipping the entire project – I constructed this under the employment of a company, whom therefore are the legal owners of the application and pieces. What I provide here is only the critical ideas needed to reconstruct a solution specific to your site.
hi,
can u any body tell about. sitemap database table structure and fields
@karthick
There is not a specific table for the sitemap .instead you specify the existing table for your pages, posts, our whatever else you want included. Start with your content, add the sitemap after.
Hey! I’m doing what you said here and having problems in this line:
$this->set(‘trains’, $this->AboutTrain->find(‘all’, ‘fields’ => array(‘date_modified’,'id’)));
(I have a lista of trains, table called about_trains)
It keeps telling me: Parse error in that line.. what Am I missing?
Thanks!
That line looks ok, but PHP validation is not always clear, check the line before this line for missing semi colon, comment out this line to ensure it is actually this line. Next try breaking it into two commands (query the db as one, then set trains as the second)