Log Recycler Script
So when I wrote the article that introduced a script to generate mysql backup files for multiple databases I mentioned the trouble that will occur if you don’t get a handle on some means to retire old files.
This applies to log files, mysql backups, or just about any other type of file that is created on a recurring basis. You don’t need a error log from 134 days ago, but error logs for the past week could be very useful. So what do you do? Why recycle of course.
This article shares a simple shell script to purge any files older than X days, where X is of course a number allowing for flexibility. It is very simple to use a shell script to delete log files, or in this example sql backups.
The problem
Your server is being overrun with numerous files that just hang around long after they have served there useful life. These files may be small or large, but something about leaving unused files hanging around doesn’t feel right.
After just a few days of mysql backups I end up with a directory structure like this;
sql_dumps/
|-- edwardawebb.com
| |-- edwardawebb_wordpress_01-13-2009.sql.gz
| |-- edwardawebb_wordpress_01-14-2009.sql.gz
| |-- edwardawebb_wordpress_01-15-2009.sql.gz
| |-- edwardawebb_wordpress_01-16-2009.sql.gz
| |-- edwardawebb_wordpress_01-17-2009.sql.gz
| |-- edwardawebb_wordpress_01-18-2009.sql.gz
| `-- edwardawebb_wordpress_01-19-2009.sql.gz
|-- mantis.mainsite.org
| |-- mainsite_mantis_01-13-2009.sql.gz
| |-- mainsite_mantis_01-14-2009.sql.gz
| |-- mainsite_mantis_01-15-2009.sql.gz
| |-- mainsite_mantis_01-16-2009.sql.gz
| |-- mainsite_mantis_01-17-2009.sql.gz
| |-- mainsite_mantis_01-18-2009.sql.gz
| `-- mainsite_mantis_01-19-2009.sql.gz
`-- taskfreak.mainsite.org
|-- mainsite_taskfreak_01-11-2009
|-- mainsite_taskfreak_01-11-2009.sql.gz
|-- mainsite_taskfreak_01-12-2009.sql.gz
|-- mainsite_taskfreak_01-13-2009.sql.gz
|-- mainsite_taskfreak_01-14-2009.sql.gz
|-- mainsite_taskfreak_01-15-2009.sql.gz
|-- mainsite_taskfreak_01-16-2009.sql.gz
|-- mainsite_taskfreak_01-17-2009.sql.gz
|-- mainsite_taskfreak_01-18-2009.sql.gz
`-- mainsite_taskfreak_01-19-2009.sql.gz
Although 24 files may seem manageable, those who deal with log files and multiple sites know that this can quickly get out of hand.
The solution
We lazily create a shell script to run at weekly intervals to purge all those older files and send them off to the bit bucket.
Only files older than X days should be deleted, we’ll leave all the fresh and potentially needed logs/backups in place
This example assumes mysql logs with the .sql or .sql.gz extensions.
shell script to purge outdated files
#!/bin/bash #if you use this script you must attribute to me Eddie - Edwardawebb.com 1/14/09 #this script will run through all nested directories of a parent just killing off all matching files. ###### ### Set these values ###### ## default days to retain (override with .RETAIN_RULE in specific directory DEFRETAIN=60 #want to append the activity to a log? good idea, add its location here LOGFILE=`pwd`/Recycler.log # enter the distinguishing extension, or portion of the filename here (eg. log, txt, etc.) EXTENSION=sql #the absolute path of folder to begin purging #this is the top most file to begin the attack, all sub directories contain lowercase letters and periods are game. SQLDIR=$HOME/sql_dumps ##### ## End user configuartion ##### #this note will remind you that you have a log in case your getting emails form a cron job or something echo see $LOGFILE for details #jump to working directory cd $SQLDIR #if your sub-dirs have some crazy characters you may adjust this regex DIRS=`ls | grep ^[a-z.]*$` TODAY=`date` printf "\n\n********************************************\n\tSQL Recycler Log for:\n\t" | tee -a $LOGFILE echo $TODAY | tee -a $LOGFILE printf "********************************************\n" $TODAY | tee -a $LOGFILE for DIR in $DIRS do pushd $DIR >/dev/null HERE=`pwd` printf "\n\n%s\n" $HERE | tee -a $LOGFILE if [ -f .RETAIN_RULE ] then printf "\tdefault Retain period being overridden\n" | tee -a $LOGFILE read RETAIN < .RETAIN_RULE else RETAIN=$DEFRETAIN fi printf "\tpurging files older than %s days\n" ${RETAIN} | tee -a $LOGFILE OLDFILES=`find -mtime +${RETAIN} -regex .*${EXTENSION}.*` set -- $OLDFILES if [ -z $1 ] then printf "\tNo files matching purge criteria\n" | tee -a $LOGFILE else printf "\tSQL Files being Delete from $HERE\n" | tee -a $LOGFILE printf "\t\t%s\n" $OLDFILES | tee -a $LOGFILE fi rm -f $OLDFILES if [ $? -ne 0 ] then echo "Error while deleting last set" | tee -a $LOGFILE exit 2 else printf "\tSuccess\n" | tee -a $LOGFILE fi popd >/dev/null done
did you notice the bit about .RETAIN_RULE? good!
I added this after I realized that I don’t treat all my sites equally. For this very blog which is backed up daily I only need 3-4 days back max. But for other sites that I back up monthly I need to keep the default 60 days or 1-2 files.
So I set the default in the script to 60. But I allow it to be overwritten by adding a simple text file to any directory. If a file .RETAIN_RULE is present it will read the first line (and first line only!) for a new value, example;
$HOME/sql_dumps/dailysite.com/.RETAIN_RULE
5 #only keep files in this single directory around for 5 days
notice i comment after the actual data!
This means my actual directory structure including retain rules looks more like;
#tree -a sql_dumps
sql_dumps/
|-- edwardawebb.com
| |-- .RETAIN_RULE
| |-- edwardawebb_wordpress_01-13-2009.sql.gz
| |-- edwardawebb_wordpress_01-14-2009.sql.gz
| |-- edwardawebb_wordpress_01-15-2009.sql.gz
| |-- edwardawebb_wordpress_01-16-2009.sql.gz
| |-- edwardawebb_wordpress_01-17-2009.sql.gz
| |-- edwardawebb_wordpress_01-18-2009.sql.gz
| `-- edwardawebb_wordpress_01-19-2009.sql.gz
|-- mantis.mainsite.org
| |-- .RETAIN_RULE
| |-- mainsite_mantis_01-13-2009.sql.gz
| |-- mainsite_mantis_01-14-2009.sql.gz
| |-- mainsite_mantis_01-15-2009.sql.gz
| |-- mainsite_mantis_01-16-2009.sql.gz
| |-- mainsite_mantis_01-17-2009.sql.gz
| |-- mainsite_mantis_01-18-2009.sql.gz
| `-- mainsite_mantis_01-19-2009.sql.gz
`-- taskfreak.mainsite.org
|-- mainsite_taskfreak_01-11-2009
|-- mainsite_taskfreak_01-11-2009.sql.gz
|-- mainsite_taskfreak_01-12-2009.sql.gz
|-- mainsite_taskfreak_01-13-2009.sql.gz
|-- mainsite_taskfreak_01-14-2009.sql.gz
|-- mainsite_taskfreak_01-15-2009.sql.gz
|-- mainsite_taskfreak_01-16-2009.sql.gz
|-- mainsite_taskfreak_01-17-2009.sql.gz
|-- mainsite_taskfreak_01-18-2009.sql.gz
`-- mainsite_taskfreak_01-19-2009.sql.gz
The Result
So as the script walks through the structure above it prints a log to the effect of;
see /home/<USERNAME>/sql_dumps/Recycler.log for details ******************************************** SQL Recycler Log for: Sun Feb 8 00:00:07 PST 2009 ******************************************** /home/MYUSERNAME/sql_dumps/edwardawebb.com default Retain period being overridden purging files older than 4 days SQL Files being Delete from /home/masterkeedu/sql_dumps/edwardawebb.com ./edwardawebb_wordpress_01-28-2009.sql.gz ./edwardawebb_wordpress_02-03-2009.sql.gz ./edwardawebb_wordpress_01-29-2009.sql.gz ./edwardawebb_wordpress_02-02-2009.sql.gz ./edwardawebb_wordpress_01-31-2009.sql.gz ./edwardawebb_wordpress_01-30-2009.sql.gz ./edwardawebb_wordpress_02-01-2009.sql.gz Success /home/MYUSERNAME/sql_dumps/mantis.mainsite.org default Retain period being overridden purging files older than 4 days SQL Files being Delete from /home/masterkeedu/sql_dumps/mantis.mainsite.org ./webbmaster_mantis_01-30-2009.sql.gz ./webbmaster_mantis_01-31-2009.sql.gz ./webbmaster_mantis_02-01-2009.sql.gz ./webbmaster_mantis_01-27-2009.sql.gz ./webbmaster_mantis_01-29-2009.sql.gz ./webbmaster_mantis_02-02-2009.sql.gz ./webbmaster_mantis_01-28-2009.sql.gz Success /home/MYUSERNAME/sql_dumps/taskfreak.mainsite.org purging files older than 60 days No files matching purge criteria Success
As with any article I welcome feedback or questions!

Yes, hmm I concur. Very well said.
@CP
Hey friend, I’m sure you got great use from this post…can’t wait til CP.com is back up and running.
I have halted all work on CP.com to focus on my newly acquired company/website. Maybe you have heard of it; Google. Its a nice little start-up.
Wow! Your backup script has worked perfectly, so much so I had to come back to see if there was any suggestions for purging the backlog I don’t need, and here it is, also working perfectly with a minimum of config
Thanks!
@Tom
Well I wouldn’t want to leave you hanging with a bunch of useless back logs. Glad they both worked so well for you. Thanks for the feedback!
I’m back! Just to say $HOME/purge.sh fails in cron, fixed with fully qualified path, 30 GB lighter now
@Tom
The user variable can be tricky, and will not work if your using a web host or admin user to call scripts of another user.
But for crontabs edited within a user, the variable should work.
But resorting to fully qualified paths will *always* work.
Thanks for sharing your feedback!