Log Recycler Script

2009 January 17
by Eddie

So when I wrote the article that introduced a script to generate mysql backup files for multiple databases I mentioned the trouble that will occur if you don’t get a handle on some means to retire old files.

This applies to log files, mysql backups, or just about any other type of file that is created on a recurring basis. You don’t need a error log from 134 days ago, but error logs for the past week could be very useful. So what do you do? Why recycle of course.

This article shares a simple shell script to purge any files older than X days, where X is of course a number allowing for flexibility. It is very simple to use a shell script to delete log files, or in this example sql backups.

The problem

Your server is being overrun with numerous files that just hang around long after they have served there useful life. These files may be small or large, but something about leaving unused files hanging around doesn’t feel right.

After just a few days of mysql backups I end up with a directory structure like this;

sql_dumps/
|-- edwardawebb.com
|   |-- edwardawebb_wordpress_01-13-2009.sql.gz
|   |-- edwardawebb_wordpress_01-14-2009.sql.gz
|   |-- edwardawebb_wordpress_01-15-2009.sql.gz
|   |-- edwardawebb_wordpress_01-16-2009.sql.gz
|   |-- edwardawebb_wordpress_01-17-2009.sql.gz
|   |-- edwardawebb_wordpress_01-18-2009.sql.gz
|   `-- edwardawebb_wordpress_01-19-2009.sql.gz
|-- mantis.mainsite.org
|   |-- mainsite_mantis_01-13-2009.sql.gz
|   |-- mainsite_mantis_01-14-2009.sql.gz
|   |-- mainsite_mantis_01-15-2009.sql.gz
|   |-- mainsite_mantis_01-16-2009.sql.gz
|   |-- mainsite_mantis_01-17-2009.sql.gz
|   |-- mainsite_mantis_01-18-2009.sql.gz
|   `-- mainsite_mantis_01-19-2009.sql.gz
`-- taskfreak.mainsite.org
    |-- mainsite_taskfreak_01-11-2009
    |-- mainsite_taskfreak_01-11-2009.sql.gz
    |-- mainsite_taskfreak_01-12-2009.sql.gz
    |-- mainsite_taskfreak_01-13-2009.sql.gz
    |-- mainsite_taskfreak_01-14-2009.sql.gz
    |-- mainsite_taskfreak_01-15-2009.sql.gz
    |-- mainsite_taskfreak_01-16-2009.sql.gz
    |-- mainsite_taskfreak_01-17-2009.sql.gz
    |-- mainsite_taskfreak_01-18-2009.sql.gz
    `-- mainsite_taskfreak_01-19-2009.sql.gz

Although 24 files may seem manageable, those who deal with log files and multiple sites know that this can quickly get out of hand.

The solution

We lazily create a shell script to run at weekly intervals to purge all those older files and send them off to the bit bucket.

Only files older than X days should be deleted, we’ll leave all the fresh and potentially needed logs/backups in place

This example assumes mysql logs with the .sql or .sql.gz extensions.

shell script to purge outdated files

#!/bin/bash
 
#if you use this script you must attribute to me Eddie - Edwardawebb.com 1/14/09
 
#this script will run through all nested directories of a parent just killing off all matching files.
 
######
### Set these values
######
 
## default days to retain (override with .RETAIN_RULE in specific directory
DEFRETAIN=60
 
#want to append the activity to a log? good idea, add its location here
LOGFILE=`pwd`/Recycler.log
 
# enter the distinguishing extension, or portion of the filename here (eg. log, txt, etc.)
EXTENSION=sql
 
 
#the absolute path of folder to begin purging
#this is the top most file to begin the attack, all sub directories contain lowercase letters and periods are game.
SQLDIR=$HOME/sql_dumps
 
#####
##   End user configuartion
#####
 
 
#this note will remind you that you have a log in case your getting emails form a cron job or something
echo see $LOGFILE for details
 
#jump to working directory
cd $SQLDIR
 
#if your sub-dirs have some crazy characters you may adjust this regex
DIRS=`ls | grep ^[a-z.]*$`
 
 
TODAY=`date`
 
printf "\n\n********************************************\n\tSQL Recycler Log for:\n\t" | tee -a $LOGFILE
echo $TODAY | tee -a $LOGFILE
printf "********************************************\n" $TODAY | tee -a $LOGFILE
 
for DIR in $DIRS 
do
	pushd $DIR >/dev/null
	HERE=`pwd`
	printf "\n\n%s\n" $HERE | tee -a $LOGFILE
	if [ -f .RETAIN_RULE ]
	then
		printf "\tdefault Retain period being overridden\n" | tee -a $LOGFILE
		read RETAIN < .RETAIN_RULE
	else
		RETAIN=$DEFRETAIN
	fi
 
	printf "\tpurging files older than %s days\n" ${RETAIN} | tee -a $LOGFILE
 
	OLDFILES=`find -mtime +${RETAIN} -regex .*${EXTENSION}.*`
 
	set -- $OLDFILES
 
	if [ -z $1 ]
	then
		printf "\tNo files matching purge criteria\n" | tee -a $LOGFILE
	else
		printf "\tSQL Files being Delete from $HERE\n" | tee -a $LOGFILE
		printf "\t\t%s\n" $OLDFILES  | tee -a $LOGFILE
	fi
 
 	rm -f $OLDFILES
	if [ $? -ne 0 ]
	then	
		echo "Error while deleting last set" | tee -a $LOGFILE
		exit 2
	else
		printf "\tSuccess\n" | tee -a $LOGFILE
	fi
	popd >/dev/null
done

did you notice the bit about .RETAIN_RULE? good!
I added this after I realized that I don’t treat all my sites equally. For this very blog which is backed up daily I only need 3-4 days back max. But for other sites that I back up monthly I need to keep the default 60 days or 1-2 files.

So I set the default in the script to 60. But I allow it to be overwritten by adding a simple text file to any directory. If a file .RETAIN_RULE is present it will read the first line (and first line only!) for a new value, example;

$HOME/sql_dumps/dailysite.com/.RETAIN_RULE

5
#only keep files in this single directory around for 5 days

notice i comment after the actual data!

This means my actual directory structure including retain rules looks more like;

#tree -a sql_dumps
sql_dumps/
|-- edwardawebb.com
|   |-- .RETAIN_RULE
|   |-- edwardawebb_wordpress_01-13-2009.sql.gz
|   |-- edwardawebb_wordpress_01-14-2009.sql.gz
|   |-- edwardawebb_wordpress_01-15-2009.sql.gz
|   |-- edwardawebb_wordpress_01-16-2009.sql.gz
|   |-- edwardawebb_wordpress_01-17-2009.sql.gz
|   |-- edwardawebb_wordpress_01-18-2009.sql.gz
|   `-- edwardawebb_wordpress_01-19-2009.sql.gz
|-- mantis.mainsite.org
|   |-- .RETAIN_RULE
|   |-- mainsite_mantis_01-13-2009.sql.gz
|   |-- mainsite_mantis_01-14-2009.sql.gz
|   |-- mainsite_mantis_01-15-2009.sql.gz
|   |-- mainsite_mantis_01-16-2009.sql.gz
|   |-- mainsite_mantis_01-17-2009.sql.gz
|   |-- mainsite_mantis_01-18-2009.sql.gz
|   `-- mainsite_mantis_01-19-2009.sql.gz
`-- taskfreak.mainsite.org
    |-- mainsite_taskfreak_01-11-2009
    |-- mainsite_taskfreak_01-11-2009.sql.gz
    |-- mainsite_taskfreak_01-12-2009.sql.gz
    |-- mainsite_taskfreak_01-13-2009.sql.gz
    |-- mainsite_taskfreak_01-14-2009.sql.gz
    |-- mainsite_taskfreak_01-15-2009.sql.gz
    |-- mainsite_taskfreak_01-16-2009.sql.gz
    |-- mainsite_taskfreak_01-17-2009.sql.gz
    |-- mainsite_taskfreak_01-18-2009.sql.gz
    `-- mainsite_taskfreak_01-19-2009.sql.gz

The Result

So as the script walks through the structure above it prints a log to the effect of;

see /home/<USERNAME>/sql_dumps/Recycler.log for details
 
 
********************************************
	SQL Recycler Log for:
	Sun Feb 8 00:00:07 PST 2009
********************************************
 
 
/home/MYUSERNAME/sql_dumps/edwardawebb.com
       	default Retain period being overridden
	purging files older than 4 days
	SQL Files being Delete from /home/masterkeedu/sql_dumps/edwardawebb.com
		./edwardawebb_wordpress_01-28-2009.sql.gz
		./edwardawebb_wordpress_02-03-2009.sql.gz
		./edwardawebb_wordpress_01-29-2009.sql.gz
		./edwardawebb_wordpress_02-02-2009.sql.gz
		./edwardawebb_wordpress_01-31-2009.sql.gz
		./edwardawebb_wordpress_01-30-2009.sql.gz
		./edwardawebb_wordpress_02-01-2009.sql.gz
	Success
 
 
/home/MYUSERNAME/sql_dumps/mantis.mainsite.org
	default Retain period being overridden
	purging files older than 4 days
	SQL Files being Delete from /home/masterkeedu/sql_dumps/mantis.mainsite.org
		./webbmaster_mantis_01-30-2009.sql.gz
		./webbmaster_mantis_01-31-2009.sql.gz
		./webbmaster_mantis_02-01-2009.sql.gz
		./webbmaster_mantis_01-27-2009.sql.gz
		./webbmaster_mantis_01-29-2009.sql.gz
		./webbmaster_mantis_02-02-2009.sql.gz
		./webbmaster_mantis_01-28-2009.sql.gz
	Success
 
 
/home/MYUSERNAME/sql_dumps/taskfreak.mainsite.org
        purging files older than 60 days
        No files matching purge criteria
        Success

As with any article I welcome feedback or questions!

9 Responses leave one →
  1. 2009 January 18

    Yes, hmm I concur. Very well said.

  2. 2009 January 18
    Eddie permalink

    @CP
    Hey friend, I’m sure you got great use from this post…can’t wait til CP.com is back up and running.

  3. 2009 January 19

    I have halted all work on CP.com to focus on my newly acquired company/website. Maybe you have heard of it; Google. Its a nice little start-up.

  4. 2009 May 18

    Wow! Your backup script has worked perfectly, so much so I had to come back to see if there was any suggestions for purging the backlog I don’t need, and here it is, also working perfectly with a minimum of config :) Thanks!

  5. 2009 May 18
    Eddie permalink

    @Tom
    Well I wouldn’t want to leave you hanging with a bunch of useless back logs. Glad they both worked so well for you. Thanks for the feedback!

  6. 2010 July 7

    I’m back! Just to say $HOME/purge.sh fails in cron, fixed with fully qualified path, 30 GB lighter now ;)

  7. 2010 July 7
    Eddie permalink

    @Tom

    The user variable can be tricky, and will not work if your using a web host or admin user to call scripts of another user.

    But for crontabs edited within a user, the variable should work.

    But resorting to fully qualified paths will *always* work.

    Thanks for sharing your feedback!

Trackbacks & Pingbacks

  1. Using crontab and shell scripts to automatically backup mysql databases | Edward A. Webb (.com)
  2. Backup all sub-directories with a Bash array loop | Edward A. Webb (.com)

Leave a Reply

Note: You can use basic XHTML in your comments. Your email address will never be published.

Subscribe to this comment feed via RSS