Delete old files in a directory with Python

Loading

After running Python scripts on a routine basis, log files can become unmanageable after some time. This is the situation that I ran into many incidences when I had server scripts routinely initiated using Cron tasks. Most of the scripts would dump their statuses into log files and if any errors are handled, the log file would be emailed to some unexpected administrator. After some time, the depositary directories would be filled with files.

Why so many files one would ask? Well, as a normal routine, the log files would be closed after so many hours and another would be created. This would be generally done on a daily basis. The reasoning is to keep the size of the file down to a manageable size.

With all of the files that need to be cleaned out, a script called sweep.py was created that would clean out any old items from a list of designated directories.

Here is the breakdown of the script and the full code.

import os
import time

Time to get the current datetime.

now = time.time()

Usually these variables would be in the shared configuration file that this script would have access.

number_of_days_to_keep = 7

Note: There are 86,400 seconds in a day. So multiply the amount of days for the sweep with the seconds in a day.

file_cut_off = now - (number_of_days_to_keep * 86400)

Here is where the path of the directory is inserted in the directory list. The script will loop through the values in the list, so separate them with commas. Note: don’t put the trailing forward slash in the path names. I worked with different variations that I would use the forward-slash but it always caused an error. I finally left out the slash and just inserted the slash via a chr function later in the directory routine.

directory_list = [
    "/home/mylogs",
    "/home/yourlogs"
]

for my_directories in directory_list:
    try:

Now is time to get all of the files within the targeted directory in the list.

        files = os.listdir(my_directories)
        
        for myFiles in files:

The character 92 is a forward slash. Since I don’t add the forward slash in the directory list, it is time to add it here.

            if os.path.isfile(my_directories + chr(92) + myFiles):

What is the creation DateTime of the file? We will use the stat function to get value. # See the documentation for the stat function here: stat

t = os.stat.ST_MTIME (my_directories + chr(92) + myFiles)

Now for the test to see if the file is older than the DateTime that we set as a cutoff. If so, we will invoke the remove function to delete the file.

if t.st_ctime < file_cut_off:
                    print("Deleting => " + my_directories + chr(92) + myFiles)
                    os.remove(my_directories + chr(92) + myFiles)

If the directory is not accessible, such as permissions or it’s an actual file, we will fault out with the exception. In the actual production code, I fault out here and write to a log file that would be sent to the administrators.   

except:
        print("Cannot access : " + my_directories)

The Code

# *************************************************************************
#
# Copyright (C) 2021 by Keith Foster
#
# Permission is hereby granted, free of charge, to any person obtaining
# a copy of this software and associated documentation files
# (the "Software"), to deal in the Software without restriction,
# including without l> imitation the rights to use, copy, modify,
# merge, publish, distribute, sublicense, and/or sell copies of the
# Software, and to permit persons to whom the Software is furnished
# to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included
# in all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
# EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
# OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFINGEMENT.
# IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM,
# DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
# OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR
# THE USE OR OTHER DEALINGS IN THE SOFTWARE.
#
# *************************************************************************
# sweep.py
#
# LAST MODIFICATION:       2021-1-7
# ORIGINATION DATE:        n/a
#
# *************************************************************************
#
import os
import time


now = time.time()

# Usually these variables would be in the shared configuration file that this
# script would have access.
number_of_days_to_keep = 7


file_cut_off = now - (number_of_days_to_keep * 86400)
directory_list = [
    "/home/mylogs",
    "/home/yourlogs"
]

for my_directories in directory_list:
    try:
        files = os.listdir(my_directories)
        for myFiles in files:
            if os.path.isfile(my_directories + chr(92) + myFiles):
                t = os.stat(my_directories + chr(92) + myFiles)
                if t.st_ctime < file_cut_off:
                    print("Deleting => " + my_directories + chr(92) + myFiles)
                    os.remove(my_directories + chr(92) + myFiles)
    except:
        print("Cannot access : " + my_directories)

Add Comment

Your email address will not be published. Required fields are marked *