Using SA-Learn to improve spam filtering in cPanel

More information available at the following link
https://forums.cpanel.net/resources/how-to-train-spamassassin-with-sa-learn.623/

Training on Spam

Train Spam from Junk email directories for email account. Replace USER with the domain admin username, DOMAIN.TLD with domain name, and ACCOUNT with the email address.

/usr/local/cpanel/3rdparty/bin/sa-learn -p /home/USER/.spamassassin/user_prefs --spam /home/USER/mail/DOMAIN.TLD/.ACCOUNT@DOMAIN.TLD/.Junk/{cur,new}

Use read emails in inbox as Ham

You can use the following script to feed sa-learn ham. The script looks at all the read messages for the current year in the default inbox and then feeds them individually to sa-learn

cd /home/USER/mail/
for emailHam in `ls -lt --time-style=long-iso {cur/,new/} | grep $(date | awk '{print $6}') | grep "2,S" | awk '{print $8}'`
do
/usr/local/cpanel/3rdparty/bin/sa-learn -p /home/${cpanelUser}/.spamassassin/user_prefs --ham ${mailbox}/{cur,new}/${emailHam}
done

Script to automate the process

You can use the following script to automatically train sa-learn. Create the script and then use Crontab to launch it.

Script

Create a file named sa-learn.sh and add the following contents to it.

#!/bin/bash

# Notes on cpanel mail
# - /home/cpanel_user/mail <- Default mail directory, all the email accounts are located in the domain.com directory, although there are hidden files in here that point to that.
# - the default catch all is in ..../mail

dateYear=`date +%Y`

echo "Starting Training"
for mailbox in `cat mailboxes.txt`; do
        cd ${mailbox}
        echo "training on Ham" for ${mailbox}
        cpanelUser=`echo ${mailbox} | cut -d\/ -f3`
        # Check Spam
        echo "Trainging on Spam, SPAM, spam, junk, Junk Email, and Junk folders"
        /usr/local/cpanel/3rdparty/bin/sa-learn -p /home/${cpanelUser}/.spamassassin/user_prefs --spam ${mailbox}/{".Junk Email"/{new/,cur/},.Junk/{new/,cur/},.junk/{new/,cur/},.spam/{new/,cur/},.Spam/{new/,cur/},.SPAM/{new/,cur/}}
        cd
        # Gets a list of seen messages for the current year to use as Ham
        for emailHam in `ls -lt --time-style=long-iso {cur/,new/} |  grep $(date | awk '{print $6}') | grep "2,S" | awk '{print $8}'`
        do
            /usr/local/cpanel/3rdparty/bin/sa-learn -p /home/${cpanelUser}/.spamassassin/user_prefs --ham ${mailbox}/{cur,new}/${emailHam}
        done
done

Create text file to hold mailbox paths

You’ll need to create a file called mailboxes.txt and put the email paths for the email accounts you want to run sa-learn against. The following is an example of what the file should look like.

/home/incredigeek/mail/.bob@incredigeek_com/
/home/incredigeek/mail/.larry@incredigeek_com/
/home/incredigeek/mail/.steve@incredigeek_com/
/home/incredigeek/mail/.admin@incredigeek_com/

Create Crontab

Add script to cron by running

crontab -e

and paste in the following to launch the script every day at 1AM

0 1 * * * /root/sa-learn.sh train && echo "training run at $(date)" >> /root/email_report.log

Save and you should be ready to go.