{"id":3539,"date":"2020-09-23T15:27:51","date_gmt":"2020-09-23T20:27:51","guid":{"rendered":"http:\/\/www.incredigeek.com\/home\/?p=3539"},"modified":"2020-09-24T16:18:11","modified_gmt":"2020-09-24T21:18:11","slug":"using-sa-learn-to-improve-spam-filtering-in-cpanel","status":"publish","type":"post","link":"https:\/\/www.incredigeek.com\/home\/using-sa-learn-to-improve-spam-filtering-in-cpanel\/","title":{"rendered":"Using SA-Learn to improve spam filtering in cPanel"},"content":{"rendered":"\n<p>More information available at the following link<br><a href=\"https:\/\/forums.cpanel.net\/resources\/how-to-train-spamassassin-with-sa-learn.623\/\">https:\/\/forums.cpanel.net\/resources\/how-to-train-spamassassin-with-sa-learn.623\/<\/a><\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Training on Spam<\/h2>\n\n\n\n<p>Train Spam from Junk email directories for email account.  Replace USER with the domain admin username, DOMAIN.TLD with domain name, and ACCOUNT with the email address.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">\/usr\/local\/cpanel\/3rdparty\/bin\/sa-learn -p \/home\/USER\/.spamassassin\/user_prefs --spam \/home\/USER\/mail\/DOMAIN.TLD\/.ACCOUNT@DOMAIN.TLD\/.Junk\/{cur,new}<\/pre>\n\n\n\n<h2 class=\"wp-block-heading\">Use read emails in inbox as Ham<\/h2>\n\n\n\n<p>You can use the following script to feed sa-learn ham.  The script looks at all the read messages for the current year in the default inbox and then feeds them individually to sa-learn<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">cd \/home\/USER\/mail\/\nfor emailHam in `ls -lt --time-style=long-iso {cur\/,new\/} | grep $(date | awk '{print $6}') | grep \"2,S\" | awk '{print $8}'`\ndo\n\/usr\/local\/cpanel\/3rdparty\/bin\/sa-learn -p \/home\/${cpanelUser}\/.spamassassin\/user_prefs --ham ${mailbox}\/{cur,new}\/${emailHam}\ndone<\/pre>\n\n\n\n<h2 class=\"wp-block-heading\">Script to automate the process<\/h2>\n\n\n\n<p>You can use the following script to automatically train sa-learn.  Create the script and then use Crontab to launch it.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Script<\/h3>\n\n\n\n<p>Create a file named sa-learn.sh and add the following contents to it.<\/p>\n\n\n\n<div style=\"height: 250px; position:relative; margin-bottom: 50px;\" class=\"wp-block-simple-code-block-ace\"><pre class=\"wp-block-simple-code-block-ace\" style=\"position:absolute;top:0;right:0;bottom:0;left:0\" data-mode=\"php\" data-theme=\"monokai\" data-fontsize=\"14\" data-lines=\"Infinity\" data-showlines=\"true\" data-copy=\"false\">#!\/bin\/bash\n\n# Notes on cpanel mail\n# - \/home\/cpanel_user\/mail &lt;- Default mail directory, all the email accounts are located in the domain.com directory, although there are hidden files in here that point to that.\n# - the default catch all is in ....\/mail\n\ndateYear=`date +%Y`\n\necho \"Starting Training\"\nfor mailbox in `cat mailboxes.txt`; do\n        cd ${mailbox}\n        echo \"training on Ham\" for ${mailbox}\n        cpanelUser=`echo ${mailbox} | cut -d\\\/ -f3`\n        # Check Spam\n        echo \"Trainging on Spam, SPAM, spam, junk, Junk Email, and Junk folders\"\n        \/usr\/local\/cpanel\/3rdparty\/bin\/sa-learn -p \/home\/${cpanelUser}\/.spamassassin\/user_prefs --spam ${mailbox}\/{\".Junk Email\"\/{new\/,cur\/},.Junk\/{new\/,cur\/},.junk\/{new\/,cur\/},.spam\/{new\/,cur\/},.Spam\/{new\/,cur\/},.SPAM\/{new\/,cur\/}}\n        cd\n        # Gets a list of seen messages for the current year to use as Ham\n        for emailHam in `ls -lt --time-style=long-iso {cur\/,new\/} |  grep $(date | awk '{print $6}') | grep \"2,S\" | awk '{print $8}'`\n        do\n            \/usr\/local\/cpanel\/3rdparty\/bin\/sa-learn -p \/home\/${cpanelUser}\/.spamassassin\/user_prefs --ham ${mailbox}\/{cur,new}\/${emailHam}\n        done\ndone\n\n<\/pre><\/div>\n\n\n\n<h3 class=\"wp-block-heading\">Create text file to hold mailbox paths<\/h3>\n\n\n\n<p>You&#8217;ll need to create a file called mailboxes.txt and put the email paths for the email accounts you want to run sa-learn against.  The following is an example of what the file should look like.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">\/home\/incredigeek\/mail\/.bob@incredigeek_com\/\n\/home\/incredigeek\/mail\/.larry@incredigeek_com\/\n\/home\/incredigeek\/mail\/.steve@incredigeek_com\/\n\/home\/incredigeek\/mail\/.admin@incredigeek_com\/<\/pre>\n\n\n\n<h3 class=\"wp-block-heading\">Create Crontab<\/h3>\n\n\n\n<p>Add script to cron by running<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">crontab -e<\/pre>\n\n\n\n<p>and paste in the following to launch the script every day at 1AM<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">0 1 * * * \/root\/sa-learn.sh train &amp;&amp; echo \"training run at $(date)\" >> \/root\/email_report.log<\/pre>\n\n\n\n<p>Save and you should be ready to go.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>More information available at the following linkhttps:\/\/forums.cpanel.net\/resources\/how-to-train-spamassassin-with-sa-learn.623\/ Training on Spam Train Spam from Junk email directories for email account. Replace USER with the domain admin username, DOMAIN.TLD with domain name, and ACCOUNT with the email address. \/usr\/local\/cpanel\/3rdparty\/bin\/sa-learn -p \/home\/USER\/.spamassassin\/user_prefs &#8211;spam &hellip; <a href=\"https:\/\/www.incredigeek.com\/home\/using-sa-learn-to-improve-spam-filtering-in-cpanel\/\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[195],"tags":[196,1075,1076,446,382],"class_list":["post-3539","post","type-post","status-publish","format-standard","hentry","category-cpanel","tag-cpanel-2","tag-ham","tag-sa-learn","tag-spam","tag-whm"],"_links":{"self":[{"href":"https:\/\/www.incredigeek.com\/home\/wp-json\/wp\/v2\/posts\/3539","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.incredigeek.com\/home\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.incredigeek.com\/home\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.incredigeek.com\/home\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.incredigeek.com\/home\/wp-json\/wp\/v2\/comments?post=3539"}],"version-history":[{"count":4,"href":"https:\/\/www.incredigeek.com\/home\/wp-json\/wp\/v2\/posts\/3539\/revisions"}],"predecessor-version":[{"id":3554,"href":"https:\/\/www.incredigeek.com\/home\/wp-json\/wp\/v2\/posts\/3539\/revisions\/3554"}],"wp:attachment":[{"href":"https:\/\/www.incredigeek.com\/home\/wp-json\/wp\/v2\/media?parent=3539"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.incredigeek.com\/home\/wp-json\/wp\/v2\/categories?post=3539"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.incredigeek.com\/home\/wp-json\/wp\/v2\/tags?post=3539"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}