Penn Computing
Computing Menu Computing A-Z
Computing Home Information Systems & Computing Penn

Spam Filtering Service

Unsolicited e-mail that pitches products, goods, services, etc., can be considered spam. These messages can be quite annoying and many users would prefer not to see these messages in their Inbox. Users with accounts on the ISC Networking and Telecommuncations e-mail servers can opt to filter all incoming messages for spam. The spam filtering service uses SpamAssassin to filter incoming messages. Messages that are marked as spam are stored in a separate mail folder and are automatically deleted after 28 days (or the expiration period that you specify) by a nightly process. Messages stored in the spam folder should be regularly monitored as real mail may be incorrectly identified as spam. Once a message is deleted from the spam folder, it cannot be recovered.

Opting In

If you wish to filter your mail for spam, you must opt into the service. Spam filtering is not automatically turned on. You can opt-in from the Account Services menu, authenticating yourself with your PennKey/password.

From the Spam Filtering menu, choose Opt-in/Opt-out and you will be presented with the following form option:

Choose your level of spam filtering:

To opt-in to the spam filtering service, you must pick the level of filtering that will be applied to all incoming mail. Four levels of filtering are available: low, medium, high, very high. The levels of filtering refer to the amount of filtering rules that will be applied. Low filtering means that a message must match more of the spam rules and less spam will be identified but less real mail will be incorrectly identified as spam. Very high filtering means that a message must match less of the spam rules and more messages may be identified as spam but this also increases your changes of real mail being identified as spam. We suggest that you start by using "low filtering". If you find that low filtering does not catch the kind of spam messages that you receive, switch to medium filtering. You can change your filtering level at any time.

You can also choose a "Tag only" level. With "Tag only" selected, all of your incoming messages will be evaluated and scored but will remain in your Inbox so that your email client's spam filtering mechanisms can further manage your messages locally.

See below for more information on spam filtering levels or consult with your local support provider for help in determining which filter level is best for you and the type of mail that you receive.

Opting out

If you have opted for spam filtering but find that you'd rather not have your mail filtered, choose No filtering from Opt-in/Opt-out form as described in the Opting In section above and that will opt you out of the service.

Once you have opted out of the service, there will be no more automatic maintenance of mail that has already been identified as spam. You must manually delete any unwanted messages from the spam filter folder as these messages do count towards your disk quota.

Maintenance of the spam filter folder

After opting into the service, messages that have been identified as spam will be stored in a separate folder in your Mail directory.

Example:
/home/j/o/joeuser/Mail/caught-spam
This is a mail folder just like any other mail folder in your home directory and does count towards your disk quota. If your mail client can access other mail folders besides the main Inbox (unlike POP clients which can only access the main Inbox), you can access this folder with the mail client and maintain it like any other mail folder, deleting, forwarding, moving messages. The one difference is that we do have a nightly process that will automatically delete messages in this folder that are older than the specified expiration period. The default is 28 days and you can opt to change the expiration period.

It is important that you monitor this folder since it is possible that real mail may be identified as spam if the real message has characteristics that match the spam filtering rules. If a message is older than 28 days and the nightly process deletes that message, it cannot be recovered.

Since the spam filter folder does count towards your disk quota, you may want to delete messages sooner than 28 days. You can use your mail client to manually delete messages from the spam filter folder or you can use the Spam Filtering menu option to Maintain spam filter folder.

To delete messages using your mail client, you'll need to change to the "caught-spam" mail folder first. If you are using Elm, please see full instructions for changing folders. If you are using Webmail, you are automatically subscribed to the "caught-spam" folder when you opt in and should be able to change to the "caught-spam" folder. Other IMAP clients will handle this "caught-spam" folder just like any other mail folder outside of the main inbox. How you access the "caught-spam" folder will depend on your IMAP mail client. POP mail clients cannot change mail folders. If you are using a POP mail client, you may want to consider using Webmail to view your "caught-spam" folder. A quick guide to managing your caught-spam folder through Webmail is available.

To delete messages using the Spam Filtering menu, choose Maintain spam filter folder and if you have a caught-spam folder, you will be presented with the following form option:

Delete messages older than   day(s)

If you wish to delete all messages that are older than 21 days, choose "21" from the pull-down menu. Once the form is submitted, all messages older than 21 days will be deleted from the spam filter folder. Choosing to delete messages from this form will not affect the nightly process of deleting messages older than 28 days from the spam folder. The nightly process will continue to delete only messages that are older than 28 days.

Choosing "0" from the pull-down menu will delete all messages from the spam filter folder. Once again we must stress that once messages are deleted from the spam filter folder, they cannot be recovered. It is strongly recommended that you scan messages in the spam filter folder before opting to delete messages.

Adjusting the expiration period

There is a nightly job that will examine messages in your spam-filter folder and will delete messages that are older than the specified expiration period. The default is 28 days. This default usually gives the user enough time to examine the spam filter folder to make sure that no real messages were accidently caught by the spam filter rules. If you'd rather messages in your spam filter folder deleted sooner, you can opt to change the expiration period for your messages.

Choose the Spam Filtering menu option Adjust expiration period. You will be presented with the follow form option:

Set automatic expiration period to:
Select the expiration period that matches your needs. Nightly, messages in your spam filter folder will now be deleted when they are older than the amount of days that you have chosen.

Forwarding mail

If you have setup a .forward in your home directory or have opted to forward your e-mail through Account Services and you are forwarding all of your mail to some other account, you will not be able to use spam filtering.

If you are forwarding a copy of your mail to another e-mail address and still want to keep a copy of your e-mail on the local server, you will be able to use spam filtering but you must set up your .forward with the proper syntax.

Example:
Correct - joeuser, another-user@sas.upenn.edu

Incorrect - joeuser@pobox.upenn.edu, another-user@sas.upenn.edu

Your .forward file should include just your username without any hostname, then the list of address(es) to which you wish to send a copy of all of your e-mail. All addresses should be separated with commas. In the above "Correct" example, a copy of the mail will be delivered locally to joeuser's mailbox where it will then be filtered for spam and a copy of the mail will be forwarded unfiltered to another-user@sas.upenn.edu.

SpamAssassin mail headers

If you have chosen to filter your mail and you look at the full mail headers of your filtered messages, you will notice some additional mail headers will have been added to your messages.

Example of a message that is not considered spam:
X-Spam-Checker-Version: SpamAssassin 2.63 (2004-01-11) on
        porpoise.net.isc.upenn.edu
X-Spam-Status: No, hits=-2.7 required=5.0 
	tests=BAYES_00, MIME_HEADER_CTYPE_ONLY autolearn=no version=2.63 
X-Spam-Level:

Example of a message that is considered spam:

X-Spam-Flag: YES
X-Spam-Checker-Version: SpamAssassin 2.63 (2004-01-11) on
         porpoise.net.isc.upenn.edu
X-Spam-Status: Yes, hits=6.3 required=5.0 
tests=BAYES_01,DATE_IN_PAST_12_24, DRASTIC_REDUCED,INVALID_MSGID,
	MIME_HEADER_CTYPE_ONLY,NO_REAL_NAME,
        REMOVE_SUBJ autolearn=no version=2.63
X-Spam-Level: ******

Messages that are multi-part may also include a report from SpamAssassin that lists in detail the spam tests for which the message matches:

   Spam detection software, has identified this incoming email as possible
   spam.  The original message has been attached to this so you can view
   it (if it isn't spam) or block similar future email.  If you have any
   questions, see server-admin@isc.upenn.edu for details.

   Content preview:  THIS ENTERPRISE IS AWESOMELY FEATURED IN SEPTEMBER
   2000 MILLIONAIRE, AUGUST 2000 TYCOONS AND AUGUST 2000 ENTREPRENEUR
   Magazine. ====> Do you have a burning desire to change the quality of
   your existing life? [...]

   pts rule name              description
  ---- ---------------------- --------------------------------------------
   0.2 NO_REAL_NAME           From: does not include a real name
   1.8 DRASTIC_REDUCED        BODY: Drastically Reduced
   0.4 REMOVE_SUBJ            BODY: List removal information
  -1.5 BAYES_01               BODY: Bayesian spam probability is 1 to 10%
                              [score: 0.0562]
   0.7 DATE_IN_PAST_12_24     Date: is 12 to 24 hours before Received: date
   2.2 MIME_HEADER_CTYPE_ONLY 'Content-Type' found without required MIME headers
   2.5 INVALID_MSGID          Message-Id is not valid, according to RFC 2822
These extra mail headers may help to explain why a message was or was not tagged as spam and may help you to decide what level of spam filtering you may need.

Following is an explanation of some of the tags in the spam mail headers.

hits= The spam ranking for the message. When a message is filtered, it is examined for certain spam characteristics and if it matches any of those characteristics, it is given a numerical spam ranking.
required= Messages must have at least a 5.0 ranking to be considered spam. This is a system-wide configuration. Because a message meets the minimum requirements to be considered spam does not mean that the message will be stored in your spam filter folder. Whether a spam message is stored in your Inbox or your spam folder will depend on the filtering level that you have chosen for your spam filtering and the X-SpamLevel of the message.
tests= The list of spam tests for which the message has matched. A full list of tests performed by SpamAssassin is available.
X-Spam-Level: The overall whole integer representation of the spam level of the message that displays as a series of asterisks. If a message scores a ranking of hits=7.5, the X-Spam-Level for that message will be X-Spam-Level: *******. It is this value that determines whether a message will be stored in your spam folder. For example, if you have chosen low filtering, the message must have at least 7 asterisks for the messages to be stored in your spam filter folder. Even if a message is considered spam because it has a 5.0 spam ranking, it will only be stored in your spam filter folder if it has 7 asterisks or more.
  • low filtering - 7 asterisks or more
  • medium filtering - 6 asterisks or more
  • high filtering - 5 asterisks or more

Interaction with existing .procmailrc file

To implement spam filtering with Spam Assassin, the following statement is written to the .procmailrc file in your home directory

INCLUDERC=/usr/local/etc/filtering/spamfilter.level
where level is the filtering level that you have chosen. This INCLUDERC= statement implicitly calls SpamAssassin to filter your mail.

If you already have a .procmailrc file with existing Procmail recipes, the system will write the above statement to the top of the file so that spam filtering will be applied to all mail before any other filtering rule is used. If you decide to edit your .procmailrc file to change this behavior, please be aware that anytime you use the Spam Filtering menu, the system will again put this INCLUDERC= statement back to the top of your .procmailrc file to again force spam filtering as the first Procmail action.

Allow/Disallow lists

You can filter your messages by using Allow/Disallow lists. These lists contain patterns that may appear in your messages and the system will first look for these patterns in your messages before applying any other spam filtering. If a message matches any of your Allow/Disallow list, an additional message header will be added to the incoming message. As with spam filtering, you can opt to move messages that match these patterns into your spam-filter folder or you can move these messages directly into your Inbox to then allow your client's spam filtering mechanisms to further manage these messages. If the incoming message doesn't match any of your Allow/Disallow lists, the message will then be passed to Spam Assassin where it will be evaluated and scored. The disposition of the message at this point will depend on your opt-in selection. Please see our diagram on the interaction between Allow/Disallow lists and spam filtering.

Allow/Disallow lists will signal the system to look for these patterns in areas that you specify:

  • Sender domain - the sending host, e.g., upenn.edu
  • Sender - the email address of the sender, e.g., joeuser@pobox.upenn.edu
  • Subject - the subject of the message, e.g., Pharm
You can opt to "Allow" or "Disallow" messages that match these patterns.

If you add "aol.com" to your "Disallow these entire sender domains" list, all mail from any user at "aol.com" will be automatically categorized as spam. If you "disallow" the sender joeuser@pobox.upenn.edu, mail from this user will be automatically categorized as spam. If you add the term "Pharm" to your "Disallow these subjects" list, all mail that has a subject that contains "Pharm" will be categorized as spam. A message with the subject "Pharmaceuticals" or "Pharmacy" or "New PHARMACY" will be marked as spam.

Conversely, if you add a pattern to your "Allow" lists, you can override all spam filtering to accept delivery on a message.

Allow patterns are excepts to all spam filtering and even override any disallow patterns. If you have chosen to disallow all mail sent from "aol.com" but do want to receive mail sent from "someuser@aol.com", you can add "someuser@aol.com" to your "Allow these senders".

top

Information Systems and Computing
University of Pennsylvania
Comments & Questions


University of Pennsylvania Penn Computing University of Pennsylvania Information Systems & Computing (ISC)
Information Systems and Computing, University of Pennsylvania