Creating a New Spam/Junk Filter

The architecture of Movable Type’s spam detection and filtering framework was inspired by the now ubiquitous Spam Assassin spam filtering system used for email. It works by chaining multiple filters together and then aggregating the score that emanates from each filter. The aggregate score is then used to determine if a comment is spam or ham. “Ham” of course refers to any comment that is not spam — aren’t geeks funny?

Each filter can influence the aggregate score in either a negative (hammy) or positive (spammy) way, by returning a number between -10 and 10 respectively. In other words, some plugins may focus on identifying ham, rather then trying to detect spam.

Registering a spam or junk filter is done through the Movable Type registry like so:

name: Example Plugin for Movable Type
id: Example
description: This plugin is an example plugin for Movable Type.
version: 1.0
junk_filters:
    my_antispam:
        label: 'My AntiSpam'
        code: $Example::Example::Plugin::spam_score

Your junk filter will need to point to a handler through which each comment received will be processed. Now let’s define the handler itself.

As we all know, the “e” character is eeevil. So here is a plugin to detect any E’s in an incoming feedback and place a high junk score on items that have a lot of those monsters.

package Example::Plugin;
use strict;

sub spam_score {
    my ($obj) = @_;
    # count the number of E's in the comment
    my @es = $obj->all_text =~ m/(e)/gi;
    my $count = scalar @es;
    # generate your score
    my $score = (2 ** $count - 1);
    return MT::JunkFilter::ABSTAIN() if ($score <= 0);

    return (-$score, "Contained $count 'e' characters");
}
1;

In the above example two possible values are returned. The first is MT::JunkFilter::ABSTAIN(). Returning ABSTAIN will result in this filter being skipped and excluded from contributing to the overall score for the associated comment or TrackBack. The second possible return value is returned if the handler elects to report a score for the comment or TrackBack. In that instance, the handler returns an array containing two values: the score, and a text message that will be recorded in the log stating the reason the comment was scored the way it was. This message is made visible within the Movable Type application to administrators so that they can better understand why a comment was flagged or not flagged as spam, and adjust their filters accordingly.

Computing the Aggregate Junk Score

Movable Type calculates the aggregate score, but taking a simple average of all of the scores contributed by all the filters associated with a TrackBack or comment.

It is important to remember that returning the score of zero is not the same as abstaining for voting in the first place. To help illustrate, consider the effect a zero can have in an average. If a comment has two scores it is averaging, say 0 and 10, then the resulting score will be 5.

Junk Handler Return Values

MT::JunkFilter::ABSTAIN() - returning this value will result in the current filter being skipped and excluded completely from contributing to the overall aggregate score of the associated comment or TrackBack.
MT::JunkFilter::HAM()
MT::JunkFilter::SPAM()
MT::JunkFilter::APPROVE()
MT::JunkFilter::JUNK()

Note: the maximum and minimum value for any junk score is 10 and -10 respectively. Exceeding these limits will force Movable Type to round your returned score to the nearest floor or ceiling.

Junk Thresholds

Within Movable Type users can adjust their junk “threshold.” One can think of a threshold in terms of “how spammy must a comment be before I force it into the junk folder?” By default the threshold is zero.

 Spam                                            Ham
 |----|----|----|----|----|----|----|----|----|----|
-10  -8   -6   -4   -2    0    2    4    6    7   10
                          ^
                          +-- threshold

By adjusting the threshold you can fine tune and calibrate Movable Type to filter comments according to your needs. In the example above, any comment with a score to the left of the threshold will be regarded as spam, and anything to the right will be ham. By moving the threshold to the left, then more comments will be rated as ham.

Creating a New Spam/Junk Filter

Getting Started

Guides

Reference

Solutions