Creating a New Spam/Junk Filter
The architecture of Movable Type’s spam detection and filtering framework was inspired by the now ubiquitous Spam Assassin spam filtering system used for email. It works by chaining multiple filters together and then aggregating the score that emanates from each filter. The aggregate score is then used to determine if a comment is spam or ham. “Ham” of course refers to any comment that is not spam — aren’t geeks funny?
Each filter can influence the aggregate score in either a negative (hammy) or positive (spammy) way, by returning a number between -10 and 10 respectively. In other words, some plugins may focus on identifying ham, rather then trying to detect spam.
Registering a spam or junk filter is done through the Movable Type registry like so:
name: Example Plugin for Movable Type
id: Example
description: This plugin is an example plugin for Movable Type.
version: 1.0
junk_filters:
my_antispam:
label: 'My AntiSpam'
code: $Example::Example::Plugin::spam_score
Your junk filter will need to point to a handler through which each comment received will be processed. Now let’s define the handler itself.
As we all know, the “e” character is eeevil. So here is a plugin to detect any E’s in an incoming feedback and place a high junk score on items that have a lot of those monsters.
package Example::Plugin;
use strict;
sub spam_score {
my ($obj) = @_;
# count the number of E's in the comment
my @es = $obj->all_text =~ m/(e)/gi;
my $count = scalar @es;
# generate your score
my $score = (2 ** $count - 1);
return MT::JunkFilter::ABSTAIN() if ($score <= 0);
return (-$score, "Contained $count 'e' characters");
}
1;
In the above example two possible values are returned. The first is MT::JunkFilter::ABSTAIN()
. Returning ABSTAIN will result in this filter being skipped and excluded from contributing to the overall score for the associated comment or TrackBack. The second possible return value is returned if the handler elects to report a score for the comment or TrackBack. In that instance, the handler returns an array containing two values: the score, and a text message that will be recorded in the log stating the reason the comment was scored the way it was. This message is made visible within the Movable Type application to administrators so that they can better understand why a comment was flagged or not flagged as spam, and adjust their filters accordingly.
Computing the Aggregate Junk Score
Movable Type calculates the aggregate score, but taking a simple average of all of the scores contributed by all the filters associated with a TrackBack or comment.
It is important to remember that returning the score of zero is not the same as abstaining for voting in the first place. To help illustrate, consider the effect a zero can have in an average. If a comment has two scores it is averaging, say 0 and 10, then the resulting score will be 5.
Junk Handler Return Values
MT::JunkFilter::ABSTAIN()
- returning this value will result in the current filter being skipped and excluded completely from contributing to the overall aggregate score of the associated comment or TrackBack.MT::JunkFilter::HAM()
MT::JunkFilter::SPAM()
MT::JunkFilter::APPROVE()
MT::JunkFilter::JUNK()
Note: the maximum and minimum value for any junk score is 10 and -10 respectively. Exceeding these limits will force Movable Type to round your returned score to the nearest floor or ceiling.
Junk Thresholds
Within Movable Type users can adjust their junk “threshold.” One can think of a threshold in terms of “how spammy must a comment be before I force it into the junk folder?” By default the threshold is zero.
Spam Ham
|----|----|----|----|----|----|----|----|----|----|
-10 -8 -6 -4 -2 0 2 4 6 7 10
^
+-- threshold
By adjusting the threshold you can fine tune and calibrate Movable Type to filter comments according to your needs. In the example above, any comment with a score to the left of the threshold will be regarded as spam, and anything to the right will be ham. By moving the threshold to the left, then more comments will be rated as ham.