July 2006 Update to “Which Wiki”

About this Document

Since I wrote the previous article, I learned a lot more about wikis. Furthermore wiki spam has become much more of a problem, and I’d like to detail some ways of fighting it. Without further ado: here it is.

Document Information

Written By:
Shlomi Fish
Finish Date:
29-July-2006
Last Updated:
29-July-2006

Licence

Creative Commons License

This document is Copyright by Shlomi Fish, 2006, and is available under the terms of the Creative Commons Attribution License (CC-by) 3.0 Unported (or at your option any later version of that licence).

For securing additional rights, please contact Shlomi Fish and see the explicit requirements that are being spelt from abiding by that licence.

The Article Itself

Updates for the Implementations

UseModWiki

One should note that UseModWiki has been unmaintained for a long time. However, it has been forked into Oddmuse, which was then heavily extended and improved. I didn’t try it yet, but according to all reports, it should be better that UseModWiki. One should note that the MediaWiki syntax is backwards compatible with the UseModWiki one.

PmWiki

As I discovered after the Perl-Begin PmWiki was spammed was that even the navigation, etc. parts of its layout can be overrided by the user using wiki-syntax. While this is a nice feature, if spammed, it may make the wiki temporary unusable. Furthermore, I was able to write a Perl script to revert the old format of PmWiki to a previous state, in case of excessive spamming.

MediaWiki

I was told that in order to facilitate maintaining several instances of MediaWiki on the same host (under different URLs or domains) one can re-use the same LocalSettings.php file, only with a dispatch of some sort to the host and URL. I was able to achieve it pretty easily on the iglu.org.il host, and upgrading the MediaWiki is thus much easier.

TWiki

TWiki had a 4.0.0 release (breaking the annoying flow of date-based releases) and is now at version 4.0.4. I still don’t know whether its installation is still as long as it was or not. The TWiki code quality is reportedly very bad, but that can be improved with some concentrated amount of refactoring.

MoinMoin

As some people noted to me, it is possible that I misrepresented MoinMoin there and that it does have some advantages over other wikis. One advantage I noted was the ability to have a few versions of the same page in different human languages (as opposed to MediaWiki which generally assumes one will install a different MediaWiki instance for this).

One problem I see with MoinMoin is that it looks cheesy (at least to me). I believe the MoinMoin hackers should work on a better look for it.

Fighting Wiki Spam

An article on Newsforge.com covered ways to deal with wiki spam back on June, 2005. While the article was pretty good, it had a few omissions. One of them was that a wiki administrator should better watch its RSS or Atom feeds for any activity, so in case a wiki is spammed it would be detected immediately.

From my experiences with spam-protecting MediaWikis, the following techniques are effective:

  1. Monitoring the RSS/Atom Feeds.

  2. Installing and enabling the MediaWiki SpamBlacklist extension, and making sure the master list is updated (using a small cron job). Afterwards, I found it necessary to also maintain my own private blacklist, with URLs that I’ve been spammed with (to prevent them from re-appearing). That’s because the wikimedia blacklist has some latency.

  3. MediaWiki ships with a spam cleanup script that cleans all spam from a certain (exact) hostname. It is useful for reverting such changes.

  4. I noticed that requiring users to login helps reduce the spam a lot. Some spammers don’t bother to login, and we also had the case of a broken spamming script that kept spamming us with strings of digits, instead of URLs.

  5. Captchas (= garbled images with text) are useful for reducing the amount of spam considerably but pose a large accessibility and usability problem, as sight-impaired people cannot see them, and other people end up finding them very annoying.

    There are also ways to overcome Captchas, either by employing programs that white-hat researchers wrote that can read and analyse them, or by spammers who create sites that require these Captchas upon entrance. So far I heard of them doing it only to register email accounts on free email providers en-masse and not to spam wikis, but who knows.

  6. Another thing that can reduce spam a lot is requiring an email confirmation after the creation of the wiki account, and before one can edit pages. MediaWiki does not support it out of the box, but I was told it is easy to do, as it already has the ability to do such email confirmation.

    An email handshake is not an accessibility problem, but it still may be annoying.

DocBook and HTML to Wiki Format and Back Again

The CPAN Module HTML-WikiConverter allows one to convert HTML to the formats of most popular Wikis. DocBook/XML and POD can generate HTML, which means they can be fed to it in turn.

MediaWiki has a set of scripts named wiki2xml, which converts the MediaWiki format to XML and from that to DocBook and many other formats. Finally there’s a Google Summer of Code project for working on a MoinMoin to-and-from DocBook/XML conversion.