Begin main content

Comment Spam & Literature

I finally got fed up with manually monitoring the spam messages, but I didn't want to turn off anonymous comments. I figured captcha was the answer, but what was the easiest way?

photo.net, the awesome photographry site built and run by Philip Greenspun is built on a similar platform to this site, and I knew it had captcha. Philip is true to his word on the importance of open source software, and the api and code of his site is publicly available. I discovered that photo.net uses a free service called reCAPTCHA - and it's brilliant! It gives you two words to type in. One that it knows the answer to (to check you're a human) and the other is a curly word that the OCR system at Carnegie Mellon University couldn't figure out while scanning books for the public good. I assume they run the unknown word through a number of captcha queries and pick the most popular interpretation.

Now that's really crowd-sourcing!

07:52 AM, 26 Jul 2008 by Mark Aufflick Permalink | Comments (1)

TWiki on FastCGI

I couldn't find much discussion of TWiki on FCGI. There is a beta project to make a standalone TWiki daemon which can also be run under FCGI, but I had already installed TWiki 4.2.0 so I was reticent to reinstall from a different branch.

I only really needed to speed up the view cgi, so it shouldn't be too hard surely? I already had mod_fcgid installed on my Apache2 server.

It turned out to really be very easy, and seems to be working fine so far. Here is my diff for twiki/bin/view:

--- view~       2008-01-22 14:18:52.000000000 +1100
+++ view        2008-07-20 18:40:33.000000000 +1000
@@ -27,6 +27,14 @@
     require 'setlib.cfg';
 }

+use FCGI;
 use TWiki::UI;
 use TWiki::UI::View;
-TWiki::UI::run( \&TWiki::UI::View::view, view => 1 );
+
+my $request = FCGI::Request;
+while ($request->Accept >= 0) {
+  eval {TWiki::UI::run( \&TWiki::UI::View::view, view => 1 );};
+  warn $@ if $@;
+  $request->Flush;
+  $request->Finish;
+}

And I added the following to the bin <Directory> section in my apache config:

<FilesMatch "^view$">
    SetHandler fcgid-script
</FilesMatch>

This seemed to work fine, but searches failed to spawn grep correctly. I think diffs would have also failed to spawn rcs. So I switched over to the pure perl versions by making the following settings in LocalSite.cfg:

$TWiki::cfg{StoreImpl} = 'RcsLite';
TWiki::cfg{RCS}{SearchAlgorithm} = 'TWiki::Store::SearchAlgorithms::PurePerl';

Working a treat so far - Memory use seems ok. It rose from about 14000k up to about 15200k, and then hovered around that level indefinately. I'll let you know if I see any memory leaks or wierd issues.

Update: I guess the pure perl search implementation isn't well used. It threw up a taint error when I tried to use it. No matter, the fix was as simple as replacing a horrible piece of string eval:

--- lib/TWiki/Store/SearchAlgorithms/PurePerl.pm~       2008-01-22 14:18:55.000000000 +1100
+++ lib/TWiki/Store/SearchAlgorithms/PurePerl.pm        2008-07-20 19:32:58.000000000 +1000
@@ -46,9 +46,14 @@
     # Convert GNU grep \< \> syntax to \b
     $searchString =~ s/(?]/\\b/g;
     $searchString =~ s/^(.*)$/\\b$1\\b/go if $options->{'wordboundaries'};
-    my $match_code = "return \$_[0] =~ m/$searchString/o";
-    $match_code .= 'i' unless ($options->{casesensitive});
-    my $doMatch = eval "sub { $match_code }";
+
+    my $doMatch;
+    if ($options->{casesensitive}) {
+      $doMatch = sub { $_[0] =~ m/$searchString/o };
+    } else {
+      $doMatch = sub { $_[0] =~ m/$searchString/oi };
+    }
+
   FILE:
     foreach my $file ( @$topics ) {
         next unless open(FILE, "<$sDir/$file.txt");

I'll have to track down how to submit TWiki bugs...

Update 2: Another search issue - in the persisted view, the search page never gets re-rendered (ie. after making one successful search, all future searches appear to have identical results).

I didn't have time to find if that was a problem in the view code or the pure pearl search, but it was easy enough to make sure a new cgi was spawned per search request by adding the following at the end of the VirtualHost:

<LocationMatch "WebSearch">
    SetHandler cgi-script
</LocationMatch>

Conveniently the FCGI script works fine as a regular one-shot cgi, and since LocationMatch is processed after FilesMatch by Apache, this overrides the fcgid handler setting.

04:51 AM, 20 Jul 2008 by Mark Aufflick Permalink | Comments (1)

XML

Blog Categories

software (41)
..cocoa (23)
  ..heads up 'tunes (5)
..ruby (6)
..lisp (4)
..perl (4)
..openacs (1)
mac (21)
embedded (2)
..microprocessor (2)
  ..avr (1)
electronics (3)
design (1)
photography (26)
..black and white (6)
..A day in Sydney (18)
..The Daily Shoot (6)
food (2)
Book Review (2)

Notifications

Icon of envelope Request notifications

Syndication Feed

XML

Recent Comments

  1. Mark Aufflick: Re: the go/Inbox go/Sent buttons
  2. Unregistered Visitor: How do make a button to jump to folder
  3. Unregistered Visitor: Note I've updated the gist
  4. Unregistered Visitor: umbrello is now an available port on macPorts
  5. Unregistered Visitor: Updated version on Github
  6. Unregistered Visitor: Modification request.
  7. Unregistered Visitor: Accents and labels with spaces
  8. Unregistered Visitor: Mel Kaye - additional info
  9. Unregistered Visitor: mmh
  10. Mark Aufflick: Thank you