Writing a perl read-eval-print loop (REPL) - part 1
=== update, April 25
For the impatient, part 2 and part 3 are already out and I'm aiming to publish a part per week until I run out of ideas and change to a different topic.
=== end update, original post follows
So, I need to sort out my personal e-mail - I've left it alone for a few weeks and it's accumulated >10k messages. Forwarding it to gmail sort of works, but I'm a die-hard mutt user. I also prefer doing mail access over IMAP, so things like procmail aren't spectacularly useful. Plus I find procmail even uglier than the worst perl I've ever seen, so ... no.
Which means I need to script mail classification and filtering over IMAP. Which means I need an easy way to experiment with the various CPAN IMAP modules without repeatedly fetching the header list. Which means a repl - read-eval-print loop, basically an interactive shell for $language_of_choice, ideally, so I can prat around interactively and make my mistakes in an environment where it's not going to screw me entirely when I get it wrong.
Now, ok, there are two already on CPAN. Great. Except Shell::Perl uses package variables to persist data between lines (think namespaced globals) and App::REPL is somewhat baroque and really, really wants to use bright colours everywhere. And I'm an old fvwm2-loving curmudgeon who really hates colourisation. Plus I really want to make something that's nice and easy to extend, which means I want to use the meta-object-orientation goodness of Moose and the runtime plugin facilities of the MooseX::Object::Pluggable role. Soo, sod it. I'll write a new one, and explain what I'm doing and why as I go along as an attempt to justify the level to which this is now yak shaving
Name first. Easy. Devel::REPL, because (a) it isn't taken and (b) a REPL to me is very very much a development tool, so it's a reasonably sane namespace. The actual script is going to be called re.pl, mostly because I can and because it amuses me. Last tools to mention before I start - Term::Readline, which will do the heavy lifting of handling readline capabilities, and namespace::clean which will let me clear out any helper functions I import from my classes so I can inherit methods of the same name without breaking anything.
First, setup the dist directory and open the module file -
cain$ mkdir Devel-REPL
cain$ cd Devel-REPL
cain$ mkdir -p lib/Devel
cain$ vi lib/Devel/REPL.pm
Declare the package (class) name and load the tools I need -
package Devel::REPL;
use Term::ReadLine;
use Moose;
use namespace::clean;
Note that I don't need to explicitly ask for 'strict' and 'warnings' as is normal at the top of a perl file - Moose does this automatically. namespace::clean comes last because it examines the package's namespace at the point it's use'd to figure out what to clean out afterwards - so far, just the stuff that came from Moose but there could easily be more later. I don't need to declare a base class because I get the standard Moose::Object but I do need to load the Pluggable role to get the load_plugin goodness -
with 'MooseX::Object::Pluggable';
Now, according to the Term::ReadLine synopsis, which handily is an extremely primitive REPL in and of itself, I'm going to need at least a term object, a prompt string and an output filehandle, so let's declare those as attributes -
has 'term' => (
is => 'rw', required => 1,
default => sub { Term::ReadLine->new('Perl REPL') }
);has 'prompt' => (
is => 'rw', required => 1,
default => sub { '$ ' }
);has 'out_fh' => (
is => 'rw', required => 1, lazy => 1,
default => sub { shift->term->OUT || \*STDOUT; }
);
The 'rw' means I'll get a getter/setter accessor type for each of these, required prevents them accidentally being set to something undefined, and I've made the 'out_fh' attribute lazy so that it can rely on being defaulted -after- the object's constructed so the call to 'term' will work. I could have set restrictions on what types the values provided to these attributes are by providing the 'isa' option to the has calls but I can't see any advantage to it right now and I might want to pass something odd in for interesting purposes later.
Next step is to create an initial runloop that calls on read, execute and print steps (why execute and not eval? we'll get to that in part 2 :) -
sub run {
my ($self) = @_;
while ($self->run_once) {
# keep looping
}
}sub run_once {
my ($self) = @_;
my $line = $self->read;
return unless defined($line); # undefined value == EOF
my @ret = $self->execute($line);
$self->print(@ret);
return 1;
}
Separating out run and run_once may seem largely pointless at this stage, but later on we may want to hook some sort of action to happen before every step - say incrementing a counter in the prompt (or something more interesting once I think of it :). I imagine a bunch of you are probably muttering 'yagni yagni yagni' under your breath, so in turn I'd like those of you who -are- to imagine me sticking my tongue out at you. Ok, we done now? Good.
run_once itself does, pretty much literally, read then execute then print. The only wrinkle is the return if $line is undefined; traditionally perl filehandles of any sort return the special value undef to indicate EOF, since '' or '0' both evaluate to false but are perfectly valid lines to read (even if they make no sense to the app reading them), and Term::ReadLine behaves just the same. Then at the end if we got that far, we return 1 to indicate success to run() so execution continues.
Of course, we still haven't actually defined the read, execute and print steps, so let's do that now.
sub read {
my ($self) = @_;
return $self->term->readline($self->prompt);
}
Simple enough; term and prompt are both stock accessors so calling them with no arguments returns the value - to set we'd call $self->term($new_term) or similar. Moose will happily let you create separate get_term and set_term methods via the 'reader' and 'writer' options to has, but it's not usual and it's more typing so I'm not going to.
sub execute {
my ($self, $to_exec) = @_;
my @ret = eval $to_exec;
@ret = ("ERROR: $@") if $@;
return @ret;
}
eval is used here in string mode to compile+execute at the same time - this currently means that all code is executing in the Devel::REPL namespace, which we don't really want but it'll do for a start. The return is made in list context in case the code's returning multiple values - it's unlikely to do any harm and having to put [] round code returning more than one thing would be -annoying-. A quick check of $@ afterwards for compile or execution errors and we're good to go.
sub print {
my ($self, @ret) = @_;
my $fh = $self->out_fh;
print $fh "@ret";
}1;
And now we can grab the appropriate filehandle (which will call the lazy default => sub the first time we ask for it) and print the output. Yay. The 1; at the end of the file indicates to perl that the .pm loading ok. So, check syntax -
cain$ perl -c lib/Devel/REPL.pm
lib/Devel/REPL.pm syntax OK
and try running the code (-Ilib tells perl to search 'lib' in the local dir, the -M loads the module and -e provides the code to execute since we don't have a script yet) -
cain$ perl -Ilib -MDevel::REPL -e 'Devel::REPL->new->run;'
$ 2 + 4
6
$ (1 .. 3)
1 2 3
$
cain$
And it lives, it evaluates, and Ctrl-D sends EOF and brings me back to the shell prompt. Lovely. So, a quick svk add + commit later, the first working code is in the repository.
Next time round I'll sort out history and add the first plugin - one to provide a persistant lexical environment so we can carry variables between lines without polluting the Devel::REPL namespace or giving up the joys of compile-time typo checking from 'use strict'. I'll see you there.
=== update, April 25
History handling turned out to be more interesting than I first expected so it and the plugin approach got part 2 all to themselves. Lexical environment handling is now covered by part 3
=== end update, original post follows
Comments
(2) the latest Moose lets you close classes and generate inlined constructors which speed things up substantially. Maybe I'll cover "how to optimise a REPL for no good reason other than 'because I can'" for part 4 :)
Devel::REPL could have been implemented with Shell::Base like so:
package Devel::REPL;
use strict;
use base qw(Shell::Base);
sub do_default {
my ($self, @cmd) = @_;
chomp(my $output = eval "@cmd");
return $output;
}
1;And then use it like:
perl -MShell::Base=shell -e shell
And that will put you into a REPL.
Plus once I realised I was going to have to write code at all I decided I wanted to write something that had a full meta-model, which went Moosen all the way down. When I have a need for a command shell for stuff (which is probably going to happen reasonably soon, actually) I'll have another look over Shell::Base then.
Shell::Perl is meant to be a simple interactive way to run Perl commands. If you enter commands like:
pirl> $a = 3;
3
pirl> $b = $a + 1;
4
pirl> $c = $a + $b;
7
it is supposed to do the same as a simple script like
package Shell::Perl::sandbox;
no strict qw(vars subs);
$a = 3;
$b = $a + 1;
$c = $a + $b;
Undeclared variables end up like package variables in Shell::Perl::sandbox which is supposed to hold all these unqualified variables in the evaluated expressions. Plain simple - as (to me) the REPL should be too clever to disturb or annoy the user. (The idea to persist the environment be it lexical or a package is damn good and untouched by Shell::Perl by now.)
For the record, there is currently a branch of Shell::Perl using Shell::Base to simplify the programming.
http://iperl.googlecode.com/svn/branches/shell-base/
It has not been promoted yet to trunk because Shell::Base issues running on Windows, Cygwin, etc.
But even my simple scripts are 100% strict and warnings clean except where -I- explicitly turn them off, and I want any code I type at the REPL to behave the exact same way - I want strict's protection against typo-ing variable names when I'm experimenting, there are generally enough variables already :)
So we're in violent agreement on "the REPL should be too clever to disturb or annoy the user", we just have different definitions of 'disturb' and 'annoy' - which is probably going to be the case to some extent between any two developers, anda large part of why I'm aiming to keep Devel::REPL itself as simple as possible and add functionality via plugins.
Since the environment of the Devel::REPL package itself has strict and warnings on that's the default for any eval that takes place and hence what happens under 'as simple as possible', but a wrapper round execute that threw 'no strict qw(vars subs);\n' onto the front of each line would be trivial so I don't see that as a disadvantage even for those who prefer to use package vars rather than lexicals.
thanks
I have found two interesting sources Fileshunt.com and Filesfinds.com and would like to give the benefit of my experience to you.