Sunday, March 04, 2012

NSIS: Windows installers through an EDSL

Summary: I've just released the NSIS library on Hackage. It's useful for building Windows installers, but is also a call-by-name embedded two-stage programming language.

I've just released a new library on Hackage, named NSIS. It's a library for building Windows installers (see an example at the top of the documentation), based on the NSIS (Nullsoft Scriptable Install System). The basic idea of NSIS is that you write a program which defines your installer - it can search for files, create registry keys, create groups of files etc. But, at its heart, it is a full programming language, merely optimised to the job of writing installers. The original NSIS system has an excellent backend, producing small but featureful installers which run well on a variety of Windows platforms. Unfortunately, the front-end programming language is probably best described as assembly code with gentle influences from scripting languages. My NSIS library defines an embedded Haskell language, of the style promoted by Lennart Augustsson, which can produce scripts to be fed into the original NSIS compiler.

Why might you want to use the original NSIS backend?

I've tried several installer generators over the past few years, and even written my own from scratch. Of all these, I prefer NSIS mainly for two reasons:


  • It's a full programming language. The installer is capable of expressing any program, which means that as you need to do custom things in the installer, you can. For example, you can prohibit installing into the Windows directory. Equally, you can calculate the 1000th prime number. You are never artificially limited by the installer.

  • The installers are slick and robust. As the NSIS documentation says, "Being a user's first experience with your product, a stable and reliable installer is an important component of successful software". NSIS consistently delivers a smooth end-user experience, provided you select the MUI2 (Modern User Interface 2) settings.



Why might you want to use my frontend?

There are many advantages to my frontend. If your script is simple it is likely to look relatively similar in either system. If your script is complex it could end up 100 lines shorter and far clearer in my system. However, there are three primary advantages.

MUI2 by default

If you are writing a simple NSIS installer, the biggest difference is that my system will use the MUI2 settings by default. As NSIS has evolved there are now three separate ways to define your user interface, using the Classic builtin commands, using MUI1 or using MUI2. Of these, MUI2 is by far the nicest and should always be used, but the Classic interface is built in by default, and MUI2 is defined on top using macros. To specify that you want to insert a particular page into the installer you can write either of:


page Directory # classic NSIS
!insertmacro MUI_PAGE_DIRECTORY # MUI2


However, with my front end, you simply write:


page Directory


Flow control

While NSIS installer scripts are very powerful, they aren't easy for script authors. Taking the example from earlier, let's warn before installing into the Windows directory or the System directory:


StrCmp $WINDIR $INSTDIR bad 0
StrCmp $SYSDIR $INSTDIR bad 0
Goto skip
bad:
MessageBox MBOK|MB_ICON_EXCLAMATION "Warning: installing into the Windows directory"
skip:


Shockingly, in 1995 someone wrote a scripting language with only goto, and hasn't added any flow control since then. In my frontend, you can write:


iff_ ("$INSTDIR" %== "$WINDIR" %|| "$INSTDIR" %== "$SYSDIR") $
alert "Warning: installing into the Windows directory"


All control flow can be accomplished with structured programming.

Variables

The original NSIS system has global mutable variables, 16 register variables and a stack - it directly mirrors assembly programming. In my system, variables are properly scoped and named. The difference for complicated functions is significant. As an example, let us define substring replacement in the original NSIS script:


Push "Hello World"
Push "Hello"
Push "Goodbye"
Call StrRep
Pop $R0
MessageBox MB_OK $R0

; Taken from http://nsis.sourceforge.net/Replace_Sub_String_(macro)
Function StrRep
;Written by dirtydingus 2003-02-20 04:30:09
Exch $R4 ; $R4 = Replacement String
Exch
Exch $R3 ; $R3 = String to replace (needle)
Exch 2
Exch $R1 ; $R1 = String to do replacement in (haystack)
Push $R2 ; Replaced haystack
Push $R5 ; Len (needle)
Push $R6 ; len (haystack)
Push $R7 ; Scratch reg
StrCpy $R2 ""
StrLen $R5 $R3
StrLen $R6 $R1
loop:
StrCpy $R7 $R1 $R5
StrCmp $R7 $R3 found
StrCpy $R7 $R1 1 ; - optimization can be removed if U know len needle=1
StrCpy $R2 "$R2$R7"
StrCpy $R1 $R1 $R6 1
StrCmp $R1 "" done loop
found:
StrCpy $R2 "$R2$R4"
StrCpy $R1 $R1 $R6 $R5
StrCmp $R1 "" done loop
done:
StrCpy $R3 $R2
Pop $R7
Pop $R6
Pop $R5
Pop $R2
Pop $R1
Pop $R4
Exch $R3
FunctionEnd


Alternatively, with my frontend, you can write:


alert $ strReplace "Hello" "Goodbye" "Hello World"

strReplace :: Exp String -> Exp String -> Exp String -> Exp String
strReplace from to str = do
from <- constant_ from
to <- constant_ to
rest <- mutable_ str
res <- mutable_ ""
while (rest %/= "") $ do
iff (from `strIsPrefixOf` rest)
(do
res @= res & to
rest @= strDrop (strLength from) rest)
(do
res @= res & strTake 1 rest
rest @= strDrop 1 rest)
res


The difference is immense - the first is hard to follow without plenty of paper. The second is a fairly obvious imperative algorithm for string replacement (and is included in the package for reuse). Note that even though we're using a functional host language, our embedded language is very much imperative.

What technologies went into the frontend?

I really enjoyed writing this installer frontend. I got to dust off many techniques from compiler design that I haven't used for a while. Below is a flavour of some of the techniques I used:


  • Phantom types - all expressions have a phantom type, indicating what type the expression returns. In truth, all NSIS types are strings, but the phantom types let us make conversions explicit and catch user errors.

  • Call-by-name - the programming language I've implemented is call-by-name, which seems to be a particularly convenient choice for embedded languages.

  • Optimisation - since the target is a programming language, I wrote eight optimisation passes, which I iterate over. The result is that the string replacement function above becomes 21 lines of compiled code, while the original hand-coded version is 32 lines. There's no real need for the optimisation pass (the code is rarely a bottleneck or a significant size cost), but writing optimisers is fun.

  • Generics - the front end and optimisation passes are heavily based on generic programming, in particular using Uniplate.

  • Two-stage programming - the first program runs and generates a second program that is then rerun. It is possible to do computation at either level, and most errors are only caught at one or other of those levels, not both.

  • Goto programming - while I provide concepts such as iff and while, the target language is exclusively goto based, which I translate down to.

  • Overloaded literals - I use the overloaded strings and numbers extensively. Haskell doesn't permit overloaded booleans, but if it did I'd use those too.



If any of these techniques are particularly interesting, please comment below, and I'll expand on that area in a future post.

2 comments:

Stephen Tetley said...

Hi Neil - that's cool. I 'd certainly be interested in how you deal with control flow, as I'm working on an EDSL with a similarly limited target language.

Neil Mitchell said...

Stephen: I'll see if I get time to post something - its quite an interesting topic.