Neil Mitchell's Blog (Haskell etc): January 2010

Saturday, January 23, 2010

Optimising HLint

HLint is a tool for suggesting improvements to Haskell code. Recently I've put some effort in to optimisation and HLint is now over 20 times faster. The standard benchmark (running HLint over the source code of HLint) has gone from 30 seconds to just over 1 second. This blog post is the story of that optimisation, the dead ends I encountered, and the steps I took. I've deliberately included reasonable chunks of code in this post, so interested readers can see the whole story - less technical readers should feel free to skip them. The results of the optimisation are all available on Hackage, as new versions of hlint, uniplate and derive.

Before I start, I'd like to share my five guiding principles of optimisation:

Tests - make sure you have tests, so you don't break anything while optimising.

Profile - if you don't profile first, you are almost certainly optimising the wrong thing.

Necessity - only start the optimisation process if something is running too slowly.

Benchmark - use a sensible and representative test case to benchmark, to make sure you optimise the right thing.

Optimising - to make a function faster, either call it less, or write it better.

Below are the steps in the optimisation, along with their speed improvement.

Special support for Rational in Uniplate.Data, 30s to 10s

HLint uses Uniplate extensively. HLint works over large abstract syntax trees, from the library haskell-src-exts (HSE), so a generics library is essential. There are two main variants of Uniplate - Data builds on the Scrap Your Boilerplate (SYB) instances, and Direct requires special instances. HLint uses the Data variant, as it requires no instances to be written.

One of the advantages of Uniplate is that it generally outperforms most generics libraries. In particular, the variant written on top of Data instances is often many times faster than using SYB directly. The reason for outperforming SYB is documented in my PhD thesis. The essential idea is that Uniplate builds a "hit table", a mapping noting which types can be contained within which other types - e.g. that there is potentially an Int inside Either String [Int], but there isn't an Int inside String. By consulting this mapping while traversing a value, Uniplate is able to skip large portions, leading to an efficiency improvement.

When computing the hit table it is necessary for Uniplate to create dummy values of each type, which it then traverses using SYB functions. To create dummy values Uniplate uses undefined, unfortunately given the definition data Foo = Foo !Int the value Foo undefined will be forced due to the strictness annotation, and the code will raise an error - as described in bug 243. Uniplate 1.2 had a special case for Rational, which is the only type with strict components contained within HSE. Uniplate 1.3 fixed this problem more generally by catching the exception and turning off the hit table facility on such types. Unfortunately this caused Uniplate 1.3 to turn off the hit table for HSE, causing HLint to run three times slower.

The fix was simple, and pragmatic, but not general. In Uniplate 1.4 I reinstated the special case for Rational, so now HLint makes use of the hit table, and goes three times faster. A more general solution would be to manufacture dummy values for certain types (it's usually an Int or Integer that is a strict component), or to create concrete dummy values using SYB. It's interesting to observe that if HLint used SYB as it's generics library, it would not be using the hit table trick, and would run three times slower.

Use Uniplate.Direct, 10s to 5s, reverted

Uniplate also provides a Direct version, which performs better, but requires instances to be written. In order to further improve the speed of HLint I decided to try the Direct version. The biggest hurdle to using the Direct version is that many instances need to be generated, in the case of HLint it required 677. The first step was to modify the Derive tool to generate these instances (which Derive 2.1 now does), and to write a small script to decide which instances were necessary. With these instances in place, the time dropped to 5 seconds.

Unfortunately, the downside was that compilation time skyrocketed, and the instances are very specific to a particular version of HSE. While these problems are not insurmountable, I did not consider the benefit to be worthwhile, so reverted the changes. It's worth pointing out that most users of Uniplate won't require so many instances to use the Direct versions, and that a program can be switched between Direct and Data versions without any code changes (just a simple import). I also considered the possibility of discovering which Uniplate instances dominated and using the Direct method only for those - but I did not investigate further.

Add and optimise eqExpShell, 10s to 8s

The next step was to profile. I compiled and ran HLint with the following options:


ghc --make -O1 -prof -auto-all -o hlint
hlint src +RTS -p

Looking at the profiling output, I saw that the function unify took up over 60% of the execution time:


unify :: NameMatch -> Exp -> Exp -> Maybe [(String,Exp)]
unify nm (Do xs) (Do ys) | length xs == length ys = concatZipWithM (unifyStmt nm) xs ys
unify nm (Lambda _ xs x) (Lambda _ ys y) | length xs == length ys = liftM2 (++) (unify nm x y) (concatZipWithM unifyPat xs ys)
unify nm x y | isParen x || isParen y = unify nm (fromParen x) (fromParen y)
unify nm (Var (fromNamed -> v)) y | isUnifyVar v = Just [(v,y)]
unify nm (Var x) (Var y) | nm x y = Just []
unify nm x y | ((==) `on` descend (const $ toNamed "_")) x y = concatZipWithM (unify nm) (children x) (children y)
unify nm _ _ = Nothing

The function unify is an essential part of the rule matching in HLint, and attempts to compare a rule to a target expression, and if successful returns the correct substitution. Each rule is compared to every expression in all files, which means that unify is called millions of times in a normal HLint run. Looking closely, my first suspicion was the second line from the bottom in the guard - the call to descend and (==). This line compares the outer shell of two expressions, ignoring any inner expressions. It first uses the Uniplate descend function to insert a dummy value as each subexpression, then compares for equality. To test my hypothesis that this method was indeed the culprit I extracted it to a separate function, and modified unify to call it:


eqExpShell :: Exp_ -> Exp_ -> Bool
eqExpShell = (==) `on` descend (const $ toNamed "_")

I reran the profiling, and now all the time was being spent in eqExpShell. My first thought was to expand out the function to not use Uniplate. I quickly rejected that idea - there are expressions within statements and patterns, and to follow all the intricacies of HSE would be fragile and verbose.

The first optimisation I tried was to replace toNamed "_", the dummy expression, with something simpler. The toNamed call expands to many constructors, so instead I used Do an [] (an is a dummy source location), which is the simplest expression HSE provides. This change had a noticeable speed improvement.


eqExpShell :: Exp_ -> Exp_ -> Bool
eqExpShell = (==) `on` descend (const $ Do an [])

My second thought was to add a quick test, so that if the outer constructors were not equal then the expensive test was not tried. Determining the outer constructor of a value can be done by calling show then only looking at the first word (assuming a sensible Show instance, which HSE has).


eqExpShell :: Exp_ -> Exp_ -> Bool
eqExpShell x y =
    ((==) `on` constr) x y &&
    ((==) `on` descend (const $ Do an [])) x y
    where constr = takeWhile (not . isSpace) . show

This change had a bigger speed improvement. I found that of the 1.5 million times eqExpShell was called, the quick equality test rejected 1 million cases.

My next thought was to try replacing constr with the SYB function toConstr. There was no noticeable performance impact, but the code is neater, and doesn't rely on the Show instance, so I stuck with it. After all these changes HLint was 2 seconds faster, but eqExpShell was still the biggest culprit on the profiling report.

Write eqExpShell entirely in SYB, 8s to 8s, reverted

My next thought was to rewrite eqExpShell entirely in SYB functions, not using the Eq instance at all. The advantages of this approach would be that I can simply disregard all subexpressions, I only walk the expression once, and I can skip source position annotations entirely. Starting from the geq function in SYB, I came up with:


data Box = forall a . Data a => Box a

eqExpShellSYB :: Exp_ -> Exp_ -> Bool
eqExpShellSYB = f
    where
        f :: (Data a, Data b) => a -> b -> Bool
        f x y = toConstr x == toConstr y &&
                and (zipWith g (gmapQ Box x) (gmapQ Box y))

        g (Box x) (Box y) = tx == typeAnn || tx == typeExp || f x y
            where tx = typeOf x

typeAnn = typeOf an
typeExp = typeOf ex

Unfortunately, this code takes exactly the same time as the previous version, despite being significantly more complex. My guess is that toConstr is not as fast as the Eq instance, and that this additional overhead negates all the other savings. I decided to revert back to the simpler version.

Call eqExpShell less, 8s to 4s

Having failed to optimise eqExpShell further, I then thought about how to call it less. I added a trace and found that of the 1.5 million calls, in 1.3 million times at least one of the constructors was an App. Application is very common in Haskell programs, so this is not particularly surprising. By looking back at the code for unify I found several other constructors were already handled, so I added a special case for App.


unify nm (App _ x1 x2) (App _ y1 y2) = liftM2 (++) (unify nm x1 y1) (unify nm x2 y2)

It is easy to show that if an App (or any specially handled constructor) is passed as either argument to eqExpShell then the result will be False, as if both shells had been equal a previous case would have matched. Taking advantage of this observation, I rewrote the line with the generic match as:


unify nm x y | isOther x && isOther y && eqExpShell x y = concatZipWithM (unify nm) (children x) (children y)
    where 
        isOther Do{} = False
        isOther Lambda{} = False
        isOther Var{} = False
        isOther App{} = False
        isOther _ = True

With this change the eqExpShell function was called substantially less, it disappeared from the profile, and the speed improved to 4 seconds.

Fix Uniplate bug, 4s to 1.3s

The next step was to rerun the profiling. However, the results were very confusing - almost 70% of the execution time was recorded to three CAF's, while I could see no obvious culprit. I reran the profiler with the -caf-all flag to more precisely report the location of CAF's, and was again confused - the optimiser had rearranged the functions to make -caf-all useless. I then reran the profiler with optimisation turned off using -O0 and looked again. This time the profiling clearly showed Uniplate being the source of the CAF's. The hit table that Uniplate creates is stored inside a CAF, so was an obvious candidate.

Turning back to Uniplate, I attempted to reproduce the bug outside HLint. I enhanced the benchmarking suite by adding a method to find all the String's inside very small HSE values. The standard Uniplate benchmarks are for testing the performance of running code, and I had neglected to check the creation of the hit table, assuming it to be negligible.


testN "Module" $ Module ssi Nothing [] [] []
testN "Decl" $ FunBind ssi []
testN "Exp" $ Do ssi []
testN "Pat" $ PWildCard ssi

testN :: Biplate a String => String -> a -> IO ()
testN msg x = do
    t <- timer $ evaluate $ length (universeBi x :: [String])
    putStrLn $ "HSE for " ++ msg ++ " takes " ++ dp2 t ++ "s"

My initial worry was that at 2 decimal places I was likely to see 0.00 for all values. However, that turned out not to be a problem! What I saw was:


HSE for Module takes 0.54
HSE for Decl takes 0.86
HSE for Exp takes 2.54
HSE for Pat takes 0.32

These results surprised me. In particular, the hit table from Exp to String is a subset of the Module one, so should always be faster. The computation of the hit table is reasonably complex, and I was unable to isolate the problem. The tricky part of the hit table is that it is necessary to take the fixed point of the transitive closure of reachability - I had tried to keep track of recursive types and reach a fixed point with the minimal number of recomputations. I clearly failed, and probably had an exponential aspect to the algorithm that under certain circumstances caused ridiculously bad behaviour.

Rather than try to diagnose the bug, I decided instead to rethink the approach, and simplify the design. In particular, the fixed point of the transitive closure is now written as:


fixEq trans (populate box)

Where populate finds the immediate children of a constructor, trans takes the transitive closure based on the current state, and fixEq takes the fixed point. Using this new simpler design I was also able to compute which types contained with types recursively and cache it, meaning that now computing the hit table for Module not only computes the hit table for Exp, but does so in a way that means the result can be reused when asking about Exp. After rewriting the code I reran the benchmark:


HSE for Module takes 0.03
HSE for Decl takes 0.00
HSE for Exp takes 0.00
HSE for Pat takes 0.00

I had hoped that the rewrite would fix the performance problems, and it did. I have not diagnosed the original performance bug, but at 0.03 seconds I was satisfied. I have now released Uniplate 1.5 with the revised hit table code. With this change, the time for HLint drops to 1.3 seconds and all the CAF's went away from the profile.

Sharing computation, 1.3s to 1.2s

At this point, I was happy to finish, but decided to profile just one last time. The top function in the list was matchIdea:


matchIdea :: NameMatch -> Setting -> Exp_ -> Maybe Exp_
matchIdea nm MatchExp{lhs=lhs,rhs=rhs,side=side} x = do
    u <- unify nm (fmap (const an) lhs) (fmap (const an) x)
    u <- check u
    guard $ checkSide side u
    let rhs2 = subst u rhs
    guard $ checkDot lhs rhs2
    return $ unqualify nm $ dotContract $ performEval rhs2

This function first strips both the rule (lhs) and the expression (x) of their source position information to ensure equality works correctly. However, both the rules and expressions are reused multiple times, so I moved the fmap calls backwards so each rule/expression is only ever stripped of source position information once. With this change the runtime was reduced to 1.2 seconds.

Final Results

After all that effort, I reran the profile results and got:


parseString                    HSE.All               16.2   17.7
matchIdea                      Hint.Match            14.4    1.6
eqExpShellSlow                 HSE.Eq                 9.2   11.4
listDecl                       Hint.List              6.1    4.1
lambdaHint                     Hint.Lambda            5.1    5.6
bracketHint                    Hint.Bracket           4.1    6.4
hasS                           Hint.Extensions        4.1    5.0
monadHint                      Hint.Monad             4.1    2.9
~=                             HSE.Match              4.1    2.5
isParen                        HSE.Util               3.1    0.0

Now the largest contributor to the HLint runtime is the parsing of Haskell files. There are no obvious candidates for easy optimisations, and the code runs sufficiently fast for my purposes.

Conclusions

There is still some scope for optimisation of HLint, but I leave that for future work. One possible avenue for exploration would be turning on selected packages of hints, to see which one takes the most time - profiling on a different measure.

In optimising HLint I've found two issues in Uniplate, the first of which I was aware of, and the second of which came as a total surprise. These optimisations to Uniplate will benefit everyone who uses it. I have achieved the goal of optimising HLint, simply by following the profile reports, and as a result HLint is now substantially faster than I had ever expected.

Footnote: I didn't actually profile first, as I knew that a performance regression was caused by the upgrade to Uniplate 1.3, so knew where to start looking. Generally, I would start with a profile.

Thursday, January 14, 2010

Better .ghci files

A few days ago I posted about my plan to use .ghci files for all my projects. I am now doing so in at least five projects, and it's working great. There were two disadvantages: 1) every command had to be squeezed on to a single line; 2) some names were introduced into the global namespace. Thanks to a hint from doliorules, about :{ :} I can eliminate these disadvantages.

Let's take the previous example from HLint's .ghci file:


let cmdHpc _ = return $ unlines [":!ghc --make -isrc -i. src/Main.hs -w -fhpc -odir .hpc -hidir .hpc -threaded -o .hpc/hlint-test",":!del hlint-test.tix",":!.hpc\\hlint-test --help",":!.hpc\\hlint-test --test",":!.hpc\\hlint-test src --report=.hpc\\hlint-test-report.html +RTS -N3",":!.hpc\\hlint-test data --report=.hpc\\hlint-test-report.html +RTS -N3",":!hpc.exe markup hlint-test.tix --destdir=.hpc",":!hpc.exe report hlint-test.tix",":!del hlint-test.tix",":!start .hpc\\hpc_index_fun.html"]
:def hpc cmdHpc

It work's, but it's ugly. However, it can be rewritten as:


:{
:def hpc const $ return $ unlines
    [":!ghc --make -isrc -i. src/Main.hs -w -fhpc -odir .hpc -hidir .hpc -threaded -o .hpc/hlint-test"
    ,":!del hlint-test.tix"
    ,":!.hpc\\hlint-test --help"
    ,":!.hpc\\hlint-test --test"
    ,":!.hpc\\hlint-test src --report=.hpc\\hlint-test-report.html +RTS -N3"
    ,":!.hpc\\hlint-test data --report=.hpc\\hlint-test-report.html +RTS -N3"
    ,":!hpc.exe markup hlint-test.tix --destdir=.hpc"
    ,":!hpc.exe report hlint-test.tix"
    ,":!del hlint-test.tix"
    ,":!start .hpc\\hpc_index_fun.html"]
:}

The :{ :} notation allows multi-line input in GHCi. GHCi also allows full expressions after a :def. Combined, we now have a much more readable .ghci file.

Saturday, January 09, 2010

Using .ghci files to run projects

I develop a reasonable number of different Haskell projects, and tend to switch between them regularly. I often come back to a project after a number of months and can't remember the basics - how to load it up, how to test it. Until yesterday, my technique was to add a ghci.bat file to load the project, and invoke the tests with either :main test or :main --test. For some projects I also had commands to run hpc, or perform profiling. Using .ghci files, I can do much better.

All my projects are now gaining .ghci files in the root directory. For example, the CmdArgs project has:


:set -w -fwarn-unused-binds -fwarn-unused-imports
:load Main

let cmdTest _ = return ":main test"
:def test cmdTest

The first line sets some additional warnings. I usually develop my projects in GHCi with lots of useful warnings turned on. I could include these warnings in the .cabal file, but I'd rather have them when developing, and not display them when other people are installing.

The second line loads the Main.hs file, which for CmdArgs is the where I've put the main tests.

The last two lines define a command :test which when invoked just runs :main test, which is how I run the test for CmdArgs.

To load GHCi with this configuration file I simply change to the directory, and type ghci. It automatically loads the right files, and provides a :test command to run the test suite.

I've also converted the HLint project to use a .ghci file. This time the .ghci file is slightly different, but the way I load/test my project is identical:


:set -fno-warn-overlapping-patterns -w -fwarn-unused-binds -fwarn-unused-imports
:set -isrc;.
:load Main

let cmdTest _ = return ":main --test"
:def test cmdTest

let cmdHpc _ = return $ unlines [":!ghc --make -isrc -i. src/Main.hs -w -fhpc -odir .hpc -hidir .hpc -threaded -o .hpc/hlint-test",":!del hlint-test.tix",":!.hpc\\hlint-test --help",":!.hpc\\hlint-test --test",":!.hpc\\hlint-test src --report=.hpc\\hlint-test-report.html +RTS -N3",":!.hpc\\hlint-test data --report=.hpc\\hlint-test-report.html +RTS -N3",":!hpc.exe markup hlint-test.tix --destdir=.hpc",":!hpc.exe report hlint-test.tix",":!del hlint-test.tix",":!start .hpc\\hpc_index_fun.html"]
:def hpc cmdHpc

The first section turns on/off the appropriate warnings, then sets the include path and loads the main module.

The second section defines a command named :test, which runs the tests.

The final section defines a command named :hpc which runs hpc and pops up a web browser with the result. Unfortunately GHC requires definitions entered in a .ghci file to be on one line, so the formatting isn't ideal, but it's just a list of commands to run at the shell.

Using a .ghci file for all my projects has a number of advantages:

I have a consistent interface for all my projects.

Typing :def at the GHCi prompt says which definitions are in scope, and thus which commands exist for this project.

I've eliminated the Windows specific .bat files.

The .ghci file mechanism is quite powerful - I've yet to explore it fully, but could imagine much more complex commands.

Update: For a slight improvement on this technique see this post.

Thanks to Gwern Branwen for submitting a .ghci file for running HLint, and starting my investigation of .ghci files.

Sunday, January 03, 2010

Explaining Haskell IO without Monads

This tutorial explains how to perform IO in Haskell, without attempting to give any understanding of monads. We start with the simplest example of IO, then build up to more complex examples. You can either read the tutorial to the end, or stop at the end of any section - each additional section will let you tackle new problems. We assume basic familiarity with Haskell, such as the material covered in chapters 1 to 6 of Programming in Haskell by Graham Hutton.

IO Functions

In this tutorial I use four standard IO functions:

readFile :: FilePath -> IO String -- read in a file

writeFile :: FilePath -> String -> IO () -- write out a file

getArgs :: IO [String] -- get the command line arguments, from the module System.Environment

putStrLn :: String -> IO () -- write out a string, followed by a new line, to the console

Simple IO

The simplest useful form of IO is to read a file, do something, then write out a file.


main :: IO ()
main = do
   src <- readFile "file.in"
   writeFile "file.out" (operate src)

operate :: String -> String
operate = ... -- your code here

This program gets the contents of file.in, runs the operate function on it, then writes the result to file.out. The main function contains all the IO operations, while operate is entirely pure. When writing operate you do not need to understand any details of IO. This pattern of IO was sufficient for my first two years of programming Haskell.

Action List

If the pattern described in Simple IO is insufficient, the next step is a list of actions. A main function can be written as:


main :: IO ()
main = do
    x1 <- expr1
    x2 <- expr2
    ...
    xN <- exprN
    return ()

The main function starts with do, then has a sequence of xI <- exprI statements, and ends with return (). Each statement has a pattern on the left of the arrow (often just a variable), and an expression on the right. If the expression is not of type IO, then you must write xI <- return (exprI). The return function takes a value, and wraps it in the IO type.

As a simple example we can write a program that gets the command line arguments, reads the file given by the first argument, operates on it, then writes out to the file given by the second argument:


main :: IO ()
main = do
    [arg1,arg2] <- getArgs
    src <- readFile arg1
    res <- return (operate src)
    _ <- writeFile arg2 res
    return ()

As before, operate is a pure function. The first line after the do uses a pattern match to extract the command line arguments. The second line reads the file specified by the first argument. The third line uses return to wrap a pure value. The fourth line provides no useful result, so we ignore it by writing _ <-.

Simplifying IO

The action list pattern is very rigid, and people usually simplify the code using the following three rules:

_ <- x can be rewritten as x.

If the penultimate line doesn't have a binding arrow (<-) and is of type IO (), then the return () can be removed.

x <- return y can be rewritten as let x = y (provided you don't reuse variable names).

With these rules we can rewrite our example as:


main :: IO ()
main = do
    [arg1,arg2] <- getArgs
    src <- readFile arg1
    let res = operate src
    writeFile arg2 res

Nested IO

So far only the main function has been of type IO, but we can create other IO functions, to wrap up common patterns. For example, we can write a utility function to print nice looking titles:


title :: String -> IO ()
title str = do
    putStrLn str
    putStrLn (replicate (length str) '-')
    putStrLn ""

We can use this title function multiple times within main:


main :: IO ()
main = do
    title "Hello"
    title "Goodbye"

Returning IO Values

The functions we've written so far have all been of type IO (), which lets us perform IO actions, but not give back interesting results. To give back the value x, we write return x as the final line of the do block. Unlike the imperative language return statement, this return must be on the final line.


readArgs :: IO (String,String)
readArgs = do
    xs <- getArgs
    let x1 = if length xs > 0 then xs !! 0 else "file.in"
    let x2 = if length xs > 1 then xs !! 1 else "file.out"
    return (x1,x2)

This function returns the first two command line arguments, or supplies default values if fewer arguments are given. We can now use this in the main program from before:


main :: IO ()
main = do
    (arg1,arg2) <- readArgs
    src <- readFile arg1
    let res = operate src
    writeFile arg2 res

Now, if less than two arguments are given, the program will use default file names instead of crashing.

Optional IO

So far we've only seen a static list of IO statements, executed in order. Using if, we can choose what IO to perform. For example, if the user enters no arguments we can tell them:


main :: IO ()
main = do
    xs <- getArgs
    if null xs then do
        putStrLn "You entered no arguments"
     else do
        putStrLn ("You entered " ++ show xs)

For optional IO you make the final statement of the do block an if, then under each branch continue the do. The only subtle point is that the else must be indented by one more space than the if. This caveat is widely considered to be a bug in the definition of Haskell, but for the moment, the extra space before the else is required.

Break Time

If you've gone from understanding no IO to this point in the tutorial, I suggest you take a break (a cookie is recommended). The IO presented above is all that imperative languages provide, and is a useful starting point. Just as functional programming provides much more powerful ways of working with functions by treating them as values, it also allows IO to be treated as values, which we explore in the rest of the tutorial.

Working with IO Values

The next stage is to work with IO as values. Until now, all IO statements have been executed immediately, but we can also create variables of type IO. Using our title function from above we can write:


main :: IO ()
main = do
    let x = title "Welcome"
    x
    x
    x

Instead of running the IO with x <-, we have placed the IO value in the variable x, without running it. The type of x is IO (), so we can now write x on a line to execute the action. By writing the x three times we perform the action three times.

Passing IO Arguments

We can also pass IO values as arguments to functions. In the previous example we ran the IO action three times, but how would we run it fifty times? We can write a function that takes an IO action, and a number, and runs the action that number of times:


replicateM_ :: Int -> IO () -> IO ()
replicateM_ n act = do
    if n == 0 then do
        return ()
     else do
        act
        replicateM_ (n-1) act

This definition makes use of optional IO to decide when to stop, and recursion to continue performing the IO. We can now rewrite the previous example as:


main :: IO ()
main = do
    let x = title "Welcome"
    replicateM_ 3 x

In an imperative language the replicateM_ function is built in as a for statement, but the flexibility of Haskell allows us to define new control flow statements - a very powerful feature. The replicateM_ function defined in Control.Monad is like ours, but more general, and can be used instead.

IO in Structures

We've seen IO values being passed as arguments, so it's natural that we can also put IO in structures such as lists and tuples. The function sequence_ takes a list of IO actions, and executes each action in turn:


sequence_ :: [IO ()] -> IO ()
sequence_ xs = do
    if null xs then do
        return ()
     else do
        head xs
        sequence_ (tail xs)

If there are no elements in the list then sequence_ stops, with return (). If there are elements in the list then sequence_ gets the first action (with head xs) and executes it, then calls sequence_ on the remaining actions. As before, sequence_ is available in Control.Monad, but in a more general form. It is now simple to rewrite replicateM_ in terms of sequence_:


replicateM_ :: Int -> IO () -> IO ()
replicateM_ n act = sequence_ (replicate n act)

Pattern Matching

A much more natural definition of sequence_, rather than using null/head/tail, is to make use of Haskell's pattern matching. If there is exactly one statement in a do block, you can remove the do. Rewriting sequence_ we can eliminate the do after the equals sign, and the do after the then keyword.


sequence_ :: [IO ()] -> IO ()
sequence_ xs =
    if null xs then
        return ()
     else do
        head xs
        sequence_ (tail xs)

Now we can replace the if with pattern matching, without needing to consider the IO:


sequence_ :: [IO ()] -> IO ()
sequence_ [] = return ()
sequence_ (x:xs) = do
    x
    sequence_ xs

Final Example

As a final example, imagine we wish to perform some operation on every file given at the command line. Using what we have already learnt, we can write:


main :: IO ()
main = do
    xs <- getArgs
    sequence_ (map operateFile xs)

operateFile :: FilePath -> IO ()
operateFile x = do
    src <- readFile x
    writeFile (x ++ ".out") (operate src)

operate :: String -> String
operate = ...

IO Design

A Haskell program usually consists of an outer IO shell calling pure functions. In the previous example main and operateFile are part of the IO shell, while operate and everything it uses are pure. As a general design principle, keep the IO layer small. The IO layer should concisely perform the necessary IO, then delegate to the pure part. Use of explicit IO in Haskell is necessary, but should be kept to a minimum - pure Haskell is where the beauty lies.

Where to go now

You should now be equipped to do all the IO you need. To become more proficient I recommend any of the following:

Write lots of Haskell code.

Read chapters 8 and 9 of Programming in Haskell by Graham Hutton. You should expect to spend about 6 hours thinking and contemplating on sections 8.1 to 8.4 (I recommend going to a hospital A&E department with a minor injury).

Read Monads as Containers, an excellent introduction to monads.

Look at the documentation on the monad laws, and find where I've used them in this tutorial.

Read through all the functions in Control.Monad, try to define them, and then use them when writing programs.

Implement and use a state monad.

Neil Mitchell's Blog (Haskell etc)