The bug is that in the Hoogle web interface, user supplied data may end up being shown to the user without escaping. For example, searching for the number 1 results in an error message, which says "Parse Error: Unexpected character '1'". Unfortunately, in Hoogle 3, if that search string had been "1<u>2<u>", then the result would have been "Parse Error: Unexpected character '12'" - i.e. the number 2 would be underlined. If you try this same example in Hoogle 3.1, you do not get any formatting, and see the entered tags.
The bug could be provoked in several places:
- The error message, on a parse error.
- The input box, after a search had been entered.
- As the string listed as what the user searched for.
To perturb the input box would require entering a quote character ("), and to perturb the other instances would require an opening angle bracket (<).
I am fairly sure the severity of this bug is "incredibly low". As a result of entering a malicious query, the attacker could cause the page displayed to contain whatever was desired. However, the Hoogle online interface has no privileges beyond that of a normal web page, and so can't actually do anything evil. The bug does not permit any supplied code to be executed on the server.
There is only one malicious use I could think of: browser redirects. Sometimes evil companies will send out spam mail, with links such as "click here to go to example.com and order Viagra". One anti-spam measure is to reject all emails linking to a particular domain name. By crafting a URL, it was possible for a link to Hoogle to redirect to another domain, thus appearing that the initial link was to a trusted website. The spam recipient still goes to the original page, but it may defeat their spam filters.
Checking the server logs for Hoogle shows that no one ever actually exploited the flaw to perform a redirect, or even to insert a <script> tag - the first step to any such exploit.
I had to make two fixes to the code. I use Haskell Source Extensions to generate most of the HTML shown in Hoogle. As part of that, I have a ToXML class that automatically converts values to an XML representation, which is then rendered. The ToXML instance for String did not escape special HTML characters, now it does. I wrote the ToXML instances, instead of relying on those supplied in the associated HSP, and thus introduced the bug.
The only other code that generates HTML uses a formatted string type, which can represent hyperlinks and various formatting, and can be rendered as either console escape characters or as HTML. Since this part of the code was written before moving to Haskell Source Extensions, it generates raw strings. This generating code was also patched to escape certain characters.
As a result of using libraries and abstractions, it wasn't necessary to fix each of the security flaws one by one, but to fix the interface to the library. In doing so, I have much more confidence that all the security flaws have been tackled once and for all, and that they will not reoccur.
Is Haskell Insecure?
Enhanced security is one of the many advantages that Haskell offers. It is not possible to overrun a buffer and conduct stack smashing attacks on a Haskell program. Passing query strings will not overwrite global variables, and escaping cannot cause user code to be executed on the server. However, when Haskell code generates HTML, it is not immune from code injection attacks on the client side.
In the beginning Hoogle did not use any HTML generation libraries. As I have slowly moved towards Haskell Source Extensions, I have benefited from better guarantees about well-formed HTML. By creating appropriate abstractions, and dealing with concerns like escaping at the right level, and enforcing these decisions with appropriate types, the number of places to introduce a security bug is lowered. Hopefully Hoogle will not fall victim to such a security problem in future.