One night, while drifting off to sleep (or failing to), I solved a conundrum that has puzzled me since 1987.
Before Haskell there was Orwell. In Orwell equations were checked to ensure order is unimportant (similar to Agda today). When an equation was to match only if no previous equation applied, it was to be preceded by ELSE
. Thus, equality on lists would be defined as follows:
(==) :: Eq a => [a] -> [a] -> Bool
[] == [] = True
(x:xs) == (y:ys) = x == y && xs == ys
ELSE
_ = _ = False
We pondered whether to include this restriction in Haskell. Further, we wondered whether Haskell should insist that order is unimportant in a sequence of conditionals, unless ELSE
was included. Thus, equality on an abstract type Shape would be defined as follows:
(==) :: Shape -> Shape -> Bool
x == y | circle x && circle y = radius x == radius y
| square x && square y = side x == side y
ELSE
| otherwise = False
In Orwell and early Haskell, guards were written at the end of an equation and preceded by the keyword if
or the end of an equation could be labelled *otherwise
. (Miranda was similar, but lacked the keywords.) Here I use the guard notation, introduced later by Paul Hudak, where otherwise
is a variable bound to True
.
Sometime two equations or two guards not separated by ELSE
might both be satisfied. In that case, we thought the semantics should ensure that both corresponding right-hand sides returned the same value, indicating an error otherwise. Thus, the following:
plus :: Thing -> Thing -> Thing
plus x y | zero x = y
| zero y = x
ELSE
| otherwise = ...
would be equivalent to:
plus :: Thing -> Thing -> Thing
plus x y | zero x && zero y && x == y = x
| zero x && zero y && x /= y = error "undefined"
| zero x && not (zero y) = y
| not (zero x) && zero y = x
| not (zero x) && not (zero y) = ...
Here the code checks that if x
and y
are both zero then they are the same. (I will consider a refinement to the check for sameness later.) Of course, the compiler would issue code that performs the tests zero x
, zero y
, and x == y
at most once.
We didn’t pursue this design in Haskell for two reasons. First, because we thought it might be too unfamiliar. Second, because the ELSE
on a line by itself was syntactically awkward. It would be especially annoying if one ever wanted the usual cascading behaviour:
f :: Thing -> Thing
f x | p x = ...
ELSE
| q x = ...
ELSE
| r x = ...
Here each guard is tested in turn, and we take the first that succeeds.
Today, the first problem is perhaps no longer quite so strong an issue. Many applications using Haskell would welcome the extra assurance from flagging any cases where order of the equations is significant. But the syntactic awkwardness of ELSE
remains considerable. It was syntax about which I had an insight while tossing in bed.
Above otherwise
is a variable bound to True
in the standard prelude. But say we were to treat otherwise
as a keyword, and to give it the meaning that the equation applies only if no previous equation applies, and to allow it to optionally be followed by a further guard. Then our first example becomes:
(==) :: Eq a => [a] -> [a] -> Bool
[] == [] = True
(x:xs) == (y:ys) = x == y && xs == ys
_ = _ | otherwise = False
And our second example becomes:
(==) :: Shape -> Shape -> Bool
x == y | circle x && circle y = radius x == radius y
| square x && square y = side x == side
| otherwise = False
And our third example becomes:
plus :: Thing -> Thing -> Thing
plus x y | zero x = y
| zero y = x
| otherwise = ...
If one doesn’t want to invoke the equality test in the case that both zero x
and zero y
hold then one would instead write:
plus :: Thing -> Thing -> Thing
plus x y | zero x = y
| otherwise zero y = x
| otherwise = ...
Similarly, the cascading example becomes:
f :: Thing -> Thing
f x | p x = ...
| otherwise q x = ...
| otherwise r x = ...
That’s it! The syntactic awkwardness is greatly reduced.
The proposed notation depends upon Paul’s clever insight to move the guard from the end of the equation to the middle, so evaluation works strictly left to right. But we’ve had guards in that position for quite a while now. Goodness knows why none of us hit upon this proposal thirty-odd years ago.
Of course, the change is not backward compatible. Changes to guards could be made backward compatible (with added ugliness) by using a different symbol than ‘|’ to mark guards with the new semantics. But now the old definition of (==)
should not be accepted without an otherwise
, and I cannot think of how to introduce that new semantics with a backward compatible syntax.
The solution, as with so much of Haskell nowadays, is to activate the new semantics with a pragma. Manual porting of legacy code would not be hard in most cases, and it would also be easy to write a tool that adds otherwise
whenever the equations are not easily shown to be independent of order.
John Hughes suggests a further refinement to the above. Using equality to check that the value of two equations is the same may not be appropriate if the values are computed lazily. Instead, he suggests that the plus
example should translates as follows:
plus :: Thing -> Thing -> Thing
plus x y | zero x && zero y = x `meet` y
| zero x && not (zero y) = y
| not (zero x) && zero y = x
| not (zero x) && not (zero y) = ...
Here we presume a type class
class Meet a where
meet : a -> a -> a
which confirms that the two arguments are the same and returns a value that is the same as both the arguments. For strict data types, two arguments are the same if they are equal.
instance Meet Integer where
x `meet` y | x == y = x
| otherwise = error "undefined"
For lazy data types, we check that they are the meet lazily.
instance Meet a => Meet [a] where
[] `meet` [] = []
(x:xs) `meet` (y:ys) = (x `meet` y) :: (xs `meet` ys)
meet _ _ | otherwise = error "undefined"
If the compiler could not verify that equations are disjoint, it would require that their right-hand sides have a type belonging to the class Meet
.
In most cases, one would hope the compiler could verify that equations are disjoint, and hence would not have to resort to meet
or additional checks. One might wish to allow a pragma to declare disjointness, permitting the compiler to assume, for instance, that x < y
and x >= y
are disjoint. An SMT solver could do much of the work of checking for disjointness.
In general, equations not separated with otherwise
would be checked to ensure they are disjoint or all give equivalent results. For example,
g :: Thing -> Thing
g x | p x = a x
| q x = b x
| otherwise r x = c x
| s x = d x
| otherwise t x = e x
would be equivalent to
g :: Thing -> Thing
g x | p x && q x = a x `meet` b x
| p x && not (q x) = a x
| q x && not (p x) = b x
| otherwise r x && s x = c x `meet` d x
| r x && not (s x) = c x
| s x && not (r x) = d x
| otherwise t x = e x
On the other hand, if we declared that p x
and q x
are disjoint, and the same for s x
and r x
, then the first code would instead compile to something equivalent to Haskell’s current behaviour,
g :: Thing -> Thing
g x | p x = a x
| otherwise q x = b x
| otherwise r x = c x
| otherwise s x = d x
| otherwise t x = e x
One drawback of this proposal is that the source code doesn’t directly indicate when extra tests and the use of meet
are required. An IDE might provide feedback to make explicit which tests are performed, or one might also add pragmas or additional syntax to reflect that information in the source.
I hope some reader might be keen to take this forward. What do you think?
No comments:
Post a Comment