Why GADTs matter for performance

If we care about performance, both solutions are a bit lacking. In the first case, we must allocate an option in both functions. In the second case, we must setup an exception handler. We might decide that both are not good enough and write out both functions by hand. Of course, this is a toy example, so it doesn't really matter if we repeat some code here, but imagine if this is a bit more complicated, e.g. finding and removing a node from a balanced binary tree.

Here is how we can use GADTs to solve this problem.

module M3 : S = struct
  type 'a t = 'a list

  type (_,_) r = 
  | Exn : ('a, 'a) r 
  | Opt : ('a, 'a option) r 

  let found : type a s . a -> (a,s) r -> s = fun a r ->
    match r with
    | Exn -> a
    | Opt -> Some a

  let not_found : type a s . (a,s) r -> s = 
    function
    | Exn -> raise Not_found
    | Opt -> None

  let rec find_gen t f r =
    match t with
    | [] -> not_found r
    | hd::tl ->
      if f hd then found hd r else find_gen tl f r

  let find     t f = find_gen t f Opt
  let find_exn t f = find_gen t f Exn
end

We have some extra verbosity here, namely functions "found" and "not_found" which have to be type annotated because type checker otherwise won't be able to infer the types. I could have inlined them inside "find_gen" but in a larger module it makes sense to make them separate since they can then be reused.

"find_gen" is almost the same as out original optimal functions. There is no unnecessary allocation (Exn and Opt constructors have no arguments), nor setting up of exception handlers. There is an extra match against the "r" argument, but this is just one machine instruction so it is basically negligeble.

/r/ocaml Thread Parent Link - blogs.janestreet.com