How much data is copied when updating a field deep within a data structure?

Pattern matching doesn't necessarily force the field as a whole, though depending on the pattern, it might force deeper parts of the fields data structure. Pattern matching forces as little/much as it needs to force in order to check the pattern actually matches.

If the pattern matches the data constructor, though, Haskell has to force that data constructor - even if there's only one data constructor for the type. The reason is that undefined (a nonterminating computation) exists in all types, and for some type with data constructor DC, the value undefined is different to the value DC undefined (forcing the data constructor itself may not terminate, and that's different from when forcing the field within the data constructor doesn't terminate).

The library definition of undefined is actually undefined = undefined so that mechanically resolving it is an infinitely self-referencing loop. No matter how deeply you force, you never find a data constructor. It's a library function because, so long as you never force it, it's still usable (and occasionally useful) in a lazy language.

Anyway, needing to force the data constructor for a type that only has one is a bit weird, but it's necessary to keep formal semantics people happy, even though Fast and loose reasoning is morally correct WRT ignoring undefined for e.g. typeclass laws.

There are provisos WRT newtype, strict components and unboxed components - as I only discovered today, strict components aren't necessarily unboxed.

Just because you force a component, doesn't mean you need to copy it. For immutable data, everything referencing that data should benefit from the fact it has now been computed. This kind of thing is covered in analysis of time and space complexity for pure functional algorithms - see mainly Okasaki - Purely Functional Data Structures (there's a book, but also the thesis (PDF). When a structure is forced purely to read or match it, no copying is needed - everything referencing the same structure benefits.

This implies there are some in-place updates involved in forcing things, but that's hidden inside the lazy-evaluation thunks - something for the compiler to worry about, though it does suggest for costs-of-concurrency purposes that pure, immutable data structures aren't quite so immutable as some advertising claims suggest - in principle, every thunk mutates (at most) once, from an unforced state to a forced state.

/r/haskell Thread Parent