On the suggestion of Mark, I decided to revisit the idempotence optimization in OCaml to see how it fares. As you can see, it fares very well:

Benchmarks for OCaml show it is already performing the optimization

The speedup in the unoptimized version can plausibly be explained by the optimized version having an extra conditional branch. Conclusion: GHC sucks. We all kind of knew this anyway. It might be worth submitting a patch to GHC, though, to clean this up. This sort of thing is very easy to do, and has big performance implications.

In the functional world, updates in place are modelled semantically by recreating the data structure (with something changed). This is a very common idiom, and one that should probably be optimized away.