Purely Functional Performance

We'll have to agree to disagree then. Functional programming help, but it doesn't help as much as FP adherents think. Meanwhile imperative languages have been making strides in parallel programming as well, driven by the same multicore environment that gets FP fans excited.

I've been writing parallel code since the 80's (starting on the Sequent Balance machine, later on the Silicon Graphics Origin and Thinking Machines Inc CM-1, then multiprocessor PC's), and it's gotten much much easier to write performant parallel code over the years. It's still not trivial, but it's definitely gotten easier to write manually parallelized code. Partly because in a lot of situations, it turns out that there's only one or two loop that really needs to be parallelized to get the benefit. If machines with 1000+ cores were common, then manual parallelization might not be practical. But they aren't. The Connection Machines back in the 80's had 65,536 CPU's, the bigger Azul machines today have 800 or so, and those are the environments that need parallelizing compilers. Hardly the stuff that will keep most non-academics up at night.

Meanwhile automatic parallelization still has a very hard time achieving its goals. Simply detecting parallelizable sections isn't sufficient, most of the opportunities you find aren't going to give you speedups due to the runtime overhead, and even so many of those are already being exploited by the superscalar CPU's. Even the Intel Itanium, which was designed to take advantage of smaller chunks of parallelism, failed because the compilers just couldn't make it happen. The problems at the smaller granularity aren't necessarily solvable by functional languages, not even pure nonstrict ones -- Haskell programs have a very high mutation rate, comparable to imperative languages. It's just hidden from the developer. But a parallelizing compiler has to take those mutations into account. So your hypothetical "sufficiently smart" parallelizing compiler is stuck trying to find long-running chunks of code separated from each other that it can schedule in parallel. This is a very hard problem. "Long running" is a very difficult standard for a compiler to detect (it's NP-Hard in the general case). Or you can just let the user give you hints, which is nearly as easy for the user and actually works today.

/r/programming Thread Parent Link - awelonblue.wordpress.com