ELI5: How does software "understand" coding languages?

Okay welcome to the exciting world of COMPILERS! In short, compilers are magic.

How do they work:

Step 0: Pre-processing

If a line starts with a hashtag (pound) symbol then the compiler does something special. #include <stdio.h> tells the compiler to look for stdio.h and compile that, as part of the program.

You can do a lot of wizardry with the pre-processor in C. Yes wizardry is the right term.

Step 1: Parsing

This converts your program into whats called, An Abstract Syntax Tree. This is basically a list of dependencies, or in simple terms:

  "Hello Word!\n"
  printf( )

For your program to work properly "Hello World!\n" needs to exist someplace, before "printf" can run. And you expect printf will need to know hello world.

Step 2: What does that even mean?!

Now the compiler will attempt to figure out what you are saying. What is printf? Where did you find printf? Is printf real? What about main? return? import? stdlio.h?

Okay. Some of these are key words which mean they have special function, you can't use them in a program, because the programming language uses them. For example, return tells the compiler, this function is finished here. If you try to say something like

 int return;

You'll get an error.

So eventually the compiler figures out printf was defined in stdio.

Step 3: IR Means Intermediate Representation

Now your compiler translates your program to IR, or Intermediate Representation. Remember when I said compiler people aren't creative? IR is a middle ground between humans can read this and computers can read this. The goal of IR is search for patterns, and apply a different pattern that is faster, but will do the same thing.

This stage is all about optimizations.

Step 4: Translation

Now your program is finally translated into a system binary. Which involves translating the IR to assembly, and linking several assembly blobs together in one big blob.

Step 5: Packaging

Now your program is wrapped. Into a proper binary. This basically puts a lot of binary data on the front of your program that tells your OS how to execute this program.

And that's it.

/r/explainlikeimfive Thread