Audio Signal Flow

Wall of text incoming.

When the signal hits the ADC it passes through a DC blocker and an antialiasing filter before hitting the actual converter. Old school converters used comparators, which you can think of as staged clippers at 2N signal levels, where N is the bit depth of the signal. These days they use what are called delta-sigma modulation, the wiki article has a good overview how it works.

This spits out a stream of pulses where the pulse width corresponds to the signal level. You will have one ∆S modulator for each channel, usually synched to the same clock running at Fs × M, where M is the number of channels. These pulses are interleaved, where every Mth pulse corresponds to the Mth channel, this allows you to transmit all the channels along a pair of wires (one for data, one for clock).

The converter exchanges the data to the processor usually over DMA, where it is buffered in shared memory between the converter and the processor. Once the buffer is filled, an interrupt is triggered to alert the OS to do something with the audio, and it exchanges buffers back to the converter with the previously filled buffer (normally). This is an interrupt driven architecture, and it's common on most low latency and embedded systems.

A different approach is to use a polling architecture, where the drivers will continuously poll the device for the next available buffer. This jives better with the highly asynchronous architecture of modern OSes.

Now this all goes on in the background, from boot to shutdown. When a program wants to send audio data, it alerts the OS through the drivers that it wants to open a stream. The OS will schedule a callback on a high priority thread (although on some drivers, it's the responsibility of the user to handle that part), where the OS will copy the buffered audio from physical memory to the virtual address space of the program. The callback is supplied by the programmer. Once the callback is done, the output is copied back to physical memory and exchanged with the device over DMA.

There's a little more going on under the hood, and it has to do with two problems. The first is that most people have multiple devices, as in different input and output devices. Sometimes it is the responsibility of the programmer to synchronize multiple callbacks for streaming from input to output, other times the drivers will support this for you. The second problem is that multiple programs might want to stream to the same device, which means that the OS will have to handle priority and mixing of multiple callback buffers after their callbacks run and before the memory is shipped to the device.

The second problem can be avoided using exclusive modes, where a program requests single access to a device. This minimizes latency, but only one stream is active at a time.

There are additional problems. When I say that memory is copied from the program address space to physical memory, what actually happens is called a "context" switch where the OS moves data from user space to kernel space, which is expensive and takes a good but of time to complete.

Another issue is that you have a finite amount of time for the callback to run. If you take too long, memory is gone and you get audible artifacts. For this reason, you can't use blocking operations like file IO or allocation in the callback without introducing a nonzero probability of artifacts. However, the OS has no way to treat your callback differently than any other function, so the honus is on you to make sure you don't do it.

/r/DSP Thread