The good news is that digital audio signal processing is much easier than you think. The bad news is…well, there is no bad news. It’s really easy. The nuts and bolts implementation of a digital audio recording is an array representing the amplitude of the sound being sampled. And “amplitude” just means “loudness”. So an analog-to-digital encoder basically checks how loud the music is playing thousands of times per second and saves the results as an array. You could say it is a digital implementation of old timey AM (i.e. Amplitude Modulation) radio. The data type of the array depends on the bit depth of the recording. So, for instance, a 16 bit recording would use an array of short int (i.e. two bytes). That’s it. Simple, right? To understand why that is a sufficient format for representing audio, check out this link.

Note: High fidelity systems will often save the amplitude data in floating point format for greater precision. That is beyond the scope of a primer.


So let’s take the example of an audio mixing board. When mixing two or more 16 bit audio samples, you add together the corresponding values in each element of the sample arrays. Simple addition. More audio tracks just means more arrays.


So with that in mind, let’s take the example of a delay effect and see how it would be implemented. Delay is a common studio effect where a sound is essentially repeated within a few milliseconds of the original signal. If you keep the delay within the thresh hold of human hearing, say 15ms, then the ear hears both sounds as one. This creates a “thickening” of the signal and can help boost it in the mix. Knowing how mixing works, you can see this is just a case of copying the original signal and “mixing” it with itself after a delay. How to create the actual delay? Well, since each element in the sample array represents a fixed length of time, you can shift the copied signal over a number of array elements corresponding to the time delay and then start adding the amplitude values together. So, for instance, if you wanted to delay a 44.1k signal by 15ms, you’d shift 661 elements in the array and add the resulting array back in. In addition to boosting a track in the mix, delay is used as an audio effect in it’s own right. For example, if you set the delay to something like 250ms, then you get the “slapback” echo effect made popular in the 1950’s.


Now let’s look at reverb. Just as delay was based on mixing, reverb is based on delay. The reverb effect is trying to simulate the sound of audio bouncing off nearby walls in a room. It makes the audio sound like it is happening in a physical space and adds significantly to the realism. So this can be simulated by creating multiple delays with increasing delay lengths and decreasing volumes and decays. But to create a realistic effect, the delay needs to be implemented in a particular pattern. Specifically, it comes in the form of a “Comb Filter”, a method of repeating a signal that causes constructive and destructive interference when it is added back to itself. The most common comb filter algorithm for natural sounding reverb is called the Schroeder’s Algorithm.


Flange is also based on multiple delays. But instead of varying the delay using an algorithm like Schroeder’s, the flange algorithm varies the delay time according to a function – usually a sine wave. The sound effect you get from flange is something like a “whoosh” on top of the original signal. That whooshing sound is actually the variation in the interference patterns being created by the varying delay.