What is sound
Sound on the Physical Level
Section titled “Sound on the Physical Level”Sound is a vibration transmitted through a medium (air, water, or solids) as alternating regions of high andlow pressure. Imagine a Slinky being compressed and released: the coils push and pull each other,creating a chain of compression and rarefaction that travels through the medium . These longitudinalwaves move at a fixed speed determined by the medium (roughly 343 m/s in air at 20 °C). In water ormetal the particles are closer together, so sound travels faster. Note that unlike light, sound cannottravel through a vacuum because it needs material particles to carry the vibration. In other words without a medium, such as in outer space (there is no sound in space!!!), sound cannot travel.
As sound radiates from a point source (like a speaker or drum), its energy spreads spherically. Thisspreading obeys the inverse-square law: doubling the distance from the source causes the sound intensity to drop by ~6 dB (four times less power). In practical terms, you hear a sound as quieter the farther you are from it, because its pressure pulses spread over a larger area of the wavefront.
How Our Ears Hear Sound
Section titled “How Our Ears Hear Sound”The human ear converts sound pressure waves into nerve impulses through three stages. First, theouter ear (pinna and ear canal) funnels incoming air pressure waves to the eardrum (tympanic membrane), causing it to vibrate. These vibrations pass through the middle ear, where three tiny bones (ossiclescalled malleus, incus, stapes) amplify the motion. The stapes pushes on the fluid-filled cochlea in theinner ear, creating a traveling fluid wave along the basilar membrane.
Sensory hair cells on the basilar membrane ride this wave: as the fluid pushes the membrane up anddown, hair cells bend. Bending opens ion channels at the hair-cell tips, generating tiny electrical signals. These electrical impulses are carried by the auditory nerve to the brain, which interprets them as sound .In sum, outer-ear anatomy collects sound, middle-ear bones amplify it, and the inner ear’s hair cellstransduce vibrations into neural signals.
How Sound Is Captured
Section titled “How Sound Is Captured”A microphone is essentially a transducer that converts sound pressure waves into an analog electrical signal. Inside a mic, a thin diaphragm vibrates in response to air pressure changes. In a dynamicmicrophone, this diaphragm carries a small coil of wire that moves in a magnetic field, inducing a voltage(electromagnetic induction). In a condenser microphone, the diaphragm itself forms one plate of acapacitor: as it moves, the capacitance (and thus voltage) changes. (Ribbon and piezo mics are othertypes, but dynamic and condenser are the most common for recording.) The microphone’s tiny electricalsignal usually goes through a preamplifier for level boosting.
Once a sound is converted to an analog electrical waveform, it can be recorded or processed. In modern digital recording, this analog signal is fed into an Analog-to-Digital Converter (ADC). The ADC samples the continuous waveform at high rates (e.g. 44.1 kHz) and quantizes its amplitude into binary numbers (bit depth). The result is a digital audio stream that represents the original sound. In short, a mic + preamp yields an analog voltage; an ADC then digitizes this waveform into digital audio.
How Speakers Reproduce Sound
Section titled “How Speakers Reproduce Sound”A loudspeaker works as the mirror image of a microphone. It takes an electrical audio signal (from an amp)and converts it back into moving air. The heart of a speaker is the driver: a voice coil (a coil of wire) attachedto a flexible cone or diaphragm. The coil sits in the gap of a strong permanent magnet . When the audiocurrent flows through the coil, it creates a changing electromagnetic field. This field alternately attracts andrepels the coil against the magnet. As a result, the coil (and the cone attached to it) moves back and forthrapidly .
The moving cone pushes and pulls on the air, creating pressure waves that match the electrical waveform.When the cone moves out (forward), it compresses air in front of it; when it moves in (back), it rarefies air. These compressed/rarefied regions propagate as sound waves. The speaker’s suspension (surroundand spider) keep the cone centered and allow it to move freely, but the essential parts are the coil, cone,and magnet. Good speakers are designed to reproduce the voltage waveform as accurate mechanicalmotion of the diaphragm, thus reconstructing the original sound pressures.
Waveform Fundamentals
Section titled “Waveform Fundamentals”A sound waveform has several key properties:
Frequency (Pitch) - The frequency is the number of cycles (oscillations) per second, measured inHertz (Hz). In sound, frequency determines pitch: higher frequency means higher pitch and vice versa. For example, 440 Hz is the A above middle C on a piano; 880 Hz (double) is one octavehigher. The audio range of human hearing is roughly 20 Hz to 20 kHz . Lower-frequency soundshave longer wavelengths; higher-frequency sounds have shorter wavelengths. (This inverse relationfollows λ·f = speed of sound.) In practice, the frequency content of a sound is often visualized with aspectrogram or shown in a DAW’s spectrum analyzer.
Amplitude (Loudness) - The amplitude is the height of the waveform (difference between crest andtrough). Greater amplitude means larger pressure variations and thus louder sound . We oftenmeasure amplitude in decibels (dB) relative to a reference. Doubling the amplitude of a wavequadruples its power (about +6 dB). Amplitude is sometimes described as peak level (crest value) orRMS (an “average” that correlates with perceived loudness) . For example, in a DAW the verticalaxis of a waveform display shows amplitude: higher peaks mean a louder sound.
Wavelength - The wavelength is the physical distance a wave travels during one cycle. It equals thespeed of sound divided by frequency (λ = v/f). Low-frequency sound waves have long wavelengths(meters long), while high-frequency waves are short (centimeters). In a waveform diagram, you canmark wavelength as the distance between two consecutive peaks (or troughs) of a wave . Forinstance, a 343 m/s speed and 343 Hz tone gives a 1-meter wavelength; a 10 kHz tone has about3.4 cm wavelength. In space, long-wavelength bass notes can bend around obstacles, whereas short-wavelength treble is more directional.
Phase - Phase describes a wave’s position in its cycle at a given time. One full cycle is 360°. If twoidentical tones start at the same time, they are “in phase” (0° difference). If one starts half a cyclelater (180° difference), one peak lines up with the other’s trough . Phase itself does not change atone’s pitch or loudness, but it affects how multiple waves interact. (See next section on cancellation.)In technical terms, phase is the shift in time of the waveform relative to a reference. For example, asound recorded by two microphones placed at different distances will arrive at each mic at slightlydifferent phase (time) offsets .
What Is Phase Cancellation?
Section titled “What Is Phase Cancellation?”When two sound waves meet, they superpose (add together) point by point. This can lead to interference.In constructive interference, two in-phase waves reinforce each other: their pressures add, yielding a largerresulting wave . In destructive interference, an out-of-phase pair cancels: a crest from one wave meets atrough from the other, so the pressures subtract (and can cancel to zero) . In practice, sound sources orreflections combining out-of-phase can cause phase cancellation, making parts of the sound quieter orchange its timbre.
A useful metaphor is overlapping ripples in a pond: if two sets of ripples crest together, the ripple heightdoubles (constructive); if a crest meets a depression, they flatten out (destructive). Audio engineersencounter phase cancellation when combining microphones or speaker outputs. For example, twomicrophones on the same instrument will pick up the sound with slight delays; if those delays correspondto opposite phases for some frequencies, the waves can partially cancel, causing comb filtering or hollowsounds. Musicians and engineers must watch for this effect.
Polarity vs. Phase – What’s the Difference?
Section titled “Polarity vs. Phase – What’s the Difference?”Polarity refers to the positive/negative orientation of an electrical signal waveform, essentially a mirror flipof the waveform’s sign . If you invert polarity (often via a phase-invert button on a mixer), every positivevoltage becomes negative and vice versa – the waveform is flipped upside-down in time but not shiftedlater. Two identical waveforms, one of positive polarity and one of negative polarity, will cancel if summed. For example, tapping a speaker cone and measuring voltage, an inward push should give a “+” voltage;if wiring is reversed, it instead produces a “–” voltage, indicating inverted polarity .
Phase, in contrast, implies a shift in time (degrees of the waveform cycle) between signals . A 180°phase shift of a waveform means it reaches its peak one half-cycle later than an unshifted copy. In somecases a 180° phase shift looks like a polarity inversion (for a pure sine wave, they are equivalent), but phasein general can be any time delay (0–360°). For example, if you delay one microphone’s signal by 2 ms relativeto another, their waveforms are phase-shifted: peaks and troughs no longer align (this is mre than asimple polarity flip).
In summary, flipping polarity is an all-or-nothing sign inversion of the waveform, whereas shifting phase is moving the waveform forward/back in time relative to another . In practice, most mixer “phase” switches actually invert polarity. True phase adjustment requires variable time delay or all-pass filters. Understanding this distinction helps in troubleshooting recording issues (for instance, two mics wired with opposite polarity will cancel in mono ).
For Advanced Readers
Section titled “For Advanced Readers”At a deeper level, sound waves and their interactions are described mathematically by sinusoids and Fourieranalysis. Any complex waveform can be represented as a sum of sinusoids (sine and cosine waves) ofvarious frequencies, amplitudes, and phases. This decomposition is the basis of frequency-domain analysis.For example, a musical chord can be thought of as a combination of fundamental tones and overtones; aFourier transform breaks the time-domain waveform into its frequency components. Conversely,oscillating sinusoidal components can be added to synthesize arbitrary sounds.
Audio signals are often analyzed in the time domain (waveform amplitude vs. time) or the frequencydomain (amplitude/phase vs. frequency). Digital Audio Workstations (DAWs) typically display the time-domain waveform on the track and may offer spectrum meters or plugins to view frequency content. Thevelocity (speed) of sound ties these domains: λ·f = v (wave speed), so a waveform’s period in timecorresponds to a spatial wavelength. In practice, engineers think in both domains: e.g. an EQ curve(frequency domain) vs. an audio waveform (time domain).
Phase relationships become subtler in this view. A waveform’s phase at each frequency can affect how itscomponents interact when combined with other signals. For instance, shifting the phase of one harmonicrelative to another can change the resulting wave’s shape. In mixing, aligning tracks in time (phase-coherent) or using delay and crossfades can avoid unwanted comb filtering. Tools like correlation metersmeasure how in-phase stereo channels are (1.0 = fully in phase, –1.0 = exactly inverted). In acoustics,measuring the impulse response (the speaker or room’s time-domain reaction to a brief click) andconverting it via Fourier transform yields the frequency response.
Very Technical Concepts
Section titled “Very Technical Concepts”At the most technical level, signal processing concepts like Fourier transforms, impulse responses, andtransfer functions come into play. The Fourier transform is a mathematical tool that converts a time-domain waveform into a frequency-domain representation . In essence, it answers: how much of eachfrequency is present in this signal? For example, taking the Fourier transform of a recorded musical note willreveal peaks at the note’s fundamental frequency and its harmonics, corresponding to the musical pitch spectrum.
The impulse response (IR) of an audio system is its output when excited by a very short, idealized“impulse” (a delta function). For a linear, time-invariant (LTI) system, the impulse response fully characterizesthe system . In practice, an impulse response can be measured (e.g. by playing a starter pistol or sinesweep through speakers and recording) to capture how a speaker or room responds over time. Convolutionof an input signal with the IR simulates how that signal would be colored by the system (time-domainapproach). Equivalently, the transfer function (Laplace or Fourier transform of the IR) describes thesystem’s frequency response . In audio, engineers often view speaker or microphone behavior as a
Sound is a form of energy that travels through a medium - like air, water, or solid materials-in the form of vibrations. When an object vibrates, it causes the surrounding particles to move, creating waves of pressure that our ears detect and our brains interpret as sound. transfer function: a frequency-dependent gain and phase shift (for example, a speaker cone may attenuate low bass or have resonant peaks).
These waves are called sound waves, and they typically travel in a pattern known as a longitudinal wave, where particles move back and forth in the direction of the wave. The key elements of sound include frequency (how high or low a sound is), amplitude (how loud it is), and timbre (its unique tone or quality).
Time-domain vs. frequency-domain: These are two lenses on the same data. The frequency response(magnitude vs. frequency) shows how loud each sine-tone input would sound at the output . By contrast,the impulse response (amplitude vs. time) shows how the system reacts over time to a quick transient. Bothare equivalent descriptions: one emphasizes when the output occurs, the other which frequencies areemphasized. A flat frequency response means the system reproduces all frequencies equally (ideal for high-fidelity audio).
Finally, audio engineers often test polarity and phase in the studio. A common test uses a polarity-checkdevice (e.g. the “Cricket” by Galaxy Audio ) or even a sharp hand clap. By sending a known positive-pressure pulse into a microphone and observing the recorded polarity, one can verify wiring and preamppolarity. When mixing multiple mics, engineers check correlation meters or simply flip a channel’spolarity while listening: if inverting one mic makes the combined sound null (e.g. bass disappears), the micwas wired out of polarity relative to the others . In digital systems, the same principle applies – there are“phase invert” switches to flip signal polarity. These technical checks ensure that when signals are combinedor when a signal is sent to multiple speakers, the waveforms add constructively instead of canceling due topolarity/phase mismatches.