Before diving into the intricacies of how to determine pitch with an algorithm it is important that we understand what pitch is and what it is not. Frequency and pitch are often used interchangeably, but there are some subtle differences between the two.
Frequency is an unambiguous measurement that can be read directly from a source. Frequency isn’t subjective, it’s the amount of oscillations per cycle (Hertz) of a sound source. Frequency doesn’t change with amplitude ( when the volume is turned up on a speaker the frequency isn’t shifted), nor relative to a listener’s position in a room. Frequency is purely analytical and doesn’t have much meaning when applied to a listener.
Pitch, on the other hand, is human perception of frequency. Pitch is subjective, and can change from person to person as well as from day to day for the same person. In other words the actual frequency of a source might not change but one person may hear the sound as an entirely different pitch than another person. Pitch can change with amplitude (when volume is turned up often listeners will hear the pitch differently than when volume is at a more normal threshold). Pitch can also change with a listeners position relative to the source. If a listener is standing far away from a speaker they may hear pitch differently than if they were standing right next to the speaker. As such, pitch is a much less precise, much more intuitive descriptor than frequency.
The image above shows the relationship between perceived pitch and measured frequency over the average human hearing range (from 20 Hz to 20,000 Hz). This shows that the relationship between pitch and frequency is NOT linear. It can be seen that at low frequencies humans are typically better at estimating pitch, while at high frequencies they have much more trouble estimating pitch versus the recorded frequency.
–Scott Newton: March 26, 2017