JackJohnston wrote on Tue, 12 October 2004 03:24 |
I was originally confused by Nyquest theory because it's natural to assume that if the frequency of a sample can be represented, meaning that the actual frequency of the sound is maintained, then the sampled sound must sound the same as the original. Which, of course, is not necessarily the case.
My questions would be:
How many samples are used to represent a 20K Hz frequency sound at 44.1 Khz sample rate?
Does that number of samples accurately describe all of the characteristics of that 20K Hz sound in its original continues form?
Does it seem likely that in a good listening environment, most people could hear the difference?
Thanks,
Jack Johnston jackjohnston99@hotmail.com
|
Hello JJ
You said:
I was originally confused by Nyquest theory because it's natural to assume that if the frequency of a sample can be represented, meaning that the actual frequency of the sound is maintained, then the sampled sound must sound the same as the original. Which, of course, is not necessarily the case.
Well, it is not the same, but is half way to being there. The difference between the sampled (digitized) waveform and the original one (the analog) is very well defined, and it has one nice characteristic to it: The difference is in fact all made of high frequency energy. In other words, the part of the energy that makes up the signal is under Nyquist (0-22.05KHz for 44.1KHz sampling) and the error signal (the difference between the analog and it’s sampled representation) resides above Nyquist (22.05KHz to 44.1KHz).
So that sampled wave (just binary numbers) is converted to “voltage steps” at say 44.1KHz. We are now in hardware world. We run the voltage steps through an analog filter which removes the high frequencies which is the difference between the original analog and the sampled wave. Did we say “remove the difference”? I guess we did. So with no difference we end up with the original.
The confusion is often due to the difference between two statements:
1. “The sampled wave contains all the information”
2. “The sampled wave is all the information”.
1. Is the correct one.
2. Is missing something (a filter)
How many samples are used to represent a 20K Hz frequency sound at 44.1 Khz sample rate?
Infinity. I see what you are getting at. Well, if we agree on some performance goal, we can figure it out. I did not analize it to be able to answer you with numbers (so and so many samples for such and such maximum deviation). It is a day’s work, and I am busy (maybe after the AES).
But I was wondering myself about the “startup” of a DA. Well if you have an anti imaging filter (analog low pass filter, after the DA, passing audio and blocking energy above Nyqusit), and you apply 1 sec of digital black, you will be doing fine. But that takes 44100 samples. If you are after, say .001% maximum deviation, I bet it takes fewer samples…
Your question might interest a lot of people, from data compression, to ear research, wavelets and more. It is not an issue as far as making converters. From a practical standpoint, for converters, the data is an on going steady flow – “it just goes for ever”. Technically speaking, 1 second is a long time, It is pretty close to “forever”.
Does that number of samples accurately describe all of the characteristics of that 20K Hz sound in its original continues form?
Well, the number of samples is half the story. You also need the value of each sample (think of it as an XY plot). And with that information one can plot a “stair case” like plot” - a waveform which is a filtered away from being a duplicate of the original.
Does it seem likely that in a good listening environment, most people could hear the difference?
I do not know all the answers. I know very little about the ear brain behavior. But once I get some answers from you we can proceed. For example:
120dB dynamic range is very good for the ear brain combination
0-20KHz audio frequency range (or 0-30KHz or to 40KHz…).
With such information, my job is well defined. I can now concentrate on “waveform in” is the same as “waveform out”. Personally, my job is mostly around “electrical wave in” and “electrical wave out”. Microphone people are about “air motion in” "electrical wave out”. Speaker people do the opposite. Are there any obstacles between “picking the musical air vibrations at the performance space” and “regenerating the same into you ears”? Of course. There is the whole list (starting with room acoustics…)
If I can restrict the answer to your question to from “vibration of a mic membrane” to “vibration of speaker cone”, I would say that the idea is to make the difference very small. We hardware people make the difference small until the ear research people, tell us it is small enough.
In theory, if we respect Nyquist rule, and go for enough bits, there is no difference within the specified boundaries of the agreed bandwidth and dynamic range.
BR
Dan Lavry