Let me try to clarify a few things for you...
You can't have more 'bits per decibel', put simplisticly: 1bit = 6dB. Can't change that. For each extra bit of resolution, you get twice the peak signal compared to the noise floor, which is 6dB.
Also I think you're getting slightly confused with the specifics of 'dBs'. Remember dB is a measure of one signal compared to another, not an absolute measure in itself.
Remember, hearing/sound levels are in dB, relative to the threshold of hearing, i.e. 0dB is where you start hearing things, so 0dB in this case is the LOWEST signal level. 120dB is pretty much the limit before you blow out your ears. Dynamic range is the difference between the lowest and highest level signals. So in this case is 120dB (largest signal referenced to the smallest - remember dB always has a reference).
For digital signal levels, the usual reference is fullscale signal, so this is 0dB, more fully shown as 0dBFS (indicating that it is relative to full scale). Also, for pure digital signals the smallest signal is basically the same as the noise floor. The smallest signal is roughly a single bit in size. So, for 16-bit audio, the lowest signal level is about -96dBFS. This gives a dynamic range of 96dB. For 24-bit audio smallest signal is about -144dBFS, giving a dynamic range of 144dB.
Between 16- and 24-bit, dynamic range is not really simply 'extended up' or 'extended down'. For real signals that all depends on what digital level, in dBFS, you equate with normal signal level (in dBu, sound dB, or whatever). The main point is that 16-bit does not cover the dynamic range of the ear, but 24-bit does. Which is why in storage, media, converters, etc. you should never need more than 24-bits (internal digital processing is a separate issue).
With 16-bits you need to somehow fit the full ear range into just 96dB. This can be done by limiting the maximum signal level to equivalent of 96dB (sound), and keeping the noise level the same, or accepting a noise level of 24dB (sound, is 120dB - 96dB), or (more usually) a compromise between the two.
With 24-bit format, you don't need to make those compromises, when relating SPL to dBFS.
However, in all this remember that real analog electronics and converters, do not themselves have full 144dB dynamic range. You'll be paying a lot of money for A/D with 120dB dynamic range. I don't know the specifics of microphones, but I understand that they're nowhere near that level.
In your last post you talk about if 16-bit and 24-bit capture the 'same range'. I think you're a bit confused about the notion of the range. If you're referring to the ear's dynamic range (120dB -ish), then 16-bit doesn't capture this range, only 96dB of it - so you loose the highest level signals (either they get compressed, or clipped before A/D), and/or the lowest level signals (get lost in the noise floor of the A/D). With 24-bit you capture it all. This includes capturing the noise of your input devices, as they could be noisier than the A/D introduced noise. All of which are noisier than the inherent noise level with 24-bit.
Real A/Ds are at best equivalent to about 20bits, and real audio signals a lot less than this.
So say you have a signal in which the noise level, is -86dBFS (or equivalent in analog levels), and your A/D is accurate to -110dBFS levels, then the signal noise is 24dB more than the A/D noise. You basically have about 24dB (4 bits) more real 'accuracy' in the A/D than you need. These 4 bits are the sampled signal noise. Then there are still 5-6 bits of A/D generated noise (the remaining range from -110dBFS to -144dBFS).
Also, comparing sampling in digital audio to digitised pictures and monitor screens is not a good way to look at it. They are really too dissimilar. Let me just demonstrate with youre first post... On a screen, if you go to a larger monitor and increase the number of pixels, you're keeping the dpi the same. BUT, the way you actually look at a computer screen, what matters for 'smoothing' (so you can't see the pixels), is the angle subtended at your eye going from one pixel to the next. In other words, what are the 'dots per degree' as you look at the screen? You can get more dots per degree by, moving away from the screen. But then the image gets smaller, so you make the screen bigger, but keep the dpi the same. However, this has nothing to do with audio, and has no analogies with it.
Also, comparing image pixel smoothing with audio smoothing and detail, is way off base. If you think of pixels as being spatial 'samples' of the 'ideal' text font you're trying to read, then the sample 'rate' is far below the 'Nyquist' rate(spacing of samples in relation to the character feature detail size/scale). Fortunately on computer screens, they don't actually sample an ideal font, but generate a font out of colouring pixels so it kind of looks OK. To see what they would really look like if actually sampled a real font at that rate, think of scanning regular text at 100dpi, or printing at 100dpi. Yuk! The thing with audio is that is IS sampled at or above the Nyquist rate for the signals it is sampling (normally at least - otherwise you get that horrible metallic sound of aliasing), so it is already perfectly smoothed. And this has NOTHING to do with bit depth. Bit depth affects the noise level (at the risk of making an image comparison - how much 'fuzz' or 'snow' there is in the picture - more comparable to colour/grey level depth - this is equating image spatial dimensions to time in audio, and colour/grey level to voltage - but even this is not a direct comparison).
BTW: One of the best explainers on dither, uses image colour/grey-scale dithering (it's conceptually the same) to show it and how it relates to bit depth. Note that in this they used the highest resolution possible so that the pixel aspect of it is removed as far as possible (as it is in audio).
Here's a link to Wikipedia with something like what I saw:
http://en.wikipedia.org/wiki/DitheringLook at the "Digital photography and image processing", NOT the audio section. The problem with the audio diagrams is that it can't really get across the notion of what you actually hear. But the images can.
Figure 1 = original (high bit depth, 'c.f 24-bit')
Figure 2 and 5 = truncated bit depth (c.f. 16-bit truncated) (note the lack of detail in areas where the image/signal change/level is close to the change by a single bit).
Figure 3 and 6 = dithered reduced bit depth (c.f. 16 bit from 24-bit with dither first) (note that there are no more 'sample levels' than in Fig2, but as you go across areas where the image is changing slowly, over a small areas of pixels the ratio of the numbers of pixels at the two sample levels changes. As your eye averages together dots that are very close together, you see much or overall detail than figure 2 - but not as much as Fig 1, but that's because the resolution/noise floor of our eyes is better than the level of dither).
If you screw up your eyes (effectively raising the noise floor on your vision), you get to a point where you cannot distinguish between Figs 1 and 3/5, but can still see the errors in Fig 2/4.
Hope this helps.
Graham