a 48kHz 128x oversampling converter actually does the A/D conversion at a 6.144 MHz, which allows the first analogue filter to be way above the audible band, and gentle enough that it produces effectively no ripple or phase shift in the audible band.
The subsequent downsampling (which reduces sample frequency but increases word length) includes additional filters, but since these are digital they can be made linear phase. There is a cost/performance thing here, since better digital filters need more taps and so take more silicon space (therefore cost), which is why some manufacturers of high end ADC boxes will use a converter chip which allows the sample stream to be read when it is still say 16x oversampled, and do the final stages using their own DSP code.