I can see how you've come to this conclusion, but once again, you're back to the problem of real-world usage. Just because you've run every track through a given piece of gear doesn't mean you'll have any understanding of how that piece of gear will react under different circumstances.
Perhaps the only fair way to do comparitive samples would be to take a piece of gear, and have several engineers (maybe ten or more) utilize it throughout a production (perhaps a mix of the same song), and take careful notes on its usage.
Then have ten other engineers do a mix of the same song with a different piece of gear, and take extensive notes.
After that, swap the gear in question, and have everyone do a second mix, using the other gear.
Once all that is complete, publish the notes along with full-resolution audio files.
So I guess it could be done. It's a logistical nightmare, but it could be done.
Frankly, I'd rather make gear decisions based upon my own experience, and the experiences of those I trust. And if a manufacturer or dealer really wants my business, they'll let me test-drive new or unproven gear.