To reply to a lot of thoughts here:
The room is the most important factor, or at least should be the first component addressed. The perfect instrument or speaker in a lousy room will sound lousy.
A perfect speaker would play the complete dynamic and frequency pallett of all known sounds, with absolute phase delivery. Since this is unlikely to be built, we have to IMHO, start with achievable minimums on a reasonable production system: frequency response from 20-20k Hz, headroom of 20 dB above our target average, reasonably flat frequency and phase response, reasonably flat off-axis response. Amps, crossovers, drivers, cabinets, etc, are simply a part of the monitor system, all critical.
A proper monitor system is not a matter of taste, that is what gets us in trouble, and produces an inconsistent product. Even though we can train our ear/brain mechanism to think through monitor system deficiencies, wouldn't our jobs be easier and our product more consistent if monitor systems met reasonable standards and were consistent from room to room?
Yes, to the OP, current monitoring practice is limiting the quality of our music delivery systems. A pair of 1031s (as good as they are), to choose a new whipping boy, does not reflect original acoustic events, mixed multi-mono sounds, or typical home speakers well.
As is proper, most Mastering Engineers address this reasonably for their facilities. The subject is vital for the survival of professional recording studios - each has to provide a proper reference monitor system for tracking and mixing so the studio can be an alternative to the garage/living room/office. Only with this vital component addressed reasonably can the very concept of the professional recording studio survive.