Here's another example. Suppose someone shot the following targets during load development (3 shots at each charge level, systematically increasing the charge by 0.2 gr each time). Which should they pick as the 'best' load:

I expect most people would pick the third one to the right in the middle row.
However, I generated these by first creating a pool of 1000 simulated shots at each charge level. I assigned a mean group size for each charge level, as follows:
2MOA, 1.5MOA, 1 MOA, 1.5MOA, 0.75MOA, 1MOA, 0.5MOA, 1.75MOA, 2MOA (reading left to right, row by row). I then randomly chose 3 shots from each pool.
This simulates real underlying differences in accuracy, and attempting to find those using 3 shot groups. Most people would choose load 6 (a 1MOA load), when in fact the most accurate load is load 7 (the 0.5MOA load).
So - we do it again, but this time shoot 5 at each charge level:

Still struggle to tell the difference between the 1MOA and <1MOA loads (here, you might choose the centre of the middle row, which is actually the 0.75MOA load)...