It’s not hard to design a test that has a high likelihood of producing just about any outcome desired. Let’s not forget that a test cannot possibly measure all that is taught. Therefore, designers rely on sampling what they believe are the most important concepts in any given subject. But suppose the mission is to sort out students (e.g., the SAT). If the test were heavily loaded up with items assessing only the most important material that was well taught, scores would likely be clumped together, making comparisons unsatisfactory. In that case, the testing company would not remain in business very long because it had not delivered on its promise.
To avoid that possibility, designers deliberately include some items that have a low probability of being taught. This is not at all fair, but it is extremely useful. As a result, the public will draw invalid inferences about the effectiveness of instruction in the schools they support with their taxes. The same invalid inferences can be drawn by the arbitrary way that cut scores are determined. The public demands to know how many students are “proficient” in any given subject. By manipulating the scale changes, designers can help mold judgments about how many students fall into that category. —Walt Gardner’s Reality Check – Education Week.