SAT
This past weekend, a pioneering group of testers sat for the inaugural digital SAT. The reports are in, and they’re nearly universal: The March SAT was a shitshow.
Literally every student we’ve heard from—not to mention the hundreds who posted on Reddit—agrees that the second modules were significantly more challenging than the examples provided on the bluebook app. We haven’t heard from one person who thought the test was easy.
But there’s an important thing to keep in mind as we wait for scores: harder tests usually produce better scores.
To make tests standardized, all tests are equated to one another. That means that the objective difficulty of the test—not the cohort of students sitting for the test– determines the scaling. Accordingly, “easier” feeling tests will likely have more punishing scales than do “difficult” feeling tests. Historically when students all leave a test discussing how “easy” it was, the scores will not reflect that feeling (see, e.g., the October 2023 PSAT).
Let’s take a look at an example from the previous version of the SAT. Below we have a snippet of the scoring sheet from two SAT practice tests, both of which were administered as official SATs prior to becoming practice tests. The “raw score” in the left column represents the number of questions a student answered correctly. The “math section score” in the right column represents the scaled score, which is your typical SAT score.
Practice Test #7 Practice Test #8
To make this a bit easier to digest, here’s a table showing the number of questions wrong on each test and the corresponding score:
At every number of questions wrong, students score higher - considerably so - on practice test 8.
What do these scales tell us about the tests? They tell us that Practice Test #8 was objectively more difficult. A student taking that test would likely have a much worse time during the test because the content would be a lot harder. However, when students receive scores, they don’t care about the number of questions they missed. Only the score. Would you rather have 4 questions wrong and a 740 or 6 questions wrong and a 770?
The second reason why student diagnosis isn’t the most accurate: We’re pretty bad at self-diagnosing performance on standardized tests. For the most part, students remember the hardest questions (those they spend the most time working on) and can identify whether they got those questions right or wrong. But they fail to see the questions that were hard because the wording was unexpected or the question stem asked them for information that required an extra step or two. For example, let’s look at this “difficult” ACT math question:
For context, the ACT math section generally gets more difficult over the course of the test.. Question 60 is the final question on the section, so students should be expecting a challenge here. Even the best math students struggle to answer this question correctly because they don’t take the time to process what makes it difficult. They see the conversion from feet to yards, divide by 3, and boom done. Unfortunately, they needed to convert square feet to square yards, so they should have divided by 9. Most students will never think of this question again because it felt so easy. This question won’t even factor into the equation for how difficult the test felt, but it certainly will factor into their score.
Bottom line: Let’s save the freaking out until after scores come back. We’re still holding out hope that the scores will be better than the experience.
Bottomer line: The SAT has always been a challenging test. There’s been much written about the dumbing down of the SAT because the passages are shorter, but if there’s one thing the College Board is good at, it’s pissing off large groups of people. If there are two things CB is good at, it’s that AND making a challenging test that measures a student’s college readiness in math, reading comprehension, and grammar. Let’s expect the test to be challenging and prepare accordingly.