Are We Measuring What We Think We're Measuring?

Remarks given at the 2020 Student Learning Outcomes Symposium, Exploring the Intersections of Assessment and Equity

Hi everyone, and welcome to day three of the student learning outcomes symposium. Like Kara [Moloney] just said, I'm Cynthia Carter Ching. I'm the Interim Vice Provost and Dean of Undergraduate Education here at UC Davis. I'm really happy to be with you all this morning, and I'm so glad we have such a robust community of educators who are taking the time these three days to focus on student learning outcomes. So I say this as a colleague, as an education scholar, and as the person on campus charged with overseeing our university's WSCUC certification, where student learning outcomes are actually a major focus.

Kara asked me to share some of my own thoughts about the relationship between student learning outcomes assessment and equity today. So I'm going to do that, drawing on both my position as VPDUE and the kinds of conversations around these issues we are having at the university right now and on my background as an education scholar and someone who actually used to have a job in assessment development at the K-12 level.

One way of thinking about the relationship between student learning outcomes and equity is in terms of the big outcomes. The ones that transform students’ lives, their prospects, their opportunities, their ability to take their rightful places as equal participants in our society and our economy. And no matter which social mobility metric you look at, we seem to do a pretty good job at that. But they have to get here first. And once they're here, then they have to graduate in order for that to happen. So then, we talk about all the various student learning outcomes along the way. What are our learning goals for our students? How are we articulating those goals? And most importantly for questions like, “Are they going to graduate or not?”, how are we assessing those goals?

When I think about the relationship between learning goals, outcomes assessment, and equity, the first thing that comes to my mind is my favorite concept from grad school classes on assessment: construct validity. Construct validity, for those who aren't familiar, is an assessment term that just basically means, are we measuring what we think we're measuring? And then, perhaps more problematically, what if we're actually measuring something else?

SAT & Chegg: Construct Validity Problems Impacting UC Davis

There are two of-the-moment examples that we're talking about right now at UC Davis that I think illustrate this question really well. The first one is the SAT. As you may know, there is a Regents-level recommendation--and an accompanying pending lawsuit--about whether or not we can continue to use the SAT in freshman admissions, scholarship decisions, etc. And that's because, while the SAT is pretty good at predicting first year GPA among students (although it doesn't predict actual graduation rates or overall academic success), what it actually tracks the best with is parental education and family income. So there are lots of experts who argue that it isn't actually some objective measure of academic potential, but rather just another way of measuring social inequality instead. It's a good question. And right now the literal jury is still out on that one.

The second example is, which came under fire last quarter because, although it is ostensibly a homework help service, it was discovered that some students in some remote courses in spring 2020 were using it to procure answers for remote final exams. Now this is a significant problem, and not just because we find cheating behavior pretty problematic on a moral or social-contract level, but because it throws into question the validity of those exam results.

In this situation, we can no longer be confident that a student's exam score is at least an approximate measure of their learning in the course rather than a measure of something else: their knowledge of the service, the number of devices and screens they can have open and accessing the internet at any given time, etc.

In addition to these two things being just sort of being thorny problems for the institution, these are both construct validity problems because they demonstrate that you can have an assessment that purports to be measuring one thing, but then ultimately by design or circumstance, ends up measuring something else.

Infamous History: When IQ Tests Measure Hunger and Poverty

So those are really current problems, but this is actually a really old idea. When I teach my educational psychology class to undergrads, we talk about the infamous history of IQ testing, during the advent of compulsory schooling in the early 20th century, and how Stanford Binet IQ tests were often given to children who didn't speak English, who were living in abject poverty, etc. And then those test results were used to warehouse these children in "dullard" classes -- that's literally the term they used. So, this quote is from a very controversial address by New York City Mayor LaGuardia to a conference of special educators in 1929.  He says,

“There is nothing more repulsive to me, and nothing more unwarranted, than to single out little tots under 12, put them in a separate room, and label them. Before you give the child the Binet test, be sure to give him first the Borden and Sheffield test. And for the benefit of out-of-towners, by the Borden in Sheffield test I mean: find out if they get enough milk. Just cow's milk.”

Whether you like LaGuardia or not (or maybe you just really really dislike the airport named after him), you have to admit that this is a powerhouse statement. He is essentially talking about equity and construct validity here. IQ tests purport to measure raw intellectual ability. And here is LaGuardia is positing that maybe, in this particular application, they just measure hunger and poverty instead.

Let's go back to the university classroom. We use a variety of assessments to measure student learning outcomes, some formative, some summative; some assignments, some tests. If we are genuinely concerned about construct validity and equity, here are some questions we might ask ourselves:

If our learning goal is understanding, what if we are actually just measuring memorization? Are we measuring students’ mastery of writing conventions in written essays, or mastery of the content that they're actually writing about? Sometimes it's important to do both, but we need to be clear about which one and why. And then finally, are we measuring students’ new learning or prior preparation?

There are some really interesting data on whether or not students’ grades in the very first quarter of an introductory science sequence are more representative of what they actually learned in the course, or whether or not they took AP Bio, AP Chemistry, AP Physics, etc. in high school. So these are all questions we should be asking. 

Multiple Measures: An Arboreal Analogy

various ways to measure big trees - height, girth, volume, canopyI'd like to close with a non-education controversy that I think has a lot of relevance for thinking about our own assessments, our own student learning outcomes, and their relationship to equity. For decades, experts have been arguing about what makes a tree really, really big. And this is important because various natural reserves, national parks, etc. want to be able to say that they have the "biggest" trees (tourists dollars, you know). But some kinds of trees grow wide, and some kinds of trees grow tall, and some have this massive spread. And interestingly, there isn't one kind of tree that does all these things, universally, the best. So Calaveras big trees and Humboldt redwoods and Yosemite Grove all have to be content with having the tallest, or the widest, or the most spread trees, but not all of those things at once.

And maybe that's okay. I'd like to propose that students, like trees, have multiple ways to grow, and can succeed at multiple outcomes. Maybe not all of them -- and that might be okay too. Equity doesn't mean everything is equal. It means that everything is acknowledged, and everything is adjusted to the actual context, not some ostensibly blind standard that we measure, and then it turns out we might actually be measuring something else. 


Cynthia Carter Ching is interim Vice Provost and Dean of Undergraduate Education at UC Davis and an education scholar with a history in assessment development at the K 12 level. She delivered these remarks at the 2020 Student Learning Outcomes Symposium, "The Intersections of Assessment and Equity."