|

Winter 2002 Departments
Exchange
Around the Pond
Branches of Learning
Books
Extended Family
Great Sport
North 40
Contributors
Features
Digging Big
Only a Test
Greek Games
|
 |
Feature
|
ONLY A TEST
Our resident psychometricians refine the science of mental measurement
|
by Marietta Pritchard '73G
|
 |
 |
 |
 |
 |
"YOU CAN'T BEAT A GOOD TEST" - Ronald Hambleton, right, with colleagues at the Center for Educational Assessment. (For larger view of Ben Barnhart photo, click in right navigation.) |
 |
THIS IS A TEST. THIS IS only a test. A monstrous creature sits waiting for you at the center of an intricate maze. Even if you manage to slay him, you may never find your way back out. This is only a test, but remember to bring your ball of string. And, oh yes, a couple of number two pencils.
In the stories we tell ourselves – our histories and myths – ordinary people, demigods, and saints are tested on matters of faith, courage, guilt or innocence through ordeals or trials. These are sometimes paths of glory, sometimes rites of community membership, sometimes doors to martyrdom – trials by fire, trials by water. Think of other tests: Hercules cleaning the Augean Stables, Abraham preparing to sacrifice Isaac, Theseus conquering the Minotaur.
High school students taking their SAT exams in a large gymnasium with a crowd of others wielding number two pencils may not have as much to lose or gain as these saintly or heroic figures, but the challenge – and the agony – is no less real. Years later, the memory of such tests can make palms sweat, nightmare visions rise up. Confronting the Minotaur might almost seem preferable.
Ron Hambleton has no such nightmares. Tests and assessment practices are his bread and butter, his meat and potatoes, his caviar and champagne. He loves his profession, researching, analyzing, assessing and teaching about tests, and finds little time for anything else. “I can’t believe the university pays me to do all these interesting things,” he says cheerfully.
IT IS EARLY MORNING ON a national holiday and he is already hard at work in his office in Hills South. Hambleton chairs the Research and Evaluation Methods Program in the School of Education’s educational policy, research, and administration department. With his colleague Hariharan Swaminathan, he also co-directs the Center for Educational Assessment, which focuses on assessment theory and practice in the state and nation. One of the world’s leading psychometricians – experts in mental measurement – Hambleton was named a Distinguished University Professor by the UMass trustees in 1998. At the top of a long list of other honors is his profession’s highest, the 1993 Career Achievement Award of the National Council on Measurement in Education.
The Canadian-born Hambleton comes to his discipline by way of a talent for numbers. He earned his undergraduate degree at the University of Waterloo, Ontario, in math, with a minor in psychology. “I was pretty good at math,” says Hambleton, “but everybody around me was a genius.” Since “a lot of psychologists are not so strong mathematically,” he decided to keep a foot in each field. His timing was lucky, he says, since “the mathematical modeling of psychological phenomena” was just beginning to be important when he started graduate school in the ’60s.
Hambleton completed his doctorate in psychometric methods and statistics at the University of Toronto in 1969, got his first job at UMass that year, and has been here ever since. But he seems to have been everywhere else too. His doorstop-thick curriculum vitae appears to include service on every council, committee, commission, or board related to testing on the globe. Among them are the Arab Council for Child Development, the Israeli National Institute for Tests and Evaluation, and the National Examination Center in Indonesia. Domestic references include the National Board of Medical Examiners, the National Association of Securities Dealers, and Bill Gates’s Microsoft.
Like many Canadians, Hambleton grew up skating. When his sons, now grown, were in school here in Amherst, he coached in the Amherst Hockey Association for 12 years. He still keeps a pair of skates in the back of his car, just in case he has time to skate at noon in the Mullins Center. But he almost never has the time, he says.
PSYCHOMETRICIANS CONSTRUCT "testing instruments." They speak of “building” tests. The metaphor seems apt, since they want their creations, like any construction projects, to withstand both internal and external challenges. Though there are many uses for tests – prediction, diagnosis, classification, research, assessment, and evaluation – Hambleton specializes in achievement and cre-dentialing tests. He is especially interested in assesments aimed at determining what examinees know and can do. He and other UMass scholars have been involved in the creation of a wide variety of such tests, from medical board exams to certification for computer technicians.
Hambleton is part of a troika with two tenured colleagues, Stephen Sireci and Hariharan Swami-nathan. Swaminathan, known to all as Swami, is more likely to stay close to home while Hambleton travels the globe. He describes himself as working more at the math/statistics end of the field, with Hambleton more at the applied end. “But for that matter,” says Swaminathan, “we’re almost interchangeable.” The two have known each other as long as they’ve known their wives – since their graduate school days in Toronto – and have collaborated on several books that are recognized as classics in the field of psychometrics. In 1997 they jointly won the School of Education’s Outstanding Teacher Award. “All of our students want to be good teachers,” says Hambleton, “and we try to model good teaching in our classes. It’s what sets our graduate training program ahead of many others in the country.”
Hambleton and his colleagues are interested in problems of standard-setting, reliability, and validity as they relate to different types of tests. Their primary interest is in “criterion-referenced” testing, where specific standards are established and test-takers measured against them. You want to be sure your doctor is familiar with the basics of anatomy, for instance, and that your accountant has mastered bookkeeping. The so-called accountability movement in education has increased the emphasis on criterion-referenced testing in the nation’s public schools. In the other major category of educational assessment, called “norm-referenced testing,” exam-takers are compared with each other. As Swami-nathan puts it, “You are judged by the company you keep.” This type of test might be used to compare the relative performance of students around the country or in predicting college or job success.
THE PROGRAM DRAWS STUDENTS AND visiting faculty from around the world, and the three researchers share a commitment to nurturing their 18 full-time graduate students and to finding them good projects as well as the money to support their studies. Although dean of education Bailey Jackson has been especially generous this year with a grant of $40,000, says Hambleton, university funding is never sufficient to support all of their students. As a result, a substantial amount of time is spent beating the bushes for grants and internships – some $250,000 this year alone. “You make commitments to graduate students,” says Hambleton, “and boy, you’d better keep them.”
This critical mass of students, Hambleton believes, provides “more energy and variety for advanced courses. The students feel good.” And despite the faculty’s high level of productivity and outside commitments, students find them “amazingly available,” says third-year graduate student Michael Jodoin. Students work closely with faculty on their research, and often produce multiple journal publications and conference presentations before graduation. They are urged to make their knowledge accessible to all kinds of audiences and encouraged to use up-to-date media tools. “We do a lot of stuff with numbers,” says Hambleton, “but you’ve also got to be able to talk.”
Regular seminars bring in executives from test publishing firms, researchers and others to discuss issues of concern to them. Students take on internships, work on research projects, and eventually hope to find jobs in this multi-billion-dollar industry as well as in academe. The public is familiar with large test-makers like ETS (Educational Testing Service), publisher of the SATs, but few outside the field know that Microsoft Corporation is a huge test-maker, administering some three million exams for those who want to be certified as Microsoft technicians. There are over 1,000 credentialling agencies in this country alone, says Hambleton, with hardly a profession or occupation that doesn’t have one.
STEPHEN SIRECI IS THE YOUNGEST faculty member in the program, now in his seventh year. Trained as a clinical psychologist, he found he had no patience with counseling and was drawn instead to psychometric research. A specialist in test development and evaluation, he is troubled by the gap between good tests and how those tests are interpreted.
“We can design a test to see if students have mastered the material,” says Sireci. “But then sometimes the results can be used inappropriately – to affect teachers’ pay, for example.”
Sireci’s concern goes to the heart of what many see as the problem with standardized testing. The trouble, say critics, is not so much the tests themselves, though some find fault here too, as the uses they’re put to. At this point, like scientists splitting the atom, psychometricians might prefer to retreat into “pure science” mode, arguing that they can’t be made to answer for the dropping of a bomb.
Ron Hambleton argues strongly for the proper use of tests and doesn’t duck the social consequences of testing. But he also insists that it is not his job to set policy. This is the role of administrators and politicians.
Still, he hasn’t kept entirely out of the fray. In a recent article in the School of Education newsletter headlined “Politicians Fail, Not the Teachers,” Hambleton wrote about the botched administration of the Massachusetts teacher certification exams in 1998, when what was supposed to be a “practice” test was suddenly announced to be the real thing.
Originally, wrote Hambleton, this first administration of the teacher certification test was aimed at helping the test-maker evaluate the instrument and test-takers prepare for the “real” exam. Instead, two weeks before the test was given, the rules were changed and prospective teachers told that the scores would be used to reflect their competency. Hambleton skewers the test’s administrators, citing violations of professional standards of assessment practices. And he methodically lists the test’s other flaws: no external review, problems with the establishment of passing scores, and biased media representation of candidates’ writing skills.
STEPPING BACK FROM THAT PARTICULAR fiasco, Hambleton commonsensically suggests that there’s no reason why Massachusetts couldn’t test its potential teachers before they enter teacher-training programs – as Connecticut has done with good results for several years.
Swaminathan and Hambleton are experts in the relatively new field of “item response theory,” which adds a dimension of flexibility to the construction and interpretation of tests. This theory, on which the pair have written the definitive text, allows test-makers to link observable performance and the less observable abilities measured by the tests themselves. Benefits of a move toward “computerized adaptive testing” include shorter test times, with students encountering less frustration because test items will be matched to their abilities.
Another problem engaging the group is the difficult one of “cross-lingual” examination, testing the same material – medical knowledge, junior high school science – among people who speak different languages.
“There is no perfect solution here,” says Sireci. “You start with a good standard translation, then do a ‘back translation’ to the original language. You make sure that it’s looked at by multiple eyes, and you try to evaluate it for differences among different language groups.”
“You need good bilingual people,” says Swaminathan, who reads Latin, Greek and Sanskrit in his spare time. “For instance, you need to know that Latin-root words are going to be easier for Spanish-speaking people to understand.”
Hambleton's expertise in creating tests that are fair and reliable has brought him praise even from those who oppose standardized testing as an educational tool. Michael Greenebaum ’72G, a visiting lecturer at the School of Education and retired principal of Mark’s Meadow, its former lab school, has written frequently in opposition to standardized tests. Usually, he says, such tests are wielded for comparison and social control, and at their worst can be punitive and humiliating. “Testing is too often a mechanism by which we sort, rather than educate, people,” says Greenebaum. Still, he is unqualified in his praise for Ron Hambleton, describing him as an “admirable and serious” scientist.
“Hambleton knows what standards are,” says Greene-baum. “He’s a rigorous humanist.”
ASKED IF HE HIMSELF IS “good at taking tests,” Ron Hambleton bristles slightly. He doesn’t like that way of looking at it. The test-coaching industry has promoted the idea that you can “beat” a test, he says. “That’s a myth. You can’t beat a good test like MCAS or SATs. The way to score high is to come prepared with knowledge and skills.”
Hambleton acknowledges, though, that there are people who are “just plain high-anxious.” These people may bene-fit from doing practice tests, which they could just as easily do without recourse to the test-preparation industry – if they had the discipline.
What’s amazing, says Hambleton, is that schools don’t teach good test-taking skills. Like any other problem-solving situation, test-taking involves a little knowledge and a lot of common sense. You need to read the directions. You need to consider how the test is scored so you’ll know how to manage your time. (Do you have to answer all questions, for instance? Are there penalties for wrong answers?) You need to know how to mark the answer sheets. You need to know how to start on a problem and what to do if you get stuck. Most important of all – and this seems almost too obvious to mention, says Hambleton – you need a good breakfast and a good night’s sleep.
One reason schools don’t teach these skills, he adds, is that many teachers don’t know them themselves. And one of the reasons for that is that many schools of education, including his own, don’t require prospective teachers to take a basic course in classroom testing and assessment. “No wonder teachers are so opposed to testing practices – they have little idea about tests and how to use them properly.”
Like all his UMass colleagues, Hambleton insists that tests are just one assessment tool – just a snapshot. Not all the things we value can be tested with paper and pencil. Often an interview, a portfolio or a peer review will yield equally valuable information. This principle is applied in selecting students for UMass, says David Hautanen ’87, ’90G, director of operations for undergraduate admissions. The SAT scores are used, but only as one part of the admissions critera. The primary indicator, says Hautanen, is grade point average, with a strong emphasis on a student’s demonstrated leadership, honors and awards.
AS FOR "MCAS" – THE Massachusetts Comprehensive Assessment System tests, which have aroused so much concern, anguish, and opposition in the commonwealth, and which now determine whether students will graduate from public high school – Hambleton says frankly that he doesn’t know “if MCAS is the best educational system for Massachusetts – who does? – but it’s the one we have in place now, and I wish educators and the public would get behind it and see if we can make it work. All of this bickering is counter-productive, and it’s not as if the critics have better answers for educational reform than the Department of Education.”
Hambleton has worked for four years on the MCAS advisory board and has coordinated research for the test. “We’ve tried to make it as good as it can be,” he says. The crucial next step is “to use the results to improve teaching, to help students.” When tests show black students doing less well than white students, for instance, why not take that as a cue to try to improve education in predominantly black schools?
It’s one thing to have a test that “puts teeth into the high school diploma,” says Hambleton, but the unfortunate fact is that Massachusetts has classroom teachers who are not certified, teachers who aren’t willing to teach the state’s curricula, and schools using 30-year-old books.
Don’t shoot the messenger, he says. With the right methodology, it’s really not so difficult to create good tests. The hard part is building an educational system that serves all its students equally. |
|
 |
[top of page]
|
 |
 |
 |
Only a test
TEST: larger image
SIDEBAR: Test yourself
SIDEBAR: Constructing a good test
|