Adaptive Testing: Multiple Choices, What’s The Right Answer?
Adaptive testing is an assessment paradigm that leverages the power of the computer to make a test much more than just a paper test delivered on a monitor. Just like adaptive learning is changing the way people acquire knowledge and skills by increasing efficiency and effectiveness, adaptive testing seeks to change the way we measure people by making a test more efficient and effective.
Historically referred to as computerized adaptive testing and the acronym CAT, the field is expanding and perhaps the names intelligent testing or algorithmic assessment might now be a better fit. The word “computerized” is no longer necessary as most tests are now delivered on computers and it is no longer cutting-edge simply to use computers like it was in the 1970s; and algorithms that no longer fit the traditional definitions of adaptive are being explored, including multistage testing (MST), linear on the fly testing (LOFT), shadow tests, and diagnostic measurement models (DMMs).
So what is adaptive testing? It is the use of algorithms to adapt the test given to each individual, utilizing their past responses to items to design the rest of the test to be as optimal as possible. The test can adapt in terms of both which items are delivered, and how many items are delivered. Typically, the which is defined by difficulty; if you get an item of medium difficulty correct, your next item is harder, and easier if you respond incorrectly. The test will often end based on psychometric criteria such as precision (reliability) rather than a fixed number of items. A fixed form test usually measures the top or bottom examinees much more inaccurately than the middle examinees, while a well-built adaptive test will measure everyone with equivalent precision. Which case is more fair?
Resources on adaptive testing are readily available, including the classic book by Wainer (Computerized Adaptive Testing: A Primer) and free resources at http://www.iacat.org/. A more relevant question here is probably whether adaptive testing would be beneficial for your organization. Let’s start by reviewing some of the benefits and drawbacks of adaptive testing.
The most important benefit of adaptive testing is the increased psychometric efficiency. Because items that are too easy are a waste of time and add no psychometric information nor enhance measurement precision for top examinees, with the same analogy ringing true for very difficult items and lower-ability examinees, we can greatly reduce the number of items needed per examinees by not administering such items. Past research has shown this reduction in number of test items required for optimal measurement precision can typically be 50 percent but can be as much as 90 percent. In the case of credentialing, this directly translates into savings in seat time costs.
As previously mentioned, adaptive testing can produce tests that are fairer from a psychometric perspective because they are equiprecise. The control that the test designer has over score precision for each examinee is much stronger. In addition, adaptive tests typically provide much greater security because every examinee receives a unique set of items. They are also known for increasing examinee engagement; a lower ability examinee will feel much more engaged if not presented with tons of items that are difficult to the point of being daunting. Finally, the unique creation of the test also facilitates re-testing, as an examinee will receive a different set of items if they come back next month, even if the item bank in the algorithm has not changed.
"Past research has shown this reduction in number of test items required for optimal measurement precision can typically be 50% but can be as much as 90%. In the case of credentialing, this directly translates into savings in seat time costs."
So what are the drawbacks? The biggest one is cost. Adaptive testing requires you to leverage a psychometrician with extensive expertise in item response theory (IRT) and adaptive algorithms; this is a focused subset of psychometricians, which are already rare enough. The item bank needs of an adaptive test will be more than a small set of fixed forms (e.g. two per year with high overlap), though likely on par with larger sets (e.g.,per year with low overlap). A minimum rule of thumb is that the item bank should be three times the intended length of the CAT; this is easy to explore with simulation studies discussed below. Items must be calibrated with IRT, and must be scored by a computer in real time, so item types such as clinical examinations or essays are not relevant.
You’ll also need a delivery platform capable of adaptive testing, and most are not, though the major vendors in the credentialing world do support high-quality adaptive tests. When evaluating platforms, also consider republishing costs; if you want to rotate, say, 50 items in the pool every year, does the vendor allow that to be done easily or is it expensive?
How can you weigh these benefits in a way you can effectively evaluate for your organization? The best place to start is Monte Carlo simulations. This type of study will simulate what adaptive testing would look like under a range of conditions that you provide (examinee volumes, item bank sizes, security algorithms, termination criteria) and provide a relatively accurate picture of how your exam would perform under those conditions. The most important result is the reduction in test length you can expect, which you can then translate to seat time costs. For example, if you pay $25/hour for seat time, test 10,000 examinees per year, and save 1 hour each, you are saving $250,000 in seat costs – is that worth the increased cost in psychometrics, item development, and test delivery? It is essential to note that this is an economy of scale; it would likely take about as much investment to develop an adaptive test for only 1,000 examinees per year, and your savings then are only $25,000!
If you do decide that adaptive testing is right for your organization, how do you move forward in a defensible manner? Thompson and Weiss (2011) outline a five-step framework from a project management perspective that is designed to guide organizations down that path. Again, the core underlying methodology is simulation studies, which require advanced knowledge of adaptive testing algorithms, but software to perform these (both commercial and free) is readily available. The five steps include:
- Monte Carlo simulations and business case evaluation
- Item bank development
- Pilot testing, item review, and IRT calibration
- Real data simulations and validity studies
- Publishing and maintenance
Finally, let us return to the more generalizable case of algorithmic assessment. Approaches like multistage testing or LOFT might make more sense for your organization, and can be compared alongside item-level adaptive testing. Multistage testing is similar to adaptive testing but adapts in blocks of items (e.g., 5, 10, 20) rather than after each item; it trades in psychometric efficiency for more control of content and the capability to use testlets (reading passages, scenarios, vignettes). LOFT is an approach that creates a unique test form for each examinee, utilizing algorithms to ensure that each is psychometrically equivalent; it provides little to no psychometric efficiency gains because forms are linear, but can massively increase the security of your exam.
In conclusion, a move to an algorithmic approach is an extensive investment with a lengthy timeline, but its benefits can be quite real in the end, which is why many organizations utilize such an approach. This is not true only in the credentialing world, but also in other areas of assessment such as K-12 education, university admissions, pre-employment testing, and patient-reported outcomes in healthcare. In addition, the recent ubiquity of computing offered by the cloud and proliferation of inexpensive mobile devices is changing the perspective of assessment in some fields. It would be foolhardy to not leverage the advantages that technology can provide to make your organization’s exams more accurate, secure, engaging, fairer, and smarter.
Introduction to Computerized Adaptive Testing
Presented by Nathan Thompson, PhD, Assessment Systems Corporation
This on-demand webinar provides a focused introduction to computerized adaptive testing (CAT), allowing credentialing professionals to better understand the benefits of the approach and evaluate its applicability to their organization. Begin with a brief introduction to the algorithms of CAT, such as item selection and termination criterion, and how they are supported by item response theory. Next, discuss the benefits of CAT, from both a psychometric and operational perspective. Finally, learn a 5-step model for developing a valid, defensible CAT, which begins with feasibility studies that would be extremely useful for any organization not yet leveraging the benefits of CAT.
Learn more here.