By Drew Gitomer, Emily Hodge, and Rachel Garver

 

Of utmost importance in endorsing, selecting, or designing assessments to evaluate teaching quality is that the assessments are valid for their intended purposes.  In the context of teacher education, assessments have been used for the purpose of demonstrating that those aspiring to be teachers are able to engage in key aspects of teaching, including designing instruction, carrying out instruction, and assessing the students they teach.

But what does it mean for an assessment to be valid for a particular purpose?  The concept of validity is multi-faceted and has been the subject of much theoretical and empirical exploration.  Validity theorists often think of two large categories of validity questions.  One has to do with score meaning: is the assessment measuring aspects of an individual, and does the assessment do that well?  The second involves the consequences of using the assessment: what are the consequences of using the assessment from individual, institutional, and societal perspectives?  Validity is not a simple, objective determination.  There are a range of questions and types of evidence that can be brought to bear in considering the validity of any assessment.  Evaluating the validity of an assessment very much reflects the values and priorities of test developers, policymakers, and users. One truth about assessment is that no single assessment can satisfy all desired purposes.  Thus, policymakers and users must make choices about what assessments to use and how to use them because there are inevitable tradeoffs and tensions depending on what purposes are most salient.

Changes in assessment policy for teacher education provides an opportunity to explore the changing nature of questions and concerns about the validity of performance-based assessments for teacher candidates in New Jersey.  As we have shared in prior posts, from 2016 to 2022, teacher candidates in New Jersey were required to take a standardized assessment, edTPA, that required them to submit a standardized portfolio of their practice, including classroom video and extensive written commentary, that was then scored by a commercial test publisher. Candidates needed to attain a passing score in order to receive a license to teach in New Jersey.

In 2022, that policy was changed in fundamental ways, as discussed here.  No specific assessment is mandated by the state, and control of assessment practices has been wholly transformed as described here.  The responsibility for developing assessments and making decisions about teacher candidates is now the province of the educator preparation programs (EPPs).  This transformation coincided with a shift in validity considerations.  Based on our interviews with New Jersey EPPs, we see the shift from the edTPA to local assessments as a shift from validity in score meaning (How accurately does the instrument produce a score that measures an individual’s teaching skills?) to programs placing greater emphasis on the consequential validity of performance-based assessments (What will happen to an individual, institution, or to society from using this assessment?).

To illustrate the contrast, the adoption of an assessment like the edTPA placed a heavy focus on score meaning.  The edTPA was a performance-based assessment that focused on important aspects of teaching and was administered in a standardized form so that scores across individuals would be comparable.  Policymakers wanted to have an evaluation tool that they believed provided a high-quality, psychometrically defensible assessment.

The most important consequential consideration in using the edTPA was that it would result in entering-teacher cohorts that demonstrated sufficient skill to be successful in the classroom.  The assessment was also intended, as noted in its name, to be educative and to provide guidance to candidates and teacher education students about high-quality teaching practice.

Evidence about edTPA’s validity has been mixed and contested with respect to both score meaning and the assessment consequences.  In addition to concerns about the psychometric quality of the edTPA (Gitomer et al., 2021), there have been substantial concerns that the assessment skews teacher preparation so that significant attention is given to mastering the details of the assessment requirements rather than focusing on the teaching that it is intended to measure (e.g., Greenblatt & O’Hara, 2015).

Our research has shown that, under the new set of policies in New Jersey, the goals and methods of teacher assessment have profoundly shifted with increased EPP control.  In terms of score meaning, programs have almost uniformly reduced writing demands, believing that the focus of any assessment needs to be on demonstrations of teaching rather than what they believed was an overemphasis on the ability to write about teaching.  The teacher educators with whom we spoke believe that the edTPA’s emphasis on writing skewed the measurement of the central construct of interest—good teaching.

Nevertheless, the most substantial shift we have seen is that considerations of validity are overwhelmingly concerned with the consequences of assessment practice.  Across institutions, almost all assessments are used formatively to improve practice.  There are very few formal metrics or psychometric procedures in place to ensure score quality.  Instead, there is far greater emphasis on a candidate’s progress and the provision of feedback throughout the assessment process.  EPPs feel increasingly empowered by the policy change to work with students during the assessment process, providing feedback to the EPPs so that they can adjust their practice.  Doing this challenges traditional approaches to assessment in which the student is expected to complete an assessment on their own.

In many programs, the assessment results are evaluated by multiple individuals, including teacher education faculty and professionals who personally support teachers as they proceed through their clinical work.  In general, EPPs are using assessment in a way that has led to a greater focus on teaching rather than test preparation.  While consequences (i.e., consequential validity) were an important consideration under prior policy as well, the most salient consequences and how they are considered have shifted.  With a standardized, externally managed assessment, the most important consequence to be evaluated was whether teachers with sufficient skills could be differentiated from those who were not sufficiently skilled to support K-12 students. With a more flexible, institutionally managed assessment system, the most important consequence is whether the assessment protocols contribute to the development of their teacher candidates and instill a practice of continual professional improvement.  Careful long-term study will be required to determine whether and how programs are accomplishing this goal.

 

Drew Gitomer, Ph.D. is a professor at the Rutgers Graduate School of Education, and Emily Hodge, Ph.D. and Rachel Garver, Ph.D. are associate professors at Montclair State University.

 

References:

Gitomer, D. H., Martínez, J. F., Battey, D., & Hyland. N. E. (2021). Assessing the assessment: Evidence of reliability and validity in the edTPA. American Educational Research Journal, 58(1), 3–31. https://doi.org/10.3102/0002831219890608

Greenblatt, D., & O’Hara, K. E. (2015). Buyer beware: Lessons learned from edTPA implementation in New York State. Teacher Education Quarterly, 42(2), 57–67. https://eric.ed.gov/?id=EJ1072124