This article presents an investigation of the reliability of a rubric-based writing assessment system, the National Writing Project's (NWP) Analytic Writing Continuum (AWC), which applies both holistic and analytic scoring. Data from double-scored student writing samples collected over several national scoring events (2005 to 2011) were used. First examined was the extent to which scorers trained to apply the AWC tended to agree with each other on the quality of various attributes of student writing (inter-rater agreement rates). Next considerations were how consistently groups of scorers applied the standards of AWC over multiple scoring events (cross-time reliability), and how consistently the attributes of the AWC collectively represented the construct of writing (internal consistency reliability). Finally, generalizability analyses were conducted to determine the degree to which the observed score variances were attributable to two sources of measurement error -- scorers and scoring environment (grade group). Reliability examined from consensus, consistency, and measurement approaches indicate that the AWC assessment system generates highly reliable scoring of both holistic and analytic components of writing. The AWC assessment system includes expert scorers, training procedures, and materials as essential components and serves purposes beyond assessment of writing. It provides a common framework for structuring professional development and coordinating research and evaluation programs, encouraging the growth of professional learning communities and improved understanding of the links between professional development, classroom practice, and student writing performance.
Keywords: writing assessment, scoring rubric, reliability, teacher professional development