Testing the Reliability of two Rubrics Used in Official English Certificates for the Assessment of Writing

Lucía Fraga Viñas

doi:10.14198/raei.2022.36.05

Authors

Lucía Fraga Viñas University of A Coruña, Spain https://orcid.org/0000-0003-4602-3593

DOI:

https://doi.org/10.14198/raei.2022.36.05

Keywords:

rubrics, Official English Certificates, assessment, reliability

Abstract

The learning of English as a Foreign Language (EFL) is clearly a primary concern worldwide these days. This has spurred a proliferation of studies related to it and the emergence of new methodologies and instruments of assessment. Along with these, new qualifications devoted to the certification of language competence have been created, triggered in no small part by the fact that demonstrating one’s level of proficiency has become almost an imperative when applying for a job or a grant, or to enable someone to study in a foreign country. It is therefore essential to test the reliability of the instruments used for the assessment of competences. With this purpose, over a four-week period, four different evaluators have assessed the written essays of students on a C1 level course using the writing rubrics for Cambridge Assessment English’s Cambridge Advance English Certificate (CAE) and Trinity College’s Integrated Skills in English Exams III (ISE-III). The aim was to examine the CAE and the ISE-III rubrics’ reliability through the calculation of their respective Cronbach’s alpha, the Corrected-Item Total correlation, the Intra-class Correlation Coefficient and the Standard Error of Measurement. Afterwards, the results given to each essay on the basis of the two rubrics were compared so to ascertain whether their language is clear and which criteria tended to obtain higher and lower marks on average. Examiners were also surveyed at the end of the assessment process to find their opinion on the use of the two rubrics in terms of clarity. The research provided meaningful and interesting results such as the fact that although both rubrics obtained good results in the coefficients of reliability, the variance in scores is greater when using the ISE-III rubric and that examiners tend to be tougher when assessing the learner’s language resource than any other criterion. It is also worth pointing out that according to the survey, examiners’ general perception of both rubrics is that some of their descriptors were confusing or vague, which suggests both rubrics should be revised and could benefit from some improvement.

References

BECKER, Anthony. 2016. “Student-generated scoring rubrics: Examining their formative value for improving ESL students’ writing performance.” Assessing Writing 29: 15-24. https://doi.org/10.1016/j.asw.2016.05.002

BERKELEY UNIVERSITY CENTER FOR TEACHING & LEARNING. 2020. “Rubrics.” teaching.berkeley.edu. Accessed online on the15th of July 2020: https://teaching.berkeley.edu/resources/assessment-and-evaluation/design-assessment/rubrics

BROOKS, Gavin. 2012. “Assessment and Academic Writing: A look at the Use of Rubrics in the Second Language Writing Classroom.” Kwansei Gakuin University Humanities Review Vol. 17: 227-240. Accessed online on the 20th of July 2020: core.ac.uk/download/pdf/143638458.pdf

BROWN, James D. 1999. “Questions and answers about language testing statistics: Standard error vs. Standard error of measurement.” Shiken: JALT Testing & Evaluation SIG Newsletter, 3 (1): 20-25. Accessed online on the 5th of August 2020: http://hosted.jalt.org/test/bro_4.htm

CAMBRIDGE ENGLISH ASSESSMENT. 2020. “Quality and accountability”. Cambridge English webpage. Accessed online on the 4th of August 2020: https://www.cambridgeenglish.org/research-and-validation/quality-and-accountability/

CIUDAD-GOMEZ, Adelaida and Jesús Valverde-Berrocoso. 2014. “Reliability Analysis of An Evaluation Rubric For University Accounting Students: A Learning Activity About Database Use.” Journal of International Education Research (JIER) 10(5): 301-306. https://doi.org/10.19030/jier.v10i5.8983

FALEYE, Bamidele Abiodun. 2008. “Reliability and Factor Analyses of a Teacher Efficacy Scale for Nigerian Secondary School Teachers.” Electronic Journal of Research in Educational Psychology 16, Vol 6 (3): 823 – 846. http://dx.doi.org/10.25115/ejrep.v6i16.1297

FLECKENSTEIN, Johanna, Stephan Keller, Maleika Kruger, Richard J. Tannenbaum, and Olaf Köller. 2019. “Linking TOEFL iBT writing rubrics to CEFR levels: Cut scores and validity evidence from a standard setting study.” Assessing Writing 43. https://doi.org/10.1016/j.asw.2019.100420

GALLEGO ARRUFAT, María Jesús and Manuela Raposo-Rivas. 2014. “Compromiso del estudiante y percepción del proceso evaluador basado en rúbricas.” REDU. Revista de docencia universitaria. 12, 1: 197-215. https://doi.org/10.4995/redu.2014.6423

GEORGE, Darren and Paul Mallery. 1995. SPSS/PC+step by step: A simple guide and reference. United States: Wadsworth Publishing Company.

GIL PASCUAL, Juan Antonio. 2011. Técnicas e Instrumentos para la recogida de información. Madrid: Universidad Nacional de Educación a Distancia.

GLEN, Stephanie. 2016. “Intraclass Correlation.” StatisticsHowTo.com: Elementary Statistics for the rest of us! Accessed online on the 6th of August 2020: www.statisticshowto.com/intraclass-correlation/

GOFORTH, Chelsea. 2015. “Using and interpreting Cronbach’s Alpha.” University of Virginia. Research Data Services + Sciences. November 16. Accessed online: https://data.library.virginia.edu/using-and-interpreting-cronbachs-alpha/

HAMP-LYONS, Liz. 2016. “Purposes of Assessment.” In Tsagari and Banerjee (ed(s).). Handbook of second language assessment, The Hague, De Gruyter/Mouton, pp. 13-28.

https://doi.org/10.1515/9781614513827-004

HENNING, Melissa. D. 2020. “Rubrics to the Rescue: What are rubrics?” TeachersFirst. Thinking Teachers Teaching Thinkers. The Source of Learning, Inc. Accessed online on the 6th of July 2020: www.teachersfirst.com/lessons/rubrics/what-are-rubrics.cfm

HOWELL, David. 2018. “Intraclass Correlation: Multiple Approaches” University of Vermont. Outline of the Statistical Pages Folder. Accessed online on the 28th of July 2020: https://www.uvm.edu/~statdhtx/StatPages/icc/icc-overall.html

JONSSON, Anders and Gunilla Svingby. 2007. “The use of scoring rubrics: Reliability, validity and educational consequences.” Educational Research Review 2: 130–144. https://doi.org/10.1016/j.edurev.2007.05.002

KOO, Terry. K and Mae Y. Lin. 2016. “A Guideline of Selecting and Reporting Intraclass Correlation Coefficients for Reliability Research.” Journal of Chiropractic Medicine

, 2: 2016, 155-163. https://doi.org/10.1016/j.edurev.2007.05.002

LAVIGNE, Alyson L. and Thomas L. Good. 2014. The Teacher and student evaluation: moving beyond the failure of school system. New York, Routledge.

LAURIAN, Simona and Carlton Fitzgerald. 2013. “Effects of using rubrics in a university academic level Romanian literature class”. Procedia. Social and Behavioral Sciences 76: 431-440. https://doi.org/ 10.1016/j.sbspro.2013.04.141

MORALES, Carmen, Laura Ocaña Villuendas, Alicia López Gayarre, Irene Arrimadas Gómez and Eulalia Ramirez Nueda. 2000. La enseñanza de las lenguas extranjeras en España. Secretaría General Técnica. Centro de Publicaciones. Ministerio de Educación, Cultura y Deporte. Accessed online on the 27th of June 2020: sede.educacion.gob.es/publiventa/la-ensenanza-de-las-lenguas-extranjeras-en-espana/investigacion-educativa/8757

PANADERO, Ernesto and Anders Jonsson. 2013. “The use of scoring rubrics for formative assessment purposes revisited: A review.” Educational Research Review 9: 129–144., https://doi.org/10.1016/j.edurev.2013.01.002

RICHARDS, JACK C. AND RICHARD SCHMIDT. 2002. Language Teaching & Applied Linguistics, Longman, Pearson Education.

RUPP, ANDRÉ A., JODI M. CASABIANCA, MALEIKA KRÜGER, STEPHAN KELLER, AND OLAF KÖLLER. 2019. “Automated essay scoring at scale: a case study in Switzerland and Germany” (RR-86. ETS RR-19-12). ETS Research Report Series. https://doi.org/10.1002/ets2.12249

SHABANI, Enayat A. and Jaleh Panahi. 2020. “Examining consistency among different

rubrics for assessing writing”. Language Testing in Asia 10:12. https://doi.org/10.1186/s40468-020-00111-4

SUNDEEN, Todd. H. 2014. “Instructional rubrics: Effects of presentation on writing quality.” Assessing writing 21: 74-87. https://doi.org/10.1016/j.asw.2014.03.003

TABER, Keith. 2017 “The Use of Cronbach’s Alpha When Developing and Reporting Research Instruments in Science Education.” Research in Science Education 48: 1273–1296. https://doi.org/10.1007/s11165-016-9602-2

TRACE, Jonathan, Valerie Meier, and Gerriet Janseen. 2016. “I can see that: developing shared rubric category interpretations through score negotiation.” Assessing Writing 30: 32–43 https://doi.org/10.1016/j.asw.2016.08.001.

TSAGARI, Dina and Jayanti Banerjee, eds. 2016. Handbook of Second Language Assessment. The Hague, Walter de Gruyter Inc. https://doi.org/10.1515/9781614513827

VAN GRIETHUIJSEN, Ralf, Michiel W. van Eijck, Helen Haste, Perry J. den Brok, Nigel C. Skinner, Nasser Mansour, Ayse Savran Gencer and Saouma BouJaoude. 2015. “Global Patterns in Students’ Views of Science and Interest in Science.” Res Sci Educ 45: 581–603. https://doi.org/10.1007/s11165-014-9438-6

VELASCO-MARTÍNEZ, Leticia and Juan Carlos Tójar. 2015. “Evaluación por competencias en educación superior. Uso y diseño de rúbricas por los docentes universitarios.” AIDIPE (Ed.), Investigar con y para la sociedad 2: 1393-1405. Accessed online on the 17th of June 2020: avanza.uca.es/aidipe2015/libro/volumen2.pdf