Redacción HC
05/05/2024
With the rapid integration of artificial intelligence (AI) into language education, a key question has emerged: Can AI write as well—or better—than human students in English as a foreign language (EFL) courses? Equally pressing is the issue of how consistently teachers evaluate such writing. A recent study from the University of Costa Rica dives deep into these questions, offering timely insights for educators and policymakers across Latin America and beyond.
The research, titled "Assessing Artificial Intelligence and Professors' Calibration in English as a Foreign Language Writing Courses at a Costa Rican Public University" and published in Actualidades Investigativas en Educación (2024), reveals that AI-generated texts are not only technically proficient but sometimes outperform student writing—at least in grammar and mechanics. But perhaps more striking is the wide variation in grading among professors, raising concerns about fairness and consistency in educational assessment.
The study, conducted by William Charpentier‑Jiménez, employed a quasi-experimental quantitative design. Eight university students enrolled in a TESOL course wrote a paragraph each. Using the same prompts, the researcher generated eight additional paragraphs using AI tools.
Ten experienced EFL professors, each with over a decade of teaching experience, evaluated all 16 paragraphs using a standardized rubric covering five key criteria: content, organization, grammar, mechanics, and vocabulary.
The assessments were analyzed in Excel, calculating average scores, standard deviations, and ranges by paragraph, criterion, and evaluator. Although the sample was relatively small, the results were revealing.
Overall, the AI-generated paragraphs received a slightly higher average score (7.61) than those written by students (7.56) on a 0–10 scale. When broken down by criterion:
This suggests that AI excels in technical precision—likely due to its training on grammatically correct language patterns—but falls short in creativity and narrative cohesion, which remain human strengths.
"AI is a flawless spellchecker," the author suggests, "but students are still the ones who tell meaningful stories."
Perhaps the most critical finding was the high variability in professor evaluations. Some professors awarded up to 2 points higher or lower than others for the same paragraph. This inconsistency indicates a lack of calibration, which can undermine the fairness and reliability of assessments.
The issue of calibration—the alignment of grading standards among instructors—is rarely addressed in Latin American educational contexts. However, it is crucial for ensuring equity, especially as AI becomes more integrated into teaching and assessment processes.
"It doesn't matter how well a student writes," Charpentier-Jiménez warns, "if the grade depends more on who's reading than what's written."
The study offers several actionable takeaways for improving EFL writing assessment:
In contexts like Costa Rica—where English proficiency is increasingly important for global engagement—blending AI tools with teacher expertise and calibration protocols could revolutionize how writing is taught and assessed.
This research aligns with broader trends across Latin America, where universities and ministries of education are exploring hybrid approaches to teaching and assessment. However, the study also sounds a cautionary note: without teacher training in calibration and a nuanced understanding of AI's limitations, these tools could deepen inequalities rather than close gaps.
The study reaffirms a growing consensus: AI is here to stay in education, especially in language learning. But it cannot—and should not—replace human judgment.
Ensuring fair, accurate, and effective writing assessment requires calibrated educators, thoughtful integration of AI tools, and a commitment to continuous professional development. As Charpentier-Jiménez's research shows, when paired correctly, AI and teachers can complement one another—each enhancing the other's strengths.
The future of language education may be artificial—but it must also be intelligent.
Topics of interest
Referencia: Charpentier-Jiménez W. Assessing artificial intelligence and professors' calibration in English as a foreign language writing courses at a Costa Rican public university. Actual Investig Educ [Internet]. 2024;24(1):1–25. Available on: https://doi.org/10.15517/aie.v24i1.55612.
![]()