Automatic Coding of Classroom Observations

Teachers are the Learners: Providing Automated Feedback on Classroom Interpersonal Dynamics

(NSF Cyberlearning #1822768)


Jacob Whitehill (PI): Assistant professor of Computer Science at WPI; member of WPI's Learning Sciences & Technologies (LST) program.

Erin Ottmar (co-PI): Assistant professor of Social Science & Policy Studies at WPI; member of WPI's Learning Sciences & Technologies (LST) program.

Lane Harrison (co-PI): Assistant professor of Computer Science at WPI; member of WPI's Data Science (DS) program.

Jennifer LoCasale-Crouch (co-PI): Associate research professor of Education at UVA's Curry School of Education, and the Center for the Advanced Study of Teaching and Learning (CASTL).

This is an NSF-funded (Cyberlearning #1822768, PI: Jacob Whitehill) project on how machine learning and computer vision can be harnessed to characterize automatically the interpersonal dynamics between students and teachers from videos of school classrooms, and how machine perception can deliver new training experiences for both pre- and in-service teachers. The project is an interdisciplinary collaboration that spans multi-modal machine learning, data visualization, classroom observation, and teacher training.


Worcester Polytechnic Institute (WPI)

University of Virginia (UVA)


The quality of teacher-student interactions in school classrooms both predicts and impacts students’ learning outcomes. Training teachers to perceive subtle interactions and interpersonal classroom dynamics more accurately can help them to implement more effective interactions in their own classrooms. Contemporary methods of training teachers to understand classroom interactions are based mostly on watching classroom observation videos of other teachers, which have been annotated for different dimensions (“positive climate”, “teacher sensitivity”, etc.) of an observation protocol such as the widely-used Classroom Assessment Scoring System (CLASS; Pianta et al. 2008). Only rarely do teachers receive personalized feedback on their own classroom interactions, and when they do, it is sparse — typically coded just once for every 15-minute video segment — that does not provide detailed explanations of how the segment was scored. In order to provide more temporally specific, more densely annotated, and more efficient feedback on teachers’ own classroom observation sessions, we propose to develop an Automatic Classroom Observation Recognition neural Network (ACORN) that extracts and integrates multimodal features of facial expression, eye gaze, auditory emotion, speech, and language in order to assess classroom dynamics automatically. ACORN will be trained based on two CLASS-coded classroom observation datasets collected by the University of Virginia (UVA) of hundreds of pre-school and elementary school teachers across the USA. Moreover, based on the ACORN prototype, we will develop a Classroom Observation Interactive Learning System (COILS) that trains teachers to perceive classroom dynamics more precisely. The COILS will be evaluated in a study on 50 pre-service teachers at UVA’s Curry School of Education.

This project is based off of a Spencer Foundation grant, Towards Computer-Assisted Coding of Classroom Observations: A Computer Vision Approach to Measuring Positive Climate, that explored how machine learning and computer vision algorithms can be used to characterize automatically the teacher-student and student-student interactions from classroom observation videos. Partially automating the process of classroom observation could facilitate finer grained and more efficient feedback on teachers' classroom interactions, which could provide teachers with improved professional development feedback and researchers with a more powerful lens with which to measure the impact of educational interventions. As first steps, the researchers are investigating how automatically extractable features such as the facial expression of each participant (teacher, student), as well as auditory features of speech, can be analyzed by recurrent neural networks to predict positive climate and negative climate of the Classroom Assessment Scoring System (CLASS). Later work will consider how to identify which participants are interacting with whom at each moment in time based on their audiovisual profiles, as well as how to combine machine-coded with human-coded CLASS scores to improve accuracy. This study harnesses a dataset of hundreds of CLASS-coded videos of toddler classrooms collected and annotated by the Center for the Advanced Study of Teaching and Learning at the University of Virginia.