Diane Litman
Kate Forbes University of Pittsburgh Learning Research and Development Ctr. Pittsburgh PA, 15260, USA
University of Pittsburgh Department of Computer Science Learning Research and Development Ctr. Pittsburgh PA, 15260, USA
ABSTRACT We investigate the automatic classification of student emotional states in a corpus of human-human spoken tutoring dialogues. We first annotated student turns in this corpus for negative, neutral and positive emotions. We then automatically extracted acoustic and prosodic features from the student speech, and compared the results of a variety of machine learning algorithms that use 8 different feature sets to predict the annotated emotions. Our best results have an accuracy of 80.53% and show 26.28% relative improvement over a …show more content…
As shown, we compare feature sets containing only “raw” acoustic and prosodic features with feature sets containing each of the “n1” and “n2” normalized acoustic and prosodic features, and we also compare feature sets containing all the acoustic and prosodic features (raw and normalized). Note further that we compare “...speech” and “...subj” feature sets; these compare how well our emotion data would be learned with only acoustic, prosodic and temporal features (either raw or normalized or both), versus adding in our individualized identifier features. rawspeech: 11 raw f0, RMS, and temporal features
3.2. Extracting Features from the Speech Signal
Fig. 3. 8 Feature Sets for Machine Learning Experiments 3.3. Using Machine Learning to Predict Emotions We next performed machine learning experiments with our feature sets and our emotion-annotated data, using the Weka machine learning software [14] as in [6]. This software allows us to