Online Multimodal-Laughter Detection System
Posted on Tue 16 Sep 2014
Right on time, and to coincide with the end of the project, ILHAIRE's
online multimodal-laughter detection system has been finalized. In Year
3, the framework has been further developed and finalised. User
interaction is captured in real-time using a headset delivering
high-quality audio at 48 kHz; a Microsoft Kinect providing RGB video and
a depth image at 25 Hz, as well as, tracking of facial points and action
units; and a respiration belt recording exhalation at 125 Hz. Sensors
streams are collected and synchronised through the Social Signal Interpretation (SSI)
framework. If desired, raw signals are stored for later analysis. If
interaction of several users is captured, SSI uses a global
synchronisation signal to keep involved machines in sync. Activity
recognisers (VAD and FAD modules) are used to detect actions in voice
and face, which are further analysed using pre-trained models that
convert the input signals into probabilities for Smile (from action
units), as well as, Laughter and Speech (from voice). Apart from that,
voiced parts in the audio signal are further analysed for laughter
intensity using a pre-trained Weka model. Raw
probabilities, as well as, combined decisions (using vector fusion) are
then provided to the Dialog Manager (DM) through ActiveMQ. RGB and depth streams
are forwarded to EyesWeb, where
silhouette and shoulder features are extracted. Likewise, the raw
respiration signal is sent to EyesWeb for further processing. The RGB
video is also published as an UDP stream in the network using FFMPEG. In this way, multi-user
sessions can be recorded in separate rooms, allowing each participant to
watch the video streams of the other users. The following Figure shows
the overall final architecture of the ILHAIRE laughter analysis and
fusion framework.

----- References:
J. Urbain, R. Niewiadomski, J. Hofmann, E. Bantegnie, T. Baur, N. Berthouze, H. Cakmak, R.T. Cruz, S. Dupont, M. Geist, H. Griffin, F. Lingenfelser, M. Mancini, M. Miranda, G. Mckeown, S. Pammi, O. Pietquin, B. Piot, T. Platt, W. Ruch, A. Sharma, G. Volpe, J. Wagner, 2012, "Laugh Machine", Proceedings Of The 8th International Summer Workshop On Multimodal Interfaces - Enterface'12, Pp. 13-34, July, Metz, France.
Johannes Wagner, Florian Lingenfelser, Tobias Baur, Ionut Damian, Felix Kistler, and Elisabeth André. 2013. The social signal interpretation (SSI) framework: multimodal signal processing and recognition in real-time. In Proceedings of the 21st ACM international conference on Multimedia (MM ’13). ACM, New York, NY, USA, 831-834. DOI=10.1145/2502081.2502223 http://doi.acm.org/10.1145/2502081.2502223
Antonio Camurri, Shuji Hashimoto, Matteo Ricchetti, Andrea Ricci, Kenji Suzuki, Riccardo Trocca, and Gualtiero Volpe. 2000. EyesWeb: Toward Gesture and Affect Recognition in Interactive Dance and Music Systems. Comput. Music J. 24, 1 (April 2000), 57-69. DOI=10.1162/014892600559182 http://dx.doi.org/10.1162/014892600559182

----- References:
J. Urbain, R. Niewiadomski, J. Hofmann, E. Bantegnie, T. Baur, N. Berthouze, H. Cakmak, R.T. Cruz, S. Dupont, M. Geist, H. Griffin, F. Lingenfelser, M. Mancini, M. Miranda, G. Mckeown, S. Pammi, O. Pietquin, B. Piot, T. Platt, W. Ruch, A. Sharma, G. Volpe, J. Wagner, 2012, "Laugh Machine", Proceedings Of The 8th International Summer Workshop On Multimodal Interfaces - Enterface'12, Pp. 13-34, July, Metz, France.
Johannes Wagner, Florian Lingenfelser, Tobias Baur, Ionut Damian, Felix Kistler, and Elisabeth André. 2013. The social signal interpretation (SSI) framework: multimodal signal processing and recognition in real-time. In Proceedings of the 21st ACM international conference on Multimedia (MM ’13). ACM, New York, NY, USA, 831-834. DOI=10.1145/2502081.2502223 http://doi.acm.org/10.1145/2502081.2502223
Antonio Camurri, Shuji Hashimoto, Matteo Ricchetti, Andrea Ricci, Kenji Suzuki, Riccardo Trocca, and Gualtiero Volpe. 2000. EyesWeb: Toward Gesture and Affect Recognition in Interactive Dance and Music Systems. Comput. Music J. 24, 1 (April 2000), 57-69. DOI=10.1162/014892600559182 http://dx.doi.org/10.1162/014892600559182
Blog Archive
0 Comments