2. Affective Computing (1995):
the study and development of systems and devices that can
recognize, interpret, process, and simulate human affects
Professor Rosalind Piccard
MIT Media Lab
Annual Conference on Affect
Computing and Intelligent
Interaction (ACII)
ACII 2017 @ San Antonio
2
7. Philosophy
Discuss emotion with philosophy
Turn to Practical
Combing physical and emotion to
start to apply the systems on human
Cognitive Process
Cognitive Theory
Mind-Body Dualism
Combining physical world with emotion
Modern Theory
1
2
3
4
5
7
10. Mind-Body Dualism
In the 17th century, René Descartes viewed the body’s emotional apparatus as largely
hydraulic. He believed that when a person felt angry or sad it was because certain internal
valves opened and released such fluids as bile and phlegm
Mind-Body Dualism
Combining physical world with emotion
10
11. Charles Darwin believed that emotions were beneficial for evolution because emotions
improved chances of survival. For example, the brain uses emotion to keep us away from a
dangerous animal (fear), away from rotting food and fecal matter (disgust), in control of
our resources (anger), and in pursuit of a good meal or a good mate (pleasure and lust).
Damasio, Antonio R. Looking for Spinoza: Joy, Sorrow, and the Feeling Brain. New York NY: Harcourt, Inc., 2003.
Turn to Practical
Discuss the combination with physic and
emotion and start to apply the system
of human
11
12. 美國心理學之父
James, William. 1884. "What Is an Emotion?" Mind. 9, no. 34: 188-205.
當身體產生(生理)變化時,我們感受到這些變化,這就是情緒。
Our feeling of the same changes as they occur is the emotion
Modern Theory
12
27. Charles Darwin believed that emotions were beneficial for evolution because emotions
improved chances of survival. For example, the brain uses emotion to keep us away from a
dangerous animal (fear), away from rotting food and fecal matter (disgust), in control of
our resources (anger), and in pursuit of a good meal or a good mate (pleasure and lust).
Damasio, Antonio R. Looking for Spinoza: Joy, Sorrow, and the Feeling Brain. New York NY: Harcourt, Inc., 2003.
達爾文
28
38. Dimensional models of emotion
Define emotions according to one or more dimensions
• Wilhelm Max Wundt(1897)
• three dimensions: "pleasurable versus unpleasurable",
"arousing or subduing" and "strain or relaxation”
• Harold Schlosberg (1954)
• three dimensions of emotion: "pleasantness–
unpleasantness", "attention–rejection" and "level of
activation”
• Prevalent
• incorporate valence and arousal dimensions
39
39. 比較知名幾個模型
• Circumplex model
• Vector model
• Positive activation – negative activation (PANA) model
• Plutchik's model
• PAD emotional state model
• Lövheim cube of emotion
• Cowen & Keltner 2017
40
40. Circumplex model : Perceptual
• developed by James Russell (1980)
• two-dimensional circular space, containing arousal
and valence dimensions
• arousal represents the vertical axis and valence
represents the horizontal axis
• prevalent use as ‘labels’
41
41. Positive activation – Negative activation
(PANA) Self Report
• created by Watson and Tellegen in 1985
• suggests that positive affect and negative affect are
two separate systems (responsible for different
functions)
• states of higher arousal tend to be defined by their
valence
• states of lower arousal tend to be more neutral in
terms of valence
• the vertical axis represents low to high positive affect
• the horizontal axis represents low to high negative
affect.
• the dimensions of valence and arousal lay at a 45-
degree rotation over these axes
42
51. Little Dragon
(Affectiva- Education)
“make learning more enjoyable and more effective, by
providing an educational tool that is both universal and
personalized”
reference:
https://www.affectiva.com/success-story/
https://www.youtube.com/watch?v=SmjAa8iMkjU 52
53. Nevermind
(Affectiva- Gaming)
bio-feedback horror game
“sense a player’s facial expressions for signs of
emotional distress, and adapt game play accordingly”
reference: https://www.affectiva.com/success-story/c
https://www.youtube.com/watch?v=NGr0orAqRH4&t=497s 54
54. Brain Power
(Affectiva- Health Care)
The World’s First Augmented Reality Smart-Glass-System
to empower children and adults with autism to teach
themselves crucial social and cognitive skills.
reference: https://www.affectiva.com/success-story/
https://www.youtube.com/watch?v=qfoTprgWyns
55
56. MediaRebel
(Affectiva- Legal)
• Legal video deposition management platform MediaRebel uses Affectiva’s
Emotion SDK for facial expression analysis and emotion recognition.
• Intelligent analytical features include:
• Search transcript based upon witness emotions
• Instantly playback testimony based upon select emotions
• Identify positive, negative & neutral witness behavior
reference:
https://www.affectiva.com/success-story/
https://www.mediarebel.com/
57
57. shelfPoint
(Affectiva- Retail)
• Cloverleaf is a retail technology company for the
modern brick-and-mortar marketer and merchandise
• shelfPoint solution: brands and retailers can now
capture customer engagement and sentiment data at
the moment of purchase decision — something
previously unavailable in physical retail stores.
reference: https://www.affectiva.com/success-story/
https://www.youtube.com/watch?v=S9gDqpF6kLs
https://www.youtube.com/watch?v=W6UnahO_zXs
58
62. 公開資料庫
63
Year Database Language Setting Protocol Elicitation
1997 DES Dan. Single Scr. Induced
2000 GEMEP Fre. Single Scr. & Spo. Acted
2005 eNTERFACE' 05 Eng. Single Scr. Induced
2007 HUMAINE Eng. TV Talk Scr. & Spo. Mix.
2008 VAM Ger. TV Talk Spo. Acted
2008 IEMOCAP Eng. Dyadic Scr. & Spo. Acted
2009 SAVEE Eng. Single Spo. Acted
2010 CIT Eng. Dyadic Scr. & Spo. Acted
2010 SEMAINE Eng. Dyadic Scr. Mix.
2013 RECOLA Fre. Dyadic Spo. Acted
2016 CHEAVD Chi. TV talk Spo. Posed
2017 NNIME Chi. Dyadic Spo. Acted
另一個重要點:怎麼評分?
63. Language: Danish
Participants: 4 (Man: 2; Female: 2)
Recordings:
• Audio
Total: 0.5 hours
Sentences: 5200 utterances
Labels:
• Perspectives: Naïve-Observer
• Rater: 20
• Discrete session-level annotation
• Categorical (5)
DES:
DESIGN, RECORDING AND VERIFICATION OF A DANISH
EMOTIONAL SPEECH DATABASE
64
Engberg, Inger S., et al. "Design, recordingand verification of a Danish emotional speech database."Fifth European Conferenceon Speech
Communication and Technology.1997.
Available: Tom Brøndsted (tom@brondsted.dk)
64. DES
• Loss-Scaled Large-Margin
Gaussian Mixture Models for
Speech Emotion Classification1
(Cat.:0.676)
• Automatic emotional speech
classification2
(Cat.:0.516)
65
1Yun, Sungrack, and Chang D. Yoo. "Loss-scaled large-margin Gaussian mixture models for speech emotion classification."IEEE Transactions on Audio,
Speech, and Language Processing20.2 (2012): 585-598.
2Ververidis, Dimitrios, Constantine Kotropoulos, and Ioannis Pitas. "Automatic emotional speech classification." Acoustics, Speech, and Signal Processing,
2004. Proceedings.(ICASSP'04). IEEE International Conference on. Vol. 1. IEEE, 2004..
65. Language: French
Participants: 10 (Man: 5; Female: 5)
Recordings:
• Dual-channel Audio
• HD Video
• Manual Transcript
• Face & Head
• Body Posture & Gestures
Sentences: 7300 sequences
Labels:
• Perspectives: Naïve-Observer
• Discrete session-level annotation
• Categorical (18)
GEMEP:
Geneva Multimodal Emotion Portrayals corpus
66
Bänziger, Tanja, Hannes Pirker, and K. Scherer. "GEMEP-GEnevaMultimodal Emotion Portrayals:A corpus for the study of multimodal emotional expressions."
Proceedings of LREC. Vol. 6. 2006.
Bänziger, Tanja, and Klaus R. Scherer. "Using actor portrayals to systematicallystudy multimodalemotion expression: The GEMEPcorpus." International
conference on affective computing and intelligentinteraction. Springer, Berlin, Heidelberg, 2007.
Available: Tanja Bänziger(Tanja.Banziger@ pse.unige.ch)
66. GEMEP
• Multimodal emotion recognition from expressive
faces, body gestures and speech
(Cat.: 0.571)
67
Kessous, Loic, Ginevra Castellano, and George Caridakis. "Multimodal emotion recognition in speech-based interaction using
facial expression, body gesture and acoustic analysis." Journal on MultimodalUser Interfaces 3.1 (2010): 33-48.
67. Language: English
Participants: 42 (Man: 34; Female: 24)
(14 different nationalities)
Recordings:
• Dual-channel Audio
• HD Video
• Script
Total: 1166 video sequences
Emotion-related atmosphere:
• To express six emotions
eNTERFACE' 05:
The eNTERFACE’05 Audio-Visual Emotion Database
68
Martin, Olivier, et al. "The enterface’05 audio-visual emotion database."Data Engineering Workshops,2006. Proceedings.22nd International
Conferenceon. IEEE, 2006.
Available: O. Martin (martin@tele.ucl.ac.be)
68. eNTERFACE' 05
• Sparse autoencoder-
based feature transfer
learning for speech
emotion recognition1
(Cat.: 59.1)
• Unsupervised learning
in cross-corpus
acoustic emotion
recognition2
(Val./Act.:0.574/0.616)
69
1Deng, Jun, et al. "Sparse autoencoder-basedfeature transfer learning for speech emotion recognition." Affective Computing and IntelligentInteraction (ACII), 2013
Humaine Association Conference on. IEEE, 2013.
2Zhang, Zixing, et al. "Unsupervised learning in cross-corpus acoustic emotion recognition." Automatic Speech Recognition and Understanding(ASRU), 2011 IEEE
Workshop on. IEEE, 2011.
69. Language: English
Participants: Many (Include 8 datasets)
Recordings :
(Naturalistic (TV shows, interviews)/Induced data)
• Audio
• Video
• Gesture
• Emotion words
Labels:
• Perspectives: Naïve-Observer
• Rater: 4
• Continuous-in-time annotation
• Dimensional (8) [Intensity, Activation, Valence, Power, Expect, Word]
• Discrete annotation (5)
• Emotion-related states
• Key Event
• Everyday Emotion words…
HUMAINE:
Addressing the Collection and Annotation of
Naturalistic and Induced Emotional Data
70
Douglas-Cowie, Ellen, et al. "The HUMAINE database: addressingthe collection and annotation of naturalistic and induced emotional data." Affective
computingand intelligent interaction (2007): 488-500.
Available: E.Douglas-Cowie@qub.ac.uk
70. HUMAINE
• A Multimodal Database
for Affect Recognition
and Implicit Tagging1
(Val./Act.:0.761/0.677)
• Abandoning Emotion
Classes - Towards
Continuous Emotion
Recognition with
Modelling of Long-
Range Dependencies2
(Val./Act.[MSE]:0.18/0.08)
71
1Soleymani, Mohammad, et al. "A multimodaldatabase for affect recognition and implicittagging." IEEE Transactions on
Affective Computing 3.1 (2012): 42-55.
2Wöllmer, Martin, et al. "Abandoning emotion classes-towards continuous emotion recognitionwith modellingof long-range
dependencies."Ninth Annual Conference of the International Speech CommunicationAssociation. 2008.
71. Language: German TV shows
Participants: 47
Recordings:
• Audio
• Video
• Face
• Manual Transcript
Total: 12 hours
Sentences: 946 utterances
Labels:
• Perspectives: Peer, Director, Self, Naïve-Observer
• Rater: 17
• Continuous-in-time annotation
• Dimensional (Valence-Activation-Dominance) for Audio
• Discrete session-level annotation
• Categorical (7) for Faces
VAM:
The Vera am Mittag German Audio-Visual
Spontaneous Speech Database
72
Grimm, Michael, Kristian Kroschel,and Shrikanth Narayanan."The Vera am Mittag German audio-visual emotional speech database."Multimedia and
Expo, 2008 IEEE International Conferenceon. IEEE, 2008.
Available: Michael.Grimm@ieee.org
72. VAM
• Towards robust spontaneous
speech recognition with
emotional speech adapted
acoustic models1
(Word ACC.: 42.75)
• Selecting training data for
cross-corpus speech emotion
recognition: Prototypicality vs.
generalization Speech Adapted
Acoustic Models2
• (Val./Act.: 0.502/0.677)
73
1Vlasenko, Bogdan, Dmytro Prylipko, and Andreas Wendemuth. "Towards robust spontaneous speech recognition with emotional speech adapted acoustic
models." Poster and Demo Track of the 35th German Conference on ArtificialIntelligence, KI-2012, Saarbrucken, Germany. 2012.
2Schuller, Björn, et al. "Selecting training data for cross-corpus speech emotion recognition: Prototypicalityvs. generalization."Proc. 2011 Afeka-AVIOS
Speech Processing Conference, Tel Aviv, Israel. 2011.
74. IEMOCAP
• Tracking continuous
emotional trends of
participants during
affective dyadic
interactions using body
language and speech
information1
(Val./Act./Dom.:0.619/0.637
/0.62)
• Modeling mutual influence
of interlocutor emotion
states in dyadic spoken
interactions2
(Cat./Val./Act.:0.552/0.634/0
.650)
75
1Metallinou, Angeliki, Athanasios Katsamanis, and Shrikanth Narayanan. "Tracking continuous emotional trends of participants during affective dyadic interactionsusing body
language and speech information."Image and Vision Computing 31.2 (2013): 137-152.
2Lee, Chi-Chun, et al. "Modeling mutual influenceof interlocutoremotion states in dyadic spoken interactions."Tenth Annual Conference of the International Speech
CommunicationAssociation. 2009.
75. Language: English
Participants: 4 (Man: 4)
Recordings:
• Dual-channel Audio
• Video
• Face Maker
Sentences: 480 utterances
Labels:
• Perspectives: Naïve-Observer
• Discrete session-level annotation
• Categorical (6)
SAVEE:
Surrey Audio-Visual Expressed Emotion database
76
Jackson, P., and S. Haq. "Surrey Audio-Visual Expressed Emotion(SAVEE) Database." University of Surrey: Guildford, UK (2014).
Available: P Jackson (p.jackson@surrey.ac.uk)
76. SAVEE
77
.
2S. Haq and P.J.B. Jackson. "Speaker-DependentAudio-VisualEmotion Recognition", In Proc. Int'l Conf. on Auditory-Visual Speech Processing, pages
53-58, 2009.
3S. Haq, P.J.B. Jackson, and J.D. Edge. Audio-VisualFeature Selection and Reduction for Emotion Classification. In Proc. Int'l Conf. on Auditory-Visual
Speech Processing, pages 185-190, 2008
• Speaker-Dependent Audio-
Visual Emotion Recognition1
(Cat.: 97.5)
• Audio-Visual Feature
Selection and Reduction for
Emotion Classification3
(Cat.: 96.7)
77. Language: English
Participants: 16 (Man: 7; Female: 9)
Recordings:
• Dual-channel Audio
• HD Video
• Transcript
• Body gesture
Total: 48 dyadic sessions
Sentences: 2162 sentence
Labels:
• Perspectives: Naïve-Observer
• Rater: 3
• Discrete session-level annotation
• Continuous-in-time annotation
• Dimensional (Valence-Activation-Dominance)
CIT:
The USC CreativeIT database of multimodal dyadic
interactions: from speech and full body motion capture
to continuous emotional annotations
78
Metallinou, Angeliki, et al. "The USC CreativeIT database: A multimodal database of theatrical improvisation." Multimodal Corpora: Advances in Capturing, Coding and
Analyzing Multimodality (2010): 55.
Metallinou, Angeliki, et al. "The USC CreativeIT database of multimodaldyadic interactions: From speech and full body motion capture to continuous emotional
annotations." Languageresources and evaluation50.3 (2016): 497-521.
Available: Manoj Kumar (prabakar@usc.edu)
78. CIT
79
1Yang, Zhaojun, and Shrikanth S. Narayanan. "Modelingdynamics of expressive body gestures in dyadic interactions."IEEE Transactions on Affective
Computing 8.3 (2017): 369-381.
2Yang, Zhaojun, and Shrikanth S. Narayanan. "AnalyzingTemporal Dynamics of Dyadic Synchrony in Affective Interactions." INTERSPEECH. 2016.
3Chang, Chun-Min, and Chi-Chun Lee. "Fusion of multiple emotion perspectives: Improvingaffect recognitionthrough integrating cross-lingualemotion
information." Acoustics, Speech and Signal Processing (ICASSP), 2017 IEEE International Conference on. IEEE, 2017.
• Analyzing Temporal Dynamics
of Dyadic Synchrony in
Affective Interactions2
79. Language: English
Participants: 150
Recordings:
• Dual-channel Audio
• HD Video
• Manual Transcript
Multi-Interaction (like TV talk show):
• Human vs. Human
• Semi-human vs. Human
• Machine vs. Human
Total: 959 dyadic sessions (3 min/session)
Labels:
• Perspectives: Naïve-Observer
• Rater: 8
• Continuous-in-time annotation
• Dimensional (Valence-Activation)
• Discrete Categorical (27)
SEMAINE:
The SEMAINE Database: Annotated Multimodal Records of Emotionally
Colored Conversations between a Person and a Limited Agent
80
McKeown, Gary, et al. "The semaine database: Annotatedmultimodal recordsof emotionally colored conversationsbetween a person and a limited
agent." IEEE Transactionson Affective Computing 3.1 (2012): 5-17.
Available: eula@semaine-db.eu
80. SEMAINE
• Building autonomous
sensitive artificial
listeners1
• A Dynamic Appearance
Descriptor Approach to
Facial Actions Temporal
Modeling2
(0.701)
81
1Schroder, Marc, et al. "Buildingautonomous sensitive artificiallisteners." IEEE
Transactions on Affective Computing 3.2 (2012): 165-183.
2Jiang, Bihan, et al. "A dynamic appearance descriptor approach to facial actions
temporal modeling."IEEE transactions on cybernetics 44.2 (2014): 161-174.
81. Language: French
Participants: 46 (Man: 19; Female: 27)
Recordings:
• Dual-channel Audio
• HD Video (15 facial action units)
• Electrocardiogram
• Electrothermal activity
Total: 11 hours, 102 dyadic sessions (3 min/session)
Sentence: 1306 sentence
Labels:
• Perspectives: Self, Naïve-Observer
• Rater: 6
• Continuous-in-time annotation
• Dimensional (Valence-Activation)
RECOLA:
Remote Collaborative and Affective Interactions
82
Ringeval, Fabien, et al. "Introducing the RECOLA multimodal corpusof remote collaborative and affective interactions." Automatic Face and Gesture
Recognition (FG), 2013 10th IEEE International Conference and Workshops on. IEEE, 2013.
Available: Fabien Ringeval (faboem.ringeval@image.fr)
82. RECOLA
• Prediction of asynchronous
dimensional emotion ratings from
audiovisual and physiological data1
(Val./Act.: 0.804/0.528 )
• End-to-end speech emotion
recognition using a deep
convolutional recurrent network2
(Val./Act.: 0.741/0.325 )
• Face Reading from Speech—
Predicting Facial Action Units from
Audio Cues3
(Predict Facial Action Units from
Audio Cues: 0.650 )
83
1Ringeval, Fabien, et al. "Predictionof asynchronous dimensional emotion ratings from audiovisualand physiologicaldata." Pattern RecognitionLetters 66 (2015): 22-30.
2Trigeorgis, George, et al. "Adieu features? End-to-end speech emotion recognition using a deep convolutionalrecurrent network." Acoustics, Speech and Signal
Processing (ICASSP), 2016 IEEE International Conference on. IEEE, 2016.
3Ringeval, Fabien, et al. "Face Reading from Speech—PredictingFacial Action Units from Audio Cues." Sixteenth Annual Conference of the International Speech
CommunicationAssociation. 2015.
83. Language: Chinese
Participants: 238
Recordings:
• Audio
• Video
(34 films, 2 TV series, 4 TV shows)
Total: 2.3 hours,
Labels:
• Rater: 4
• Discrete session-level annotation
• Fake/suppressed emotions
• Multi-emotion annotation for some segments
• Categorical (26 non-prototypical)
2017 Multimodal Emotion Recognition Challenge
(MEC 2017: http://www.chineseldc.org/htdocsEn/emotion.html)
CHEAVD:
A Chinese natural emotional audio-visual database
84
Li, Ya, et al. "CHEAVD: a Chinese naturalemotional audio–visual database." Journal of Ambient Intelligence and Humanized Computing 8.6 (2017): 913-
924.
Available: Ya Li (yli@nlpr.ia.ac.cn)
84. CHEAVD
• MEC 2016: the multimodal emotion recognition
challenge of CCPR 20161 (Cat.: 37.03)
• Chinese Speech Emotion Recognition2 (Cat.: 47.33)
• Transfer Learning of Deep Neural Network for
Speech Emotion Recognition3 (Cat.: 50.01)
85
1Li, Ya, et al. "MEC 2016: the multimodal emotion recognition challenge of CCPR 2016." Chinese Conference on Pattern Recognition. Springer Singapore,
2016.
2Zhang, Shiqing, et al. "Feature Learning via Deep Belief Network for Chinese Speech Emotion Recognition." Chinese Conference on Pattern Recognition.
Springer Singapore, 2016.
3Huang, Ying, et al. "Transfer Learning of Deep Neural Network for Speech Emotion Recognition." Chinese Conference on Pattern Recognition. Springer
Singapore, 2016.
85. Language: Chinese
Participants: 44 (Man: 20; Female: 24)
Recordings:
• Dual-channel Audio
• HD Video
• Manual Transcript
• Electrocardiogram
Total: 11 hours, 102 dyadic sessions (3 min/session)
Sentences: 6029 utterances
Labels:
• Perspectives: Peer, Director, Self, Naïve-Observer
• Rater: 49
• Continuous-in-time annotation
• Discrete session-level annotation
• Dimensional (Valence-Activation)
• Categorical (6)
NNIME:
The NTHU-NTUA Chinese Interactive Multimodal
Emotion Corpus
86
Huang-ChengChou, Wei-Cheng Lin, Lien-ChiangChang, Chyi-ChangLi, Hsi-Pin Ma, Chi-Chun Lee "NNIME:
The NTHU-NTUA Chinese InteractiveMultimodal Emotion Corpus" in Proceedings of ACII 2017
Available: Huang-Cheng Chou (hc.chou@gapp.nthu.edu.tw)
Chi-Chun Lee (cclee@ee.nthu.edu.tw)
86. NNIME
• Cross-Lingual
Emotion
Information1,3(sessio
n)
(Val./Act.: 0.682/0.604)
• Dyad-Level
Interaction2
(Cat.: 0.65)
87
1Chun-Min Chang, Bo-Hao Su, Shih-Chen Lin, Jeng-Lin Li, Chi-Chun Lee*, "A Boostrapped Multi-ViewWeighted Kernel Fusion Framework for Cross-Corpus Integration of
Multimodal Emotion Recognition"in Proceedingsof ACII 2017
2Yun-Shao Lin, Chi-Chun Lee*, "DerivingDyad-Level InteractionRepresentation using InterlocutorsStructural and Expressive Multimodal Behavior Features" in Proceedings
of the InternationalSpeech CommunicationAssociation (Interspeech),pp. 2366-2370, 2017
3Chun-Min Chang, Chi-Chun Lee*, "Fusion of Multiple Emotion Perspectives:Improving Affect RecognitionThrough Integrating Cross-Lingual Emotion Information" in
Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp.5820-5824, 2017
93. 正向情緒
•Achievement 88.4%
•Amusement 90.4%
•Contentment 52.4%
•Pleasure 61.6%
•Relief 83.9%
分類情緒 易引發情緒之語音組
•篩選自前一實驗的樣本
•結果平均落在73%-94%
•Amusement及disgust容易區分
•Pleasure和sadness難辨別
•部分的非語言性語音容易混淆
加入”以上皆非”選項
•與分類情緒比較
•Sadness及relief上升24%及
17.5%
•Amusement下降12%
•平均上升7.9%
聽見情緒? 人?
非語言性語音
Achievement,
Amusement, Anger,
Contentment, Disgust,
Pleasure, Relief,
Sadness, Surprise
平均: 69.9%
Sauter, Disa. An investigation into vocal expressions of emotions: the roles of valence, culture, and acoustic factors. University of London, University College London (United Kingdom), 2007.
編碼 解碼
94
94. Laugh
Cry
Sigh
Whisper
Whine
副語言語音情緒
Laukka, Petri, et al. "Cross-cultural decoding of positive and negative non-linguistic emotion vocalizations." Frontiers in Psychology 4 (2013).
Gupta, Rahul, et al. "Detecting paralinguistic events in audio stream using context in features and probabilistic decisions." Computer Speech & Language 36 (2016): 72-92.
Laughter & Fillers
2015
IS2013 sub-challenge
AUC for Detection
Laughter : 95.3 %
Fillers : 90.4 %
Cross-Culture
2013
Universal Emotion
Non-Verbal Signals
Speak : India, USA,
Kenya, Singapore,
Listen : Sweden
95
95. 那機器怎麼從語音辨識?
Sahu, Saurabh & Gupta, Rahul & Sivaraman, Ganesh & AbdAlmageed, Wael & Espy-Wilson, Carol. (2017). Adversarial Auto-Encoders for Speech Based Emotion Recognition. 1243-1247.
10.21437/Interspeech.2017-1421.
Rao, K. Sreenivasa, Shashidhar G. Koolagudi, and Ramu Reddy Vempada. "Emotion recognition from speech using global and local prosodic features." International journal of
speech technology 16.2 (2013): 143-160.
Lalitha, S., et al. "Emotion detection using MFCC and Cepstrum features." Procedia Computer Science 70 (2015): 29-35.
Huang, Che-Wei, and Shrikanth Narayanan. "Characterizing Types of Convolution in Deep Convolutional Recurrent Neural Networks for Robust Speech Emotion Recognition." arXiv preprint
arXiv:1706.02901 (2017).
Lee, Jinkyu, and Ivan Tashev. "High-level feature representation using recurrent neural network for speech emotion recognition." INTERSPEECH. 2015.
Emo-DB
Prosodic
SVM
62.43%
MFCC
ANN
85.7%
Deep Convolution
High-Level
Representation
(time series)
96
96. 語音情緒中的特徵
Dimosa, Kostis, Leopold Dickb, and Volker Dellwoc. "Perception of levels of emotion in speech prosody." The Scottish Consortium for ICPhS (2015).
Erickson, Donna. "Expressive speech: Production, perception and application to speech synthesis." Acoustical Science and Technology 26.4 (2005): 317-325.
Sauter, Disa. An investigation into vocal expressions of emotions: the roles of valence, culture, and acoustic factors. University of London, University College London (United Kingdom), 2007.
Erickson, Donna. "Expressive speech: Production, perception and application to speech synthesis." Acoustical Science and Technology 26.4 (2005): 317-325.
“emotional prosody does not function categorically, distinguishing
only different emotions, but also indicates different degrees of the
expressed emotion.”
pitch and pitch variation is especially important for people to recognize
emotion from non-verbal sounds
voice quality tension
Some Experiments : change the sound (remove pitch, noisy channel, …)
語音描述性特徵 (descriptors) 與情緒息息相關
A Review : Research Findings of Acoustic and Perceptual Studies
97
98. 特徵擷取 (Low-level Descriptors)
Low Level Descriptors (10 – 15 ms)
Mel Frequency Cepstral Coefficients
Pitch
Signal Energy
Loudness
Voice Quality (Jitter, Shimmer)
Log Filterbank Energies
Linear Prediction Cepstral Coefficients
CHROMA and CENS Features (Music)
Compute
原始訊號
Statistics Method
Continuous Qualitative Spectral
Pitch
Energy
Formants
Voice quality :
Harsh, tense,
breathy
LPC
MFCC
LFPC
99
99. 例子: 用音高辨別情緒
Arias, Juan Pablo, Carlos Busso,and NestorBecerra Yoma. "Shape-based modeling of the fundamental frequency contour for emotion detection in speech." ComputerSpeech &
Language28.1(2014): 278-294.
emotionally salient temporal segments
• 75.8% in binary emotion classification
• Dot, dash : subjective, dev. of sujective
• Solid : objective
100
100. 更多特徵計算—語音的產生
Source Filter
(ex) High arousal
Physically
Vocal Production System
• Respiration
• Vocal Fold Vibration
• Articulation
increase tension in laryngeal
musculature 喉肌肉組織
raised subglottis pressure
change production of sound at glottis
vocal quality
Johnstone, Tom & Scherer, Klaus. (2000). Vocal communication of Emotion. Handbook of Emotions,. .
101
101. 更多特徵計算—語音的感知
Mel-scale Filter Bank
The response of the basilar membrane
as a function of frequency, measured at
six different distances from the stapes
The psychoacoustical transfer function
Stern, Richard M., and Nelson Morgan. "Features based on auditory physiology and perception." Techniques for Noise Robustness in Automatic Speech Recognition (2012): 193227.
102
103. 語音情緒深度學習
End to End – From LLD to Deep Learning
深度學習模型擷取更全面的語音特徵資訊
Z. Aldeneh and E. M. Provost, "Using regional saliency for speech emotion recognition," 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, LA, 2017, pp.
2741-2745.doi: 10.1109/ICASSP.2017.7952655
C. W. Huang and S. S. Narayanan, "Deep convolutional recurrent neural network with attention mechanism for robust speech emotion recognition," 2017 IEEE International Conference on Multimedia and
Expo (ICME), Hong Kong, 2017, pp. 583-588. doi: 10.1109/ICME.2017.8019296
signal Neural Network emotion
CNN for Time Series Signal
Attention
104
104. 語音特徵擷取工具
跨平台
YAAFE, an Easy to Use and EfficientAudio Feature Extraction Software, B.Mathieu,S.Essid, T.Fillon,J.Prado, G.Richard,proceedingsof the 11th ISMIR conference, Utrecht,Netherlands,2010.
Florian Eyben, Felix Weninger, Florian Gross, Björn Schuller: “RecentDevelopments in openSMILE,the Munich Open-Source Multimedia Feature Extractor”, In Proc. ACM Multimedia (MM),
Barcelona,Spain, ACM, ISBN 978-1-4503-2404-5,pp. 835-838,October 2013. doi:10.1145/2502081.2502224
Paul Boersma & David Weenink (2013): Praat: doing phonetics by computer [Computer program].
105
106. 情緒 認知 語言
讀到語言內的情緒?
Schwarz-Friesel, Monika. "Language and emotion." The Cognitive Linguistic Perspective, in: Ulrike Lüdtke (Hg.), Emotion in Language. Theory–Research–Application,
Amsterdam (2015): 157-173.
轉換關係
情感
判斷
Lexicon
Grammar
Ideational Meaning
具象化
語言是一種幫助人類獲知情緒的管道
Lindquist, Kristen A., Jennifer K. MacCormack, and Holly Shablack. "The role of language in emotion: predictions from psychological constructionism." Frontiers in psychology 6 (2015).
107
107. 辨識語意情緒
Human Behavior Evaluation
•Cuple’s Therapy
•Oral Presentation
Reviews
•Hotels HBRNN
•Amazon Cross-Lingual
•Movie (93%), Book (92%), DVD (93%), … PNN + RBM
Tweets
•Positive & Negative
•DCNN & LSTM
巨量資料 缺乏標註 結構複雜
Ain, Qurat Tul, et al. "Sentiment analysis using deep learning techniques: a review." Int J Adv Comput Sci Appl 8.6 (2017): 424
108
108. Review Article Social Media Talk
語意分析, 文字帶有甚麼?
It’s terrible!
What Texts Tell Us (Topics) Emotional Polarity
It’s cool!
Parts of Speech (POS) tagsN-Gram
https://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html
VB
VBD
NN
NNS
JJ
JJR
JJS
IN
TO
109
110. 建立情緒字典
Changqin Quan and Fuji Ren. 2009. Construction of a blog emotion corpus for Chinese emotional expression analysis. In Proceedings of the 2009 Conference on Empirical Methods in
Natural Language Processing: Volume 3 - Volume 3 (EMNLP '09), Vol. 3. Association for Computational Linguistics, Stroudsburg, PA, USA, 1446-1454.
Mohammad, Saif M., and Peter D. Turney. "Crowdsourcing a word–emotion association lexicon." Computational Intelligence29.3 (2013): 436-465.
Pennebaker, James W., et al. The development and psychometric properties of LIWC2015. 2015.
假設 :
關鍵字富含先天的正
負面向,句意的正負
面來自這些關鍵字的
使用
LIWC (Linguistic Inquiry Word Count)
分類字典 : 語意種類(不只情緒)
64個類別,約4500字
情緒正面/負面 : 406/499
Seed WordGold Standard : 關鍵字
1. 由標記者選出具有
情緒的字眼
2. 半自動化利用文字
結構及相關性找出
其他情緒關鍵字
形容詞 副詞、名詞、動詞
以選擇題的方式非固
定標記者標註,以不
同問題設計驗證,如
哪個字像這個字,這
個字跟快樂有無關聯
群眾智慧
111
112. Sentiment Analysis (Supervised)
標記者間的標記相關度 = 0.76
各情緒約落在 0.6 到 0.79
使用關鍵字辨識 = 0.66
自動化情緒辨識 = 0.73 (Naïve Bayes, SVM)
Aman, Saima, and Stan Szpakowicz. "Identifying expressions of emotion in text." Text, speech and dialogue. Springer Berlin/Heidelberg, 2007.
Feature Representation
Classifier
Emotion Label
113
113. 近年深度學習方式Deep Model
Lopez, Marc Moreno, and Jugal Kalita. "Deep Learning applied to NLP." arXiv preprint arXiv:1703.03091 (2017).
.
Embed Embed Embed
I
LSTM LSTM LSTM
love it
positive
114
119. FACS
The tool for annotating facial expressions
What The Face Reveals is strong evidence for the
fruitfulness of the systematic analysis of facial expression
Paul Ekman and Wallace V. Friesen 1976
120. Action Unit (AUs) 動作單元
• AUs are considered to be the smallest visually
discernible facial movement
• As AUs are independent of any interpretation,
they can be used as the basis for recognition of
basic emotions
• It’s an explicit means of describing all possible
movements of face in 46 action points
121. Action Unit (AUs)
• FACS is a tool for measuring facial expressions
• Each observable component of facial moment is called
an AUs
• All facial expressions can be broken down into their
constituent AU’s
AU Description Example AU Description Example
1 Inner Brow Raiser 12 Lip Corner Puller
4 Brow Lowerer 13 Cheek Puffer
7 Lid Tightener 20 Lip stretcher
123. Facial
Expressions
of Emotion
(e.g., happy,
fear, disgust,
surprise, etc)
Automatic face &
Facial feature detection
Face alignment
Multiple image
windows at a variety
of Locations and scales
Feature extraction:
Facilitate subsequentlearning
and generalization,leadingto
betterhuman interpretation
Image filter:
Modify or enhance theimage
Facial AU
(e.g., AU1, AU7,
AU6+ASU15,
etc)
Rule-based
classifier
一個簡易辨識的架構
e.g.,
Gabor filter coefficients
124. Facial
Expressions
of Emotion
(e.g., happy,
fear, disgust,
surprise, etc)
Automatic face &
Facial feature detection
Face alignment
Multiple image
windows at a variety
of Locations and scales
Feature extraction:
Facilitate subsequentlearning
and generalization,leadingto
betterhuman interpretation
Image filter:
Modify or enhance theimage
Facial AU
(e.g., AU1, AU7,
AU6+ASU15,
etc)
Rule-based
classifier
"Recognizing action units for facial expression analysis.“
Tian, Y-I., Takeo Kanade, and Jeffrey F. Cohn.
125. Recognize AUs for Facial
Expression Analysis
- Rule-based Classifier
規則式分類器
Informed by FACS AUs, they group the facial features into upper and lower parts because the facial actions in two sides
are relatively independent for AU recognition [14]
P. Ekman and W.V. Friesen, The facial action coding system: A technique for the measurement of facial movement
single AU detection
combined AU detection
126. Recognize AUs for Facial
Expression Analysis
- Results 結果
AU detection
Ekman-Hager
Single AU
detection
Combine AU
detection
Recognition
rate
Upper
face 75 % 86.7 %
Lower
face 95.8 % 90.7 %
AU detection
cross database
Test databases
Train
databaseCohn-
Kanade
Ekman-
Hager
Recognition
rate
Upper
face 93.2 % 86.7 %
Ekman-
Hager
Lower
face 90.7 % 93.4 %
Cohn-
Kanade
一次預測多個比較準: 連動
AU準確率其實相當不錯, 跨資料庫也是
127. Facial
Expressions
of Emotion
(e.g., happy,
fear, disgust,
surprise, etc)
Automatic face &
Facial feature detection
Face alignment
Multiple image
windows at a variety
of Locations and scales
Feature extraction:
Facilitate subsequentlearning
and generalization,leadingto
betterhuman interpretation
Image filter:
Modify or enhance theimage
Facial AU
(e.g., AU1, AU7,
AU6+ASU15,
etc)
Rule-based
classifier
"Recognizing Facial Expressions of Emotion using Action Unit Specific Decision Thresholds “
Mustafa Sert, and Nukhet Aksoy
AAM face
track model
128. Recognizing Facial Expressions of Emotion using
Action Unit Specific Decision Thresholds
• Extract facial images from Active Appearance Model (AAM) 主動式外
觀模型 to form an appearance model
• Facial AU multi-class classification using ADT 適應性決策模型 for
both AU detection and facial expression recognition
• ADT learns a separate decision threshold 𝑇𝑖 for each AU category,
assign instance 𝑥 to category 𝑖 if and only if satisfied:
𝑓𝑖 𝑥 = 𝒘𝑖
𝑇
Φ𝒙 + 𝒃𝒊 > 𝑇𝑖
Φ is the mapping function to map SVM to high dimension space
129. Recognizing Facial Expressions of Emotion using
Action Unit Specific Decision Thresholds
↑: Prototypic and major variants of AU combinations for
facial expression fear. ‘+’ denotes logical AND ‘,’ indicates
logical OR
Facial expression recognition accuracy of the proposed scheme. →
Bold bracketed numbers indicate best result,
bold numbers denote second best
130. Recognizing Facial Expressions of Emotion using
Action Unit Specific Decision Thresholds
• ADT-based AU detector along with the rule-
based emotion classifier (B&D) outperforms the
baseline methods (A&C)
• ↑Among the proposed method, D gives best
results in all facial emotion categories except
surprise
• → The proposed ADT scheme outperforms the
baseline method by an average F1-score of
6.383% for 17 AUs
• → It gives superior performance in terms of F1-
score compared with the baseline method for all
AUs except AU2
131. Facial
Expressions
of Emotion
(e.g., happy,
fear, disgust,
surprise, etc)
Automatic face &
Facial feature detection
Face alignment
Multiple image
windows at a variety
of Locations and scales
Feature extraction:
Facilitate subsequentlearning
and generalization,leadingto
betterhuman interpretation
Image filter:
Modify or enhance theimage
Facial AU
(e.g., AU1, AU7,
AU6+ASU15,
etc)
Rule-based
classifier
"Compound facial expressions of emotion: from basic research to clinical applications“
Shichuan Du, and Aleix M. Martinez
Observations under distinct
compound emotions
133. Compound facial expressions of emotion
• AU intensity shown in a cumulative histogram for each AU and emotion
category
• The x-axis in these histograms specifies the intensity of activation
• The y-axis in these histograms defines the cumulative percentage of intensity
(scale 0 to 1)
• Numbers between zero and one specify the percentage of people using the
specified and smaller intensities.
Fig. AUs used to express a compound emotion are consistent
with the AUs used to express its component categories
136. 肢體語言
• 非語言溝通(肢體動作與臉部表情)源自於達爾文
『人類與動物的情緒表達』論文中
• 以動作分析系統來描述肢體動作
• 透過動作分析,有下列基本動作特徵描述:
(1)軀幹運動(伸展,鞠躬)
(2)手臂運動(開,關)
(3)垂直方向(向上,向下)
(4)矢狀方向(向前,向後)
(5)力(強,輕)
(6)速度(快,慢)
(7)直接性(直接,間接)
• 心理學實驗運用動作特徵找尋與辨識情緒之間的關係
reference: de Meijer, M. The contribution of general features of body movement to the attribution of emotions. Journal of
Nonverbal Behavior 13, 4 (1989), 247–268. 137
137. 肢體與情緒相關的研究
• Psychology
• Bull, P. E. Posture and gesture. Pergamon press, 1987.
• Pollick, F. E., Paterson, H. M., Bruderlin, A., and Sanford, A. J. Perceiving affect from arm
movement. Cognition 82, 2 (2001), B51–B61.
• Coulson, M. Attributing emotion to static body postures: Recognition accuracy, confusions,
and viewpoint dependence. Journal of nonverbal behavior 28, 2 (2004), 117–139.
• Boone, R. T., and Cunningham, J. G. Children’s decoding of emotion in expressive body
movement: The development of cue attunement. Developmental psychology 34 (1998),
1007–1016.
• de Meijer, M. The contribution of general features of body movement to the attribution of
emotions. Journal of Nonverbal Behavior 13, 4 (1989), 247–268.
• Engineer
• Balomenos, T., Raouzaiou, A., Ioannou, S., Drosopoulos, A., Karpouzis, K., and Kollias, S.
Emotion analysis in man-machine interaction systems. In Machine learning for multimodal
interaction. Springer, 2005, 318–328.
• Coulson, M. Attributing emotion to static body postures: Recognition accuracy, confusions,
and viewpoint dependence. Journal of nonverbal behavior 28, 2 (2004), 117–139.
reference: Stefano Piana, Alessandra Staglianò, Francesca Odone, Alessandro Verri, Antonio Camurri,
“Real-time Automatic Emotion Recognition from Body Gestures” in ArXiv 2014 138
138. 怎麼收資料?
12 actors
four female and eight males
aged between 24 and 60
total of about 100 videos
separate clips of expressive gesture
reference:
1) Stefano Piana, Alessandra Staglianò, Francesca Odone, Alessandro Verri, Antonio Camurri, “Real-
time Automatic Emotion Recognition from Body Gestures” in ArXiv 2014
2) Amol S. Patwardhan and Gerald M. Knapp, “Augmenting Supervised Emotion Recognition with
Rule-Based Decision Model.” in ArXiv 2016
QualisysKinect
139
139. Data Validation – Human annotation
人來評?
The sole 3D skeleton is a
guarantee that the user is
not exploiting other
information
Not easy for human to
recognize emotion only based
on gesture
reference: Stefano Piana, Alessandra Staglianò, Francesca Odone, Alessandro Verri, Antonio Camurri,
“Real-time Automatic Emotion Recognition from Body Gestures” in ArXiv 2014 140
140. Skeleton based feature
anger sadness
happiness fear
surprise disgust
reference:
1)Stefano Piana, Alessandra Staglianò, Francesca Odone, Alessandro Verri, Antonio Camurri, “Real-time Automatic Emotion Recognition from Body Gestures”
in ArXiv 2014
2) Piana, S., Stagliano`, A., Camurri, A., and Odone, F. A set of full-body movement features for emotion recognition to help children affected by autism
spectrum condition. In IDGEI International Workshop (2013).
Histogram: Energy on each of frames
141
141. Classification Result
Qualisys Data with 310 gestures
Kinect Data with 579 gestures
Clean Dataset
Noisy Dataset
Almost the same to the human’s recognition ability 142
142. Skeleton Capture Method 系統
Kinect
reference:
https://itp.nyu.edu/classes/dance-f16/kinect/,
https://github.com/CMU-Perceptual-Computing-Lab/openpose
https://www.qualisys.com/
expensive, sophisticated
system with multiple high
speed camera
cheap, easy to get RGB-D
3D camera device
free, new software
system with CNN
Qualisys
OpenPose
143
149. 社會神經系統
• 神經系統的進化決定了人類情緒表達形式、交流
質量和調節行為的能力
• 抑制系统通過降低心律、降低血壓,抑制心臟交
感活性等調節手段,维持個體生長所需要的新陳
代謝的平衡狀態。
• 心率能夠作為一個敏感的信號去反應和識別社交
活動
reference:
D.S.Quintana,A.J.Guastella,T.Outhred,I.B.Hickie,andA.H.Kemp. Heart rate variability is associated with emotion recognition: direct
evidence for a relationship between the autonomic nervous system and social cognition. Int. J. of Psychophysiol, 86(2):168–172, 2012
http://blog.sina.com.cn/s/blog_753e49f90100pop2.html
http://www.xzbu.com/6/view-2908185.htm
150
150. 心率變化 HRV (Heart Rate Variability)
• 自律神經系統(ANS)
• 包含交感與副交感神經系統
• 因各種情感刺激而產生作用
• 比方說受到驚嚇時,人會不自主地心跳加速、臉色發青,這就是交感神
經在運作
• 心率變異性(HRV)
• 反映了交感與副交感神經系統的平衡狀態
• 多項與自律神經活躍有關的因素,皆會使心率變異降低,譬如:血壓變
化、呼吸、身體或心理壓力,甲狀腺亢進與藥物治療等
• 心率變異分析(HRV analysis)
• 有一套完整並且標準化的評斷方法[2]
reference:
1) María Teresa Valderas , Juan Bolea, Pablo Laguna, Montserrat Vallverdú, Raquel Bailón, “Human Emotion Recognition Using
Heart Rate Variability Analysis with Spectral Bands Based on Respiration” in Engineering in Medicine and Biology Society (EMBC),
2015 37th Annual International Conference of the IEEE.
2) Task Force of the European Society of Cardiology and the North American Society of Pacing and Electrophysiology (1996) Heart
rate variability. Standards of measurement, physiological interpretation, and clinical use. Eur Heart J 17(3):354-81
3) D.S.Quintana,A.J.Guastella,T.Outhred,I.B.Hickie,andA.H.Kemp.
Heart rate variability is associated with emotion recognition: direct evidence for a relationship between the autonomic nervous
system and social cognition. Int. J. of Psychophysiol, 86(2):168–172, 2012.
151
152. 資料收集方法
• Emotion elicitation
• real experiences
• film clips
• problem solving
• computer game interfaces
• images
• spoken words
• music
• Movie clips method
• emotion inducing method more efficient than others verified by
previous studies
• 4 films (3- 10 min for each one)
• 4 emotion: angry, fear, sad and happy
• ECG data was record for 90 sec at 2 min before the end of movies.
reference:
1)Han Wen Guo, Yu Shun Huang, Jen Chien Chien, Jiann Shing Shieh, “Short-term Analysis of Heart Rate Variability for Emotion
Recognition via a Wearable ECG Device” in Intelligent Informatics and Biomedical Sciences (ICIIBMS), 2015
2) Mimma Nardelli, Gaetano Valenza, Alberto Greco, Antonio Lanata, Enzo Pasquale Scilingo, “Recognizing Emotions Induced by
Affective Sounds through Heart Rate Variability” in IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, VOL. 6, NO. 4, OCTOBER-
DECEMBER 2015
induced
153
153. ECG process pipeline
reference: Abhishek Vaish and Pinki Kumari, “A Comparative Study on Machine Learning Algorithms in Emotion State Recognition
Using ECG” in Proceedings of the Second International Conference on Soft Computing for Problem Solving (SocProS 2012),
December 28-30, 2012
154
154. ECG Feature Extraction: HRV
Time Domain Feature Frequency Domain Feature
reference: Han Wen Guo, Yu Shun Huang, Jen Chien Chien, Jiann Shing Shieh, “Short-term Analysis of Heart Rate Variability for
Emotion Recognition via a Wearable ECG Device” in Intelligent Informatics and Biomedical Sciences (ICIIBMS), 2015
1. MeanRRI:
average of resultant RR intervals.
2. CVRR:
the ratio of the standard deviation
and mean of RR intervals.
3. SDRR:
stand deviation of the RR intervals.
4. SDSD:
standard deviation of the
successive
differences of the RR intervals.
1. LF (low frequency):
standardized LF power (0.04-0.15
Hz)
2. HF (high frequency):
standardized HF power (0.15-0.4
Hz)
3. LHratio:
the ratio of LF/HF
he shapes of the
probability distributions
Statistic Feature
Evaluate the distribution :
155
155. Analysis on Feature
Time Domain Feature Frequency Domain Feature Statistics Feature
reference: Han Wen Guo, Yu Shun Huang, Jen Chien Chien, Jiann Shing Shieh, “Short-term Analysis of Heart Rate Variability for
Emotion Recognition via a Wearable ECG Device” in Intelligent Informatics and Biomedical Sciences (ICIIBMS), 2015
156
平衡的機制
156. Classifier
reference: Han Wen Guo, Yu Shun Huang, Jen Chien Chien, Jiann Shing Shieh, “Short-term Analysis of Heart Rate Variability for
Emotion Recognition via a Wearable ECG Device” in Intelligent Informatics and Biomedical Sciences (ICIIBMS),2015 157
164. Generated Perspectives Multi-view Kernel Fusion
1. Chun-Min Chang, Bo-Hao Su, Shih-Chen Lin, Jeng-Lin Li, Chi-Chun Lee, "A Boostrapped Multi-View Weighted Kernel Fusion Framework for Cross-Corpus Integration of Multimodal Emotion Recognition"
in Proceedings of ACII 2017
2. Chun-Min Chang, Chi-Chun Lee, "Fusion of Multiple Emotion Perspectives: Improving Affect Recognition Through Integrating Cross-Lingual Emotion Information" in Proceedings of the International
Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2017
跨語言語音情緒辨識 ? 整合語言特性
171. (厭倦)情緒在健康行為上面的影響
Well-beings
厭倦情緒和大部分的與成癮問題有很高的相關性。但這種情緒也是促進一個人
的生產力的動力(Boredom, it turns out, can be a dangerous and disruptive state of
mind that damages your health)(Mann’s )
Ref: http://alcoholrehab.com/drug-addiction/boredom-and-substance-abuse/
Boredom導致成癮發生的原因多
半來自生活上選擇的侷限,例如
青少年面對一成不變的念書生活,
導致他們想尋求刺激,加上好奇
心的驅使、未成熟的價值觀、同
儕的慫恿,使得酗酒、賭博、暴
食、等等對於健康傷害的問題出
現。
大多數沒有成癮問題的人回報說
他們鮮少有無聊厭倦的問題,主
因來自他們對於生活上成就感的
追求使得他們沒有餘力接觸這些
陋習,因此找到生活的目標是解
決成癮問題的良藥
Ref: On the Function of Boredom (Shane W. Bench)
172
172. 情緒對自閉症的幫助
Ref: The Facilitation of Social-Emotional Understanding and Social Interaction in High-Functioning Children with Autism:
自閉症的社交障礙:
社交障礙是自閉症的病徵,其重大原因在於他們無法判讀情
緒,研究顯示及早教育自閉症情緒上的知識有助於讓他們融
入社會。
Ref: Social Skills Deficits in Children with Autism Spectrum Disorders: Evidence Based Interventions
主動社交
經過情緒治
療後的自閉
症患者主動
表達社交意
願提高
眼神接觸
判讀他人情
緒會通常脫
離不了從人
的眼神上判
讀。受過教
育的自閉症
病人有獲得
改善
分享自己
經過情緒治
療後的自閉
症患者能懂
得合同材分
享自己
情緒表達
情緒教育的
介入讓自閉
症更能描述
比較複雜的
情緒
173
173. 教育學習與情緒
學習動機在學習行為上不能以成為不
可忽略要素(Pintrich, 1991, p. 199)
動機心理學、認知心理學、發展心理
學、教育心理學致力於了解人類學習
運作機制,情緒在學習上變成了學習
的基本要素,不論在教學、學習的過
程中都是,因此,了解情緒的運作對
於教育人員是非常重要的
( 摘 錄 自 : special issue of the
Educational Psychologist) 企劃
目標
興趣
自我
情緒
成就感
Ref: The Importance of Students‘ Goals in TheirEmotional Experience of Academic Failure:Investigating the
Precursors and Consequences of Shame (Jeannine E. Turner )
174
174. 著作:《The importance of Students’ Goals
in Their Emotional Experience of Academic
Failure: Investigating the Precursors and
Consequences of Shame,”》
發現:給予學生長遠的計畫有助於提
升學習的彈性。
這篇研究主要在探討學生學習失敗的
例子。羞恥感、失敗、錯愕常常導致
學生輟學、自卑等等負面影響,但如
果我們藉由輔導他產生短程、長程的
學習目標,在增加學習彈性的情況下,
能夠加強學生自我管控的行為,並且
提升學生受挫抗壓的能力。
The Importance of Students‘ Goals in TheirEmotional Experience of
Academic Failure:Investigating the Precursors and Consequences ofShame
(Jeannine E. Turner)
175
176. 情緒能隨著音樂被喚起
• 情緒會渲染聽到音樂的感覺,情緒也會隨著音
樂被激發(communication and perception of
emotion in music)
• 音樂引發出的情緒後果(emotional
consequences of music listening)
• 情緒可作為使用者音樂偏好的依據(predictors
of music preferences)
音樂對於人類情緒一直密不可分,心理學家還在尋求音
樂與情緒的直接關聯性Swathi Swaminathan 提出了三種
對情緒的假設:
Ref: Current Emotion Research in Music Psychology (Swathi Swaminathan )
177
181. • Uses a standard MRI scanner
• Acquires a series of images (numbers)
• Measure changes in blood oxygenation
• Use non-invasive, non-ionizing radiation
• Can be repeated many times; can be used for a
wide range of subjects
• Combines good spatial and reasonable temporal
resolution
Synopsis of fMRI
182
185. Co-activation graph for each emotion category
A) Force-directed graphs for each emotion category, based on the Fruchterman-Reingold spring
algorithm
B) The same connections in the anatomical space of the brain.
186連結的腦區 - 非單一區
189. Our Research: Human-centered Behavioral Signal Processing (BSP)
Prof. Shrikanth Narayanan
Seek a window into human mind and traits…
…through engineering approach
S. Narayanan and P. G. Georgiou, “Behavioral signal processing: Deriving human behavioral informatics from speech and language," Proceedings of the IEEE, vol. 101, no. 5, pp. 1203–1233, 2013.
Daniel Bone, Chi-Chun Lee, Theodora Chaspari, James Gibson, Shrikanth Narayanan, "Signal Processing and Machine Learning for Mental Health Research and Clinical Applications", in IEEE Signal Processing Magazine
人類行為訊號暨互動計算研究室 (EECS713)
Behavioral Informatics and Interaction Computation Laboratory (BIIC)
190. 訊號處理 (Signal Processing)
機器學習演算法 (Machine Learning)
決策工具 (Decision Analytics)
High-dimensional Behavior Space, Non-linear Predictive Recognition,
Multimodal Integration, Experts Decision Mechanism
Spatial-temporal modeling
De-noising
Feature extraction
Supervised
Unsupervised
Semi-supervised
Our Technology: Human-centric Decision Analytics Research & Development
Core Technology
Speech & Language
Diarization, SpeakerID, ASR,
Paralinguistic Descriptors, Emotion-
AI, Sentiment, Word-topic
Representation
Computer Vision
Segmentation, Tracking, Image-
Video Descriptors
Multimodal Fusion
Joint speech-language-gesture
modeling for multimodal prediction,
Multi-party interaction modeling
Representation Learning
Behavior embedded space learning,
clinical health informatics data
representation
Predictive Learning
Deep-learning, machine learning
based predictive modeling