[系列活動] Emotion-AI: 運用人工智慧實現情緒辨識

Emotion-AI:
運用人工智慧實現情緒辨識
李祈均
資料科學年會課程
December 16th, 2017
1

Affective Computing (1995):
the study and development of systems and devices that can
recognize, interpret, process, and simulate human affects
Professor Rosalind Piccard
MIT Media Lab
Annual Conference on Affect
Computing and Intelligent
Interaction (ACII)
ACII 2017 @ San Antonio
2

背景簡介及其應用
情緒行為資料庫收集與處理
情緒辨識人工智慧技術
情緒辨識的現在發展與未來
3

“People also smile when they are
miserable.”
― Paul Ekman, Telling Lies: Clues to Deceit
the Marketplace, Politics, and Marriage
5

Philosophy
Discuss emotion with philosophy
Turn to Practical
Combing physical and emotion to
start to apply the systems on human
Cognitive Process
Cognitive Theory
Mind-Body Dualism
Combining physical world with emotion
Modern Theory
1
2
3
4
5
7

https://aquileana.wordpress.com/2014/04/14/platos-phaedrus-the-allegory-of-the-chariot-and-the-tripartite-nature-of-the-soul/
Plato’s horses
Successful Person- Reason horse is more in control
described emotion and reason as two horses pulling us in opposite directions.
Philosophy
8

Stoicism Aristippus
Philosophy
9

Mind-Body Dualism
In the 17th century, René Descartes viewed the body’s emotional apparatus as largely
hydraulic. He believed that when a person felt angry or sad it was because certain internal
valves opened and released such fluids as bile and phlegm
Mind-Body Dualism
Combining physical world with emotion
10

Charles Darwin believed that emotions were beneficial for evolution because emotions
improved chances of survival. For example, the brain uses emotion to keep us away from a
dangerous animal (fear), away from rotting food and fecal matter (disgust), in control of
our resources (anger), and in pursuit of a good meal or a good mate (pleasure and lust).
Damasio, Antonio R. Looking for Spinoza: Joy, Sorrow, and the Feeling Brain. New York NY: Harcourt, Inc., 2003.
Turn to Practical
Discuss the combination with physic and
emotion and start to apply the system
of human
11

美國心理學之父
James, William. 1884. "What Is an Emotion?" Mind. 9, no. 34: 188-205.
當身體產生（生理）變化時，我們感受到這些變化，這就是情緒。
Our feeling of the same changes as they occur is the emotion
Modern Theory
12

James- Lange Cannon-Bard Schachter & Singer
13

James- Lange Cannon-Bard Schachter & Singer
情緒並非外在刺激，而是生理變化產生
刺激引發自主神經系統的活動，產生
生理狀態上的改變，生理上的反應導
致了情緒
批評:
1.不同的情緒，會有相似的生理變
化。例如:興奮和憤怒都會心跳加速
2.生理變化如來自自主神經支配，
個體不應知道自己有此情緒
3.如人工方法給個體生理變化，不
能產生真正的情緒。例如:腎上腺素
注射，無法引發害怕。
修正James的理論，因其過於強調自主
神經系統在情緒中的作用，認為情緒
刺激應由丘腦進行
外在刺激，傳到中視丘和下次丘，
再傳到大腦皮質，同時引發生理反
應和情緒
Two-factor theory of emotion
刺激引起知覺而使個體對情境有認
知考量，加上生理變化之認知解釋，
兩項認知同時引起情緒表達
對自己生理變化的認知
對刺激情境性質的認知
實驗:沙赫特和辛格認為，情緒是認知
因素和生理喚醒狀態兩者交互作用的
產物
生理喚醒是情緒激活的必要條件，但
是真正的情緒體驗是由對喚醒狀態賦
予的「標記」決定的。這種「標記」
的賦予是一種認知過程，個體利用過
去經驗和當前環境的信息對自身喚醒
狀態作出合理的解釋，正是這種解釋
決定著產生怎樣的情緒。
批評:
1. 處理情緒的主要部分為下視丘
(hypothalamus) 和邊緣系統 (limbic
system) 而非視丘
14

321
認知心理學
行為學派
精神分析學派
18

 神經症的心理治療方法
 醫療實踐中逐步形成的一種心理學理論
 從外顯和內隱的方面描述了內驅力、感情、衝
突、心理和人格等現象
 從情緒的角度看, 弗洛伊德把情緒放置在內驅力
和無意識的框架之內
3
精神分析學派
19

• 情緒只是有機體對待特定環境的一種反應
• 從反應模式和活動水平兩方面去描述情緒。
• 情緒是一種遺傳的反應模式，它包括整個的身體機制，特別是
內臟和腺體活動系統的深刻變化。
• 操作條件反射論者斯金納特別註意從動物在個體生活中的習得
行為研究情緒，發展了用條件反射技術來引發情緒的方法，並
把挫折效應作為研究情緒的一個標準方法。
2
行為學派
20

1 認知心理學
• 以心智處理來思考與推理的模式 (判斷、評價和思考過程)。
• 思考與推理在人類大腦中的運作便像電腦軟體在電腦裏運作相似。
• 談到輸入、表徵、計算或處理，以及輸出等概念認知心理學理論
• 從認知學、社會學和文化的角度，情緒不僅僅來自生理反應，還受到
信息處理過程、社會交流方式和文化背景影響。
Cognitive Process
Cognitive Theory
21

• James-Lange
• 即使沒有大腦皮質參與，人也可以產生情緒（即沒有自主意識、
沒有認知的情況下）。
• 生理變化伴隨著情緒產生，調節制約人們對情緒的感受，但是並
不直接造成情緒。情緒也可以反過來導致生理變化，並產生包括
戰鬥、逃跑、撫育在內的適應行為。
• 神經解剖學
• 哺乳動物大腦中有三個獨立的神經迴路，分別控制三種情緒反應
• 憤怒、恐懼、悲傷、厭惡四種情緒各自有獨特的自主神經系統反
應。
22

認知學視角
阿諾德（Arnold）與拉扎勒斯（Lazarus）的認知-評價理論 (Appraisal)
情緒來自正在進行著的環境中好的或不好的信息的生理心理反應，它依賴於
短時或持續的評價。在發生之前，人要對刺激進行解釋和評估。如果一個人
對刺激做出肯定的評價，他就會接近它；否則，就會躲避它。
湯姆金斯（Tomkins）和伊扎德（Izard）的動機-分化理論
主張情緒具有動機的性質。該理論以情緒為核心，以人格結構為基礎，論述情
緒的性質與功能。認為情緒是人格系統的核心動力。情緒特徵來源於個體的生
理結構，遺傳是某種情緒的特徵和強度水平的決定因素。認知是情緒產生的重
要因素，但認知不等同於情緒，也不是其產生的唯一原因，只是參與情緒激活
與調節的過程。
23

1
個人需求/動機
生理
心理
精神
2
個人理念以及心態
4
感覺產生
1
外在刺激
*轉變
*要求
*衝突
3
事物理解
6
行
為
產
生1
個人背景
*社會文化
*經驗
1
個人狀態
記憶力
專注力
判斷力
5
情緒發生
24

1
個人需求/動機
生理
心理
精神
2
4
感覺產生
1
外在刺激
*轉變
*要求
*衝突
3
事物理解
6
行
為
產
生1
個人背景
*社會文化
*經驗
1
個人狀態
記憶力
專注力
判斷力
5
情緒發生
人隱藏狀態
25

1
個人需求/動機
生理
心理
精神
2
4
感覺產生
1
外在刺激
*轉變
*要求
*衝突
3
事物理解
6
行
為
產
生1
個人背景
*社會文化
*經驗
1
個人狀態
記憶力
專注力
判斷力
5
情緒發生
這可以是個INFERENCE的問題嗎?
26

情緒是共同語言嗎?
可以TAG上共同標籤?
27

Charles Darwin believed that emotions were beneficial for evolution because emotions
improved chances of survival. For example, the brain uses emotion to keep us away from a
dangerous animal (fear), away from rotting food and fecal matter (disgust), in control of
our resources (anger), and in pursuit of a good meal or a good mate (pleasure and lust).
Damasio, Antonio R. Looking for Spinoza: Joy, Sorrow, and the Feeling Brain. New York NY: Harcourt, Inc., 2003.
達爾文
28

研究情緒和面部表情的先驅
二十世紀最傑出的100位心理學家
Paul Ekman 也在想這個問題
29

研究西方人和新幾內亞原始部落居民的面部表
情，他要求受訪者辨認各種面部表情的圖片，
並且要用面部表情來傳達自己所認定的情緒狀
態，結果他發現某些基本情緒（快樂、悲傷、
憤怒、厭惡、驚訝和恐懼）的表達在兩種文化
中都很雷同
30

Are There Universal Facial Expressions? Just guess
31
近期不同實驗、不同結果

• 對臉部肌肉群的運動及其對表情
的控製作用做了深入研究
• 開發了面部動作編碼系統
（Facial Action Coding
System，FACS）
• 根據人臉的解剖學特點，將其劃
分成若干既相互獨立又相互聯繫
的運動單元（AU）
• 分析了這些運動單元的運動特征
及其所控制的主要區域以及與之
相關的表情
這個後面我們還會在帶到
32

社會文化
一個人所處的社會環境改變，他的情緒構成也會發生改變。和
中國嬰兒相比，美國嬰兒的情緒反應更強烈，更具有表現力。
這也許是因為兩種文化中，成年人對情緒的表達就不一樣。
研究人員試圖找出中國人和美國人對「基本情緒」的認知有什
麼差異。結果顯示，中美文化中的人對於喜悅、憤怒、悲傷、
恐懼的認知一樣。但是中國人把「愛」看做悲傷的情緒，並且
中國人認為「羞惡之心」也是一種基本情緒。於是美國人的基
本情緒中有兩個正面的（喜悅、愛）和三個負面的（憤怒、悲
傷、恐懼）；中國人的基本情緒中有一個正面的（喜悅）和五
個負面的（愛、憤怒、悲傷、恐懼、羞恥）。
Mascolo, M. F., Fischer, K. W., & Li,J. (2003). Dynamic development of component system of emotions: Pride, shame, and guilt in China and the Uni
States. In R. J. Davidson, K. R. Scherer, & H. H. Goldsmith (Eds.), Handbook of affective sciences (pp. 375-408). New York: Oxford University Press.
Shaver, P. R., Wu, S., & Schwartz, J. C. (1992). Cross-cultural similarities and differences in emotion and its representation: A prototype approach. In
33

情緒
1
2
3
4
刺激
需求
狀態
背景
2
個人需求/動機
生理
心理
精神
1
外在刺激
*轉變
*要求
*衝突
4
個人背景
*社會文化
*經驗
3
個人狀態
記憶力
專注力
判斷力
2
1 4
西方與東方
對情緒的解
釋不同
3
對當下事
件的判斷
影響認知
精神狀況
心理狀況
情況的認
真不同給
的刺激不
同
34

所以呢?
35
有label 可以下嗎?

“there is no limit to the number of possible
different emotions “
William James
36

Silvan Tomkins (1962) concluded that there are eight basic
emotions:
• surprise, interest, joy, rage, fear, disgust, shame, and anguish
Carroll Izard (the University of Delaware 1993)
• 12 discrete emotions labeled:
• Interest, Joy, Surprise, Sadness, Anger, Disgust, Contempt,
Self-Hostility, Fear, Shame, Shyness, and Guilt
• Differential Emotions Scale or DES-IV
37

Ekman 在1972年提出的基本情感 (現今流行)
• 憤怒
• 厭惡
• 恐懼
• 快樂
• 悲傷
• 驚訝
38

Dimensional models of emotion
Define emotions according to one or more dimensions
• Wilhelm Max Wundt(1897)
• three dimensions: "pleasurable versus unpleasurable",
"arousing or subduing" and "strain or relaxation”
• Harold Schlosberg (1954)
• three dimensions of emotion: "pleasantness–
unpleasantness", "attention–rejection" and "level of
activation”
• Prevalent
• incorporate valence and arousal dimensions
39

比較知名幾個模型
• Circumplex model
• Vector model
• Positive activation – negative activation (PANA) model
• Plutchik's model
• PAD emotional state model
• Lövheim cube of emotion
• Cowen & Keltner 2017
40

Circumplex model : Perceptual
• developed by James Russell (1980)
• two-dimensional circular space, containing arousal
and valence dimensions
• arousal represents the vertical axis and valence
represents the horizontal axis
• prevalent use as ‘labels’
41

Positive activation – Negative activation
(PANA) Self Report
• created by Watson and Tellegen in 1985
• suggests that positive affect and negative affect are
two separate systems (responsible for different
functions)
• states of higher arousal tend to be defined by their
valence
• states of lower arousal tend to be more neutral in
terms of valence
• the vertical axis represents low to high positive affect
• the horizontal axis represents low to high negative
affect.
• the dimensions of valence and arousal lay at a 45-
degree rotation over these axes
42

Cowen & Keltner
• 2017, University of California, Berkeley researchers
Alan S. Cowen & Dacher Keltner (PNAS)
• 27 distinct emotions
• http://news.berkeley.edu/2017/09/06/27-
emotions/
• (A.) Admiration. (B.) Adoration. (C.) Aesthetic appreciation. (D.)
Amusement. (E.) Anger. (F.) Anxiety. (G.) Awe. (H.) Awkwardness. (I.)
Boredom. (J.) Calmness. (K.) Confusion. (L.) Craving. (M.) Disgust. (N.)
Empathic pain. (O.) Entrancement. (P.) Excitement. (Q.) Fear. (R.) Horror.
(S.) Interest. (T.) Joy. (U.) Nostalgia. (V.) Relief. (W.) Romance. (X.) Sadness
(Y.) Satisfaction (Z.) Sexual desire. (Ω.) Surprise.
44

在想一下:
Affective Computing
Many Theories
Many Models/Annotations
Take Away? 比較Stable的說法
45

1
個人需求/動機
生理
心理
精神
2
4
感覺產生
1
外在刺激
*轉變
*要求
*衝突
3
事物理解
6
行
為
產
生1
個人背景
*社會文化
*經驗
1
個人狀態
記憶力
專注力
判斷力
5
情緒發生
這可以是個 (Data-driven AI Learning and Inference) 的問題嗎?
46

歷史、理論、可行性
現今商業上呢?
47

Affective Computing 情感運算
reference: https://www.gartner.com/newsroom/id/3412017/
fast growing, but still not a mature technique
48

Face
Affective
Computing
一些公司的例子
SpeechBody GestureMulti-Modal PhysiologyLanguage
reference: http://blog.ventureradar.com/2016/09/21/15-leading-affective-computing-companies-you-should-know/
49

Education Health Care Gaming Advertisement Retail Legal
Emotion Recognition AS Part of Larger System
API, SDK
50

Little Dragon
(Affectiva- Education)
“make learning more enjoyable and more effective, by
providing an educational tool that is both universal and
personalized”
reference:
https://www.affectiva.com/success-story/
https://www.youtube.com/watch?v=SmjAa8iMkjU 52

Nevermind
(Affectiva- Gaming)
bio-feedback horror game
“sense a player’s facial expressions for signs of
emotional distress, and adapt game play accordingly”
reference: https://www.affectiva.com/success-story/c
https://www.youtube.com/watch?v=NGr0orAqRH4&t=497s 54

Brain Power
(Affectiva- Health Care)
The World’s First Augmented Reality Smart-Glass-System
to empower children and adults with autism to teach
themselves crucial social and cognitive skills.
reference: https://www.affectiva.com/success-story/
https://www.youtube.com/watch?v=qfoTprgWyns
55

MediaRebel
(Affectiva- Legal)
• Legal video deposition management platform MediaRebel uses Affectiva’s
Emotion SDK for facial expression analysis and emotion recognition.
• Intelligent analytical features include:
• Search transcript based upon witness emotions
• Instantly playback testimony based upon select emotions
• Identify positive, negative & neutral witness behavior
reference:
https://www.affectiva.com/success-story/
https://www.mediarebel.com/
57

shelfPoint
(Affectiva- Retail)
• Cloverleaf is a retail technology company for the
modern brick-and-mortar marketer and merchandise
• shelfPoint solution: brands and retailers can now
capture customer engagement and sentiment data at
the moment of purchase decision — something
previously unavailable in physical retail stores.
reference: https://www.affectiva.com/success-story/
https://www.youtube.com/watch?v=S9gDqpF6kLs
https://www.youtube.com/watch?v=W6UnahO_zXs
58

在想一下:
60

1
個人需求/動機
生理
心理
精神
2
4
感覺產生
1
外在刺激
*轉變
*要求
*衝突
3
事物理解
6
行
為
產
生1
個人背景
*社會文化
*經驗
1
個人狀態
記憶力
專注力
判斷力
5
情緒發生
這可以是個 Data-driven AI Learning and Inference 的問題嗎?
61

變成一個機器學習（人工智慧）問題時
資料哪裡來?
怎麼收?
62

公開資料庫
63
Year Database Language Setting Protocol Elicitation
1997 DES Dan. Single Scr. Induced
2000 GEMEP Fre. Single Scr. & Spo. Acted
2005 eNTERFACE' 05 Eng. Single Scr. Induced
2007 HUMAINE Eng. TV Talk Scr. & Spo. Mix.
2008 VAM Ger. TV Talk Spo. Acted
2008 IEMOCAP Eng. Dyadic Scr. & Spo. Acted
2009 SAVEE Eng. Single Spo. Acted
2010 CIT Eng. Dyadic Scr. & Spo. Acted
2010 SEMAINE Eng. Dyadic Scr. Mix.
2013 RECOLA Fre. Dyadic Spo. Acted
2016 CHEAVD Chi. TV talk Spo. Posed
2017 NNIME Chi. Dyadic Spo. Acted
另一個重要點：怎麼評分？

Language: Danish
Participants: 4 (Man: 2; Female: 2)
Recordings:
• Audio
Total: 0.5 hours
Sentences: 5200 utterances
Labels:
• Perspectives: Naïve-Observer
• Rater: 20
• Discrete session-level annotation
• Categorical (5)
DES:
DESIGN, RECORDING AND VERIFICATION OF A DANISH
EMOTIONAL SPEECH DATABASE
64
Engberg, Inger S., et al. "Design, recordingand verification of a Danish emotional speech database."Fifth European Conferenceon Speech
Communication and Technology.1997.
Available: Tom Brøndsted (tom@brondsted.dk)

DES
• Loss-Scaled Large-Margin
Gaussian Mixture Models for
Speech Emotion Classification1
(Cat.:0.676)
• Automatic emotional speech
classification2
(Cat.:0.516)
65
1Yun, Sungrack, and Chang D. Yoo. "Loss-scaled large-margin Gaussian mixture models for speech emotion classification."IEEE Transactions on Audio,
Speech, and Language Processing20.2 (2012): 585-598.
2Ververidis, Dimitrios, Constantine Kotropoulos, and Ioannis Pitas. "Automatic emotional speech classification." Acoustics, Speech, and Signal Processing,
2004. Proceedings.(ICASSP'04). IEEE International Conference on. Vol. 1. IEEE, 2004..

Language: French
Recordings:
• Dual-channel Audio
• HD Video
• Manual Transcript
• Face & Head
• Body Posture & Gestures
Sentences: 7300 sequences
Labels:
• Categorical (18)
GEMEP:
Geneva Multimodal Emotion Portrayals corpus
66
Bänziger, Tanja, Hannes Pirker, and K. Scherer. "GEMEP-GEnevaMultimodal Emotion Portrayals:A corpus for the study of multimodal emotional expressions."
Proceedings of LREC. Vol. 6. 2006.
Bänziger, Tanja, and Klaus R. Scherer. "Using actor portrayals to systematicallystudy multimodalemotion expression: The GEMEPcorpus." International
conference on affective computing and intelligentinteraction. Springer, Berlin, Heidelberg, 2007.
Available: Tanja Bänziger(Tanja.Banziger@ pse.unige.ch)

GEMEP
• Multimodal emotion recognition from expressive
faces, body gestures and speech
(Cat.: 0.571)
67
Kessous, Loic, Ginevra Castellano, and George Caridakis. "Multimodal emotion recognition in speech-based interaction using
facial expression, body gesture and acoustic analysis." Journal on MultimodalUser Interfaces 3.1 (2010): 33-48.

Language: English
(14 different nationalities)
Recordings:
• HD Video
• Script
Total: 1166 video sequences
Emotion-related atmosphere:
• To express six emotions
eNTERFACE' 05:
The eNTERFACE’05 Audio-Visual Emotion Database
68
Martin, Olivier, et al. "The enterface’05 audio-visual emotion database."Data Engineering Workshops,2006. Proceedings.22nd International
Conferenceon. IEEE, 2006.
Available: O. Martin (martin@tele.ucl.ac.be)

eNTERFACE' 05
• Sparse autoencoder-
based feature transfer
learning for speech
emotion recognition1
(Cat.: 59.1)
• Unsupervised learning
in cross-corpus
acoustic emotion
recognition2
(Val./Act.:0.574/0.616)
69
1Deng, Jun, et al. "Sparse autoencoder-basedfeature transfer learning for speech emotion recognition." Affective Computing and IntelligentInteraction (ACII), 2013
Humaine Association Conference on. IEEE, 2013.
2Zhang, Zixing, et al. "Unsupervised learning in cross-corpus acoustic emotion recognition." Automatic Speech Recognition and Understanding(ASRU), 2011 IEEE
Workshop on. IEEE, 2011.

Language: English
Participants: Many (Include 8 datasets)
Recordings :
(Naturalistic (TV shows, interviews)/Induced data)
• Audio
• Video
• Gesture
• Emotion words
Labels:
• Rater: 4
• Continuous-in-time annotation
• Dimensional (8) [Intensity, Activation, Valence, Power, Expect, Word]
• Discrete annotation (5)
• Emotion-related states
• Key Event
• Everyday Emotion words…
HUMAINE:
Addressing the Collection and Annotation of
Naturalistic and Induced Emotional Data
70
Douglas-Cowie, Ellen, et al. "The HUMAINE database: addressingthe collection and annotation of naturalistic and induced emotional data." Affective
computingand intelligent interaction (2007): 488-500.
Available: E.Douglas-Cowie@qub.ac.uk

HUMAINE
• A Multimodal Database
for Affect Recognition
and Implicit Tagging1
(Val./Act.:0.761/0.677)
• Abandoning Emotion
Classes - Towards
Continuous Emotion
Recognition with
Modelling of Long-
Range Dependencies2
(Val./Act.[MSE]:0.18/0.08)
71
1Soleymani, Mohammad, et al. "A multimodaldatabase for affect recognition and implicittagging." IEEE Transactions on
Affective Computing 3.1 (2012): 42-55.
2Wöllmer, Martin, et al. "Abandoning emotion classes-towards continuous emotion recognitionwith modellingof long-range
dependencies."Ninth Annual Conference of the International Speech CommunicationAssociation. 2008.

Language: German TV shows
Participants: 47
Recordings:
• Audio
• Video
• Face
Total: 12 hours
Labels:
• Perspectives: Peer, Director, Self, Naïve-Observer
• Rater: 17
• Dimensional (Valence-Activation-Dominance) for Audio
• Categorical (7) for Faces
VAM:
The Vera am Mittag German Audio-Visual
Spontaneous Speech Database
72
Grimm, Michael, Kristian Kroschel,and Shrikanth Narayanan."The Vera am Mittag German audio-visual emotional speech database."Multimedia and
Expo, 2008 IEEE International Conferenceon. IEEE, 2008.
Available: Michael.Grimm@ieee.org

VAM
• Towards robust spontaneous
speech recognition with
emotional speech adapted
acoustic models1
(Word ACC.: 42.75)
• Selecting training data for
cross-corpus speech emotion
recognition: Prototypicality vs.
generalization Speech Adapted
Acoustic Models2
• (Val./Act.: 0.502/0.677)
73
1Vlasenko, Bogdan, Dmytro Prylipko, and Andreas Wendemuth. "Towards robust spontaneous speech recognition with emotional speech adapted acoustic
models." Poster and Demo Track of the 35th German Conference on ArtificialIntelligence, KI-2012, Saarbrucken, Germany. 2012.
2Schuller, Björn, et al. "Selecting training data for cross-corpus speech emotion recognition: Prototypicalityvs. generalization."Proc. 2011 Afeka-AVIOS
Speech Processing Conference, Tel Aviv, Israel. 2011.

Language: English
Recordings:
• HD Video
• 53 Marker Motion (Face and Head)
Total: 12 hours, 50 sessions (3 min/session)
Sentence: 6904 sentences
Labels:
• Perspectives: Naïve-Observer、Self (6/10)
• Rater: 6
• Dimensional (Valence-Activation-Dominance)
• Categorical (5)
IEMOCAP:
The Interactive Emotional Dyadic Motion Capture
database
74
Busso, Carlos, et al. "IEMOCAP: Interactiveemotional dyadic motion capturedatabase." Languageresourcesand evaluation 42.4 (2008): 335.
Available: Anil Ramakrishna (akramakr@usc.edu)

IEMOCAP
• Tracking continuous
emotional trends of
participants during
affective dyadic
interactions using body
language and speech
information1
(Val./Act./Dom.:0.619/0.637
/0.62)
• Modeling mutual influence
of interlocutor emotion
states in dyadic spoken
interactions2
(Cat./Val./Act.:0.552/0.634/0
.650)
75
1Metallinou, Angeliki, Athanasios Katsamanis, and Shrikanth Narayanan. "Tracking continuous emotional trends of participants during affective dyadic interactionsusing body
language and speech information."Image and Vision Computing 31.2 (2013): 137-152.
2Lee, Chi-Chun, et al. "Modeling mutual influenceof interlocutoremotion states in dyadic spoken interactions."Tenth Annual Conference of the International Speech
CommunicationAssociation. 2009.

Language: English
Participants: 4 (Man: 4)
Recordings:
• Video
• Face Maker
Labels:
• Categorical (6)
SAVEE:
Surrey Audio-Visual Expressed Emotion database
76
Jackson, P., and S. Haq. "Surrey Audio-Visual Expressed Emotion(SAVEE) Database." University of Surrey: Guildford, UK (2014).
Available: P Jackson (p.jackson@surrey.ac.uk)

SAVEE
77
.
2S. Haq and P.J.B. Jackson. "Speaker-DependentAudio-VisualEmotion Recognition", In Proc. Int'l Conf. on Auditory-Visual Speech Processing, pages
53-58, 2009.
3S. Haq, P.J.B. Jackson, and J.D. Edge. Audio-VisualFeature Selection and Reduction for Emotion Classification. In Proc. Int'l Conf. on Auditory-Visual
Speech Processing, pages 185-190, 2008
• Speaker-Dependent Audio-
Visual Emotion Recognition1
(Cat.: 97.5)
• Audio-Visual Feature
Selection and Reduction for
Emotion Classification3
(Cat.: 96.7)

Language: English
Recordings:
• HD Video
• Transcript
• Body gesture
Total: 48 dyadic sessions
Sentences: 2162 sentence
Labels:
• Rater: 3
• Dimensional (Valence-Activation-Dominance)
CIT:
The USC CreativeIT database of multimodal dyadic
interactions: from speech and full body motion capture
to continuous emotional annotations
78
Metallinou, Angeliki, et al. "The USC CreativeIT database: A multimodal database of theatrical improvisation." Multimodal Corpora: Advances in Capturing, Coding and
Analyzing Multimodality (2010): 55.
Metallinou, Angeliki, et al. "The USC CreativeIT database of multimodaldyadic interactions: From speech and full body motion capture to continuous emotional
annotations." Languageresources and evaluation50.3 (2016): 497-521.
Available: Manoj Kumar (prabakar@usc.edu)

CIT
79
1Yang, Zhaojun, and Shrikanth S. Narayanan. "Modelingdynamics of expressive body gestures in dyadic interactions."IEEE Transactions on Affective
Computing 8.3 (2017): 369-381.
2Yang, Zhaojun, and Shrikanth S. Narayanan. "AnalyzingTemporal Dynamics of Dyadic Synchrony in Affective Interactions." INTERSPEECH. 2016.
3Chang, Chun-Min, and Chi-Chun Lee. "Fusion of multiple emotion perspectives: Improvingaffect recognitionthrough integrating cross-lingualemotion
information." Acoustics, Speech and Signal Processing (ICASSP), 2017 IEEE International Conference on. IEEE, 2017.
• Analyzing Temporal Dynamics
of Dyadic Synchrony in
Affective Interactions2

Language: English
Participants: 150
Recordings:
• HD Video
Multi-Interaction (like TV talk show):
• Human vs. Human
• Semi-human vs. Human
• Machine vs. Human
Total: 959 dyadic sessions (3 min/session)
Labels:
• Rater: 8
• Dimensional (Valence-Activation)
• Discrete Categorical (27)
SEMAINE:
The SEMAINE Database: Annotated Multimodal Records of Emotionally
Colored Conversations between a Person and a Limited Agent
80
McKeown, Gary, et al. "The semaine database: Annotatedmultimodal recordsof emotionally colored conversationsbetween a person and a limited
agent." IEEE Transactionson Affective Computing 3.1 (2012): 5-17.
Available: eula@semaine-db.eu

SEMAINE
• Building autonomous
sensitive artificial
listeners1
• A Dynamic Appearance
Descriptor Approach to
Facial Actions Temporal
Modeling2
(0.701)
81
1Schroder, Marc, et al. "Buildingautonomous sensitive artificiallisteners." IEEE
Transactions on Affective Computing 3.2 (2012): 165-183.
2Jiang, Bihan, et al. "A dynamic appearance descriptor approach to facial actions
temporal modeling."IEEE transactions on cybernetics 44.2 (2014): 161-174.

Language: French
Recordings:
• HD Video (15 facial action units)
• Electrocardiogram
• Electrothermal activity
Total: 11 hours, 102 dyadic sessions (3 min/session)
Sentence: 1306 sentence
Labels:
• Perspectives: Self, Naïve-Observer
• Rater: 6
RECOLA:
Remote Collaborative and Affective Interactions
82
Ringeval, Fabien, et al. "Introducing the RECOLA multimodal corpusof remote collaborative and affective interactions." Automatic Face and Gesture
Recognition (FG), 2013 10th IEEE International Conference and Workshops on. IEEE, 2013.
Available: Fabien Ringeval (faboem.ringeval@image.fr)

RECOLA
• Prediction of asynchronous
dimensional emotion ratings from
audiovisual and physiological data1
(Val./Act.: 0.804/0.528 )
• End-to-end speech emotion
recognition using a deep
convolutional recurrent network2
(Val./Act.: 0.741/0.325 )
• Face Reading from Speech—
Predicting Facial Action Units from
Audio Cues3
(Predict Facial Action Units from
Audio Cues: 0.650 )
83
1Ringeval, Fabien, et al. "Predictionof asynchronous dimensional emotion ratings from audiovisualand physiologicaldata." Pattern RecognitionLetters 66 (2015): 22-30.
2Trigeorgis, George, et al. "Adieu features? End-to-end speech emotion recognition using a deep convolutionalrecurrent network." Acoustics, Speech and Signal
Processing (ICASSP), 2016 IEEE International Conference on. IEEE, 2016.
3Ringeval, Fabien, et al. "Face Reading from Speech—PredictingFacial Action Units from Audio Cues." Sixteenth Annual Conference of the International Speech
CommunicationAssociation. 2015.

Language: Chinese
Participants: 238
Recordings:
• Audio
• Video
(34 films, 2 TV series, 4 TV shows)
Total: 2.3 hours,
Labels:
• Rater: 4
• Fake/suppressed emotions
• Multi-emotion annotation for some segments
• Categorical (26 non-prototypical)
2017 Multimodal Emotion Recognition Challenge
(MEC 2017: http://www.chineseldc.org/htdocsEn/emotion.html)
CHEAVD:
A Chinese natural emotional audio-visual database
84
Li, Ya, et al. "CHEAVD: a Chinese naturalemotional audio–visual database." Journal of Ambient Intelligence and Humanized Computing 8.6 (2017): 913-
924.
Available: Ya Li (yli@nlpr.ia.ac.cn)

CHEAVD
• MEC 2016: the multimodal emotion recognition
challenge of CCPR 20161 (Cat.: 37.03)
• Chinese Speech Emotion Recognition2 (Cat.: 47.33)
• Transfer Learning of Deep Neural Network for
Speech Emotion Recognition3 (Cat.: 50.01)
85
1Li, Ya, et al. "MEC 2016: the multimodal emotion recognition challenge of CCPR 2016." Chinese Conference on Pattern Recognition. Springer Singapore,
2016.
2Zhang, Shiqing, et al. "Feature Learning via Deep Belief Network for Chinese Speech Emotion Recognition." Chinese Conference on Pattern Recognition.
Springer Singapore, 2016.
3Huang, Ying, et al. "Transfer Learning of Deep Neural Network for Speech Emotion Recognition." Chinese Conference on Pattern Recognition. Springer
Singapore, 2016.

Language: Chinese
Recordings:
• HD Video
• Electrocardiogram
Total: 11 hours, 102 dyadic sessions (3 min/session)
Labels:
• Perspectives: Peer, Director, Self, Naïve-Observer
• Rater: 49
• Categorical (6)
NNIME:
The NTHU-NTUA Chinese Interactive Multimodal
Emotion Corpus
86
Huang-ChengChou, Wei-Cheng Lin, Lien-ChiangChang, Chyi-ChangLi, Hsi-Pin Ma, Chi-Chun Lee "NNIME:
The NTHU-NTUA Chinese InteractiveMultimodal Emotion Corpus" in Proceedings of ACII 2017
Available: Huang-Cheng Chou (hc.chou@gapp.nthu.edu.tw)
Chi-Chun Lee (cclee@ee.nthu.edu.tw)

NNIME
• Cross-Lingual
Emotion
Information1,3(sessio
n)
(Val./Act.: 0.682/0.604)
• Dyad-Level
Interaction2
(Cat.: 0.65)
87
1Chun-Min Chang, Bo-Hao Su, Shih-Chen Lin, Jeng-Lin Li, Chi-Chun Lee*, "A Boostrapped Multi-ViewWeighted Kernel Fusion Framework for Cross-Corpus Integration of
Multimodal Emotion Recognition"in Proceedingsof ACII 2017
2Yun-Shao Lin, Chi-Chun Lee*, "DerivingDyad-Level InteractionRepresentation using InterlocutorsStructural and Expressive Multimodal Behavior Features" in Proceedings
of the InternationalSpeech CommunicationAssociation (Interspeech),pp. 2366-2370, 2017
3Chun-Min Chang, Chi-Chun Lee*, "Fusion of Multiple Emotion Perspectives:Improving Affect RecognitionThrough Integrating Cross-Lingual Emotion Information" in
Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp.5820-5824, 2017

Access These Emotion Database
88
Year Database Website
1997 DES http://kom.aau.dk/~tb/speech/Emotions/
2000 GEMEP https://www.affective-sciences.org/gemep/
2005 eNTERFACE' 05 http://www.enterface.net/enterface05/
2007 HUMAINE http://emotion-research.net/download/pilot-db/
2008 VAM http://emotion-research.net/download/vam
2008 IEMOCAP http://sail.usc.edu/iemocap/
2009 SAVEE http://kahlan.eps.surrey.ac.uk/savee/
2010 CIT http://sail.usc.edu/CreativeIT/ImprovRelease.htm
2010 SEMAINE https://semaine-db.eu/
2013 RECOLA https://diuf.unifr.ch/diva/recola/download.html
2016 CHEAVD Upon request
2017 NNIME http://nnime.ee.nthu.edu.tw/
當然很多公司有自己私人的

Key take-away
多模態資料 (哪些可量測行為)
情緒標籤 (怎麼標、請誰標)
數量 (多少人、多少種、多久)
準確率不一定！！
收集真的很辛苦 : 其他媒介呢 ?（Ｍａｙｂｅ）
89

1
個人需求/動機
生理
心理
精神
2
4
感覺產生
1
外在刺激
*轉變
*要求
*衝突
3
事物理解
6
行
為
產
生1
個人背景
*社會文化
*經驗
1
個人狀態
記憶力
專注力
判斷力
5
情緒發生
90

1
個人需求/動機
生理
心理
精神
2
4
感覺產生
1
外在刺激
*轉變
*要求
*衝突
3
事物理解
6
行
為
產
生1
個人背景
*社會文化
*經驗
1
個人狀態
記憶力
專注力
判斷力
5
情緒發生
91

Speech
Text
Gesture
Face
人外顯行為
Human
Expression
意圖與目的
交流互動
傳達理念
附加情緒感受情緒
萃取情緒
92

語音與文字之情感計算
Paralinguistic Expression
Linguistic Expression
93

正向情緒
•Achievement 88.4%
•Amusement 90.4%
•Contentment 52.4%
•Pleasure 61.6%
•Relief 83.9%
分類情緒易引發情緒之語音組
•篩選自前一實驗的樣本
•結果平均落在73%-94%
•Amusement及disgust容易區分
•Pleasure和sadness難辨別
•部分的非語言性語音容易混淆
加入”以上皆非”選項
•與分類情緒比較
•Sadness及relief上升24%及
17.5%
•Amusement下降12%
•平均上升7.9%
聽見情緒? 人?
非語言性語音
Achievement,
Amusement, Anger,
Contentment, Disgust,
Pleasure, Relief,
Sadness, Surprise
平均: 69.9%
Sauter, Disa. An investigation into vocal expressions of emotions: the roles of valence, culture, and acoustic factors. University of London, University College London (United Kingdom), 2007.
編碼解碼
94

Laugh
Cry
Sigh
Whisper
Whine
副語言語音情緒
Laukka, Petri, et al. "Cross-cultural decoding of positive and negative non-linguistic emotion vocalizations." Frontiers in Psychology 4 (2013).
Gupta, Rahul, et al. "Detecting paralinguistic events in audio stream using context in features and probabilistic decisions." Computer Speech & Language 36 (2016): 72-92.
Laughter & Fillers
2015
IS2013 sub-challenge
AUC for Detection
Laughter : 95.3 %
Fillers : 90.4 %
Cross-Culture
2013
Universal Emotion
Non-Verbal Signals
Speak : India, USA,
Kenya, Singapore,
Listen : Sweden
95

那機器怎麼從語音辨識?
Sahu, Saurabh & Gupta, Rahul & Sivaraman, Ganesh & AbdAlmageed, Wael & Espy-Wilson, Carol. (2017). Adversarial Auto-Encoders for Speech Based Emotion Recognition. 1243-1247.
10.21437/Interspeech.2017-1421.
Rao, K. Sreenivasa, Shashidhar G. Koolagudi, and Ramu Reddy Vempada. "Emotion recognition from speech using global and local prosodic features." International journal of
speech technology 16.2 (2013): 143-160.
Lalitha, S., et al. "Emotion detection using MFCC and Cepstrum features." Procedia Computer Science 70 (2015): 29-35.
Huang, Che-Wei, and Shrikanth Narayanan. "Characterizing Types of Convolution in Deep Convolutional Recurrent Neural Networks for Robust Speech Emotion Recognition." arXiv preprint
arXiv:1706.02901 (2017).
Lee, Jinkyu, and Ivan Tashev. "High-level feature representation using recurrent neural network for speech emotion recognition." INTERSPEECH. 2015.
Emo-DB
Prosodic
SVM
62.43%
MFCC
ANN
85.7%
Deep Convolution
High-Level
Representation
(time series)
96

語音情緒中的特徵
Dimosa, Kostis, Leopold Dickb, and Volker Dellwoc. "Perception of levels of emotion in speech prosody." The Scottish Consortium for ICPhS (2015).
Erickson, Donna. "Expressive speech: Production, perception and application to speech synthesis." Acoustical Science and Technology 26.4 (2005): 317-325.
Sauter, Disa. An investigation into vocal expressions of emotions: the roles of valence, culture, and acoustic factors. University of London, University College London (United Kingdom), 2007.
Erickson, Donna. "Expressive speech: Production, perception and application to speech synthesis." Acoustical Science and Technology 26.4 (2005): 317-325.
“emotional prosody does not function categorically, distinguishing
only different emotions, but also indicates different degrees of the
expressed emotion.”
pitch and pitch variation is especially important for people to recognize
emotion from non-verbal sounds
voice quality  tension
Some Experiments : change the sound (remove pitch, noisy channel, …)
語音描述性特徵 (descriptors) 與情緒息息相關
A Review : Research Findings of Acoustic and Perceptual Studies
97

語音情緒辨識流程
Flow chart 前處理原始訊號特徵擷取辨識器
Learning Representation Discriminative Model
98

特徵擷取 (Low-level Descriptors)
Low Level Descriptors (10 – 15 ms)
Mel Frequency Cepstral Coefficients
Pitch
Signal Energy
Loudness
Voice Quality (Jitter, Shimmer)
Log Filterbank Energies
Linear Prediction Cepstral Coefficients
CHROMA and CENS Features (Music)
Compute
原始訊號
Statistics Method
Continuous Qualitative Spectral
Pitch
Energy
Formants
Voice quality :
Harsh, tense,
breathy
LPC
MFCC
LFPC
99

例子: 用音高辨別情緒
Arias, Juan Pablo, Carlos Busso,and NestorBecerra Yoma. "Shape-based modeling of the fundamental frequency contour for emotion detection in speech." ComputerSpeech &
Language28.1(2014): 278-294.
emotionally salient temporal segments
• 75.8% in binary emotion classification
• Dot, dash : subjective, dev. of sujective
• Solid : objective
100

更多特徵計算—語音的產生
Source Filter
(ex) High arousal
Physically
Vocal Production System
• Respiration
• Vocal Fold Vibration
• Articulation
increase tension in laryngeal
musculature 喉肌肉組織
raised subglottis pressure
change production of sound at glottis
vocal quality
Johnstone, Tom & Scherer, Klaus. (2000). Vocal communication of Emotion. Handbook of Emotions,. .
101

更多特徵計算—語音的感知
Mel-scale Filter Bank
The response of the basilar membrane
as a function of frequency, measured at
six different distances from the stapes
The psychoacoustical transfer function
Stern, Richard M., and Nelson Morgan. "Features based on auditory physiology and perception." Techniques for Noise Robustness in Automatic Speech Recognition (2012): 193227.
102

Support Vector
Machine (SVM)
Convolutional
Neural Network
Hidden Markov
Model (HMM)
Recurrent Neural
Network
情緒辨識模型
Time series Model
103

語音情緒深度學習
End to End – From LLD to Deep Learning
深度學習模型擷取更全面的語音特徵資訊
Z. Aldeneh and E. M. Provost, "Using regional saliency for speech emotion recognition," 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, LA, 2017, pp.
2741-2745.doi: 10.1109/ICASSP.2017.7952655
C. W. Huang and S. S. Narayanan, "Deep convolutional recurrent neural network with attention mechanism for robust speech emotion recognition," 2017 IEEE International Conference on Multimedia and
Expo (ICME), Hong Kong, 2017, pp. 583-588. doi: 10.1109/ICME.2017.8019296
signal Neural Network emotion
CNN for Time Series Signal
Attention
104

語音特徵擷取工具
跨平台
YAAFE, an Easy to Use and EfficientAudio Feature Extraction Software, B.Mathieu,S.Essid, T.Fillon,J.Prado, G.Richard,proceedingsof the 11th ISMIR conference, Utrecht,Netherlands,2010.
Florian Eyben, Felix Weninger, Florian Gross, Björn Schuller: “RecentDevelopments in openSMILE,the Munich Open-Source Multimedia Feature Extractor”, In Proc. ACM Multimedia (MM),
Barcelona,Spain, ACM, ISBN 978-1-4503-2404-5,pp. 835-838,October 2013. doi:10.1145/2502081.2502224
Paul Boersma & David Weenink (2013): Praat: doing phonetics by computer [Computer program].
105

語音與文字之情感計算
Paralinguistic Expression
Linguistic Expression
106

情緒認知語言
讀到語言內的情緒?
Schwarz-Friesel, Monika. "Language and emotion." The Cognitive Linguistic Perspective, in: Ulrike Lüdtke (Hg.), Emotion in Language. Theory–Research–Application,
Amsterdam (2015): 157-173.
轉換關係
情感
判斷
Lexicon
Grammar
Ideational Meaning
具象化
語言是一種幫助人類獲知情緒的管道
Lindquist, Kristen A., Jennifer K. MacCormack, and Holly Shablack. "The role of language in emotion: predictions from psychological constructionism." Frontiers in psychology 6 (2015).
107

辨識語意情緒
Human Behavior Evaluation
•Cuple’s Therapy
•Oral Presentation
Reviews
•Hotels  HBRNN
•Amazon  Cross-Lingual
•Movie (93%), Book (92%), DVD (93%), …  PNN + RBM
Tweets
•Positive & Negative
•DCNN & LSTM
巨量資料缺乏標註結構複雜
Ain, Qurat Tul, et al. "Sentiment analysis using deep learning techniques: a review." Int J Adv Comput Sci Appl 8.6 (2017): 424
108

Review Article Social Media Talk
語意分析, 文字帶有甚麼?
It’s terrible!
What Texts Tell Us (Topics) Emotional Polarity
It’s cool!
Parts of Speech (POS) tagsN-Gram
https://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html
VB
VBD
NN
NNS
JJ
JJR
JJS
IN
TO
109

Dictionary-Based Sentiment Analysis
關鍵字
情緒字典
對應關係
句子
110

建立情緒字典
Changqin Quan and Fuji Ren. 2009. Construction of a blog emotion corpus for Chinese emotional expression analysis. In Proceedings of the 2009 Conference on Empirical Methods in
Natural Language Processing: Volume 3 - Volume 3 (EMNLP '09), Vol. 3. Association for Computational Linguistics, Stroudsburg, PA, USA, 1446-1454.
Mohammad, Saif M., and Peter D. Turney. "Crowdsourcing a word–emotion association lexicon." Computational Intelligence29.3 (2013): 436-465.
Pennebaker, James W., et al. The development and psychometric properties of LIWC2015. 2015.
假設 :
關鍵字富含先天的正
負面向，句意的正負
面來自這些關鍵字的
使用
LIWC (Linguistic Inquiry Word Count)
分類字典 : 語意種類(不只情緒)
64個類別，約4500字
情緒正面/負面 : 406/499
Seed WordGold Standard : 關鍵字
1. 由標記者選出具有
情緒的字眼
2. 半自動化利用文字
結構及相關性找出
其他情緒關鍵字
形容詞  副詞、名詞、動詞
以選擇題的方式非固
定標記者標註，以不
同問題設計驗證，如
哪個字像這個字，這
個字跟快樂有無關聯
群眾智慧
111

Data-driven方法呢?
Sentiment Analysis (Unsupervised)
Data-Driven
Latent Structure  (representation)  recognition
112

Sentiment Analysis (Supervised)
標記者間的標記相關度 = 0.76
各情緒約落在 0.6 到 0.79
使用關鍵字辨識 = 0.66
自動化情緒辨識 = 0.73 (Naïve Bayes, SVM)
Aman, Saima, and Stan Szpakowicz. "Identifying expressions of emotion in text." Text, speech and dialogue. Springer Berlin/Heidelberg, 2007.
Feature Representation
Classifier
Emotion Label
113

近年深度學習方式Deep Model
Lopez, Marc Moreno, and Jugal Kalita. "Deep Learning applied to NLP." arXiv preprint arXiv:1703.03091 (2017).
.
Embed Embed Embed
I
LSTM LSTM LSTM
love it
positive
114

Automatic Speech Recognition (ASR)
語音辨識
f( ) =
Speech Text
Challenging Task
speaker gender
Mapping / Translation
116
把聲音拆成小小的
phoneme 在組起來
辨識

整合語音與文字
Aldeneh, Zakaria & Khorram, Soheil & Dimitriadis, Dimitrios & Mower Provost, Emily. (2017). Pooling acoustic and lexical features for the prediction of valence. 68-72. 10.1145/3136755.3136760.
AffectNatural
Language
Non-Verbal
Speech
Bio-
Information
Image
…
Pooling Intermediate Representation
Performance
Robustness
117

• 對臉部肌肉群的運動及其對表情
的控製作用做了深入研究
• 開發了面部動作編碼系統
（Facial Action Coding
System，FACS）
• 根據人臉的解剖學特點，將其劃
分成若干既相互獨立又相互聯繫
的運動單元（AU）
• 分析了這些運動單元的運動特征
及其所控制的主要區域以及與之
相關的表情
這個後面我們這邊帶到
118
臉部表情

Facial Action Coding System (FACS)
面部動作編碼系統

FACS
The tool for annotating facial expressions
What The Face Reveals is strong evidence for the
fruitfulness of the systematic analysis of facial expression
Paul Ekman and Wallace V. Friesen 1976

Action Unit (AUs) 動作單元
• AUs are considered to be the smallest visually
discernible facial movement
• As AUs are independent of any interpretation,
they can be used as the basis for recognition of
basic emotions
• It’s an explicit means of describing all possible
movements of face in 46 action points

Action Unit (AUs)
• FACS is a tool for measuring facial expressions
• Each observable component of facial moment is called
an AUs
• All facial expressions can be broken down into their
constituent AU’s
AU Description Example AU Description Example
1 Inner Brow Raiser 12 Lip Corner Puller
4 Brow Lowerer 13 Cheek Puffer
7 Lid Tightener 20 Lip stretcher

AU framework
臉部動作單位基礎架構

Facial
Expressions
of Emotion
(e.g., happy,
fear, disgust,
surprise, etc)
Automatic face &
Facial feature detection
Face alignment
Multiple image
windows at a variety
of Locations and scales
Feature extraction:
Facilitate subsequentlearning
and generalization,leadingto
betterhuman interpretation
Image filter:
Modify or enhance theimage
Facial AU
(e.g., AU1, AU7,
AU6+ASU15,
etc)
Rule-based
classifier
一個簡易辨識的架構
e.g.,
Gabor filter coefficients

Facial
Expressions
of Emotion
(e.g., happy,
fear, disgust,
surprise, etc)
Automatic face &
Face alignment
Multiple image
Feature extraction:
Image filter:
Facial AU
(e.g., AU1, AU7,
AU6+ASU15,
etc)
Rule-based
classifier
"Recognizing action units for facial expression analysis.“
Tian, Y-I., Takeo Kanade, and Jeffrey F. Cohn.

Recognize AUs for Facial
Expression Analysis
- Rule-based Classifier
規則式分類器
Informed by FACS AUs, they group the facial features into upper and lower parts because the facial actions in two sides
are relatively independent for AU recognition [14]
P. Ekman and W.V. Friesen, The facial action coding system: A technique for the measurement of facial movement
single AU detection
combined AU detection

Recognize AUs for Facial
Expression Analysis
- Results 結果
AU detection
Ekman-Hager
Single AU
detection
Combine AU
detection
Recognition
rate
Upper
face 75 % 86.7 %
Lower
face 95.8 % 90.7 %
AU detection
cross database
Test databases
Train
databaseCohn-
Kanade
Ekman-
Hager
Recognition
rate
Upper
face 93.2 % 86.7 %
Ekman-
Hager
Lower
face 90.7 % 93.4 %
Cohn-
Kanade
一次預測多個比較準: 連動
AU準確率其實相當不錯, 跨資料庫也是

Facial
Expressions
of Emotion
(e.g., happy,
fear, disgust,
surprise, etc)
Automatic face &
Face alignment
Multiple image
Feature extraction:
Image filter:
Facial AU
(e.g., AU1, AU7,
AU6+ASU15,
etc)
Rule-based
classifier
"Recognizing Facial Expressions of Emotion using Action Unit Specific Decision Thresholds “
Mustafa Sert, and Nukhet Aksoy
AAM face
track model

Recognizing Facial Expressions of Emotion using
Action Unit Specific Decision Thresholds
• Extract facial images from Active Appearance Model (AAM) 主動式外
觀模型 to form an appearance model
• Facial AU multi-class classification using ADT 適應性決策模型 for
both AU detection and facial expression recognition
• ADT learns a separate decision threshold 𝑇𝑖 for each AU category,
assign instance 𝑥 to category 𝑖 if and only if satisfied:
𝑓𝑖 𝑥 = 𝒘𝑖
𝑇
Φ𝒙 + 𝒃𝒊 > 𝑇𝑖
Φ is the mapping function to map SVM to high dimension space

↑: Prototypic and major variants of AU combinations for
facial expression fear. ‘+’ denotes logical AND ‘,’ indicates
logical OR
Facial expression recognition accuracy of the proposed scheme. →
Bold bracketed numbers indicate best result,
bold numbers denote second best

• ADT-based AU detector along with the rule-
based emotion classifier (B&D) outperforms the
baseline methods (A&C)
• ↑Among the proposed method, D gives best
results in all facial emotion categories except
surprise
• → The proposed ADT scheme outperforms the
baseline method by an average F1-score of
6.383% for 17 AUs
• → It gives superior performance in terms of F1-
score compared with the baseline method for all
AUs except AU2

Facial
Expressions
of Emotion
(e.g., happy,
fear, disgust,
surprise, etc)
Automatic face &
Face alignment
Multiple image
Feature extraction:
Image filter:
Facial AU
(e.g., AU1, AU7,
AU6+ASU15,
etc)
Rule-based
classifier
"Compound facial expressions of emotion: from basic research to clinical applications“
Shichuan Du, and Aleix M. Martinez
Observations under distinct
compound emotions

Compound facial expressions of emotion
複合式情緒面部表情

Compound facial expressions of emotion
• AU intensity shown in a cumulative histogram for each AU and emotion
category
• The x-axis in these histograms specifies the intensity of activation
• The y-axis in these histograms defines the cumulative percentage of intensity
(scale 0 to 1)
• Numbers between zero and one specify the percentage of people using the
specified and smaller intensities.
Fig. AUs used to express a compound emotion are consistent
with the AUs used to express its component categories

Key take-away
找到AU很重要
標出AU很重要 ! ! !
學辨識AU可以用DNN
可以整合語音+文字+臉部
135

136
AffectNatural
Language
Non-
Verbal
Speech Physiology
Face
Body
Gestures
人的下一個行為模態?

肢體語言
• 非語言溝通（肢體動作與臉部表情）源自於達爾文
『人類與動物的情緒表達』論文中
• 以動作分析系統來描述肢體動作
• 透過動作分析，有下列基本動作特徵描述：
（1）軀幹運動（伸展，鞠躬）
（2）手臂運動（開，關）
（3）垂直方向（向上，向下）
（4）矢狀方向（向前，向後）
（5）力（強，輕）
（6）速度（快，慢）
（7）直接性（直接，間接）
• 心理學實驗運用動作特徵找尋與辨識情緒之間的關係
reference: de Meijer, M. The contribution of general features of body movement to the attribution of emotions. Journal of
Nonverbal Behavior 13, 4 (1989), 247–268. 137

肢體與情緒相關的研究
• Psychology
• Bull, P. E. Posture and gesture. Pergamon press, 1987.
• Pollick, F. E., Paterson, H. M., Bruderlin, A., and Sanford, A. J. Perceiving affect from arm
movement. Cognition 82, 2 (2001), B51–B61.
• Coulson, M. Attributing emotion to static body postures: Recognition accuracy, confusions,
and viewpoint dependence. Journal of nonverbal behavior 28, 2 (2004), 117–139.
• Boone, R. T., and Cunningham, J. G. Children’s decoding of emotion in expressive body
movement: The development of cue attunement. Developmental psychology 34 (1998),
1007–1016.
• de Meijer, M. The contribution of general features of body movement to the attribution of
emotions. Journal of Nonverbal Behavior 13, 4 (1989), 247–268.
• Engineer
• Balomenos, T., Raouzaiou, A., Ioannou, S., Drosopoulos, A., Karpouzis, K., and Kollias, S.
Emotion analysis in man-machine interaction systems. In Machine learning for multimodal
interaction. Springer, 2005, 318–328.
• Coulson, M. Attributing emotion to static body postures: Recognition accuracy, confusions,
and viewpoint dependence. Journal of nonverbal behavior 28, 2 (2004), 117–139.
reference: Stefano Piana, Alessandra Staglianò, Francesca Odone, Alessandro Verri, Antonio Camurri,
“Real-time Automatic Emotion Recognition from Body Gestures” in ArXiv 2014 138

怎麼收資料?
12 actors
four female and eight males
aged between 24 and 60
total of about 100 videos
separate clips of expressive gesture
reference:
1) Stefano Piana, Alessandra Staglianò, Francesca Odone, Alessandro Verri, Antonio Camurri, “Real-
time Automatic Emotion Recognition from Body Gestures” in ArXiv 2014
2) Amol S. Patwardhan and Gerald M. Knapp, “Augmenting Supervised Emotion Recognition with
Rule-Based Decision Model.” in ArXiv 2016
QualisysKinect
139

Data Validation – Human annotation
人來評?
The sole 3D skeleton is a
guarantee that the user is
not exploiting other
information
Not easy for human to
recognize emotion only based
on gesture
reference: Stefano Piana, Alessandra Staglianò, Francesca Odone, Alessandro Verri, Antonio Camurri,
“Real-time Automatic Emotion Recognition from Body Gestures” in ArXiv 2014 140

Skeleton based feature
anger sadness
happiness fear
surprise disgust
reference:
1)Stefano Piana, Alessandra Staglianò, Francesca Odone, Alessandro Verri, Antonio Camurri, “Real-time Automatic Emotion Recognition from Body Gestures”
in ArXiv 2014
2) Piana, S., Stagliano`, A., Camurri, A., and Odone, F. A set of full-body movement features for emotion recognition to help children affected by autism
spectrum condition. In IDGEI International Workshop (2013).
Histogram: Energy on each of frames
141

Classification Result
Qualisys Data with 310 gestures
Kinect Data with 579 gestures
Clean Dataset
Noisy Dataset
Almost the same to the human’s recognition ability 142

Skeleton Capture Method 系統
Kinect
reference:
https://itp.nyu.edu/classes/dance-f16/kinect/,
https://github.com/CMU-Perceptual-Computing-Lab/openpose
https://www.qualisys.com/
expensive, sophisticated
system with multiple high
speed camera
cheap, easy to get RGB-D
3D camera device
free, new software
system with CNN
Qualisys
OpenPose
143

OpenPose: CNN based Method
144
直接對影片處理

Pose difference/movement indicative of ‘arousal’ mostly
145
AffectNatural
Language
Non-
Verbal
Speech Physiology
Face
Body
Gestures
多行為整合的重要

1
個人需求/動機
生理
心理
精神
2
4
感覺產生
1
外在刺激
*轉變
*要求
*衝突
3
事物理解
6
行
為
產
生1
個人背景
*社會文化
*經驗
1
個人狀態
記憶力
專注力
判斷力
5
情緒發生
這可以是個 Expressive Data AI Learning and Inference 的問題嗎?
146

1
個人需求/動機
生理
心理
精神
2
4
感覺產生
1
外在刺激
*轉變
*要求
*衝突
3
事物理解
6
行
為
產
生1
個人背景
*社會文化
*經驗
1
個人狀態
記憶力
專注力
判斷力
5
情緒發生
這可以是個 Internal Data AI Learning and Inference 的問題嗎?
147

多重迷走神經理論-Polyvagal theory
Stephen Porges
148

人類的層級反應策略
• 第一層次：原始副交感神經系統。
• 通過抑制新陳代謝活動來應對威脅。
• 非主動性(immobilization)行為，裝死、昏厥、停止等策略来應對生命危險。
• 第二層次：交感神經系統。
• 通過增強新陳代謝功能能够，調高腎上腺素提高人的應對能力。
• 產生“戰鬥或逃離”(fight -flight)的主動性(mobilization)行為選擇。
• 第三層次，社會神經系統，哺乳動物所獨有。
• 行為調節上慘用與社會交流相聯繫的策略，比如表情、語音和傾聽等
• 沒有新陳代謝的興奮或腎上腺素的變化。
• 對臉、喉、咽部肌肉神经控制的增强，於是得以產生複雜的與社会交往相聯繫
的臉部表情和語調。
149

社會神經系統
• 神經系統的進化決定了人類情緒表達形式、交流
質量和調節行為的能力
• 抑制系统通過降低心律、降低血壓，抑制心臟交
感活性等調節手段，维持個體生長所需要的新陳
代謝的平衡狀態。
• 心率能夠作為一個敏感的信號去反應和識別社交
活動
reference:
D.S.Quintana,A.J.Guastella,T.Outhred,I.B.Hickie,andA.H.Kemp. Heart rate variability is associated with emotion recognition: direct
evidence for a relationship between the autonomic nervous system and social cognition. Int. J. of Psychophysiol, 86(2):168–172, 2012
http://blog.sina.com.cn/s/blog_753e49f90100pop2.html
http://www.xzbu.com/6/view-2908185.htm
150

心率變化 HRV (Heart Rate Variability)
• 自律神經系統（ANS）
• 包含交感與副交感神經系統
• 因各種情感刺激而產生作用
• 比方說受到驚嚇時，人會不自主地心跳加速、臉色發青，這就是交感神
經在運作
• 心率變異性(HRV)
• 反映了交感與副交感神經系統的平衡狀態
• 多項與自律神經活躍有關的因素，皆會使心率變異降低，譬如：血壓變
化、呼吸、身體或心理壓力，甲狀腺亢進與藥物治療等
• 心率變異分析(HRV analysis)
• 有一套完整並且標準化的評斷方法[2]
reference:
1) María Teresa Valderas , Juan Bolea, Pablo Laguna, Montserrat Vallverdú, Raquel Bailón, “Human Emotion Recognition Using
Heart Rate Variability Analysis with Spectral Bands Based on Respiration” in Engineering in Medicine and Biology Society (EMBC),
2015 37th Annual International Conference of the IEEE.
2) Task Force of the European Society of Cardiology and the North American Society of Pacing and Electrophysiology (1996) Heart
rate variability. Standards of measurement, physiological interpretation, and clinical use. Eur Heart J 17(3):354-81
3) D.S.Quintana,A.J.Guastella,T.Outhred,I.B.Hickie,andA.H.Kemp.
Heart rate variability is associated with emotion recognition: direct evidence for a relationship between the autonomic nervous
system and social cognition. Int. J. of Psychophysiol, 86(2):168–172, 2012.
151

HRV 生理上的指標與意義
心律變異度頻域分析測量指標、定義及臨床意義
指標單位定義頻譜範圍臨床意義
總功率
total power, TP
ms2
全部正常心跳間
期之變異數高頻、
低頻、極低頻的
總和
≤0.4Hz
整體心律變異度
評估
極低頻範圍功率
very low
frequency
power, VLFP
ms2 極低頻範圍正常
心跳間期之變異
≤0.04Hz 生理意義不明
低頻範圍功率
low frequency
power, LFP
ms2 低頻範圍正常心
跳間期之變異數
0.04-0.15Hz
代表交感與副交
感神經活性
高頻範圍功率
high frequency
power, HFP
ms2 高頻範圍正常心
跳間期之變異數
0.15-0.4Hz
代表副交感神經
活性
標準化低頻功率
normalized
LFP, nLFP
標準化單位,n.u. LF/(TP-VLF)
交感神經活性定
量指標
標準化高頻功率
normalized
HFP,nHFP
標準化單位,n.u. HF/(TP－VLF)
副交感神經活性
定量指標
低、高頻功率的
比值
LF/HF
無單位
低、高頻功率的
比值
代表自律神經活
性平衡
https://zh.wikipedia.org/wiki/%E5%BF%83%E7%8E%87%E8%AE%8A%E7%95%B0%E5%88%86%E6%9E%90
152

資料收集方法
• Emotion elicitation
• real experiences
• film clips
• problem solving
• computer game interfaces
• images
• spoken words
• music
• Movie clips method
• emotion inducing method more efficient than others verified by
previous studies
• 4 films (3- 10 min for each one)
• 4 emotion: angry, fear, sad and happy
• ECG data was record for 90 sec at 2 min before the end of movies.
reference:
1)Han Wen Guo, Yu Shun Huang, Jen Chien Chien, Jiann Shing Shieh, “Short-term Analysis of Heart Rate Variability for Emotion
Recognition via a Wearable ECG Device” in Intelligent Informatics and Biomedical Sciences (ICIIBMS), 2015
2) Mimma Nardelli, Gaetano Valenza, Alberto Greco, Antonio Lanata, Enzo Pasquale Scilingo, “Recognizing Emotions Induced by
Affective Sounds through Heart Rate Variability” in IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, VOL. 6, NO. 4, OCTOBER-
DECEMBER 2015
induced
153

ECG process pipeline
reference: Abhishek Vaish and Pinki Kumari, “A Comparative Study on Machine Learning Algorithms in Emotion State Recognition
Using ECG” in Proceedings of the Second International Conference on Soft Computing for Problem Solving (SocProS 2012),
December 28-30, 2012
154

ECG Feature Extraction: HRV
Time Domain Feature Frequency Domain Feature
reference: Han Wen Guo, Yu Shun Huang, Jen Chien Chien, Jiann Shing Shieh, “Short-term Analysis of Heart Rate Variability for
Emotion Recognition via a Wearable ECG Device” in Intelligent Informatics and Biomedical Sciences (ICIIBMS), 2015
1. MeanRRI:
average of resultant RR intervals.
2. CVRR:
the ratio of the standard deviation
and mean of RR intervals.
3. SDRR:
stand deviation of the RR intervals.
4. SDSD:
standard deviation of the
successive
differences of the RR intervals.
1. LF (low frequency):
standardized LF power (0.04-0.15
Hz)
2. HF (high frequency):
standardized HF power (0.15-0.4
Hz)
3. LHratio:
the ratio of LF/HF
he shapes of the
probability distributions
Statistic Feature
Evaluate the distribution :
155

Analysis on Feature
Time Domain Feature Frequency Domain Feature Statistics Feature
Emotion Recognition via a Wearable ECG Device” in Intelligent Informatics and Biomedical Sciences (ICIIBMS), 2015
156
平衡的機制

Classifier
Emotion Recognition via a Wearable ECG Device” in Intelligent Informatics and Biomedical Sciences (ICIIBMS),2015 157

這是個例子, 還有很多
158

1
個人需求/動機
生理
心理
精神
2
4
感覺產生
1
外在刺激
*轉變
*要求
*衝突
3
事物理解
6
行
為
產
生1
個人背景
*社會文化
*經驗
1
個人狀態
記憶力
專注力
判斷力
5
情緒發生
這還是個問題嗎? 當然!
159

160
情緒辨識的現在發展與未來

Group-Level EmotionThin Slice
技術上的開發 : 納入多重因子 Multi-Task
Cross Corpus
Common ground
Cross Lingual Perspective
161

LLD
這麼多資料庫: 每個不一樣，演算法又這麼多怎麼比較?
Encoding

五個資料庫比較
Result & Discussion
(Binary Classification: Unweighted Average Recall)
Database Act. Feature Rep. Val. Feature Rep.
CIT 0.658 Praat BoAW 0.613 Praat FV
IEMOCAP 0.769 EGEMAPS Func. 0.663 Praat FV
NNIME 0.65 Praat FV 0.564 Praat BoAW
RECOLA 0.634 EGEMAPS Func. 0.602 Praat BoAW
VAM 0.811 ComP_LLD FV 0.665 EGE_LLD BoW

Variational Deep Embedding Fisher Scoring
雙人互動: 我們在情緒辨識這點是否需要考慮 ?
答案: 要會比較好! 該怎麼作

Generated Perspectives Multi-view Kernel Fusion
1. Chun-Min Chang, Bo-Hao Su, Shih-Chen Lin, Jeng-Lin Li, Chi-Chun Lee, "A Boostrapped Multi-View Weighted Kernel Fusion Framework for Cross-Corpus Integration of Multimodal Emotion Recognition"
in Proceedings of ACII 2017
2. Chun-Min Chang, Chi-Chun Lee, "Fusion of Multiple Emotion Perspectives: Improving Affect Recognition Through Integrating Cross-Lingual Emotion Information" in Proceedings of the International
Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2017
跨語言語音情緒辨識 ? 整合語言特性

Group-Level EmotionThin Slice
技術上的開發
Multi-Task
Cross Corpus
Common ground
Cross Lingual Perspective
166

情緒在人類行為上的影響
memory
cognitive
emotion
167

情緒對人的認知與行為的影響
情緒1
2 3
4專注力
記憶感覺
判斷力
身經中樞
害怕的情緒往往會刺激杏仁核
作用。這種恐懼的情緒除了對
外在威脅可以產生迅速的反應
外，還會讓記憶更深刻。
醫療發現
在帶有(正向/負向)情緒的時候、
人臉偵測的數量提高
情緒的身體反
應情緒是本質，來自大腦某區功
能反映，而感覺就是情緒引發
的種種反應
情緒判斷
人做的決定會因當時所處的情
緒狀況而不同。
168

情緒研究在醫療上的價值
自閉症、alexithymia
情感判讀、了解障礙
精神分裂症、強迫症
情緒控制障礙
情緒在健康上的影響
樂觀情緒增強健康與厭倦情緒的負面影
響。人類都會想讓自己健康，他們生活
上做的決定和隨之而來的行為將導致他
們的健康狀況
169

情緒導入的個體行為
情緒1
2
3
4
5樂觀
厭倦
170

(樂觀/悲觀)在健康行為上面的影響
Mental Well-being
根據研究現代社會人的健康影響因子來
自環境、基因、醫療系統、生活型態
(如圖)，人對於的自己的生活型態控管
佔了很重的比例。
探討樂觀與捷克大學成年學生的健康狀
況的研究中，雖然沒有發現顯著的高相
關性，但卻發現樂觀的態度與心靈健康
的指標呈現高度相關，他們解釋：樂觀
的人容易達到情緒祥和穩定的狀態、抱
持正向的態度、睡眠充足、保持良好社
交關係及享受自己做的事情
Ref: https://ac.els-cdn.com/S1877042815003080/1-s2.0-S1877042815003080-main.pdf?_tid=238e46fe-da36-11e7-86e8-
00000aab0f26&acdnat=1512531351_6db4641b5d3531d365e0f207f474d65f
171

(厭倦)情緒在健康行為上面的影響
Well-beings
厭倦情緒和大部分的與成癮問題有很高的相關性。但這種情緒也是促進一個人
的生產力的動力(Boredom, it turns out, can be a dangerous and disruptive state of
mind that damages your health)(Mann’s )
Ref: http://alcoholrehab.com/drug-addiction/boredom-and-substance-abuse/
Boredom導致成癮發生的原因多
半來自生活上選擇的侷限，例如
青少年面對一成不變的念書生活，
導致他們想尋求刺激，加上好奇
心的驅使、未成熟的價值觀、同
儕的慫恿，使得酗酒、賭博、暴
食、等等對於健康傷害的問題出
現。
大多數沒有成癮問題的人回報說
他們鮮少有無聊厭倦的問題，主
因來自他們對於生活上成就感的
追求使得他們沒有餘力接觸這些
陋習，因此找到生活的目標是解
決成癮問題的良藥
Ref: On the Function of Boredom (Shane W. Bench)
172

情緒對自閉症的幫助
Ref: The Facilitation of Social-Emotional Understanding and Social Interaction in High-Functioning Children with Autism:
自閉症的社交障礙:
社交障礙是自閉症的病徵，其重大原因在於他們無法判讀情
緒，研究顯示及早教育自閉症情緒上的知識有助於讓他們融
入社會。
Ref: Social Skills Deficits in Children with Autism Spectrum Disorders: Evidence Based Interventions
主動社交
經過情緒治
療後的自閉
症患者主動
表達社交意
願提高
眼神接觸
判讀他人情
緒會通常脫
離不了從人
的眼神上判
讀。受過教
育的自閉症
病人有獲得
改善
分享自己
經過情緒治
療後的自閉
症患者能懂
得合同材分
享自己
情緒表達
情緒教育的
介入讓自閉
症更能描述
比較複雜的
情緒
173

教育學習與情緒
學習動機在學習行為上不能以成為不
可忽略要素(Pintrich, 1991, p. 199)
動機心理學、認知心理學、發展心理
學、教育心理學致力於了解人類學習
運作機制，情緒在學習上變成了學習
的基本要素，不論在教學、學習的過
程中都是，因此，了解情緒的運作對
於教育人員是非常重要的
( 摘錄自： special issue of the
Educational Psychologist) 企劃
目標
興趣
自我
情緒
成就感
Ref: The Importance of Students‘ Goals in TheirEmotional Experience of Academic Failure:Investigating the
Precursors and Consequences of Shame (Jeannine E. Turner )
174

著作：《The importance of Students’ Goals
in Their Emotional Experience of Academic
Failure: Investigating the Precursors and
Consequences of Shame,”》
發現：給予學生長遠的計畫有助於提
升學習的彈性。
這篇研究主要在探討學生學習失敗的
例子。羞恥感、失敗、錯愕常常導致
學生輟學、自卑等等負面影響，但如
果我們藉由輔導他產生短程、長程的
學習目標，在增加學習彈性的情況下，
能夠加強學生自我管控的行為，並且
提升學生受挫抗壓的能力。
The Importance of Students‘ Goals in TheirEmotional Experience of
Academic Failure:Investigating the Precursors and Consequences ofShame
(Jeannine E. Turner)
175

情緒與消費行為、廣告
FMRI核磁造影證實情消費行為與情緒(感
覺與經驗)比較相關而非資訊
廣告研究發現消費者對於廣告的情緒反應
的影響遠大於廣告內容本身
同樣廣告研究發現對於廣告好感度與因為
廣告增加產品銷售量有相關性
研究發現正向情緒對於消費者對品牌的忠
誠度比其他任何判斷都來的有關
Ref: https://blog.hubspot.com/marketing/emotions-in-advertising-examples
176
情緒內容往往會加深人的印象
開心：開心的影片會很容易被別人轉貼
害怕：引發人恐懼的影片通常是作為宣導相關的目的
難過：人壽類型的企業常常引發消費者悲傷的情緒，通常會與孩子或是父母為主題
生氣：生氣的情緒可以促使人反省

情緒能隨著音樂被喚起
• 情緒會渲染聽到音樂的感覺，情緒也會隨著音
樂被激發(communication and perception of
emotion in music)
• 音樂引發出的情緒後果(emotional
consequences of music listening)
• 情緒可作為使用者音樂偏好的依據(predictors
of music preferences)
音樂對於人類情緒一直密不可分，心理學家還在尋求音
樂與情緒的直接關聯性Swathi Swaminathan 提出了三種
對情緒的假設：
Ref: Current Emotion Research in Music Psychology (Swathi Swaminathan )
177

情緒的重要
情緒的應用
光是可以SENSE EMOTION這件事…
….交給大家想像….
178

1
個人需求/動機
生理
心理
精神
2
4
感覺產生
1
外在刺激
*轉變
*要求
*衝突
3
事物理解
6
行
為
產
生1
個人背景
*社會文化
*經驗
1
個人狀態
記憶力
專注力
判斷力
5
情緒發生
還沒完呢!
179

functional Magnetic
Resonance Imaging (fMRI)
180

• Uses a standard MRI scanner
• Acquires a series of images (numbers)
• Measure changes in blood oxygenation
• Use non-invasive, non-ionizing radiation
• Can be repeated many times; can be used for a
wide range of subjects
• Combines good spatial and reasonable temporal
resolution
Synopsis of fMRI
182

Blood-Oxygen-Level dependent (BOLD)
183

Emotion Perception Decoding from fMRI
fMRI Dataset
Interaction
behavior
SPM
Preprocessing
Emotion
Machine
Learning
Behavior
observation
184

Emotional modules 活化的腦區
185

Co-activation graph for each emotion category
A) Force-directed graphs for each emotion category, based on the Fruchterman-Reingold spring
algorithm
B) The same connections in the anatomical space of the brain.
186連結的腦區 - 非單一區

1
個人需求/動機
生理
心理
精神
2
4
感覺產生
1
外在刺激
*轉變
*要求
*衝突
3
事物理解
6
行
為
產
生1
個人背景
*社會文化
*經驗
1
個人狀態
記憶力
專注力
判斷力
5
情緒發生
不過有了解這個INFERENCE可能性，有更多方向?
188

Our Research: Human-centered Behavioral Signal Processing (BSP)
Prof. Shrikanth Narayanan
Seek a window into human mind and traits…
…through engineering approach
S. Narayanan and P. G. Georgiou, “Behavioral signal processing: Deriving human behavioral informatics from speech and language," Proceedings of the IEEE, vol. 101, no. 5, pp. 1203–1233, 2013.
Daniel Bone, Chi-Chun Lee, Theodora Chaspari, James Gibson, Shrikanth Narayanan, "Signal Processing and Machine Learning for Mental Health Research and Clinical Applications", in IEEE Signal Processing Magazine
人類行為訊號暨互動計算研究室 (EECS713)
Behavioral Informatics and Interaction Computation Laboratory (BIIC)

訊號處理 (Signal Processing)
機器學習演算法 (Machine Learning)
決策工具 (Decision Analytics)
High-dimensional Behavior Space, Non-linear Predictive Recognition,
Multimodal Integration, Experts Decision Mechanism
Spatial-temporal modeling
De-noising
Feature extraction
Supervised
Unsupervised
Semi-supervised
Our Technology: Human-centric Decision Analytics Research & Development
Core Technology
Speech & Language
Diarization, SpeakerID, ASR,
Paralinguistic Descriptors, Emotion-
AI, Sentiment, Word-topic
Representation
Computer Vision
Segmentation, Tracking, Image-
Video Descriptors
Multimodal Fusion
Joint speech-language-gesture
modeling for multimodal prediction,
Multi-party interaction modeling
Representation Learning
Behavior embedded space learning,
clinical health informatics data
representation
Predictive Learning
Deep-learning, machine learning
based predictive modeling

BIIC
Interdisciplinary
Research
ASD
PAIN
EHR
fMRI
EMO-
AI
ORAL
Mental Health
Clinical Health
Affective Computing
Education
Our Application: Human-centered Exemplary BSP Domains
Flow
Consumer Behavior
EMO
-AI
Neuroscience
KEY
APPLICATIONS
Affective
Computing
Mental Health
Clinical Health
Education
Neuroscience
Consumer
Behavior
跨領域: 以人為主的應用

193
科技、資料、人類行為、人工智慧、跨領域合作
提供專家決策工具，全新各種的可能
顯微鏡: 不只是 “放大”
可以研究開發幫助社會有意義科技應用
Computing beyond status-quo in making a positive impact

Factual Conceptual Procedural Metacognitive
Computation Blueprints
Behavior
Computing
Health
Analytics
Affect
Recognition
Emphatic
Computing
Social
Computing
Value-
Sensitive
TechnologyAffective
Feedback
Interpersonal
Relationship
Computing
Cognitive
Feedback
Fulfillment
Empowerment
Motivation
Internal States
External
Functions
Our Vision: Human-Centric Computing (HCC)
“…computationally innovate human-centric empowerment
enabling next-generation entity intelligence”

195
張鈞閔
陳金博
李政霖
林畇劭
蔡福昇
周惶振
洪振瀛醫師
PHD 學生
研究生
BIIC LAB MEMBERS

196
BIIC Lab @ NTHU EE
http://biic.ee.nthu.edu.tw
THANK YOU . . .

[系列活動] Emotion-AI: 運用人工智慧實現情緒辨識

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a [系列活動] Emotion-AI: 運用人工智慧實現情緒辨識

Similar a [系列活動] Emotion-AI: 運用人工智慧實現情緒辨識 (20)

Más de 台灣資料科學年會

Más de 台灣資料科學年會 (20)

Último

Último (20)

[系列活動] Emotion-AI: 運用人工智慧實現情緒辨識