7. 7
Early 1990s
Early 2000s
2017
Multi-modal systems
e.g., Microsoft MiPad, Pocket PC
Keyword Spotting
(e.g., AT&T)
System: “Please say collect,
calling card, person, third
number, or operator”
TV Voice Search
e.g., Bing on Xbox
Intent Determination
(Nuance’s Emily™, AT&T HMIHY)
User: “Uh…we want to move…we
want to change our phone line
from this house to another house”
Task-specific argument extraction
(e.g., Nuance, SpeechWorks)
User: “I want to fly from Boston
to New York next week.”
Brief History of Dialogue Systems
Apple Siri
(2011)
Google Now (2012)
Facebook M & Bot
(2015)
Google Home
(2016)
Microsoft Cortana
(2014)
Amazon Alexa/Echo
(2014)
Google Assistant
(2016)
DARPA
CALO Project
Virtual Personal Assistants
8. 8
Language Empowering Intelligent Assistant
Apple Siri (2011) Google Now (2012)
Facebook M & Bot (2015) Google Home (2016)
Microsoft Cortana (2014)
Amazon Alexa/Echo (2014)
Google Assistant (2016)
9. 9
Why We Need?
Daily Life Usage
Weather
Schedule
Transportation
Restaurant Seeking
9
10. 10
Why We Need?
Get things done
E.g. set up alarm/reminder, take note
Easy access to structured data, services and apps
E.g. find docs/photos/restaurants
Assist your daily schedule and routine
E.g. commute alerts to/from work
Be more productive in managing your work and personal life
10
11. 11
Why Natural Language?
Global Digital Statistics (2015 January)
11
Global Population
7.21B
Active Internet Users
3.01B
Active Social
Media Accounts
2.08B
Active Unique
Mobile Users
3.65B
The more natural and convenient input of devices evolves towards speech.
12. 12
Intelligent Assistant Architecture
12
Reactive
Assistance
ASR, LU, Dialog, LG, TTS
Proactive
Assistance
Inferences, User
Modeling, Suggestions
Data
Back-end Data
Bases, Services and
Client Signals
Device/Service End-points
(Phone, PC, Xbox, Web Browser, Messaging Apps)
User Experience
“restaurant suggestions”“call taxi”
13. 13
Spoken Dialogue System (SDS)
Spoken dialogue systems are intelligent agents that are able to help users
finish tasks more efficiently via spoken interactions.
Spoken dialogue systems are being incorporated into various devices
(smart-phones, smart TVs, in-car navigating system, etc).
13
JARVIS – Iron Man’s Personal Assistant Baymax – Personal Healthcare Companion
Good dialogue systems assist users to access information conveniently
and finish tasks efficiently.
14. 14
App Bot
A bot is responsible for a “single” domain, similar to an app
14
Seamless and automatic information transferring across domains
reduce duplicate information and interaction
Food
MAP
LINE
Goal: Schedule lunch with Vivian
KKBOX
15. 15
GUI v.s. CUI (Conversational UI)
15
https://github.com/enginebai/Movie-lol-android
18. Two Branches of Bots
Personal assistant, helps
users achieve a certain task
Combination of rules and
statistical components
POMDP for spoken dialog systems
(Williams and Young, 2007)
End-to-end trainable task-oriented
dialogue system (Wen et al., 2016)
End-to-end reinforcement learning
dialogue system (Li et al., 2017; Zhao
and Eskenazi, 2016)
No specific goal, focus on natural
responses
Using variants of seq2seq model
A neural conversation model (Vinyals and
Le, 2015)
Reinforcement learning for dialogue
generation (Li et al., 2016)
Conversational contextual cues for
response ranking (AI-Rfou et al., 2016)
18
Task-Oriented Bot Chit-Chat Bot
19. 19
Task-Oriented Dialogue System (Young, 2000)
19
Speech
Recognition
Language Understanding (LU)
• Domain Identification
• User Intent Detection
• Slot Filling
Dialogue Management (DM)
• Dialogue State Tracking (DST)
• Dialogue Policy
Natural Language
Generation (NLG)
Hypothesis
are there any action movies to
see this weekend
Semantic Frame
request_movie
genre=action, date=this weekend
System Action/Policy
request_location
Text response
Where are you located?
Text Input
Are there any action movies to see this weekend?
Speech Signal
Backend Action /
Knowledge Providers
http://rsta.royalsocietypublishing.org/content/358/1769/1389.short
20. 20
Interaction Example
20
User
Intelligent
Agent Q: How does a dialogue system process this
request?
Good Taiwanese eating places include Din Tai
Fung, Boiling Point, etc. What do you want to
choose? I can help you go there.
find a good eating place for taiwanese food
21. 21
Task-Oriented Dialogue System (Young, 2000)
21
Speech
Recognition
Language Understanding (LU)
• Domain Identification
• User Intent Detection
• Slot Filling
Dialogue Management (DM)
• Dialogue State Tracking (DST)
• Dialogue Policy
Natural Language
Generation (NLG)
Hypothesis
are there any action movies to
see this weekend
Semantic Frame
request_movie
genre=action, date=this weekend
System Action/Policy
request_location
Text response
Where are you located?
Text Input
Are there any action movies to see this weekend?
Speech Signal
Backend Action /
Knowledge Providers
22. 22
1. Domain Identification
Requires Predefined Domain Ontology
22
find a good eating place for taiwanese food
User
Organized Domain Knowledge (Database)Intelligent
Agent
Restaurant DB Taxi DB Movie DB
Classification!
23. 23
2. Intent Detection
Requires Predefined Schema
23
find a good eating place for taiwanese food
User
Intelligent
Agent
Restaurant DB
FIND_RESTAURANT
FIND_PRICE
FIND_TYPE
:
Classification!
24. 24
3. Slot Filling
Requires Predefined Schema
find a good eating place for taiwanese food
User
Intelligent
Agent
24
Restaurant DB
Restaurant Rating Type
Rest 1 good Taiwanese
Rest 2 bad Thai
: : :
FIND_RESTAURANT
rating=“good”
type=“taiwanese”
SELECT restaurant {
rest.rating=“good”
rest.type=“taiwanese”
}Semantic Frame Sequence Labeling
O O B-rating O O O B-type O
25. 25
Task-Oriented Dialogue System (Young, 2000)
25
Speech
Recognition
Language Understanding (LU)
• Domain Identification
• User Intent Detection
• Slot Filling
Dialogue Management (DM)
• Dialogue State Tracking (DST)
• Dialogue Policy
Natural Language
Generation (NLG)
Hypothesis
are there any action movies to
see this weekend
Semantic Frame
request_movie
genre=action, date=this weekend
System Action/Policy
request_location
Text response
Where are you located?
Text Input
Are there any action movies to see this weekend?
Speech Signal
Backend Action /
Knowledge Providers
26. 26
State Tracking
Requires Hand-Crafted States
User
Intelligent
Agent
find a good eating place for taiwanese food
26
location
ratin
g
typ
e
loc, rating
rating,
type
loc,
type
all
i want it near to my office
NULL
27. 27
State Tracking
Requires Hand-Crafted States
User
Intelligent
Agent
find a good eating place for taiwanese food
27
location
ratin
g
typ
e
loc, rating
rating,
type
loc,
type
all
i want it near to my office
NULL
28. 28
State Tracking
Handling Errors and Confidence
User
Intelligent
Agent
find a good eating place for taixxxx food
28
FIND_RESTAURANT
rating=“good”
type=“taiwanese”
FIND_RESTAURANT
rating=“good”
type=“thai”
FIND_RESTAURAN
T
rating=“good”
location
ratin
g
typ
e
loc, rating
rating,
type
loc,
type
all
NULL
?
?
29. 29
Policy for Agent Action & Generation
Inform
“The nearest one is at Taipei 101”
Request
“Where is your home?”
Confirm
“Did you want Taiwanese food?”
Database Search
Task Completion / Information Display
ticket booked, weather information
29
Din Tai Fung
:
:
33. 33
Program for Solving Tasks
Task: predicting positive or negative given a product review
33
“I love this product!” “It’s a little expensive.”“It claims too much.”
+ ?-
“台灣第一波上市!” “樓下買了我才考慮
”
“規格好雞肋…”
推 ?噓
program.py
Some tasks are complex, and we don’t know how to write a program to solve
them.
if input contains “love”, “like”, etc.
output = positive
if input contains “too much”, “bad”, etc.
output = negative
program.py program.py
program.py program.py program.py
34. 34
Learning ≈ Looking for a Function
Task: predicting positive or negative given a product review
34
“I love this product!” “It’s a little expensive.”“It claims too much.”
+ ?-
“台灣第一波上
市!”
“樓下買了我才考慮
”
“規格好雞肋…”
推 ?噓
f f f
f f f
Given a large amount of data, the machine learns what the function f should be.
35. 35
Learning ≈ Looking for a Function
Speech Recognition
Image Recognition
Go Playing
Chat Bot
f
f
f
f
cat
“你好”
5-5 (next move)
“中研院在哪裡” “地址為…
需要為你導航嗎”
36. 36
Framework
36
1f “cat”
1f “do
g”
2f “monkey”
2f “snak
e”
A set of
function 21, ff
Model
f “cat”
Image Recognition:
37. 37
Framework
37
f “cat”
Image Recognition:
A set of
function 21, ff
Model
Training
Data
Goodness of
function f
“monkey” “cat” “dog”
*
f
Pick the “Best” Function
Using
f
“cat”
Training Testing
Training is to pick the best function given the observed data
Testing is to predict the label using the learned function
39. 39
Deep Learning
DL is a general purpose framework for representation learning
Given an objective
Learn representation that is required to achieve objective
Directly from raw inputs
Use minimal domain knowledge
39
1x
2x
…
…
1y
2y
…
…
…
…
…
…
… MyNx
vector
x
vector
y
42. 42
A Layer of Neurons
Handwriting digit classification
MN
RRf :
A layer of neurons can handle multiple possible output,
and the result depends on the max one
…
1x
2x
Nx
1
1y
…
…
“1” or not
“2” or not
“3” or not
2y
3y
10 neurons/10 classes
Which
one is
max?
43. 43
Deep Neural Networks (DNN)
Fully connected feedforward network
1x
2x
……
Layer 1
……
1y
2y
……
Layer 2
……
Layer L
……
……
……
Input Output
MyNx
vector
x
vector
y
Deep NN: multiple hidden layers
MN
RRf :
44. 44
Loss / Objective
44
1x
2x
……
256x
……
……
……
……
……
y1
y2
y10
Loss 𝑙
“1”
……
1
0
0……
Loss can be square error or cross entropy between the network
output and target
target
Softmax
As close as
possible
A good function should make the loss of all examples as small as possible.
Given a set of
parameters
45. 45
Recurrent Neural Network (RNN)
http://www.wildml.com/2015/09/recurrent-neural-networks-tutorial-part-1-introduction-to-rnns/
: tanh, ReLU
time
RNN can learn accumulated sequential information (time-series)
46. 46
Model Training
All model parameters can be
updated by SGD
46http://www.wildml.com/2015/09/recurrent-neural-networks-tutorial-part-1-introduction-to-rnns/
yt-1 yt+1yt target
predicted
47. 47
BPTT
47
For 𝐶(1)Backward Pass:
For 𝐶(2)
For 𝐶(3)For 𝐶(4)
Forward Pass: Compute s1, s2, s3, s4 ……
y1 y2 y3
x1
x2 x3
o1 o2 o3
ini
t
y4
x4
o4
𝐶(1)
𝐶(2) 𝐶(3) 𝐶(4)
s1 s2 s3
s4
The model is trained by comparing the correct
sequence tags and the predicted ones
49. 49
Reinforcement Learning
RL is a general purpose framework for decision making
RL is for an agent with the capacity to act
Each action influences the agent’s future state
Success is measured by a scalar reward signal
Goal: select actions to maximize future reward
Big three: action, state, reward
52. Scenario of Reinforcement Learning
Agent learns to take actions to maximize expected reward.
52
Environment
Observation ot Action at
Reward rt
If win, reward = 1
If loss, reward = -1
Otherwise, reward = 0
Next Move
53. 53
Supervised v.s. Reinforcement
Supervised
Reinforcement
53
Hello
Agent
……
Agent
……. ……. ……
Bad
“Hello” Say “Hi”
“Bye bye” Say “Good bye”
Learning from teacher
Learning from critics
54. 54
Supervised v.s. Reinforcement
Supervised Learning
Training based on
supervisor/label/annotation
Feedback is instantaneous
Time does not matter
54
Reinforcement Learning
Training only based on reward
signal
Feedback is delayed
Time matters
Agent actions affect
subsequent data
55. 55
Reward
Reinforcement learning is based on reward hypothesis
A reward rt is a scalar feedback signal
Indicates how well agent is doing at step t
Reward hypothesis: all agent goals can be desired by maximizing
expected cumulative reward
56. 56
Sequential Decision Making
Goal: select actions to maximize total future reward
Actions may have long-term consequences
Reward may be delayed
It may be better to sacrifice immediate reward to gain more
long-term reward
56
60. 60
Semantic Frame Representation
Requires a domain ontology
Contains core content (intent, a set of slots with fillers)
find a cheap taiwanese restaurant in oakland
show me action movies directed by james cameron
find_restaurant (price=“cheap”,
type=“taiwanese”, location=“oakland”)
find_movie (genre=“action”,
director=“james cameron”)
Restaurant
Domain
Movie
Domain
restaurant
typeprice
location
movie
yeargenre
director
60
62. LU – Domain/Intent Classification
• Given a collection of utterances ui with labels ci, D=
{(u1,c1),…,(un,cn)} where ci ∊ C, train a model to estimate labels
for new utterances uk.
Mainly viewed as an utterance classification task
62
find me a cheap taiwanese restaurant in oakland
Movies
Restaurants
Sports
Weather
Music
…
Find_movie
Buy_tickets
Find_restaurant
Book_table
Find_lyrics
…
63. 63
Language Understanding - Slot Filling
63
Is there um a cheap place in the centre of town please?
Is there um a cheap place in the centre of town please?
O O O O
B-price
O O O
B-area
O
I-areaI-area
Figure credited by Milica Gašić
As a sequence tagging task
•CRF for tagging each utterance
As a classification task
•SVM for each slot value pair
65. 65
Dialogue State Tracking (DST)
Maintain a probabilistic distribution instead of a 1-best
prediction for better robustness
65Slide credited by Sungjin Lee
Incorrect
for both!
66. 66
Dialogue State Tracking (DST)
Maintain a probabilistic distribution instead of a 1-best
prediction for better robustness to SLU errors or
ambiguous input
66
How can I help you?
Book a table at Sumiko for 5
How many people?
3
Slot Value
# people 5 (0.5)
time 5 (0.5)
Slot Value
# people 3 (0.8)
time 5 (0.8)
68. 68
Dialogue Policy Optimization
Dialogue management in a RL framework
68
U s e r
Reward R Observation OAction A
Environment
Agent
Natural Language Generation Language Understanding
Dialogue Manager
Slides credited by Pei-Hao Su
Optimized dialogue policy selects the best action that can maximize the future reward.
Correct rewards are a crucial factor in dialogue policy training
69. 69
Reward for RL ≅ Evaluation for SDS
Dialogue is a special RL task
Human involves in interaction and rating (evaluation) of a
dialogue
Fully human-in-the-loop framework
Rating: correctness, appropriateness, and adequacy
- Expert rating high quality, high cost
- User rating unreliable quality, medium cost
- Objective rating Check desired aspects, low cost
69
70. 70
Dialogue Reinforcement Signal
Typical Reward Function
per turn penalty -1
Large reward at completion if successful
Typically requires domain knowledge
✔ Simulated user
✔ Paid users (Amazon Mechanical Turk)
✖ Real users
|||
…
﹅
70
71. 71
User Simulation
User Simulation
Goal: generate natural and reasonable conversations to enable
reinforcement learning for exploring the policy space
Approach
Rule-based crafted by experts (Li et al., 2016)
Learning-based (Schatzmann et al., 2006)
Dialogue
Corpus
Simulated User
Real User
Dialogue Management (DM)
• Dialogue State Tracking (DST)
• Dialogue Policy
Interaction
71
73. 73
Natural Language Generation (NLG)
Mapping semantic frame into natural language
inform(name=Seven_Days, foodtype=Chinese)
Seven Days is a nice Chinese restaurant
73
74. 74
Template-Based NLG
Define a set of rules to map frames to NL
74
Pros: simple, error-free, easy to control
Cons: time-consuming, poor scalability
Semantic Frame Natural Language
confirm() “Please tell me more about the product your are
looking for.”
confirm(area=$V) “Do you want somewhere in the $V?”
confirm(food=$V) “Do you want a $V restaurant?”
confirm(food=$V,area=$W) “Do you want a $V restaurant in the $W.”
76. 76
Concluding Remarks
Modular dialogue system
76
Speech
Recognition
Language Understanding (LU)
• Domain Identification
• User Intent Detection
• Slot Filling
Dialogue Management (DM)
• Dialogue State Tracking (DST)
• Dialogue Policy
Natural Language
Generation (NLG)
Hypothesis
are there any action movies to
see this weekend
Semantic Frame
request_movie
genre=action, date=this weekend
System Action/Policy
request_location
Text response
Where are you located?
Text Input
Are there any action movies to see this weekend?
Speech Signal
79. 79
Task-Oriented Dialogue System (Young, 2000)
79
Speech
Recognition
Language Understanding (LU)
• Domain Identification
• User Intent Detection
• Slot Filling
Dialogue Management (DM)
• Dialogue State Tracking (DST)
• Dialogue Policy
Natural Language
Generation (NLG)
Hypothesis
are there any action movies to
see this weekend
Semantic Frame
request_movie
genre=action, date=this weekend
System Action/Policy
request_location
Text response
Where are you located?
Text Input
Are there any action movies to see this weekend?
Speech Signal
Backend Database/
Knowledge Providers
http://rsta.royalsocietypublishing.org/content/358/1769/1389.short
80. 80
Task-Oriented Dialogue System (Young, 2000)
80
Speech
Recognition
Language Understanding (LU)
• Domain Identification
• User Intent Detection
• Slot Filling
Dialogue Management (DM)
• Dialogue State Tracking (DST)
• Dialogue Policy
Natural Language
Generation (NLG)
Hypothesis
are there any action movies to
see this weekend
Semantic Frame
request_movie
genre=action, date=this weekend
System Action/Policy
request_location
Text response
Where are you located?
Text Input
Are there any action movies to see this weekend?
Speech Signal
Backend Action /
Knowledge Providers
http://rsta.royalsocietypublishing.org/content/358/1769/1389.short
81. 81
Semantic Frame Representation
Requires a domain ontology: early connection to backend
Contains core concept (intent, a set of slots with fillers)
find me a cheap taiwanese restaurant in oakland
show me action movies directed by james cameron
find_restaurant (price=“cheap”,
type=“taiwanese”, location=“oakland”)
find_movie (genre=“action”,
director=“james cameron”)
Restaurant
Domain
Movie
Domain
restaurant
typeprice
location
movie
yeargenre
director
81
82. 82
Backend Database / Ontology
82
Domain-specific table
Target and attributes
Movie Name Theater Rating Date Time
美女與野獸 台北信義威秀 8.5 2017/03/21 09:00
美女與野獸 台北信義威秀 8.5 2017/03/21 09:25
美女與野獸 台北信義威秀 8.5 2017/03/21 10:15
美女與野獸 台北信義威秀 8.5 2017/03/21 10:40
美女與野獸 台北信義威秀 8.5 2017/03/21 11:05
movie name
daterating
theater time
83. 83
Supported Functionality
Information access
Finding the specific entries from the table
E.g. available theater, movie rating, etc.
Task completion
Finding the row that satisfies the constraints
83
Movie Name Theater Rating Date Time
美女與野獸 台北信義威秀 8.5 2017/03/21 09:00
美女與野獸 台北信義威秀 8.5 2017/03/21 09:25
美女與野獸 台北信義威秀 8.5 2017/03/21 10:15
美女與野獸 台北信義威秀 8.5 2017/03/21 10:40
美女與野獸 台北信義威秀 8.5 2017/03/21 11:05
84. 84
Dialogue Schema
Slot: domain-specific attributes
Columns from the table
e.g. theater, date
84
Movie Name Theater Rating Date Time
美女與野獸 台北信義威秀 8.5 2017/03/21 09:00
美女與野獸 台北信義威秀 8.5 2017/03/21 09:25
美女與野獸 台北信義威秀 8.5 2017/03/21 10:15
美女與野獸 台北信義威秀 8.5 2017/03/21 10:40
美女與野獸 台北信義威秀 8.5 2017/03/21 11:05
85. 85
Dialogue Schema
Dialogue Act
inform, request, confirm (system only)
Task-specific action (e.g. book_ticket)
Others (e.g. thanks)
85
Find me the Bill Murray’s movie.
I think it came out in 1993.
When was it released?
BotUser
request(moviename; actor=Bill Murray)
request(releaseyear)
inform(releaseyear=1993)
User Intent
= Dialogue Act + Slot
System Action
= Dialogue Act + Slot
92. 92
Extension
Multiple tables as backend
Each table is responsible for a domain
Associate intent with corresponding table/domain
Movie Name Theater Date Time
美女與野獸 台北信義威秀 2017/03/21 09:00
美女與野獸 台北信義威秀 2017/03/21 09:25
美女與野獸 台北信義威秀 2017/03/21 10:15
Movie Name Cast
Release
Year
美女與野獸 Emma Watson 2017
鋼鐵人 : :
你知道美女與野獸是那一年的電影嗎?
request(kng.release_year; moviename=美女與野獸)
Ticket Table Knowledge Table
93. 93
Concluding Remarks
Semantic schema highly corresponds to backend DB
Movie Name Theater Rating Date Time
美女與野獸 台北信義威秀 8.5 2017/03/21 09:00
美女與野獸 台北信義威秀 8.5 2017/03/21 09:25
美女與野獸 台北信義威秀 8.5 2017/03/21 10:15
美女與野獸 台北信義威秀 8.5 2017/03/21 10:40
美女與野獸 台北信義威秀 8.5 2017/03/21 11:05
slot
show me the rating of beauty and beast
find_rating (movie=“beauty and beast”)
movie
timerating
date
95. 95
Task-Oriented Dialogue System (Young, 2000)
95
Speech
Recognition
Language Understanding (LU)
• Domain Identification
• User Intent Detection
• Slot Filling
Dialogue Management (DM)
• Dialogue State Tracking (DST)
• Dialogue Policy
Natural Language
Generation (NLG)
Hypothesis
are there any action movies to
see this weekend
Semantic Frame
request_movie
genre=action, date=this weekend
System Action/Policy
request_location
Text response
Where are you located?
Text Input
Are there any action movies to see this weekend?
Speech Signal
Backend Database/
Knowledge Providers
http://rsta.royalsocietypublishing.org/content/358/1769/1389.short
96. 96
Task-Oriented Dialogue System (Young, 2000)
96
Speech
Recognition
Language Understanding (LU)
• Domain Identification
• User Intent Detection
• Slot Filling
Dialogue Management (DM)
• Dialogue State Tracking (DST)
• Dialogue Policy
Natural Language
Generation (NLG)
Hypothesis
are there any action movies to
see this weekend
Semantic Frame
request_movie
genre=action, date=this weekend
System Action/Policy
request_location
Text response
Where are you located?
Text Input
Are there any action movies to see this weekend?
Speech Signal
Backend Action /
Knowledge Providers
http://rsta.royalsocietypublishing.org/content/358/1769/1389.short
99. LU – Domain/Intent Classification
As an utterance
classification
task
• Given a collection of utterances ui with labels ci,
D= {(u1,c1),…,(un,cn)} where ci ∊ C, train a
model to estimate labels for new utterances uk.
99
find me a cheap taiwanese restaurant in oakland
Movies
Restaurants
Music
Sports
…
find_movie, buy_tickets
find_restaurant, find_price, book_table
find_lyrics, find_singer
…
Domain Intent
101. 101
Theory: Support Vector Machine
SVM is a maximum margin classifier
Input data points are mapped into a high dimensional
feature space where the data is linearly separable
Support vectors are input data points that lie on the
margin
101
http://www.csie.ntu.edu.tw/~htlin/mooc/
102. 102
z
z
Theory: Support Vector Machine
Multiclass SVM
Extended using one-versus-rest approach
Then transform into probability
http://www.csie.ntu.edu.tw/~htlin/mooc/
SVM1 SVM2 SVM3 SVMk
S1 S2 S3 Sk
score for
each class … …
P1 P2 P3 Pk
prob for
each class
… …
Domain/intent can be decided based on the estimated scores
103. LU – Slot Filling
103
flights from Boston to New York today
O O B-city O B-city I-city O
O O B-dept O B-arrival I-arrival B-date
As a sequence
tagging task
• Given a collection tagged word sequences,
S={((w1,1,w1,2,…, w1,n1), (t1,1,t1,2,…,t1,n1)),
((w2,1,w2,2,…,w2,n2), (t2,1,t2,2,…,t2,n2)) …}
where ti ∊ M, the goal is to estimate tags for a new word
sequence.
flights from Boston to New York today
Entity Tag
Slot Tag
105. 105
Theory: Conditional Random Fields
CRF assumes that the label at time step t depends on
the label in the previous time step t-1
Maximize the log probability log p(y | x) with respect
to parameters λ
105
input
output
Slots can be tagged based on the y that maximizes p(y|x)
107. 107
Recurrent Neural Network (RNN)
http://www.wildml.com/2015/09/recurrent-neural-networks-tutorial-part-1-introduction-to-rnns/
: tanh, ReLU
time
RNN can learn accumulated sequential information (time-series)
108. 108
Deep Learning Approach
108
Data
Model
Prediction
dialogue utterances annotated with
semantic frames (user intents & slots)
user intents, slots and their values
deep learning model (classification/tagging)
e.g. recurrent neural networks (RNN)
109. 109
Classification Model
Input: each utterance ui is represented as a feature vector fi
Output: a domain/intent label ci for each input utterance
109
As an utterance
classification
task
• Given a collection of utterances ui with labels ci,
D= {(u1,c1),…,(un,cn)} where ci ∊ C, train a
model to estimate labels for new utterances uk.
How to represent a sentence using a feature vector
110. Sequence Tagging Model
110
As a sequence
tagging task
• Given a collection tagged word sequences,
S={((w1,1,w1,2,…, w1,n1), (t1,1,t1,2,…,t1,n1)),
((w2,1,w2,2,…,w2,n2), (t2,1,t2,2,…,t2,n2)) …}
where ti ∊ M, the goal is to estimate tags for a new word
sequence.
Input: each word wi,j is represented as a feature vector fi,j
Output: a slot label ti for each word in the utterance
How to represent a word using a feature vector
111. 111
Word Representation
Atomic symbols: one-hot representation
111
[0 0 0 0 0 0 1 0 0 … 0]
[0 0 0 0 0 0 1 0 0 … 0] [0 0 1 0 0 0 0 0 0 … 0]AND = 0
Issues: difficult to compute the similarity
(i.e. comparing “car” and “motorcycle”)
car
car
car motorcycle
113. 113
Chinese Input Unit of Representation
Character
Feed each char to each time step
Word
Word segmentation required
你知道美女與野獸電影的評價如何嗎?
你/知道/美女與野獸/電影/的/評價/如何/嗎
Can two types of information fuse together for better performance?
114. LU – Domain/Intent Classification
As an utterance
classification
task
• Given a collection of utterances ui with labels ci,
D= {(u1,c1),…,(un,cn)} where ci ∊ C, train a
model to estimate labels for new utterances uk.
114
find me a cheap taiwanese restaurant in oakland
Movies
Restaurants
Music
Sports
…
find_movie, buy_tickets
find_restaurant, find_price, book_table
find_lyrics, find_singer
…
Domain Intent
115. 115
Deep Neural Networks for Domain/Intent
Classification (Ravuri and Stolcke, 2015)
RNN and LSTMs for
utterance classification
Word hashing to deal with
large number of singletons
Kat: #Ka, Kat, at#
Each character n-gram is
associated with a bit in the
input encoding
115
https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/RNNLM_addressee.pdf
116. LU – Slot Filling
116
flights from Boston to New York today
O O B-city O B-city I-city O
O O B-dept O B-arrival I-arrival B-date
As a sequence
tagging task
• Given a collection tagged word sequences,
S={((w1,1,w1,2,…, w1,n1), (t1,1,t1,2,…,t1,n1)),
((w2,1,w2,2,…,w2,n2), (t2,1,t2,2,…,t2,n2)) …}
where ti ∊ M, the goal is to estimate tags for a new word
sequence.
flights from Boston to New York today
Entity Tag
Slot Tag
118. 118
Recurrent Neural Nets for Slot Tagging – II
(Kurata et al., 2016; Simonnet et al., 2015)
Encoder-decoder networks
Leverages sentence level
information
Attention-based encoder-
decoder
Use of attention (as in MT)
in the encoder-decoder
network
Attention is estimated using
a feed-forward network with
input: ht and st at time t
𝑦0 𝑦1 𝑦2 𝑦𝑛
𝑤 𝑛 𝑤2 𝑤1 𝑤0
ℎ 𝑛 ℎ2 ℎ1 ℎ0
𝑤0 𝑤1 𝑤2 𝑤 𝑛
𝑦0 𝑦1 𝑦2 𝑦𝑛
𝑤0 𝑤1 𝑤2 𝑤 𝑛
ℎ0 ℎ1 ℎ2 ℎ 𝑛
𝑠0 𝑠1 𝑠2 𝑠 𝑛
ci
ℎ0 ℎ 𝑛…
http://www.aclweb.org/anthology/D16-1223
119. ht-
1
ht+
1
ht
W W W W
taiwanese
B-type
U
food
U
please
U
V
O
V
O
V
hT+1
EOS
U
FIND_RES
T
V
Slot Filling Intent
Prediction
Joint Semantic Frame Parsing
Sequence-
based
(Hakkani-Tur
et al., 2016)
• Slot filling and
intent prediction
in the same
output sequence
Parallel
(Liu and
Lane, 2016)
• Intent prediction
and slot filling
are performed
in two branches
119 https://www.microsoft.com/en-us/research/wp-content/uploads/2016/06/IS16_MultiJoint.pdf; https://arxiv.org/abs/1609.01454
120. 120
LU Evaluation
Metrics
Sub-sentence-level: intent accuracy, slot F1
Sentence-level: whole frame accuracy
120
121. 121
Speech
Recognition
Language Understanding (LU)
• Domain Identification
• User Intent Detection
• Slot Filling
Dialogue Management (DM)
• Dialogue State Tracking (DST)
• Dialogue Policy
Natural Language
Generation (NLG)
Hypothesis
are there any action movies to
see this weekend
Semantic Frame
request_movie
genre=action, date=this weekend
System Action/Policy
request_location
Text response
Where are you located?
Text Input
Are there any action movies to see this weekend?
Speech Signal
Backend Database/
Knowledge Providers
Concluding Remarks
121
122. 122
Language Understanding Demo
Movie Bot
http://140.112.21.82:8000/
Example
晚上九點的當他們認真編織時
我想要看晚上七點的電影
我想要看晚上七點在新竹大遠百威秀影城的電影
我要看新竹大遠百威秀影城的刀劍神域劇場版序列爭戰
我想要看晚上七點在新竹大遠百威秀影城的刀劍神域劇
場版序列爭戰
122
125. 125
Task-Oriented Dialogue System (Young, 2000)
125
Speech
Recognition
Language Understanding (LU)
• Domain Identification
• User Intent Detection
• Slot Filling
Dialogue Management (DM)
• Dialogue State Tracking (DST)
• Dialogue Policy
Natural Language
Generation (NLG)
Hypothesis
are there any action movies to
see this weekend
Semantic Frame
request_movie
genre=action, date=this weekend
System Action/Policy
request_location
Text response
Where are you located?
Text Input
Are there any action movies to see this weekend?
Speech Signal
Backend Database/
Knowledge Providers
http://rsta.royalsocietypublishing.org/content/358/1769/1389.short
126. 126
Task-Oriented Dialogue System (Young, 2000)
126
Speech
Recognition
Language Understanding (LU)
• Domain Identification
• User Intent Detection
• Slot Filling
Natural Language
Generation (NLG)
Hypothesis
are there any action movies to
see this weekend
Semantic Frame
request_movie
genre=action, date=this weekend
System Action/Policy
request_location
Text response
Where are you located?
Text Input
Are there any action movies to see this weekend?
Speech Signal
Backend Action /
Knowledge Providers
http://rsta.royalsocietypublishing.org/content/358/1769/1389.short
Dialogue Management (DM)
• Dialogue State Tracking (DST)
• Dialogue Policy
130. 130
Dialogue State Tracking (DST)
Maintain a probabilistic distribution instead of a 1-best
prediction for better robustness to recognition errors
130
Incorrect
for both!
131. 131
Dialogue State Tracking (DST)
Maintain a probabilistic distribution instead of a 1-best
prediction for better robustness to SLU errors or
ambiguous input
131
How can I help you?
Book a table at Sumiko for 5
How many people?
3
Slot Value
# people 5 (0.5)
time 5 (0.5)
Slot Value
# people 3 (0.8)
time 5 (0.8)
135. 135
Dialogue State Tracking (DST)
Definition
Representation of the system's belief of the user's goal(s)
at any time during the dialogue
Challenge
How to define the state space?
How to tractably maintain the dialogue state?
Which actions to take for each state?
135
Define dialogue as a control problem where the behavior can be
automatically learned
136. 136
Reinforcement Learning
RL is a general purpose framework for decision making
RL is for an agent with the capacity to act
Each action influences the agent’s future state
Success is measured by a scalar reward signal
Goal: select actions to maximize future reward
Big three: action, state, reward
137. 137
Agent and Environment
At time step t
The agent
Executes action at
Receives observation ot
Receives scalar reward rt
The environment
Receives action at
Emits observation ot+1
Emits scalar reward rt+1
t increments at env. step
observation
ot
action
at
reward
rt
138. 138
State
Experience is the sequence of observations, actions, rewards
State is the information used to determine what happens next
what happens depends on the history experience
• The agent selects actions
• The environment selects observations/rewards
The state is the function of the history experience
139. 139
observation
ot
action
at
reward
rt
Environment State
The environment state 𝑠𝑡
𝑒
is the
environment’s private
representation
whether data the environment
uses to pick the next
observation/reward
may not be visible to the agent
may contain irrelevant information
140. 140
observation
ot
action
at
reward
rt
Agent State
The agent state 𝑠𝑡
𝑎
is the agent’s
internal representation
whether data the agent uses to
pick the next action
information used by RL algorithms
can be any function of experience
141. 141
Reward
Reinforcement learning is based on reward hypothesis
A reward rt is a scalar feedback signal
Indicates how well agent is doing at step t
Reward hypothesis: all agent goals can be desired by maximizing
expected cumulative reward
143. 143
Dialogue State Tracking (DST)
Requirement
Dialogue history
Keep tracking of what happened so far in the dialogue
Normally done via Markov property
Task-oriented dialogue
Need to know what the user wants
Modeled via the user goal
Robustness to errors
Need to know what the user says
Modeled via the user action
143
144. 144
DST Problem Formulation
The DST dataset consists of
Goal: for each informable slot
e.g. price=cheap
Requested: slots by the user
e.g. moviename
Method: search method for entities
e.g. by constraints, by name
The dialogue state is
the distribution over possible slot-value pairs for goals
the distribution over possible requested slots
the distribution over possible methods
144
149. 149
RNN DST
Idea: internal memory for
representing dialogue context
Input
most recent dialogue turn
last machine dialogue act
dialogue state
memory layer
Output
update its internal memory
distribution over slot values
149
150. 150
RNN-CNN DST
150(Figure from Wen et al, 2016)
http://www.anthology.aclweb.org/W/W13/W13-4073.pdf; https://arxiv.org/abs/1506.07190
151. 151
Multichannel Tracker (Shi et al., 2016)
151
Training a multichannel CNN
for each slot
Chinese character CNN
Chinese word CNN
English word CNN
https://arxiv.org/abs/1701.06247
152. 152
DST Evaluation
Metric
Tracked state accuracy with respect to user goal
L2-norm of the hypothesized dist. and the true label
Recall/Precision/F-measure individual slots
152
153. 153
Dialog State Tracking Challenge (DSTC)
(Williams et al. 2013, Henderson et al. 2014, Henderson et al. 2014, Kim et al. 2016, Kim et al. 2016)
Challenge Type Domain Data Provider Main Theme
DSTC1
Human-
Machine
Bus Route CMU Evaluation Metrics
DSTC2
Human-
Machine
Restaurant U. Cambridge User Goal Changes
DSTC3
Human-
Machine
Tourist Information U. Cambridge Domain Adaptation
DSTC4
Human-
Human
Tourist Information I2R Human Conversation
DSTC5
Human-
Human
Tourist Information I2R Language Adaptation
155. 155
DSTC4-5
Type: Human-Human
Domain: Tourist Information
155
Tourist: Can you give me some uh- tell me some cheap rate hotels,
because I'm planning just to leave my bags there and go
somewhere take some pictures.
Guide: Okay. I'm going to recommend firstly you want to have a
backpack type of hotel, right?
Tourist: Yes. I'm just gonna bring my backpack and my buddy with me. So
I'm kinda looking for a hotel that is not that expensive. Just
gonna leave our things there and, you know, stay out the whole
day.
Guide: Okay. Let me get you hm hm. So you don't mind if it's a bit uh
not so roomy like hotel because you just back to sleep.
Tourist: Yes. Yes. As we just gonna put our things there and then go out
to take some pictures.
Guide: Okay, um-
Tourist: Hm.
Guide: Let's try this one, okay?
Tourist: Okay.
Guide: It's InnCrowd Backpackers Hostel in Singapore. If you
take a dorm bed per person only twenty dollars. If you
take a room, it's two single beds at fifty nine dollars.
Tourist: Um. Wow, that's good.
Guide: Yah, the prices are based on per person per bed or
dorm. But this one is room. So it should be fifty nine for
the two room. So you're actually paying about ten
dollars more per person only.
Tourist: Oh okay. That's- the price is reasonable actually. It's
good.
{Topic: Accommodation; Type: Hostel; Pricerange:
Cheap; GuideAct: ACK; TouristAct: REQ}
{Topic: Accommodation; NAME: InnCrowd Backpackers
Hostel; GuideAct: REC; TouristAct: ACK}
156. 156
Concluding Remarks
Dialogue state tracking (DST) of DM has Markov assumption to model
the user goal and be robust to errors
156
158. 158
Task-Oriented Dialogue System (Young, 2000)
158
Speech
Recognition
Language Understanding (LU)
• Domain Identification
• User Intent Detection
• Slot Filling
Dialogue Management (DM)
• Dialogue State Tracking (DST)
• Dialogue Policy
Natural Language
Generation (NLG)
Hypothesis
are there any action movies to
see this weekend
Semantic Frame
request_movie
genre=action, date=this weekend
System Action/Policy
request_location
Text response
Where are you located?
Text Input
Are there any action movies to see this weekend?
Speech Signal
Backend Database/
Knowledge Providers
http://rsta.royalsocietypublishing.org/content/358/1769/1389.short
159. 159
Task-Oriented Dialogue System (Young, 2000)
159
Speech
Recognition
Language Understanding (LU)
• Domain Identification
• User Intent Detection
• Slot Filling
Natural Language
Generation (NLG)
Hypothesis
are there any action movies to
see this weekend
Semantic Frame
request_movie
genre=action, date=this weekend
System Action/Policy
request_location
Text response
Where are you located?
Text Input
Are there any action movies to see this weekend?
Speech Signal
Backend Action /
Knowledge Providers
http://rsta.royalsocietypublishing.org/content/358/1769/1389.short
Dialogue Management (DM)
• Dialogue State Tracking (DST)
• Dialogue Policy
164. 164
Reinforcement Learning
RL is a general purpose framework for decision making
RL is for an agent with the capacity to act
Each action influences the agent’s future state
Success is measured by a scalar reward signal
Goal: select actions to maximize future reward
Big three: action, state, reward
165. 165
State
Experience is the sequence of observations, actions, rewards
State is the information used to determine what happens next
what happens depends on the history experience
• The agent selects actions
• The environment selects observations/rewards
The state is the function of the history experience
167. 167
Dialogue Policy Optimization
Dialogue management in a RL framework
167
U s e r
Reward R Observation OAction A
Environment
Agent
Natural Language Generation Language Understanding
Dialogue Manager
The optimized dialogue policy selects the best action that maximizes the future reward.
Correct rewards are a crucial factor in dialogue policy training
168. 168
Policy Optimization Issue
Optimization problem size
Belief dialogue state space is large and continuous
System action space is large
Knowledge environment (user)
Transition probability is unknown (user status)
How to get rewards
168
169. 169
Reinforcement Learning for Dialogue
Policy Optimization
169
Language
understanding
Language
(response)
generation
Dialogue
Policy
𝑎 = 𝜋(𝑠)
Collect rewards
(𝑠, 𝑎, 𝑟, 𝑠’)
Optimize
𝑄(𝑠, 𝑎)
User input (o)
Response
𝑠
𝑎
Type of Bots State Action Reward
Social ChatBots Chat history System Response
# of turns maximized;
Intrinsically motivated reward
InfoBots (interactive Q/A)
User current
question + Context
Answers to current
question
Relevance of answer;
# of turns minimized
Task-Completion Bots
User current input +
Context
System dialogue act w/
slot value (or API calls)
Task success rate;
# of turns minimized
Goal: develop a generic deep RL algorithm to learn dialogue policy for all bot categories
170. 170
Large Belief Space and Action Space
Solution: perform optimization in a reduced
summary space built according to the heuristics
170
Belief Dialogue
State
System Action
Summary
Dialogue State
Summary Action
Summary Policy
Summary
Function
Master
Function
171. 171
Transition Probability and Rewards
Solution: learn from real users
Speech
Recognition
Language Understanding (LU)
• Domain Identification
• User Intent Detection
• Slot Filling
Dialogue Management (DM)
• Dialogue State Tracking (DST)
• Dialogue Policy
Natural Language
Generation (NLG)
Backend Database/
Knowledge Providers
172. 172
Transition Probability and Rewards
Solution: learn from a simulated user
172
Error Model
• Recognition error
• LU error
Dialogue State
Tracking (DST)
System dialogue acts
Reward
Backend Action /
Knowledge Providers
Dialogue Policy
Optimization
Dialogue Management (DM)
User Model
Reward Model
User Simulation Distribution over
user dialogue acts
(semantic frames)
173. 173
Reward for RL ≅ Evaluation for System
Dialogue is a special RL task
Human involves in interaction and rating (evaluation) of a
dialogue
Fully human-in-the-loop framework
Rating: correctness, appropriateness, and adequacy
- Expert rating high quality, high cost
- User rating unreliable quality, medium cost
- Objective rating Check desired aspects, low cost
173
174. 174
Dialogue Reinforcement Learning Signal
Typical reward function
-1 for per turn penalty
Large reward at completion if successful
Typically requires domain knowledge
✔ Simulated user
✔ Paid users (Amazon Mechanical Turk)
✖ Real users
|||
…
﹅
174
The user simulator is usually required for
dialogue system training before deployment
175. 175
RL-Based Dialogue Management (Li et al., 2017)
Deep RL for training dialogue policy
Input: current semantic frame observation, database
returned results
Output: system action
175
Semantic Frame
request_movie
genre=action, date=this weekend
System Action/Policy
request_location
Dialogue
Management
(DM)Simulated/paid/real
User
Backend DB
https://arxiv.org/abs/1703.01008
176. 176
Major Components in an RL Agent
An RL agent may include one or more of these
components
Policy: agent’s behavior function
Value function: how good is each state and/or action
Model: agent’s representation of the environment
176
177. 177
Policy
A policy is the agent’s behavior
A policy maps from state to action
Deterministic policy:
Stochastic policy:
177
178. 178
Value Function
A value function is a prediction of future reward (with action
a in state s)
Q-value function gives expected total reward
from state and action
under policy
with discount factor
Value functions decompose into a Bellman equation
178
179. 179
Optimal Value Function
An optimal value function
is the maximum achievable value
allows us to act optimally
informally maximizes over all decisions
decompose into a Bellman equation
179
180. 180
Reinforcement Learning Approach
Policy-based RL
Search directly for optimal policy
Value-based RL
Estimate the optimal value function
Model-based RL
Build a model of the environment
Plan (e.g. by lookahead) using model
180
is the policy achieving maximum future reward
is maximum value achievable under any policy
182. 182
Maze Example: Policy
Rewards: -1 per time-
step
Actions: N, E, S, W
States: agent’s location
182
Arrows represent policy π(s) for each state s
183. 183
Maze Example: Value Function
Rewards: -1 per time-
step
Actions: N, E, S, W
States: agent’s location
183
Numbers represent value Qπ(s) of each state s
184. Estimate How Good Each State and/or Action is
Value-Based Deep RL184
185. 185
Value Function Approximation
Value functions are represented by a lookup table
too many states and/or actions to store
too slow to learn the value of each entry individually
Values can be estimated with function approximation
185
186. 186
Q-Networks
Q-networks represent value functions with weights
generalize from seen states to unseen states
update parameter for function approximation
186
187. 187
Q-Learning
Goal: estimate optimal Q-values
Optimal Q-values obey a Bellman equation
Value iteration algorithms solve the Bellman equation
187
learning target
188. 188
Deep Q-Networks (DQN)
Represent value function by deep Q-network with weights
Objective is to minimize mean square error (MSE) loss by SGD
188
Issue: naïve Q-learning oscillates or diverges using NN due to:
1) correlations between samples 2) non-stationary targets
learning target
189. 189
Deep RL in Atari Games
189
Different tricks are performed to enable RL training
190. 190
DQN in Atari
Goal: end-to-end learning of values Q(s, a) from pixels
Input: state is stack of raw pixels from last 4 frames
Output: Q(s, a) for all joystick/button positions a
Reward is the score change for that step
190DQN Nature Paper [link] [code]
192. DQN in Movie Bot
Goal: interact with the user for movie ticket booking
192
User goal: Two tickets for “the witch” tomorrow 9:30 PM at regal meridian 16, Seattle.
RULE BASED SYSTEM
User: Can I get 2 tickets for the witch?
Agent: Which theater do you want?
User: Regal meridian 16
Agent: What time would you like to see it?
User: 9:30 pm
Agent: What date would you like to see it?
User: Tomorrow!
Agent: How many people are you?
User: Just two adults.
Agent: What is your location?
User: Seattle!
Agent: What kind of video format do you like?
User: Oh, any kind is okay.
Agent: Do you have any price constraint?
User: No, I don’t care.
Agent: Great - I was able to purchase 2 tickets for
you to see the witch tomorrow at regal meridian 16
theater in Seattle at 9:30 pm.
User: Thank you.
Agent: Thank you!
REINFORCEMENT LEARNING SYSTEM
User: Can I get 2 tickets for the witch?
Agent: What date would you like to see it?
User: Tomorrow
Agent: Which theater do you prefer?
User: Regal meridian 16
Agent: What is your location?
User: Seattle
Agent: What time would you like to see it?
User: 9:30 pm.
Agent: Great - I was able to purchase 2 tickets
for you to see the witch tomorrow at regal
meridian 16 theater in Seattle at 9:30 pm.
User: Thanks.
Agent: Thanks!
https://arxiv.org/abs/1703.01008
194. 194
Deep Policy Networks
Represent policy by deep network with weights
Objective is to maximize total discounted reward by SGD
194
stochastic policy deterministic policy
195. 195
Policy Gradient
The gradient of a stochastic policy is given by
The gradient of a deterministic policy is given by
195
How to deal with continuous actions
196. 196
Actor-Critic (Value-Based + Policy-Based)
Estimate value function
Update policy parameters by SGD
Stochastic policy
Deterministic policy
196
Q-networks tell whether a policy is good or not
Policy networks optimize the policy accordingly
197. 197
Deterministic Deep Policy Gradient
Goal: end-to-end learning of control policy from pixels
Input: state is stack of raw pixels from last 4 frames
Output: two separate CNNs for Q and 𝜋
197Lillicrap et al., “Continuous control with deep reinforcement learning,” arXiv, 2015.
198. 198
Deep RL AI Examples
Play games: Atari, poker, Go, …
Control physical systems: manipulate, …
Explore worlds: 3D worlds, …
Interact with users: recommend, optimize, personalize, …
198
199. 199
Concluding Remarks
Dialogue policy optimization can be viewed as an RL task
Transition probability and reward come from
Real user
Simulated user
201. 201
Task-Oriented Dialogue System (Young, 2000)
201
Speech
Recognition
Language Understanding (LU)
• Domain Identification
• User Intent Detection
• Slot Filling
Dialogue Management (DM)
• Dialogue State Tracking (DST)
• Dialogue Policy
Natural Language
Generation (NLG)
Hypothesis
are there any action movies to
see this weekend
Semantic Frame
request_movie
genre=action, date=this weekend
System Action/Policy
request_location
Text response
Where are you located?
Text Input
Are there any action movies to see this weekend?
Speech Signal
Backend Database/
Knowledge Providers
http://rsta.royalsocietypublishing.org/content/358/1769/1389.short
202. 202
Task-Oriented Dialogue System (Young, 2000)
202
Speech
Recognition
Language Understanding (LU)
• Domain Identification
• User Intent Detection
• Slot Filling
Natural Language
Generation (NLG)
Hypothesis
are there any action movies to
see this weekend
Semantic Frame
request_movie
genre=action, date=this weekend
System Action/Policy
request_location
Text response
Where are you located?
Text Input
Are there any action movies to see this weekend?
Speech Signal
Backend Action /
Knowledge Providers
http://rsta.royalsocietypublishing.org/content/358/1769/1389.short
Dialogue Management (DM)
• Dialogue State Tracking (DST)
• Dialogue Policy
203. 203
User Simulation
Goal: generate natural and reasonable conversations to
enable reinforcement learning for exploring the policy space
Approach
Rule-based crafted by experts (Li et al., 2016)
Learning-based (Schatzmann et al., 2006; El Asri et al., 2016)
Dialogue
Corpus
Simulated User
Real User
Dialogue Management (DM)
• Dialogue State Tracking (DST)
• Dialogue Policy
Interaction
keeps a list of its
goals and actions
randomly
generates an
agenda
updates its list of goals
and adds new ones
204. 204
Elements of User Simulation
Error Model
• Recognition error
• LU error
Dialogue State
Tracking (DST)
System dialogue acts
Backend Action /
Knowledge Providers
Dialogue Policy
Optimization
Dialogue Management (DM)
User Model
Reward Model
User Simulation Distribution over
user dialogue acts
(semantic frames)
Reward
205. 205
Elements of User Simulation
Error Model
• Recognition error
• LU error
Dialogue State
Tracking (DST)
System dialogue acts
Reward
Backend Action /
Knowledge Providers
Dialogue Policy
Optimization
Dialogue Management (DM)
User Model
Reward Model
User Simulation Distribution over
user dialogue acts
(semantic frames)
The user model defines the natural interactive
flow to convey the user’s goal
206. 206
User Goal
Sampled from a real corpus / hand-crafted list
Information access
Inflexible: the user has specific entries for search
Task completion
Inflexible: the user has specific rows for search
Flexible: some slot values are unknown or can be suggested
206
request_ticket, inform_slots: {moviename=“A”, …}
request_slot: rating, inform_slots: {moviename=“A”, …}
request_ticket, request_slots: {time, theater},
inform_slots: {moviename=“A”, …}
207. 207
Supported Functionality of Ontology
Information access
Finding the specific entries from the table
Task completion
Finding the row that satisfies the constraints
207
Movie Name Theater Rating Date Time
美女與野獸 台北信義威秀 8.5 2017/03/21 09:00
美女與野獸 台北信義威秀 8.5 2017/03/21 09:25
美女與野獸 台北信義威秀 8.5 2017/03/21 10:15
美女與野獸 台北信義威秀 8.5 2017/03/21 10:40
美女與野獸 台北信義威秀 8.5 2017/03/21 11:05
208. 208
User Goal – Information Access
Information access with inflexible constraints
Finding the specific entries from the table
Intent examples in the ticket DB: request_moviename,
request_theater, request_time, etc
Intent examples in the knowledge DB:
request_moviename, request_year
Specifying the constraints for the search target
The specified slots should align with the corresponding DB
208
Movie Name Theater Date Time
美女與野獸 台北信義威秀 2017/03/21 09:00
美女與野獸 台北信義威秀 2017/03/21 09:25
美女與野獸 台北信義威秀 2017/03/21 10:15
Movie Name Cast Year
美女與野獸 Emma Watson 2017
鋼鐵人 : :
209. 209
User Goal – Task-Completion
Task completion with inflexible constraints
Finding the specific rows from the table
Intent examples: request_ticket (ticket DB)
Specifying the slot constraints for the search target
Informable slot: theater, date, etc
Requestable slot: time, number_of_people
209
Movie Name Theater Date Time
美女與野獸 台北信義威秀 2017/03/21 09:00
美女與野獸 台北信義威秀 2017/03/21 09:25
美女與野獸 台北信義威秀 2017/03/21 10:15
210. 210
User Goal – Task-Completion
Task completion with flexible constraints
No priority: {台北信義威秀, 台北日新威秀}
Prioritized: {台北信義威秀>台北日新威秀}
210
Movie Name Theater Date Time
美女與野獸 台北信義威秀 2017/03/21 09:00
美女與野獸 台北信義威秀 2017/03/21 09:25
美女與野獸 台北信義威秀 2017/03/21 10:15
211. 211
Satisfiability and Uniqueness
Use goal characteristics
Satisfiability – the predefined goal can be satisfied by
the backend database
Uniqueness – all satisfied entries should be in the goal
211
Movie Name Actor Year
鋼鐵人 Robert John Downey 2008
福爾摩斯 Robert John Downey 2008
request_actor(moviename=鋼鐵人, year=2007)
request_moviename(actor=Robert John Downey, year=2008)
212. 212
More Extension
Slot value hierarchy
“morning” {9:15, 9:45, 10:15, 10:45, 11:15, 11:45}
Inexact match / partial match
“ten o’clock” {9:45, 10:15}
“emma” {Emma Watson, Emma Stone}
212
213. 213
User Agenda Model
User agenda defines how a user interacts with the
system in the dialogue level
Random order: random sample a slot for informing or
request
Fixed order: define a list of ordering slots for
informing or request
213
214. 214
Elements of User Simulation
Error Model
• Recognition error
• LU error
Dialogue State
Tracking (DST)
System dialogue acts
Reward
Backend Action /
Knowledge Providers
Dialogue Policy
Optimization
Dialogue Management (DM)
User Model
Reward Model
User Simulation Distribution over
user dialogue acts
(semantic frames)
The error model enables the system to maintain
the robustness during training
215. 215
Frame-Level Interaction
DM receives frame-level information
No error model: perfect recognizer and LU
Error model: simulate the possible errors
215
Error Model
• Recognition error
• LU error
Dialogue State
Tracking (DST)
system dialogue acts
Dialogue Policy
Optimization
Dialogue Management (DM)
User Model
User Simulation
user dialogue acts
(semantic frames)
216. 216
Natural Language Level Interaction
User simulator sends natural language
No recognition error
Errors from NLG or LU
216
Natural Language
Generation (NLG)
Dialogue State
Tracking (DST)
system dialogue acts
Dialogue Policy
Optimization
Dialogue Management (DM)
User Model
User Simulation
NL
Language
Understanding
(LU)
217. 217
Error Model
Simple
Randomly generate errors by
Replacing with an incorrect intent
Deleting an informable slot
Replacing with an incorrect slot (value is original)
Replacing with an incorrect slot value
Learning
Simulate the errors based on the ASR or LU confusion
217
request_moviename
(actor=Robert Downey Jr)
request_year
director Robert Downey Sr
218. 218
Elements of User Simulation
Error Model
• Recognition error
• LU error
Dialogue State
Tracking (DST)
System dialogue acts
Reward
Backend Action /
Knowledge Providers
Dialogue Policy
Optimization
Dialogue Management (DM)
User Model
Reward Model
User Simulation Distribution over
user dialogue acts
(semantic frames)
The reward model checks whether the returned
results satisfy the goal in order to decide the reward
219. 219
Reward Model
Success measurement
Check whether the returned results satisfy the
predefined user goal
Success – large positive reward
Fail – large negative reward
Minimize the number of turns
Penalty for additional turns
219
220. 220
Concluding Remarks
Error Model
• Recognition error
• LU error
Dialogue State
Tracking (DST)
System dialogue acts
Reward
Backend Action /
Knowledge Providers
Dialogue Policy
Optimization
Dialogue Management (DM)
User Model
Reward Model
User Simulation Distribution over
user dialogue acts
(semantic frames)
223. 223
Task-Oriented Dialogue System (Young, 2000)
223
Speech
Recognition
Language Understanding (LU)
• Domain Identification
• User Intent Detection
• Slot Filling
Dialogue Management (DM)
• Dialogue State Tracking (DST)
• Dialogue Policy
Natural Language
Generation (NLG)
Hypothesis
are there any action movies to
see this weekend
Semantic Frame
request_movie
genre=action, date=this weekend
System Action/Policy
request_location
Text response
Where are you located?
Text Input
Are there any action movies to see this weekend?
Speech Signal
Backend Database/
Knowledge Providers
http://rsta.royalsocietypublishing.org/content/358/1769/1389.short
224. 224
Task-Oriented Dialogue System (Young, 2000)
224
Speech
Recognition
Language Understanding (LU)
• Domain Identification
• User Intent Detection
• Slot Filling
Dialogue Management (DM)
• Dialogue State Tracking (DST)
• Dialogue Policy
Natural Language
Generation (NLG)
Hypothesis
are there any action movies to
see this weekend
Semantic Frame
request_movie
genre=action, date=this weekend
System Action/Policy
request_location
Text response
Where are you located?
Text Input
Are there any action movies to see this weekend?
Speech Signal
Backend Action /
Knowledge Providers
225. 225
Natural Language Generation (NLG)
Mapping dialogue acts into natural language
inform(name=Seven_Days, foodtype=Chinese)
Seven Days is a nice Chinese restaurant
225
226. 226
Template-Based NLG
Define a set of rules to map frames to NL
226
Pros: simple, error-free, easy to control
Cons: time-consuming, rigid, poor scalability
Semantic Frame Natural Language
confirm() “Please tell me more about the product your are
looking for.”
confirm(area=$V) “Do you want somewhere in the $V?”
confirm(food=$V) “Do you want a $V restaurant?”
confirm(food=$V,area=$W) “Do you want a $V restaurant in the $W.”
227. 227
Class-Based LM NLG (Oh and Rudnicky, 2000)
Class-based language modeling
NLG by decoding
227
Pros: easy to implement/
understand, simple rules
Cons: computationally inefficient
Classes:
inform_area
inform_address
…
request_area
request_postcode
http://dl.acm.org/citation.cfm?id=1117568
228. 228
Phrase-Based NLG (Mairesse et al, 2010)
Semantic
DBN
Phrase
DBN
Charlie Chan is a Chinese Restaurant near Cineworld in the centre
d d
Inform(name=Charlie Chan, food=Chinese, type= restaurant, near=Cineworld, area=centre)
228
Pros: efficient, good performance
Cons: require semantic alignments
realization phrase semantic stack
http://dl.acm.org/citation.cfm?id=1858838
230. 230
Handling Semantic Repetition
Issue: semantic repetition
Din Tai Fung is a great Taiwanese restaurant that serves Taiwanese.
Din Tai Fung is a child friendly restaurant, and also allows kids.
Deficiency in either model or decoding (or both)
Mitigation
Post-processing rules (Oh & Rudnicky, 2000)
Gating mechanism (Wen et al., 2015)
Attention (Mei et al., 2016; Wen et al., 2015)
230
231. 231
Original LSTM cell
Dialogue act (DA) cell
Modify Ct
Semantic Conditioned LSTM (Wen et al., 2015)
DA cell
LSTM cell
Ct
it
ft
ot
rt
ht
dtdt-1
xt
xt ht-1
xt ht-1 xt ht-1 xt ht-
1
ht-1
Inform(name=Seven_Days, food=Chinese)
0, 0, 1, 0, 0, …, 1, 0, 0, …, 1, 0, 0, …
dialog act 1-hot
representation
d0
231
Idea: using gate mechanism to control the
generated semantics (dialogue act/slots)
http://www.aclweb.org/anthology/D/D15/D15-1199.pdf
232. 232
Structural NLG (Dušek and Jurčíček, 2016)
Goal: NLG based on the syntax tree
Encode trees as sequences
Seq2Seq model for generation
232
https://www.aclweb.org/anthology/P/P16/P16-2.pdf#page=79
233. 233
Contextual NLG (Dušek and Jurčíček, 2016)
Goal: adapting users’
way of speaking,
providing context-
aware responses
Context encoder
Seq2Seq model
233
https://www.aclweb.org/anthology/W/W16/W16-36.pdf#page=203
235. 235
Many-to-Many
Both input and output are both sequences → Sequence-to-
sequence learning
E.g. Machine Translation (machine learning→機器學習)
235
learning
machine
機 習器 學
Add an end symbol “===“
[Ilya Sutskever, NIPS’14][Dzmitry Bahdanau, arXiv’15]
===
242. 242
NLG Evaluation
Metrics
Subjective: human judgement (Stent et al., 2005)
Adequacy: correct meaning
Fluency: linguistic fluency
Readability: fluency in the dialogue context
Variation: multiple realizations for the same concept
Objective: automatic metrics
Word overlap: BLEU (Papineni et al, 2002), METEOR, ROUGE
Word embedding based: vector extrema, greedy matching,
embedding average
242
There is a gap between human perception and automatic metrics
243. 243
Dialogue Evaluation Measures
243
•Whether constrained values specified by users can be understood by
the system
•Agreement percentage of system/user understandings over the entire
dialog (averaging all turns)
Understanding Ability
•Number of dialogue turns for task completion
Efficiency
•An explicit confirmation for an uncertain user utterance is an
appropriate system action
•Providing information based on misunderstood user requirements
Action Appropriateness
244. 244
Outline
Dialogue System Evaluation
Recent Trends
End-to-End Learning for Dialogue System
Multimodality
Dialogue Breath
Dialogue Depth
Challenges
244
245. 245
E2E Joint NLU and DM (Yang et al., 2017)
Idea: errors from DM can be propagated to NLU
for better robustness
DM
245
Model DM NLU
Baseline (CRF+SVMs) 7.7 33.1
Pipeline-BLSTM 12.0 36.4
JointModel 22.8 37.4
Both DM and NLU
performance is improved
https://arxiv.org/abs/1612.00913
246. 246
0 0 0 … 0 1
Database Operator
Copy
field
…
Database
Sevendays
CurryPrince
Nirala
RoyalStandard
LittleSeuol
DB pointer
Can I have korean
Korean
0.7
British
0.2
French
0.1
…
Belief Tracker
Intent Network
Can I have <v.food>
E2E Supervised Dialogue System (Wen et al., 2016)
Generation Network
<v.name> serves great <v.food> .
Policy Network
246
zt
pt
xt
MySQL query:
“Select * where
food=Korean”
qt
https://arxiv.org/abs/1604.04562
247. 247
E2E MemNN for Dialogues (Bordes et al., 2016)
Split dialogue system
actions into subtasks
API issuing
API updating
Option displaying
Information informing
247
https://arxiv.org/abs/1605.07683
dialogue-levelturn-level
248. 248
E2E RL-Based Info-Bot (Dhingra et al., 2016)
Movie=?; Actor=Bill Murray; Release Year=1993
Find me the Bill Murray’s movie.
I think it came out in 1993.
When was it released?
Groundhog Day is a Bill Murray
movie which came out in 1993.
KB-InfoBot
User
(Groundhog Day, actor, Bill Murray)
(Groundhog Day, release year, 1993)
(Australia, actor, Nicole Kidman)
(Mad Max: Fury Road, release year, 2015)
Knowledge Base (head, relation, tail)
248Idea: differentiable database for propagating the gradients
https://arxiv.org/abs/1609.00777
249. 249
E2E RL-Based System (Zhao and Eskenazi, 2016)
249
Joint learning
NLU, DST, Dialogue Policy
Deep RL for training
Deep Q-network
Deep recurrent network
Baseline
RL
Hybrid-RL
http://www.aclweb.org/anthology/W/W16/W16-36.pdf#page=19
250. 250
E2E LSTM-Based Dialogue Control
(Williams and Zweig, 2016)
250
Idea: an LSTM maps from raw dialogue history directly to a
distribution over system actions
Developers can provide software including business rules &
programmatic APIs
LSTM can take actions in the real world on behalf of the user
The LSTM can be optimized using SL or RL
https://arxiv.org/abs/1606.01269
251. 251
E2E Task-Completion Bot (TC-Bot) (Li et al., 2017)
wi
B-
type
wi
+1
wi+2
O O
EOS
<intent
>
wi
B-
type
wi
+1
wi+2
O O
EOS
<intent
>
Semantic Frame
request_movie
genre=action,
date=this weekend
System Action /
Policy
request_location
User Dialogue Action
Inform(location=San Francisco)
Time t-1
wi
<slot>
wi
+1
wi+2
O O
EOS
<intent>
Language Understanding (LU)
Time t-2
Time t
Dialogue
Management
(DM)
w0 w1 w2
Natural Language Generation (NLG)
EOS
User
Goal
User Agenda Modeling
User Simulator
End-to-End Neural Dialogue System
Text Input
Are there any action movies
to see this weekend?
Idea: supervised learning for each component and reinforcement
learning for end-to-end training the neural dialogue system
251
https://arxiv.org/abs/1703.01008
252. 252
E2E Task-Completion Bot (TC-Bot) (Li et al., 2017)
User goal: Two tickets for “the witch” tomorrow 9:30 PM at regal meridian 16, Seattle.
RULE BASED SYSTEM
User: Can I get 2 tickets for the witch?
Agent: Which theater do you want?
User: Regal meridian 16
Agent: What time would you like to see it?
User: 9:30 pm
Agent: What date would you like to see it?
User: Tomorrow!
Agent: How many people are you?
User: Just two adults.
Agent: What is your location?
User: Seattle!
Agent: What kind of video format do you like?
User: Oh, any kind is okay.
Agent: Do you have any price constraint?
User: No, I don’t care.
Agent: Great - I was able to purchase 2 tickets for
you to see the witch tomorrow at regal meridian 16
theater in Seattle at 9:30 pm.
User: Thank you.
Agent: Thank you! 252
The system can learn how to efficiently
interact with users for task completion
REINFORCEMENT LEARNING SYSTEM
User: Can I get 2 tickets for the witch?
Agent: What date would you like to see it?
User: Tomorrow
Agent: Which theater do you prefer?
User: Regal meridian 16
Agent: What is your location?
User: Seattle
Agent: What time would you like to see it?
User: 9:30 pm.
Agent: Great - I was able to purchase 2 tickets for
you to see the witch tomorrow at regal meridian
16 theater in Seattle at 9:30 pm.
User: Thanks.
Agent: Thanks!
https://arxiv.org/abs/1703.01008
253. 253
Outline
Dialogue System Evaluation
Recent Trends
End-to-End Learning for Dialogue System
Multimodality
Dialogue Breath
Dialogue Depth
Challenges
253
254. 254
Brain Signal for Understanding
254
Misunderstanding detection by brain signal
Green: listen to the correct answer
Red: listen to the wrong answer
http://dl.acm.org/citation.cfm?id=2388695
Detecting misunderstanding via brain signal in order to correct the understanding results
255. 255
Video for Intent Understanding
255
Proactive (from camera)
I want to see a movie on TV!
Intent: turn_on_tv
Sir, may I turn on the TV for you?
Proactively understanding user intent to initiate the dialogues.
256. 256
App Behavior for Understanding
256
Task: user intent prediction
Challenge: language ambiguity
User preference
Some people prefer “Message” to “Email”
Some people prefer “Outlook” to “Gmail”
App-level contexts
“Message” is more likely to follow “Camera”
“Email” is more likely to follow “Excel”
send to vivian
v.s.
Email? Message?
Communication
Considering behavioral patterns in history to model understanding for intent prediction.
260. 260
Intent Expansion (Chen et al., 2016)
Transfer dialogue acts across domains
Dialogue acts are similar for multiple domains
Learning new intents by information from other domains
CDSSM
New Intent
Intent Representation
1
2
K
:
Embedding
Generation
K+1
K+2<change_calender>
Training Data
<change_note>
“adjust my note”
:
<change_setting>
“volume turn down”
300 300 300 300
U A1 A2 An
CosSi
m
P(A1 | U) P(A2 | U) P(An | U)
…
Utterance Action
The dialogue act representations can be
automatically learned for other domains
http://ieeexplore.ieee.org/abstract/document/7472838/
postpone my meeting to five pm
261. 261
Zero-Shot Learning (Daupin et al., 2016)
Semantic utterance classification
Use query click logs to define a task that makes the
networks learn the meaning or intent behind the queries
The semantic features c can be learned
https://arxiv.org/abs/1401.0509
262. 262
Domain Adaptation for SLU (Kim et al., 2016)
Frustratingly easy domain adaptation
Novel neural approaches to domain adaptation
Improve slot tagging on several domains
http://www.aclweb.org/anthology/C/C16/C16-1038.pdf
263. 263
Policy for Domain Adaptation (Gašić et al., 2015)
Bayesian committee machine (BCM) enables estimated
Q-function to share knowledge across domains
QRDR
QHDH
QL DL
Committee Model
The policy from a new domain can be boosted by the committee policy
http://ieeexplore.ieee.org/abstract/document/7404871/
264. 264
Outline
Dialogue System Evaluation
Recent Trends
End-to-End Learning for Dialogue System
Multimodality
Dialogue Breath
Dialogue Depth
Challenges
264
265. 265
Evolution Roadmap
265
Knowledge based system
Common sense system
Empathetic systems
Dialogue breadth (coverage)
Dialoguedepth(complexity)
What is influenza?
I’ve got a cold what do I do?
Tell me a joke.
I feel sad…
266. 266
High-Level Intention for Dialogue Planning
(Sun et al., 2016; Sun et al., 2016)
High-level intention may span several domains
Schedule a lunch with Vivian.
find restaurant check location contact play music
What kind of restaurants do you prefer?
The distance is …
Should I send the restaurant information to Vivian?
Users can interact via high-level descriptions and the
system learns how to plan the dialogues
http://dl.acm.org/citation.cfm?id=2856818; http://www.lrec-conf.org/proceedings/lrec2016/pdf/75_Paper.pdf
267. 267
Empathy in Dialogue System (Fung et al., 2016)
Embed an empathy module
Recognize emotion using multimodality
Generate emotion-aware responses
267Emotion Recognizer
vision
speech
text
https://arxiv.org/abs/1605.04072
268. 268
Outline
Dialogue System Evaluation
Recent Trends
End-to-End Learning for Dialogue System
Multimodality
Dialogue Breath
Dialogue Depth
Challenges
268
269. 269
Challenges in Dialogue Modeling - I
269
Semantic schema induction (Chen et al., 2013; Athanasopoulou, et al., 2014)
No predefined semantic schema
How to learn from data?
Tractability, and dimensionality reduction methods
Learning with large state action spaces
End-to-end learning methods
Learning when the user input is complex NL utterance
Learning with humans or KBs ?
Learning under domain shifts
270. 270
Challenges in Dialogue Modeling - II
270
Multiple-State hypothesis
Tracking a distribution over multiple dialog states can improve dialog
accuracy
How does current dialog systems deal with this?
Proactive v.s. reactive approaches to dialog modeling
How to build DM models when the agent is proactive (i.e., does not
wait for the user but sends messages and drives the conversation)
Localization, personalization, etc.
How to deal with issue pertaining to place, temporal and personal
context. Mostly dealt on speech side. How about DM side for when
learning the policy?
Hierarchical RL approach to policy learning actually works?
When are they useful?
How about for open domain systems (like chit-chat) - Are they
powerful?
271. 271
Challenges in Dialogue Modeling - III
271
Chat-Bot challenges
Consistency: Keep similar answers in spite of different wordings
Human: what is your job?
Machine: I am lawyer
Human: what do you do ?
Machine: I am a doctor
Quick domain-dependent adaptation: specially from un-
structured data (Yan et.al, 2016)
Personalization: handling profiles, interaction levels, and keep
relevant context history (Li et al., 2016)
Long sentence generation: most sentence are short or common
phrases
272. 272
Concluding Remarks
272
Human-Robot interfaces is a hot topic but several components must be integrated!
Most state-of-the-art technologies are based on DNN
• Requires huge amounts of labeled data
• Several frameworks/models are available
Fast domain adaptation with scarse data + re-use of rules/knowledge
Handling reasoning
Data collection and analysis from un-structured data
Complex-cascade systems requires high accuracy for working good as a whole
273. THANKS FOR ATTENTION!
Q & A
We thanks Tsung-Hsien Wen, Pei-Hao Su, Li Deng, Sungjin Lee,
Milica Gašić, Lihong Li for sharing their slides