Introduction of "TrailBlazer" algorithm

•Descargar como PPTX, PDF•

1 recomendación•1,684 vistas

論文「Blazing the trails before beating the path: Sample-efficient Monte-Carlo planning」紹介スライドです。NIPS2016読み会@PFN(2017/1/19) https://connpass.com/event/47580/ にて。

Tecnología

BLAZING THE TRAILS BEFORE
BEATING THE PATH:
SAMPLE-EFFICIENT MONTE-
CARLO PLANNING
KATSUKI OHTO
@NIPS2016-YOMI
2017/1/19

INTRODUCED PAPER
• Blazing the trails before beating the path:
Sample - efficient Monte-Carlo planning
(JB. Grill, M. Valko and R. Munos)
• NIPS 2016 accepted paper (poster session)
• Abstract starts with “You are a robot…”
• http://papers.nips.cc/paper/6253-blazing-the-trails-before-
beating-the-path-sample-efficient-monte-carlo-planning

TRAILBLAZER
• Nested-fashion Monte-Carlo Planning Algorithm
• Problem settings:
MDP (contains MAX nodes and AVG nodes)
Actions per each state : Finite
State transition candidates : Finite or Infinite
• Strong theoretical guarantee
MAX
AVG

AIM
• Input : an MDP (Markov Decision Process)
(discount factor 𝛾, maximum number of valid actions 𝐾),
𝜀 (> 0), 𝛿 (0 < 𝛿 < 1)
• Output : estimated value 𝜇 𝜀,𝛿 of current state 𝑠0
• Aim : Get good estimation of real value 𝒱[𝑠0] of current state
such as
ℙ 𝜇 𝜀,𝛿 − 𝒱 𝑠0 > 𝜀 ≤ 𝛿
（ ℙ ∙ means probability of ∙ ）
with the minimum number of calls to the generative model (state transition function)

1 PLAYER TREE MODEL
IN STOCHASTIC ENVIRONMENT
• Each MAX node means an
opportunity to decide action
• Each AVG node means
stochastic state transition
MAX
AVG

ALGORITHM OVERVIEW
• Global Initialization
set 𝜂, 𝜆 as global value
set 𝑚 as an argument of
root node
• Recursive algorithm
log(𝜂/𝛾)

ALGORITHM OVERVIEW 2
• In both MAX nodes and AVG nodes,
arguments are
𝑚 (desired branching factor)
and
𝜀 (admissible estimation error)
• If 𝑚 is large, we can search many children, but we need much time
(dilemma)
• If 𝜀 is small, we can search deeply, but we need much time (dilemma)

ALGORITHM
FOR AVG NODES
• Input : 𝑚 and 𝜀
• Output : estimated value
• If admissible error 𝜀 is large, ignore
successive reward
• Fill 𝑚 transition samples
(and store immediate reward)
• search all of 𝑚 sampled next states
• return averaged immediate reward +
estimated successive reward

ALGORITHM
FOR MAX NODES
• Input : 𝑚 and 𝜀
• Output : estimated value
• Fill candidate action pool ℒ by all valid actions
• U is a value like standard error of estimation
• Search candidate actions repeatedly until
“Only 1 action left” or “Error might be small”
• If “Error might be small”
then return estimated value of best action
else
search best action 1 more time carefully

SAMPLE COMPLEXITY OF TRAILBLAER
• Sample Complexity is a measure of performance of algorithm
• If N (the number of next states) is finite,
(
1
𝜀
)
max(2,
log 𝑁𝜅
log
1
𝛾
+𝑜 1 )
on condition that 𝜅 ∈ 1, 𝐾 (in detail in
the paper)
else
(
1
𝜀
)2+𝑑
on condition that 𝑑 is a measure of difficulty to identify near-
optimal nodes

Más contenido relacionado

La actualidad más candente

0415_seminar_DeepDPGHye-min Ahn

Competition winning learning ratesMLconf

Ashfaq Munshi, ML7 Fellow, PepperdataMLconf

K-Means AlgorithmCarlos Castillo (ChaTo)

Dual Learning for Machine Translation (NIPS 2016)Toru Fujino

Hyperparameter optimization with approximate gradientFabian Pedregosa

Dueling network architectures for deep reinforcement learningTaehoon Kim

K-Means Clustering SimplyEmad Nabil

safe and efficient off policy reinforcement learningRyo Iwaki

Tutorial on Theory and Application of Generative Adversarial NetworksMLReview

1118_Seminar_Continuous_Deep Q-Learning with Model based accelerationHye-min Ahn

Variational AutoencoderMark Chang

Gradient Estimation Using Stochastic Computation GraphsYoonho Lee

Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16MLconf

Breaking the Nonsmooth Barrier: A Scalable Parallel Method for Composite Opti...Fabian Pedregosa

Speaker DiarizationHONGJOO LEE

ddpg seminar민재 정

Introduction to Big Data ScienceAlbert Bifet

Kmeans initializationdjempol

Melanie Warrick, Deep Learning Engineer, Skymind.io at MLconf SF - 11/13/15MLconf

La actualidad más candente (20)

0415_seminar_DeepDPG

Competition winning learning rates

Ashfaq Munshi, ML7 Fellow, Pepperdata

K-Means Algorithm

Dual Learning for Machine Translation (NIPS 2016)

Hyperparameter optimization with approximate gradient

Dueling network architectures for deep reinforcement learning

K-Means Clustering Simply

safe and efficient off policy reinforcement learning

Tutorial on Theory and Application of Generative Adversarial Networks

1118_Seminar_Continuous_Deep Q-Learning with Model based acceleration

Variational Autoencoder

Gradient Estimation Using Stochastic Computation Graphs

Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16

Breaking the Nonsmooth Barrier: A Scalable Parallel Method for Composite Opti...

Speaker Diarization

ddpg seminar

Introduction to Big Data Science

Kmeans initialization

Melanie Warrick, Deep Learning Engineer, Skymind.io at MLconf SF - 11/13/15

Destacado

時系列データ3graySpace999

Conditional Image Generation with PixelCNN Decoderssuga93

Interaction Networks for Learning about Objects, Relations and PhysicsKen Kuroki

Value iteration networksFujimoto Keisuke

Learning to learn by gradient descent by gradient descentHiroyuki Fukuda

Introduction of “Fairness in Learning: Classic and Contextual Bandits”Kazuto Fukuchi

Fast and Probvably Seedings for k-MeansKimikazu Kato

[DL輪読会]Convolutional Sequence to Sequence LearningDeep Learning JP

論文紹介 Combining Model-Based and Model-Free Updates for Trajectory-Centric Rein...Kusano Hitoshi

NIPS 2016 Overview and Deep Learning Topics Koichi Hamada

Matching networks for one shot learningKazuki Fujikawa

ICML2016読み会　概要紹介Kohei Hayashi

論文紹介 Pixel Recurrent Neural NetworksSeiya Tokui

Destacado (13)

時系列データ3

Conditional Image Generation with PixelCNN Decoders

Interaction Networks for Learning about Objects, Relations and Physics

Value iteration networks

Learning to learn by gradient descent by gradient descent

Introduction of “Fairness in Learning: Classic and Contextual Bandits”

Fast and Probvably Seedings for k-Means

[DL輪読会]Convolutional Sequence to Sequence Learning

論文紹介 Combining Model-Based and Model-Free Updates for Trajectory-Centric Rein...

NIPS 2016 Overview and Deep Learning Topics

Matching networks for one shot learning

ICML2016読み会　概要紹介

論文紹介 Pixel Recurrent Neural Networks

Similar a Introduction of "TrailBlazer" algorithm

Performance OR Capacity #CMGimPACt2016 Alex Gilgur

Introduction to Genetic algorithm and its significance in VLSI design and aut...Centre for Electronics, Computer, Self development

Artificial Intelligence Course: Linear models ananth

Design and Analysis of Algorithms.pptxSyed Zaid Irshad

XGBoost: the algorithm that wins every competitionJaroslaw Szymczak

STUDY ON PROJECT MANAGEMENT THROUGH GENETIC ALGORITHMAvay Minni

Data Structures - Lecture 1 [introduction]Muhammad Hammad Waseem

DutchMLSchool 2022 - History and Developments in MLBigML, Inc

Reinfrocement LearningNatan Katz

Final Presentation - Edan&Itzikitzik cohen

General Tips for participating Kaggle CompetitionsMark Peng

EMOD_Optimization_Presentation.pptxAliElMoselhy

Practical deep learning for computer visionEran Shlomo

Deep Convolutional GANs - meaning of latent spaceHansol Kang

Mini datathonKunal Jain

Foundations: Artificial Neural Networksananth

Ga presentationziad zohdy

Scaling out logistic regression with SparkBarak Gitsis

Synthesis of analytical methods data driven decision-makingAdam Doyle

Introduction to Deep Reinforcement LearningIDEAS - Int'l Data Engineering and Science Association

Similar a Introduction of "TrailBlazer" algorithm (20)

Performance OR Capacity #CMGimPACt2016

Introduction to Genetic algorithm and its significance in VLSI design and aut...

Artificial Intelligence Course: Linear models

Design and Analysis of Algorithms.pptx

XGBoost: the algorithm that wins every competition

STUDY ON PROJECT MANAGEMENT THROUGH GENETIC ALGORITHM

Data Structures - Lecture 1 [introduction]

DutchMLSchool 2022 - History and Developments in ML

Reinfrocement Learning

Final Presentation - Edan&Itzik

General Tips for participating Kaggle Competitions

EMOD_Optimization_Presentation.pptx

Practical deep learning for computer vision

Deep Convolutional GANs - meaning of latent space

Mini datathon

Foundations: Artificial Neural Networks

Ga presentation

Scaling out logistic regression with Spark

Synthesis of analytical methods data driven decision-making

Introduction to Deep Reinforcement Learning

Más de Katsuki Ohto

論文紹介 Anomaly Detection using One-Class Neural Networks (修正版Katsuki Ohto

ゲームAIを学んで1000年生きた話Katsuki Ohto

Tensorflowユーザから見た Alpha(Go)Zero, Ponanza (TFUG #7)Katsuki Ohto

論文紹介: Value Prediction NetworkKatsuki Ohto

将棋ニューラルネットとこれからのゲームAIKatsuki Ohto

大富豪に対する機械学習の適用 + αKatsuki Ohto

論文紹介 : Unifying count based exploration and intrinsic motivationKatsuki Ohto

カーリングの局面評価関数を学習 WITH “TENSOR FLOW”Katsuki Ohto

Más de Katsuki Ohto (8)

論文紹介 Anomaly Detection using One-Class Neural Networks (修正版

ゲームAIを学んで1000年生きた話

Tensorflowユーザから見た Alpha(Go)Zero, Ponanza (TFUG #7)

論文紹介: Value Prediction Network

将棋ニューラルネットとこれからのゲームAI

大富豪に対する機械学習の適用 + α

論文紹介 : Unifying count based exploration and intrinsic motivation

カーリングの局面評価関数を学習 WITH “TENSOR FLOW”

Último

The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3

Zeshan Sattar- Assessing the skill requirements and industry expectations for...itnewsafrica

Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765

Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González

TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey

The State of Passkeys with FIDO Alliance.pptxLoriGlavin3

Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructureitnewsafrica

React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech

Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda

Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Nikki Chapple

Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3

Design pattern talk by Kaya Weers - 2024 (v2)Kaya Weers

Data governance with Unity Catalog PresentationKnoldus Inc.

Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani

Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA

Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada

Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica

So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda

Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein

UiPath Community: Communication Mining from Zero to HeroUiPathCommunity

Introduction of "TrailBlazer" algorithm

1. BLAZING THE TRAILS BEFORE BEATING THE PATH: SAMPLE-EFFICIENT MONTE- CARLO PLANNING KATSUKI OHTO @NIPS2016-YOMI 2017/1/19

2. INTRODUCED PAPER • Blazing the trails before beating the path: Sample - efficient Monte-Carlo planning (JB. Grill, M. Valko and R. Munos) • NIPS 2016 accepted paper (poster session) • Abstract starts with “You are a robot…” • http://papers.nips.cc/paper/6253-blazing-the-trails-before- beating-the-path-sample-efficient-monte-carlo-planning

3. TRAILBLAZER • Nested-fashion Monte-Carlo Planning Algorithm • Problem settings: MDP (contains MAX nodes and AVG nodes) Actions per each state : Finite State transition candidates : Finite or Infinite • Strong theoretical guarantee MAX AVG

4. AIM • Input : an MDP (Markov Decision Process) (discount factor 𝛾, maximum number of valid actions 𝐾), 𝜀 (> 0), 𝛿 (0 < 𝛿 < 1) • Output : estimated value 𝜇 𝜀,𝛿 of current state 𝑠0 • Aim : Get good estimation of real value 𝒱[𝑠0] of current state such as ℙ 𝜇 𝜀,𝛿 − 𝒱 𝑠0 > 𝜀 ≤ 𝛿 （ ℙ ∙ means probability of ∙ ） with the minimum number of calls to the generative model (state transition function)

5. 1 PLAYER TREE MODEL IN STOCHASTIC ENVIRONMENT • Each MAX node means an opportunity to decide action • Each AVG node means stochastic state transition MAX AVG

6. ALGORITHM OVERVIEW • Global Initialization set 𝜂, 𝜆 as global value set 𝑚 as an argument of root node • Recursive algorithm log(𝜂/𝛾)

7. ALGORITHM OVERVIEW 2 • In both MAX nodes and AVG nodes, arguments are 𝑚 (desired branching factor) and 𝜀 (admissible estimation error) • If 𝑚 is large, we can search many children, but we need much time (dilemma) • If 𝜀 is small, we can search deeply, but we need much time (dilemma)

8. ALGORITHM FOR AVG NODES • Input : 𝑚 and 𝜀 • Output : estimated value • If admissible error 𝜀 is large, ignore successive reward • Fill 𝑚 transition samples (and store immediate reward) • search all of 𝑚 sampled next states • return averaged immediate reward + estimated successive reward

9. ALGORITHM FOR MAX NODES • Input : 𝑚 and 𝜀 • Output : estimated value • Fill candidate action pool ℒ by all valid actions • U is a value like standard error of estimation • Search candidate actions repeatedly until “Only 1 action left” or “Error might be small” • If “Error might be small” then return estimated value of best action else search best action 1 more time carefully

10. SAMPLE COMPLEXITY OF TRAILBLAER • Sample Complexity is a measure of performance of algorithm • If N (the number of next states) is finite, ( 1 𝜀 ) max(2, log 𝑁𝜅 log 1 𝛾 +𝑜 1 ) on condition that 𝜅 ∈ 1, 𝐾 (in detail in the paper) else ( 1 𝜀 )2+𝑑 on condition that 𝑑 is a measure of difficulty to identify near- optimal nodes

Introduction of "TrailBlazer" algorithm

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Destacado

Destacado (13)

Similar a Introduction of "TrailBlazer" algorithm

Similar a Introduction of "TrailBlazer" algorithm (20)

Más de Katsuki Ohto

Más de Katsuki Ohto (8)

Último

Último (20)

Introduction of "TrailBlazer" algorithm