Robot Learning
Reinforcement Learning Overview¶
General classification of RL
graph TD;
A((Machine Learning));
B[predictive, supervised];
C[descriptive, unsupervised];
D[active, reinforcement learning];
A-->B;
A-->C;
A-->D;
Algorithms
graph TD;
A((RL algorithms));
B[Action Value Function];
C[Policy Gradient];
B1[<a href=https://en.wikipedia.org/wiki/Q-learning>Q-learning</a>];
B2[SARSA];
C1[Actor critic];
C11[A3C]
C12[ACKTER]
C2[TRPO];
C3[PPO];
A-->B;
A-->C;
B-->B1;
B-->B2;
C-->C1;
C-->C2;
C-->C3;
C1-->C11;
C1-->C12;
How to handle your data?
Back-propagation
Deep Learning¶
-> MLP's
DNN¶
(Deep Neuronal Network)
CNN¶
RNN¶
Regression¶
Ordinary Least Square¶
Weighted Regression¶
Ridge Regression¶
Local Ridge Regression¶
Material¶
Pytorch¶
Install it in Linux within a Anaconda environment:
conda install pytorch torchvision cudatoolkit=9.0 -c pytorch
Test if it worked:
python3
Python 3.6.3 |Anaconda, Inc.| (default, Oct 13 2017, 12:02:49)
[GCC 7.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> print(torch.__version__)
1.0.1.post2
see videos from Sung Kim.
Abbreviations¶
Symbols¶
Symbol | Name | Description |
---|---|---|
-- | Policy | German: Handlung |
-- | Rollout | German: ? |
-- | Reward | German: ? |
-- | State | German: Zustand, |
-- | Action | German: Aktion, |
-- | Advantage Function | |
-- | Value Function | |
-- | Finite Horizon | |
-- | Q-Value | |
-- | On policy | Agent can pick actions, Agent follows his own policy |
-- | Off policy | Agent can not pick actions, Learns with exploration and from expert |