Skip to content

Robot Learning

Reinforcement Learning Overview

General classification of RL

graph TD;
    A((Machine Learning));
    B[predictive, supervised];
    C[descriptive, unsupervised];
    D[active, reinforcement learning];
    A-->B;
    A-->C;
    A-->D;

Algorithms

graph TD;
    A((RL algorithms));
    B[Action Value Function];
    C[Policy Gradient];
    B1[<a href=https://en.wikipedia.org/wiki/Q-learning>Q-learning</a>];
    B2[SARSA];
    C1[Actor critic];
    C11[A3C]
    C12[ACKTER]
    C2[TRPO];
        C3[PPO];
    A-->B;
    A-->C;
    B-->B1;
    B-->B2;
    C-->C1;
    C-->C2;
    C-->C3;
    C1-->C11;
    C1-->C12;

How to handle your data?

Back-propagation

Deep Learning

-> MLP's

DNN

(Deep Neuronal Network)

CNN

RNN

Regression

Ordinary Least Square

Weighted Regression

Ridge Regression

Local Ridge Regression

Material

Pytorch

Install it in Linux within a Anaconda environment:

    conda install pytorch torchvision cudatoolkit=9.0 -c pytorch

Test if it worked:

python3
Python 3.6.3 |Anaconda, Inc.| (default, Oct 13 2017, 12:02:49) 
[GCC 7.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> print(torch.__version__)
1.0.1.post2

see videos from Sung Kim.

Abbreviations

Symbols

Symbol Name Description
-- Policy German: Handlung
-- Rollout German: ?
-- Reward German: ?
-- State German: Zustand,
-- Action German: Aktion,
-- Advantage Function
-- Value Function
-- Finite Horizon
-- Q-Value
-- On policy Agent can pick actions, Agent follows his own policy
-- Off policy Agent can not pick actions, Learns with exploration and from expert