ML-Agents Tutorial 01

Traffic Simulation/Unity ML-Agents

ML-Agents Tutorial 01

rrojin 2021. 2. 25. 12:36

www.youtube.com/watch?v=2Js4KiDwiyU

ML-Agents 가 소개된 tutorial이 유튜브에 꽤 있었지만 버전 문제로 오류가 많이 발생하여 따라하기가 어려웠다.

그 중 제일 안정적이었던 ML-Agentss v1.0 의 Tutorial이다. Player 자동차가 지나가는 Mover 자동차를 피해 점프하면 점수를 얻고, 부딛히면 점수를 잃는 단순한 게임이다.

Github에는 미완성본과 완성본 코드가 각각 업로드 되어있는데, 강의와 함께 미완성본 코드를 채워나가면서 ML-Agents작동 흐름을 이해하기 딱 좋았다.

- Tutorial 작업 흐름

1) Clone Repository

2) Open Unity Project

3) Jumper.cs 수정 : Change Monobehavior to Agent

4) Add Behavior Parameters script

5) Set Action size to 2 : Player 가 할 행동 2가지 jump & nothing (ex. vectorAction[0]=1 일때 jump())

6) Add RayPerceptionSensor 3D : Observations 위해 필요 (Distance btw mover => detectable tags에 'mover' 설정 )

7) Adjust RayPerceptionSensor & Set Observation Size = 0 (레이져가 역할 대신함)

8) Jumper.cs 스크립트 완성 (Action 상세 설정 및 Reward)

9) Train 을 위해 trainer_config.yaml 파일 수정 & Train 진행

10) 해당 env를 복사하여 여러 에이전트로 동시 학습시키면 성능이 훨씬 좋아짐!

11) Build 진행하여 standalone한 player생성

12) Gnerate and apply Model (.nn file) 모델 적용

- Player Agent에 붙는 script 완성본

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105

//Jumper.cs
using System;
using System.Collections;
using System.Collections.Generic;
using Unity.MLAgents;
using Unity.MLAgents.Sensors;
using UnityEngine;
 
public class Jumper : Agent
{
    [SerializeField] private float jumpForce;
    [SerializeField] private KeyCode jumpKey;
    
    private bool jumpIsReady = true;
    private Rigidbody rBody;
    private Vector3 startingPosition;
    private int score = 0;
 
    public event Action OnReset;
    
    public override void Initialize()
    {
        rBody = GetComponent<Rigidbody>();
        startingPosition = transform.position;
    }
 
    private void FixedUpdate() //RequestDecision()이 도로 위 달리고있을때만 일어나도록 최적화.
    {
        if(jumpIsReady)
            RequestDecision();
    }
 
    public override void OnActionReceived(float[] vectorAction)
    {
        if (Mathf.FloorToInt(vectorAction[0])==1)
            Jump();
        
    }
 
    public override void OnEpisodeBegin()
    {
        Reset();
    }
 
 
    public override void Heuristic(float[] actionsOut) //Player Input
    {
        actionsOut[0] = 0;   
     
        if (Input.GetKey(jumpKey))
            actionsOut[0] = 1; 
    }
 
    private void Jump()
    {
        if (jumpIsReady)
        {
            rBody.AddForce(new Vector3(0, jumpForce, 0), ForceMode.VelocityChange);
            jumpIsReady = false;
        }
    }
 
    private void Reset()
    {
        score = 0;
        jumpIsReady = true;
        
        //Reset Movement and Position
        transform.position = startingPosition;
        rBody.velocity = Vector3.zero;
        
        OnReset?.Invoke();
    }
 
    //Calculation after collision -> Require Rigidbody at least one object & both Colliders
    private void OnCollisionEnter(Collision collidedObj)
    {
        if (collidedObj.gameObject.CompareTag("Street"))
            jumpIsReady = true;
        
        else if (collidedObj.gameObject.CompareTag("Mover") || collidedObj.gameObject.CompareTag("DoubleMover"))
        {
            AddReward(-1.0f);
            Debug.Log(GetCumulativeReward());
            EndEpisode();
        }
        
            
    }
 
    //NO calculation after collision -> Require both Colliders
    
    private void OnTriggerEnter(Collider collidedObj)
    {
        if (collidedObj.gameObject.CompareTag("score"))
        {
            AddReward(0.1f);
            Debug.Log(GetCumulativeReward());
            score++;
            ScoreCollector.Instance.AddScore(score);
        }
        
    }
    
}
 
Colored by Color Scripter

cs

* Tutorial 따라할 때 주의할 점*

1) Training

mlagents-learn trainer_config.yaml --run-id="JumperAI_1" 명령어 입력하는데, trainer_config.yaml 파일의 형식이 변경된 거 같다. 기존과 같이 하면 작동이 안되니, 다음과 같이 수정이 필요하다. 수정전에는 "The option default was specified in your YAML file, but is invalid." 과 같은 오류를 내뿜었다.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27

//trainer_config.yaml
behaviors:
  Jumper:
    trainer_type: ppo
    hyperparameters:
      batch_size: 128
      buffer_size: 2048
      learning_rate: 3.0e-4
      beta: 5.0e-3
      epsilon: 0.2
      lambd: 0.95
      num_epoch: 3
      learning_rate_schedule: linear
    network_settings:
      normalize: true
      hidden_units: 64
      num_layers: 2
      vis_encode_type: simple
    reward_signals:
      extrinsic:
        gamma: 0.99
        strength: 1.0
    keep_checkpoints: 5
    max_steps: 5.0e7
    time_horizon: 64
    summary_freq: 10000
    threaded: true
 
Colored by Color Scripter

cs

 

2) Tensorboard로 결과 확인

TrainerConfig로 이동하여 tensorboard --logdir "자신의 결과폴더명(default는 results)" 입력하고

http://localhost:6006/으로 이동하면 학습결과를 확인할 수 있다.

* tutorial에서는 --logdir summaries라는 명령어 사용하지만 default폴더명이 results로 바뀐거 같다.

=> Tensorboard로 확인한 학습 결과 : 총 24개의 environment를 복사하여 학습시킴.

3) Model 적용시

마지막 단계에서 모델을 적용하여 player가 게임하는 모습을 보고 싶을 때,

TrainerConfig>results(결과폴더)>Jumper> Jumper-1127784.onnx파일과 같이 .onnx로 끝나는 파일을 찾아서

A.I.-Jumping-Cars-ML-Agents-Example>Assets 폴더 안으로 이동시킨다. 그러면 Unity Project 내 Assets폴더에서 해당 모델을 찾을 수 있고 Player Agent의 Behavior Parmaeters 중 Model에 적용시킬 수 있다.

* Tutorial 추가구현*

- 학습을 시키고 난 모델을 확인해보니 Mover가 없을 때도 점프를 할 때가 있다. 이런 경우 reward -0.1 정도를 하여서 모델 성능을 개선시켜보고자 한다.

<아이디어>

(1) Mover뒤에 투명 큐브를 배치하였듯이 공중에 투명 cube를 배치하여 점프했을 때 투명 cube에 부딪히도록 함.

이때, 레이저를 아래 방향으로 쏘아 Mover가 없으면 헛된? 점프이기에 reward -0.1을 주며 학습을 진행.

(2)

Reference:

[1]: www.youtube.com/watch?v=2Js4KiDwiyU

[2]: github.com/Sebastian-Schuchmann/A.I.-Jumping-Cars-ML-Agents-Example

'Traffic Simulation > Unity ML-Agents' 카테고리의 다른 글

ML-Agents v1.0 소개 (0)	2021.02.25

현재글ML-Agents Tutorial 01

AI Archive/...

netconvert, discounted future reward, 다음 순열, Next Permutation, 파이썬, 백준, random noise, 7562번, BFS, rl, 모두를 위한 RL, Q-learning, container orchestration, overpass turbo, 알고리즘, exploit and explore, docker swarm, 13913, 모두를 위한 RL 강좌 #강화학습 #RL #Q-learning #Windows code, Python,

Today :
Yesterday :

rrojin