Implement MuZero #136

leonard-q · 2022-03-07T04:51:41Z

Please describe the feature you want to add.
A clear and concise description of what the feature. Ex. I'm going to implement ...

Implement MuZero algorithm

Additional requirement
A clear and concise description of additional requirement for the new feature

None

Reference
Please append the reference about the feature

https://arxiv.org/pdf/1911.08265.pdf

* [#136] muzero first commit * add 1st version of muzero MCTS * modify the process to get pi and action_idx * add 1st version of muzero network * implemented muzero act and process * add 1st version of muzero agent * modify agent representation function input * modify network structure * muzero mcts minor fix * update version 1.0.1 agent * add test_muzero_agent * add agent PERBuffer * mcts + network done * modify agent * muzero first learn try * muzero 1_cycle success * update muzero config * remove act pesudo mcts * add 2nd version of muzero network * modify network and agent * restore replay buffer * add gym trajectory * gym learn cycle success * add log_softmax rd * update agent version 1.1 * update agent version 1.2 * modify s2v, net_activation, num_stack and add lr_decay * learning check of mlp network * resnet cycle check * add num_rb on config * rollback atari config * update muzero mlp network to sampled muzero's network * apply num_rb to mlp network, change config, delete print which occurred error * add set_distributed mcts_alpha * save v in MCTS so delete get v of leaf node in expansion * append s in MCTS node info for remove network calculation in selection * add num_eval_mcts * add result metric max_V, max_reward * remove epsilon in config, change maxR, maxV to scalar * Muzero/batch expansion (#158) * add pong mlagent for muzero * change batch calculation for expansion and apply it using for loop * change repeat of hidden state for MCTS with image and vector Co-authored-by: root <[email protected]> * add optim weight decay(loss l2 regularization) * add next stacked state and fix trajectory append state * modify trajecotry_size to max_trajetory * modify get_stacked_data method (#162) * modify network to respond channel and plane calculation dynamically (#163) * update config, make activation function of representation function and dynamics function identical (#167) Co-authored-by: root <[email protected]> * Muzero/unroll stacked observation (#169) * unroll stacked_observation * update unroll stacked_observation * update config, make activation function of representation function and dynamics function identical (#167) Co-authored-by: root <[email protected]> * update get_stacke_data * apply black * clean up variable * fix counting stacked index Co-authored-by: leonard-q <[email protected]> Co-authored-by: root <[email protected]> * add PER buffer * update per buffer * implementation of self supervised consistency loss * Muzero/modify state out of range (#176) * add out of range state setting on/off * fix bugs * modify channal and workers * modify enable result * add start prev state action option * add policy train delay * restore cartpole config * apply black * restore value exp * implementation of self supervised consistency loss * Muzero/modify state out of range (#176) * add out of range state setting on/off * fix bugs * modify channal and workers * modify enable result * add start prev state action option * add policy train delay * restore cartpole config * apply black * restore value exp * update per buffer * apply annealing beta * update config for PER * modify config as high performance param * Modify input of mlp's dynamics from hs to next_hs (#183) * Modify input of mlp's dynamics from hs to next_hs * Drop layers for next hs * add per init condition (#184) * update config(pong_mlagent, atari) * fix typo in buffer size of Atari * modify normalization of atari * first version of tictactoe * tictactoe done * apply black * automation of hidden_size and atari config * remove check win when the agent does the invalid action * modify representation's action shape * put trajectory class into muzero and small fixes * modify max_func, apply batch norm and input norm to rd * Muzero/resnet normalize (#189) * modify normalization of atari * automation of hidden_size and atari config * modify representation's action shape * modify max_func, apply batch norm and input norm to rd * Normalize by sample in batch * Muzero/resnet normalize (#189) * modify normalization of atari * automation of hidden_size and atari config * modify representation's action shape * modify max_func, apply batch norm and input norm to rd * update config and modify variable name * modify network representation input a * reduce max_func * remove duplicate code * channel-wise batch normalize * add sync variable (#156) * init actors network as learner network * modify to process rollout buffer stamp * delete unused variable * modify ppo mujoco config * cancel rainbow process stamp * restore agent.rainbow.py * fix tmp_buffer size to self.n_step and unify config 1step (#166) * test REINFORCE.md * test with REINFORCE pdf * update algorithm descriptions * modify PER and Rainbow DQN pdf files * modify TD3 * [-] Update mujoco config as benchmark config (#170) * update mujoco config * update sac mujoco config * add citation to README * Update README.md * Modify input size for RGB-channel(#160) (#177) * update muzero mcts (change dirichlet noise, max action for selection * separate rb (#197) * separate rb * rb deepcopy * rb deepcopy * reset config * breakout train successful version (#198) * add noise to action for encourage exploration * update mcts expansion -> update expanded node info for only selected one * change 0 to none for MCTS expansion * Muzero/modify end of trajectory (#202) * apply over trajectory size * apply prev obs and full bootsrap calc * add noise to action for encourage exploration * apply over trajectory size * update muzero config * add noise to action for encourage exploration * update mcts expansion -> update expanded node info for only selected one * change 0 to none for MCTS expansion * apply over trajectory size * add noise to action for encourage exploration * update muzero config * rebase 136 branch * fix max_trajectory_size * restore config and remove unuse code * restore network * Update muzero.py Co-authored-by: root <[email protected]> Co-authored-by: root <[email protected]> * update mcts backup * apply black * optimize backup * config set * network black * fix typo; Muzero -> Muzero * fix typo; MuZero -> Muzero * fix deprecated head * modify default network; resnet -> mlp * fix typo * Update unittest.yml * self.memory.buffer_counter -> self.memory.size * Update test_muzero_agent.py Co-authored-by: root <[email protected]> Co-authored-by: root <[email protected]> Co-authored-by: erinn-lee <[email protected]> Co-authored-by: kan-s0 <[email protected]> Co-authored-by: root <[email protected]> Co-authored-by: root <[email protected]> Co-authored-by: root <[email protected]> Co-authored-by: root <[email protected]> Co-authored-by: erinn-lee <[email protected]> Co-authored-by: kan-s0 <[email protected]> Co-authored-by: leonard-q <[email protected]> Co-authored-by: root <[email protected]> Co-authored-by: root <[email protected]> Co-authored-by: ramanuzan <[email protected]>

leonard-q added the enhancement New feature or request label Mar 7, 2022

leonard-q assigned atech-rl-kakaoenterprise, ramanuzan, kan-s0 and erinn-lee Mar 7, 2022

leonard-q pushed a commit that referenced this issue Mar 7, 2022

[#136] muzero first commit

069d229

leonard-q pushed a commit that referenced this issue Mar 10, 2022

[#136] muzero first commit

3e426e9

leonard-q pushed a commit that referenced this issue Mar 10, 2022

[#136] muzero first commit

b87d94b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement MuZero #136

Implement MuZero #136

leonard-q commented Mar 7, 2022

Implement MuZero #136

Implement MuZero #136

Comments

leonard-q commented Mar 7, 2022