-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement MuZero #136
Labels
enhancement
New feature or request
Comments
leonard-q
pushed a commit
that referenced
this issue
Mar 10, 2022
leonard-q
pushed a commit
that referenced
this issue
Mar 10, 2022
ramanuzan
added a commit
that referenced
this issue
Sep 14, 2022
* [#136] muzero first commit * add 1st version of muzero MCTS * modify the process to get pi and action_idx * add 1st version of muzero network * implemented muzero act and process * add 1st version of muzero agent * modify agent representation function input * modify network structure * muzero mcts minor fix * update version 1.0.1 agent * add test_muzero_agent * add agent PERBuffer * mcts + network done * modify agent * muzero first learn try * muzero 1_cycle success * update muzero config * remove act pesudo mcts * add 2nd version of muzero network * modify network and agent * restore replay buffer * add gym trajectory * gym learn cycle success * add log_softmax rd * update agent version 1.1 * update agent version 1.2 * modify s2v, net_activation, num_stack and add lr_decay * learning check of mlp network * resnet cycle check * add num_rb on config * rollback atari config * update muzero mlp network to sampled muzero's network * apply num_rb to mlp network, change config, delete print which occurred error * add set_distributed mcts_alpha * save v in MCTS so delete get v of leaf node in expansion * append s in MCTS node info for remove network calculation in selection * add num_eval_mcts * add result metric max_V, max_reward * remove epsilon in config, change maxR, maxV to scalar * Muzero/batch expansion (#158) * add pong mlagent for muzero * change batch calculation for expansion and apply it using for loop * change repeat of hidden state for MCTS with image and vector Co-authored-by: root <[email protected]> * add optim weight decay(loss l2 regularization) * add next stacked state and fix trajectory append state * modify trajecotry_size to max_trajetory * modify get_stacked_data method (#162) * modify network to respond channel and plane calculation dynamically (#163) * update config, make activation function of representation function and dynamics function identical (#167) Co-authored-by: root <[email protected]> * Muzero/unroll stacked observation (#169) * unroll stacked_observation * update unroll stacked_observation * update config, make activation function of representation function and dynamics function identical (#167) Co-authored-by: root <[email protected]> * update get_stacke_data * apply black * clean up variable * fix counting stacked index Co-authored-by: leonard-q <[email protected]> Co-authored-by: root <[email protected]> * add PER buffer * update per buffer * implementation of self supervised consistency loss * Muzero/modify state out of range (#176) * add out of range state setting on/off * fix bugs * modify channal and workers * modify enable result * add start prev state action option * add policy train delay * restore cartpole config * apply black * restore value exp * implementation of self supervised consistency loss * Muzero/modify state out of range (#176) * add out of range state setting on/off * fix bugs * modify channal and workers * modify enable result * add start prev state action option * add policy train delay * restore cartpole config * apply black * restore value exp * update per buffer * apply annealing beta * update config for PER * modify config as high performance param * Modify input of mlp's dynamics from hs to next_hs (#183) * Modify input of mlp's dynamics from hs to next_hs * Drop layers for next hs * add per init condition (#184) * update config(pong_mlagent, atari) * fix typo in buffer size of Atari * modify normalization of atari * first version of tictactoe * tictactoe done * apply black * automation of hidden_size and atari config * remove check win when the agent does the invalid action * modify representation's action shape * put trajectory class into muzero and small fixes * modify max_func, apply batch norm and input norm to rd * Muzero/resnet normalize (#189) * modify normalization of atari * automation of hidden_size and atari config * modify representation's action shape * modify max_func, apply batch norm and input norm to rd * Normalize by sample in batch * Muzero/resnet normalize (#189) * modify normalization of atari * automation of hidden_size and atari config * modify representation's action shape * modify max_func, apply batch norm and input norm to rd * update config and modify variable name * modify network representation input a * reduce max_func * remove duplicate code * channel-wise batch normalize * add sync variable (#156) * init actors network as learner network * modify to process rollout buffer stamp * delete unused variable * modify ppo mujoco config * cancel rainbow process stamp * restore agent.rainbow.py * fix tmp_buffer size to self.n_step and unify config 1step (#166) * test REINFORCE.md * test with REINFORCE pdf * update algorithm descriptions * modify PER and Rainbow DQN pdf files * modify TD3 * [-] Update mujoco config as benchmark config (#170) * update mujoco config * update sac mujoco config * add citation to README * Update README.md * Modify input size for RGB-channel(#160) (#177) * update muzero mcts (change dirichlet noise, max action for selection * separate rb (#197) * separate rb * rb deepcopy * rb deepcopy * reset config * breakout train successful version (#198) * add noise to action for encourage exploration * update mcts expansion -> update expanded node info for only selected one * change 0 to none for MCTS expansion * Muzero/modify end of trajectory (#202) * apply over trajectory size * apply prev obs and full bootsrap calc * add noise to action for encourage exploration * apply over trajectory size * update muzero config * add noise to action for encourage exploration * update mcts expansion -> update expanded node info for only selected one * change 0 to none for MCTS expansion * apply over trajectory size * add noise to action for encourage exploration * update muzero config * rebase 136 branch * fix max_trajectory_size * restore config and remove unuse code * restore network * Update muzero.py Co-authored-by: root <[email protected]> Co-authored-by: root <[email protected]> * update mcts backup * apply black * optimize backup * config set * network black * fix typo; Muzero -> Muzero * fix typo; MuZero -> Muzero * fix deprecated head * modify default network; resnet -> mlp * fix typo * Update unittest.yml * self.memory.buffer_counter -> self.memory.size * Update test_muzero_agent.py Co-authored-by: root <[email protected]> Co-authored-by: root <[email protected]> Co-authored-by: erinn-lee <[email protected]> Co-authored-by: kan-s0 <[email protected]> Co-authored-by: root <[email protected]> Co-authored-by: root <[email protected]> Co-authored-by: root <[email protected]> Co-authored-by: root <[email protected]> Co-authored-by: erinn-lee <[email protected]> Co-authored-by: kan-s0 <[email protected]> Co-authored-by: leonard-q <[email protected]> Co-authored-by: root <[email protected]> Co-authored-by: root <[email protected]> Co-authored-by: ramanuzan <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Please describe the feature you want to add.
A clear and concise description of what the feature. Ex. I'm going to implement ...
Implement MuZero algorithm
Additional requirement
A clear and concise description of additional requirement for the new feature
None
Reference
Please append the reference about the feature
https://arxiv.org/pdf/1911.08265.pdf
The text was updated successfully, but these errors were encountered: