Our goal is to develop a reinforcement learning agent that can play a game against a coin collector agent. We want our agent to learn from its own actions and rewards, and to optimize its performance over time.
To achieve this goal, we have the following steps and tasks:
- Step 1: Set up a static baseline to make sure the environment works. This is our priority 1 task, and it might or might not work as expected.
- Step 2: Implement Q-learning algorithm to train our agent. This is our next step after we have a working baseline.
- Step 3: Use reward shaping technique to improve the learning speed and quality of our agent. We will use multiprocessing to parallelize the training process and speed up the convergence.
- Step 4: Optimize our model to make it beat the coin collector agent. We will compare the scores of our agent and the coin collector agent, and fine-tune our model parameters and hyperparameters to achieve better results.
- Step 5: Add a rule-based agent as another opponent for our agent. We will let our agent play with the rule-based agent and see how it adapts to different strategies.
We have a deadline of 31st August 2023 for this project. We should have a workable Q-learning agent by then. For the first three weeks, we will focus on developing the reward shaping function, which is a key component of our project. We need to design the function such that it gives high positive reward for successful task completion, and encourages exploration and exploitation of the state-action space.
Bomberman Repo HandyRL