This project applies reinforcement learning to the task of modularity maximization for community detection on graphs.
Each episode in the environment follows the following steps:
An adaptation of Deep Q-Learning is used to optimize the modularity-based reward signal. On synthetic datasets, the agent can almost perfectly recover the community structure used to generate the graphs. Below is the reward trace and the modularity trace over the course of training on a synthetic graph:
See the write-up for full technical details.


