The learning rate for actor and critic is too small, it takes a very long time to converge. Maybe change it as 0.01