The Scheduler use hashset for duplicate removing. It will take a lot of memory when number of urls is huge. Add BloomFilter for less memory usage.