I tried finding information on what indexer they are using. Are they using their own?
Edit: says this in the readme:
The commoncrawl organization for crawling the web and making the dataset readily available. Even though we have our own crawler now, commoncrawl has been a huge help in the early stages of development.