Posts tagged: databases

Looking forward: Crawler version 2.0 and 3.0

Posted September 17 by Dan Cryer

Alongside my work on our current crawler, Wade has been rewriting the system to make better use of what we’ve learned so far, and make a number of changes: The current codebase is a mess, many cron jobs running terribly written classes, across two separate crawler systems. The new system is neatly organised with models [...]

Read More »

PHP/MySQL Scaling Followup

Posted September 14 by Dan Cryer

I posted last week about our work on scaling our crawler application and it’s corresponding MySQL database, but left it with very little conclusion, apart from that Memcached is good. I wanted to follow up with some of the changes we made on Thursday and Friday. Having run the code with the changes we made [...]

Read More »

When scaling for speed slows you down…

Posted September 4 by Dan Cryer

At work, the past few days, I’ve been working on the scalability of one of our systems and, hopefully, we’re almost closing on the finishing line. For some context, the system I’m talking about is effectively a web crawler. It (all too slowly) works through a queue of URLs, downloading the pages and parsing them [...]

Read More »