PHP/MySQL Scaling Followup

Posted September 14 by Dan Cryer

I posted last week about our work on scaling our crawler application and it’s corresponding MySQL database, but left it with very little conclusion, apart from that Memcached is good. I wanted to follow up with some of the changes we made on Thursday and Friday.

Having run the code with the changes we made previously for a day or so, in order to give it a fair trial, it was obvious it wasn’t going to work. We were still doing an average of 5,000 pages an hour, way below the original code and even further below the targets we’d set ourselves.

The first change we made was to move the queue table back to InnoDB from MEMORY. The memory engine wasn’t providing the benefits we’d hoped, and it was using far more RAM than it was worth. It was also locking, a lot. I have to admit, it was probably unwise to assume that we could expect memory tables to perform any better, when working with over a gigabyte of data. Next, we turned our attention to what had become an obvious bottleneck: statement based binary logging. We’d made this choice early on, based on reasons unbeknownst to me. After much research and a little panic, we flipped the switch and turned our crawlers back on, seeing an immediate and significant performance improvement.

We’d finally done it, we were now running an average of 25,000 pages an hour, peaking at 45,000. We’re still working on it, of course, as we’d like to hit 100,000 pages an hour, but we’re very happy to have finally gotten past the bottlenecks.

As an additional interesting tidbit, I re-ran the graphs for our Memcache utilisation. This time over the period 12:00pm Thursday to 12:00pm Monday. As you can see, the hits to misses ratio is starting to level out over time, averaging around 70% hits, 30% misses. Here’s the charts:

Memcache usage in real numbers under load

Memcached usage under load

As usual, please let me know if you’ve got any questions or comments, I (and I’m sure the others on my team) would be happy to help if I can.

Leave a Reply


Notice: Undefined index: HTTPS in /home/dan/public_html/wp/wp-content/plugins/stats/stats.php on line 111