March 26, 2007

How I Integrated Dugg Mirror When I got Dugg

My regular readers probably didn’t realize, but I was recently on the front page of Digg, Slashdot, and Reddit. Digg and Slashdot are notorious for killing servers that get linked in their stories. If you want to see the stats for the days in question, you should see my post about it. This post is a little geeky, so beware.

Here’s the quick and dirty list for keeping your server up during a Digg Crisis:

  1. Download and enable wp-cache.
  2. Use the .htaccess rule I explain below. It keeps the rest of your site operational and even lets people post comments directly from the mirror!
  3. Turn off miscellaneous plugins. The biggest suspect was my related posts plugin (see below).

CPU Load

When I got on the front page of Reddit, my traffic spiked immediately. My server load went up to about 2.00. A “2.00″ is high, but is manageable and won’t cause any real problems. A “1.00″ approximately equates to “100% of CPU used.” So you can guess what a “2.00″ means. Realizing Digg was coming next, I knew I had to start making preparations, since this graph shows just how much bigger Digg is than Reddit.

I knew my hits would increase 100x, so I had to shut down as much unnecessary stuff as possible. First, I wanted to turn off every plugin I had on Word Press and then activate wp-cache, which would significantly reduces database overhead. Unfortunately, before I was finished configuring the cache settings, I hit the front page of Digg in record time. The server became non-responsive and I couldn’t even hit the “enable” button. My server load sky rocketed past 500.00. At 500, things stop working.

Integrating Dugg Mirror

Dugg Mirror is a service that creates copies of articles that hit the main page of Digg. Their goal is to serve as a backup in case the main source goes down (as it often does). As soon as my server was Dugg, my objective was to forward all traffic to the mirror.

When my server died, I was racing against time to redirect the traffic. Until I redirected the traffic, I couldn’t do anything to mitigate the problem (such as disabling plugins). It took me about five minutes (due to incredible lag) to connect to the server, go to the correct directory, and edit the .htaccess file to add this line at the very top:

RewriteCond %{HTTP_REFERER} (digg.com) [NC]
RewriteRule maybe-google-wanted-to-be-sued-youtube-and-plan-b http://duggmirror.com/tech_news/Why_Google_wanted_YouTube_to_be_Sued/ [R,L,NC]

The first line says, “If this visitor is from digg.com”. The second line (wrapped over two lines) says, “then redirect hits to my google article to the Dugg Mirror.” This redirected everybody who came to my site from Digg that wanted to see my article to the Mirror. Note that the rest of my site worked completely fine, and anybody trying to post a comment directly from the mirror was able to do so.

Upon applying this fix, my server load dropped to 4.00.

To make this work for your dugg articles, use the following:

RewriteCond %{HTTP_REFERER} (digg.com) [NC]
RewriteRule [article URL without beginning/trailing slash or domain] [dugg mirror URL] [R,L,NC]

The Finishing Touches

I made sure my most CPU/MySQL intensive plugins were off when a digg user came (since they were the ones causing problems). I put a snippet of code around my related post plugin that looked like this:

<?php if(FALSE === stristr($_SERVER['HTTP_REFERER'], ‘digg.com’)) { 
/* Intensive plugin-code */
} ?>

This snippet basically says, “If this visitor is from Digg, don’t do the burdensome plugins.” The thinking is that if a visitor is coming from Digg, I am likely Dugg, so if I wasn’t lucky enough to see it coming, at least I will mitigate some of the problems.

I also disabled my anti-comment-spam plugins and only kept Akismet running (since I hear Techcrunch uses it). I also initially disabled Spam Karma 2, but I eventually turned back on (I prefer it over Akismet).

As a good example horror story of how a plugin can kill you, I once had the Bad Behavior plugin installed (late 2005). It completely locked up my database when several search bots hit the site because it attempted to log each and everything the bots did. It took me days to figure out my blog was taking down my entire server because of this plugin! (There is a new version out now, but I am too scared to try it now.)

Anyway, my point is for you to be careful with plugins that use the database and only use ones you absolutely need, especially when getting Dugg.

I then finished enabling my blog cache, and my server load fell to 1.00. My server was down for about 5 minutes total. Minutes later, I pointed the traffic back onto my site with Word Press caching enabled and the load sat around 4.00. I won’t know for sure, but I think my server would have survived had I enabled the cache plugin from the beginning (which I tried to do!! :( ).

The next day, I was on Slashdot and my server never went down. This is why I conclude that not enabling the cache plugin had more to do with going down than any other factor.

Non-Word Press Administrators

Note that the web server was fine. MySQL failed. MySQL is much less robust than the web server when it comes to this sort of stuff, and requires significantly more baby-sitting. This is especially true for applications that weren’t designed to scale, such as Word Press. This is why the cache plugin is so powerful.

If you have a web application that is MySQL intensive and NOT Word Press, the steps you need to take to keep your site up are different:

  • Minimize the SQL running on the landing page that is getting Dugg. This may involve turning off things like session logging. For example, I turned off a user tracker that inserted a record into the database every time a visitor came to the site. Disabling this sped things up quite a bit.
  • Create as much static content as possible, at least for the first two hours. After the initial surge, traffic will drop to manageable levels (see hourly graphs near bottom). Your best bet is to “fake” part of your application with static content and a disclaimer to come back later.
  • Increase the memory usage limit for MySQL.
  • Increase the maximum allowed connections to MySQL.
  • Make sure you are using indexes in your queries. The quick and very dirty way of explaining this is if you have a SQL statement “… WHERE blah = ’some value’”, make sure there is an index on the column blah if that table is more than 500 records and that column has many unique values (i.e., ignore columns like status, gender, or active/inactive). No, that’s not the ideal answer (this is why DBAs make the big bucks), but it’s the quick and dirty explanation why many Word Press plugins tend to contribute to a server dying when getting Dugg. Perhaps I will cover this in more depth another day.

I hope this helps!

Filed under: Geeky Stuff, Random Code — Michi @ 1:58 am

Share this

These icons link to social bookmarking sites where readers can share and discover new web pages.
  • bodytext
  • Reddit
  • StumbleUpon
  • del.icio.us
  • description
  • Technorati
  • Slashdot
  • co.mments
  • NewsVine

Related

Here are some numbers on the visitors brought in after I published my popular Google theory. Some of the most interesting stats: Reddit and Slashdot generated nearly the same amount of traffic. Only 74.8% of visitors use Windows...
Today's real update will have to come in the afternoon seeing as I'm exhausted right now. :) There was once a time when working 16 hour days didn't phase me. It sucks to get old! Well, and it's 3am. Since...

4 Comments »

TrackBack URI | Blog RSS | Comment RSS

  1. I found your site via Digg & glad I added your feed to the Google reader.
    This was quite insightful, cheers :)

    Comment by Nicholas Orr — March 26, 2007 @ 4:18 pm

  2. Thanks! It’s hard work being “insightful,” but I’ll do my best. ;)

    Comment by Michi — March 26, 2007 @ 9:32 pm

  3. Just passing by (saw the PB on SK2’s homepage)…

    As far as SQL performance hogs go, WP is one of the worst server app around. You seem to know enough about SQL schema to understand the need for indices on tables. Seems like whoever designs WP’s DB schema at the moment: doesn’t.

    Actually, the latest version of SK2 takes on itself to add some index to WP’s own tables (even though I was really loath to have a plugin affect the behaviour of the whole app, this was the only way I could get it to stop dying whenever too many bots would be attempting to post)… But I recommend you look into it yourself: the ‘post’ table could certainly use some more tidying up…

    Also, you may want to look into sk2_mod_security, as a way to reduce the load on SK2/WP2 by weeding out the nastiest IPs at the server level (I am not its main developer, but I will probably post about this SK2 module in the news feed).

    Comment by Dave — March 31, 2007 @ 2:09 am

  4. [...] to your article. DuggMirror makes a copy of all stories submitted to Digg. Michi Kono’s blog offers a guide to redirecting to DuggMirror. Another option is using Coral. Shoemoney offers code [...]

    Pingback by Hosting Lowdown » How to: Survive the Digg Effect (The Ultimate Guide) — May 12, 2007 @ 8:35 pm

What do you think?