Blog

Helping you keep your server online and your website fast.

Node.js and Drupal — Working Together - Presented at STL.JS May 2014 Meetup

Jeff Geerling presented a case study on Server Check.in at the May 15, 2014 STL.JS Meetup. The presentation details how Server Check.in was built and marketed and how it uses Node.js, Drupal Ansible, and inexpensive hosting providers to allow rapid development and reliable deployments.

Click through the slideshow below, or view the slideshow using the links below.

Link for full slideshow:

Server Check.in Case Study - Presented at DrupalCamp STL.14

Jeff Geerling presented a case study on Server Check.in at DrupalCamp STL.14. The presentation details how Server Check.in was built and marketed and how it uses Drupal, Node.js, Ansible, and inexpensive hosting providers to allow rapid development and reliable deployments.

Click through the slideshow below, or view the slideshow or full presentation video using the links below.

Links for full slideshow/video:

1 minute checks for all accounts

After over a year's worth of testing and expansion, Server Check.in is able to handle 1-minute checks for everyone, regardless of the plan you're using.

Our check infrastructure has not been the bottleneck for some time (since we've expanded to, now, 6 servers throughout the world)—it was mostly the database server not being able to gracefully handle the load of (potentially) 10,000+ checks per minute. The database structure has been improved, and frequent writes have been optimized, and we hope Server Check.in continues to be one of the most inexpensive, simple and robust services for monitoring your servers and websites!

Server Check.in, one year later

It's been a year since my original Show HN post announcing Server Check.in, so I thought I'd post a reflection after my first year running the service.

Server Check.in is an inexpensive website uptime monitoring service that I started out of my own need—if I had no users, I'd still keep it going. Fortunately, there are enough paid customers to keep me interested, and I've learned a lot from their feedback.

I had a good initial batch of signups after the announcement post on HN, and some other posts on Reddit, Low End Talk, and other forums. These early customers gave me a lot of feedback, and some suggested fairly major revisions to the service. It took a lot of discipline early on to make sure I only worked on the most valuable features first.

I decided early on to focus on stability, scalability, and performance before working on some of the requested features. This has paid off many times, as I have not yet had to revisit Server Check.in's architecture and basic functionality since a couple months in. The distributed Node.js-based server checking is working very well, and the master app server running Drupal with PHP/MySQL (which runs this blog) still has plenty of room for growth.

Below are some of the major lessons I've learned in the past year; I hope you can learn something from my experience.

1. Your priorities are not your customer's priorities.

This is a continual struggle. I have some pretty neat features I'd like to work on to make the site look and function better—but most of the features my customers (both existing and potential) need are features that are more invisible or are a little less fun to implement. I find that if I develop with in a tick-tock cycle, where I develop a feature I want, then one a customer wants, I keep the development interesting, and keep my customers happy.

2. Sales is an uphill battle (for a developer).

I understand why sales people seem to be pushy—it takes a lot to convince some people that a product would benefit them, even when they already know it. Especially with for-pay services, you need to make a hard sell to get most of your customers. (This assumes you have a good product in the first place, of course!).

I'm a developer. I like creating cool things, and I don't like 'wasting' my time selling these things. The reality is, though, that development is meaningless without sales, and time spent getting new customers is never wasted.

3. Contributing back to OSS improves you and your code.

Not only does contributing back (patches, documentation, testing, modules, etc.) a Good Thing™ in general, it also helps you build rock-solid components. Instead of spending an hour on a bit of code and coming up with an inflexible solution, you'll end up with a flexible, stable, and much more useful tool or solution if you decide you want to contribute it back to a community.

I don't do sloppy work in any of my projects, but I am especially thorough when I write code for an OSS community. Through my work on Server Check.in, I've been able to supply a Node.js module, some Drupal contrib module patches, and some blog posts on the process of using different OSS platforms together to build Server Check.in.

Finally, if I hadn't been following best practices for the different coding communities in which I participate, I would not have as secure of code, almost complete test coverage, or a highly organized infrastructure.

4. Keeping a good work-life balance prevents burnout.

In the past, some of my side projects became burdens due to the time I devoted to them, and my excitement for them fizzled after a few months. Most weren't profitable anyways, but I learned this: if you spread yourself too thin, you will become a lot less effective at the things you do. You only have enough bandwidth for a certain number of things.

I develop Server Check.in and other projects in my spare time, and I keep my priorities straight: family first, full-time job second, side projects third. I still ensure my side projects are reliable (Server Check.in has had more than 99.9% uptime this year, as measured by Pingdom); but I split my time in a way that keeps me sane. This helps everyone to whom I have an obligation to serve—wife, kids, co-workers, and customers.

5. Experimenting makes you a better developer.

Without Server Check.in's HTTP request scaling issues, I probably wouldn't have spent much time learning Node.js or learning how to code for asynchronous functionality. I went from being able to check a few hundred servers per minute to being able to check thousands (and more, by simply adding more Node.js check servers).

I also learned (and have adopted for other infrastructure needs) Ansible after realizing my hacked-together shell script deployment strategy wouldn't scale past a few servers. Now instead of taking an hour or two to get a new server spun up, I can have one in a couple minutes.

Finally, I've worked quite a bit on Stripe and Twilio integration, and now know parts of their APIs pretty thoroughly. This has already helped me in my current full-time job, and will continue to pay off—both literally and figuratively!

You aren't allowed to take risks or go off in crazy new directions in most jobs (especially in the 'enterprise' or corporate arena). Flex your development muscles and increase your enjoyment of developing with a side project or two. Who knows? Some of the things you learn might become extremely valuable for your next job—or help you in your current one.

6. It's harder than you think to build a simple tool like Server Check.in.

If you're building a one-off utility to check whether a server is up or down, for a few servers, and you're the only one that will view this utility, it may be simple. But add in historical tracking, latency monitoring, SMS and email notifications, thousands of concurrent requests, a user and billing system, and other bits and pieces, and the project is no longer as simple as it seems.

Many developers (myself included) think to themselves, "this sounds really simple—I'll just build my own". Almost always, it takes more time than you'd think to get everything running. What is your hourly rate? If you're working on a project for 12 hours, at $70/hour, was it really worth the almost $1,000 to have a product that isn't as good as one you could've paid for at a fraction of the cost?

Sometimes it's worthwhile because of point 5 above—but other times, what's the point? Don't waste time on side projects that won't hold your interest. Often paying for something or using an open source application that fits most of your requirements is better than spending a lot of time on a bespoke application.

Stay Hungry

These are just a few of the lessons I've learned. I could probably think of many more, but I need some material for next year's retrospective! I hope something you read here can help you when you're deciding what to do next, or how to improve your current projects. If you have any more questions, please let me know on Hacker News or in the comments below.

A little jitter can help (evening out distributed cron-based tasks)

Ever since Server Check.in started using multiple, geographically-distributed servers running a small Node.js app to perform server checks, we've been monitoring the number of checks per server on an hourly basis, and calculating the standard deviation based on the numbers for each server.

We noticed an alarming trend; some servers were checking more than 40% more servers than others! The main queue server that controls when servers get checked uses timestamps to control when the servers are checked, so we thought to introduce a little jitter by adding +/- 10 seconds to the timestamps every time they were checked. Unfortunately, this did nothing at all to spread the checks among the different servers.

Then we noticed something peculiar: the main queue server is in New York, and the server with the greatest consistent number of checks was the server geographically closest, in Atlanta. Then came the server in Dallas, then one in Seattle, etc. until we reached the stragglers, all located in Europe.

We finally realized the problem: all our servers are synchronized via NTP to within a few ms of each other, and our custom Node.js app queries the master queue server every minute, at the beginning of the minute, for a list of servers to be checked. Because some servers were geographically closer than others, their requests almost always arrived a few ms sooner than others. Because of this, the closer the server was to the master, the more likely it would get a full chunk of servers to check every minute.

Servers that were further away and had slower ping times (~70 ms vs. ~20 ms for the closest server) were more likely to be cleaning up the tail end of the list of servers to be checked in a given minute.

Solution: Add jitter to cron jobs

The solution for us was to add jitter to the cron/periodic jobs on the distributed servers. Instead of all the servers running the same command at the exact same time (every minute), the servers run the job with a little jitter—variation in the precise time the job runs.

There are a few ways to accomplish jitter, and the easiest is to run cron itself with the -j [0-60] option, which uses cron's built-in jitter... however, this (a) applies to all cron jobs, not just the one you want to have jitter, and (b) only works with vixie-cron and it's derivatives (so, FreeBSD, CentOS 5.x, etc., but not most modern linux distributions).

The solution we're using involves calling an intermediary shell script from cron (instead of the original command directly), which adds its own jitter. Here's an example of the script:

#!/bin/bash
#
# Run shell-script.sh after a few seconds of delay (jitter).
# @see http://stackoverflow.com/a/16873829/100134
#

# Add jitter by sleeping for a random amount of time (between 0-15 seconds).
WAIT=$(( RANDOM %= 15 ))
sleep $WAIT

# Run the original command/script.
/bin/bash /path/to/shell-script.sh

And instead of invoking the original command/script from crontab (crontab -e), we call the shell script instead:

# Contents of crontab.
* * * * * /path/to/jitter-script.sh

We've been running with this setup for a few days now, and the standard deviation is down to within about 5%, which is fine by us. Our server check load is now spread among all our servers evenly, and our capacity and data reliability is improved as well.

It's enough to make us want to dance a little jitterbug :)

Besides server checking, there are many other situations were adding a little jitter can help—when sending backups or grabbing data to or from a particular server, jitter can save that server from getting slammed! Maybe it's time to introduce a little uncertainty into your periodic tasks.

New Server Check.in Features - October, 2013

Clock - Check Interval        Earth - Global Checks        Radar - Ping

We've been relatively quiet for the summer months, working hard to improve our infrastructure and many behind-the-scenes aspects of Server Check.in. But there are a few features we're very excited to announce today:

  • 1-minute check intervals for Premium plan, 5-minutes for Standard
    This is the most often requested feature we've had in the past year, and now that our architecture has been improved to the point where more frequent checks won't affect the quality of Server Check.in's core service—notifying you when your servers are up or down—we have added it! Stay tuned, though; there's more to come...
  • More check servers (U.S. and Europe) added
    As mentioned in our blog post about reducing technical debt, we are now checking your servers from four geographically-distributed servers, and will be adding servers as time and budget allows. Please see the Check Servers page for a listing of the servers, their IP addresses, and their geographical locations (login required).
  • Infrastructure improvements (better uptime, even fewer false positives)
    Through the summer, instead of focusing on features, we've been hard at work identifying false positives (there have been very few, but we strive for perfection!), finding bottlenecks, and improving the site's UI and response times.

Server Check.in has continued to improve in reliability, speed, and features since day one; we're proud to report over 99.9% uptime since launch (as reported by Pingdom)! Please continue to post comments here and contact us to let us know what we can do to make Server Check.in better for you, personally!

And, if you're reading this and aren't yet a customer, tell us what you'd like to see from Server Check.in to entice you to sign up!

Pages

Subscribe to Server Check.in Blog Subscribe to RSS - Server Check.in Blog