Blog

Helping you keep your server online and your website fast.

Four years running Server Check.in

It's nearly been four years since I launched Server Check.in. A lot has changed since then, but the core of the service is exactly the same: focus on simplicity, and send alerts for servers or websites via email and/or SMS.

I started this service as a ridiculously cheap alternative to much more expensive (and feature-heavy) services like Pingdom, Ruxit, NewRelic, AppNeta, etc. My idea starting Server Check.in was simply: I need a cheap way to monitor all my servers and get a reliable SMS notification every time one goes down, from the same phone number each time.

Anything else is icing for many of the servers and services I run, and I often supplement Server Check.in with open source monitoring tools like Munin and Elasticsearch/Logstash/Kibana if I need more in-depth monitoring.

Many people have asked a few other questions about the service, though, and I though I'd answer them here:

How is business? How many customers do you have?

A lot of people have asked about this, and in the past I've been coy with hard data. But I figured it's been four years, the service is running strong, and I don't think hockey-stick growth is in the cards.

In the first year, I grew to about 70 subscribers with only the $15/year plan—around $1,000 in Annual Recurring Revenue (ARR). The subscriber base has grown to over 120 (with no marketing outside of word-of-mouth), and now that I've added a $48/year plan, ARR is about $2,100. This is nowhere near a life-sustaining amount of money, but my goals early on were to basically build an SMS-sending uptime monitoring service for dozens of my own servers—not to make any major profits.

The costs to run the service were (and still are) minimal (besides my time of course!):

  • DigitalOcean prod Drupal 2GB droplet: $20/month
  • DigitalOcean hot spare Drupal 512MB Droplet: $5/month
  • 8-10 globally-distributed Low End Box-style servers: ~$120/year
  • 'Close-to-unlimited' international Twilio SMS: ~$40/month

Annual recurring revenue: ~$2,100
Annual expenses: ~$900
Annual profit: ~$1,200

It's not a huge deal, but it is enough to give me spare cash to do things like build Raspberry Pi Clusters for fun and education, buy little things like a nice new trackpad and keyboard every year, and pay for a bunch of other servers used for nonprofits or local user groups.

If I were in this for the profit, I would have to work a lot on marketing, increase the plan pricing, stop using 'real' SMS (which costs money) and fall back to the email-to-SMS gateways most of the free or cheaper services use, and push more people towards the higher-profit-margin plans.

As it is, for the past two years I spend minimal time doing anything besides ongoing maintenance, so I'm probably going to keep things as-is at least until I work on a couple long-term goals, like a Drupal 8 and API-driven replatform, and moving all the backend servers to a Go app instead of Node.js.

Technically, the service is performing over 100,000 individual checks per day, tracking an average of 150 outages per day, monitoring over 500 servers.

Do you run the service by yourself? Is it self-sustaining?

Yes, and yes—but there's no way I could quit my day job with Server Check.in alone. Throw in Hosted Apache Solr and writing projects like Ansible for DevOps, and it might be more plausible.

But to turn a SaaS product into something that sustains more than one person requires a higher profit margin and a lot more marketing.

What are some of the best decisions, in hindsight?

The original version of the site ran on PHP alone, with one server performing all checks and sending all notifications. Rebuilding the server checking functionality as a microservice that ran on Node.js allowed me to architect it better for scalability (see older post, Moving functionality to Node.js).

After that move, I also migrated all the shell-scripted server build process entirely to Ansible, so all servers are managed with an 'infrastructure as code' approach. I can now spin up a new check server, no matter what low-cost hosting provider I rent space from, in about 5 minutes. And I can also bring up the entire stack locally using Vagrant and Ansible for testing in a matter of 10 minutes or so (even allowing me to test situations like high network latency between servers, all on my local Mac).

Another decision I made at the outset was to use Stripe for payment processing instead of some of the other payment processors I had used for other services in the past, like PayPal or Authorize.net. Stripe's developer-centered focus attracted me, and the UX and low fees are bonuses. Stripe has been reliable, easier to deal with than any other payment processor I've used, and integrating automated tests with Stripe's built-in test environment is a breeze!

Finally, one decision which I've waffled on, but in the end is probably best, is that I used Drupal to build the site's API and front end. Drupal is my golden hammer, but that doesn't mean bending a system like Drupal to do something that might be better suited to a smaller framework or even a different language entirely is a bad thing. It works for me, it's been highly resilient (even running everything from the UI to the API backend), and it's aged well, since the front-end of the site is extremely focused. I just needed a system to allow user registration and access management, content management, and content display—Drupal handily checks off those boxes and is plenty fast.

Why haven't you offered a free tier?

This was a decision I've gone back and forth on dozens of times, but I always end up sticking with no-free-plan. There are dozens of server uptime monitoring systems that offer a free tier—usually with just email, checks only every 5 or 15 minutes, and if any SMS, only 'email-to-SMS' gateways instead of SMS from a real phone number (since the latter costs money).

I almost implemented the same thing for Server Check.in, but I realized that every minute I spent working on that plan, and spent helping customers who never planned on converting to the paid plan, was a minute taken away from helping the people who make the service possible—paid clients.

In addition, having a free plan early on would've likely killed the service on the scalability front, because it took a solid year or so to iron out all the wrinkles involved in building a resilient, distributed microservices-based architecture for the server checking backend.

What makes Server Check.in better/different than free uptime monitors?

From the beginning, there have been many people who ask the question "if there are already [X] number of [server uptime monitors], why build a new one?" (This question is asked of almost any new software product these days). There are some small differentiators, of course, like:

  • SMS messages sent via Twilio from a unique, consistent phone number (so you can set a ringtone for notifications, and receive SMS in any country)
  • 1 minute check frequency
  • Extremely low price ($15/year for 5 servers)

But the main reason was stated earlier: I wanted something like Pingdom et all, but with just one feature (tell me when my server's down) for all my servers, without having to pay a ton of money. So I built the service for myself, and would gladly pay myself $15/year for the service (though I've graduated my own plan to a 50-user plan for $48/year!).

How do you stay motivated to keep the service running after four years?

There are so many 'side-projects-as-services' I've seen that don't gain much traction then are abandoned a year or two later. But because I use it myself, and because I've set it up to be very low-maintenance, I've been able to keep things running without a hitch for four years (and counting).

It's also continually fresh for me, as I can try out a new language or technique, or hone my skills on a particular feature. For example, in 2016, even though I've only worked on the site maybe 20 hours total, I've been able to reduce the mean authenticated page load time (as measured by GTMetrix and Pingdom) from 2 seconds to < 1 second.

I also use the service as a test bed for Ansible, and have used the experience to improve and expand my book, Ansible for DevOps.

Related

Here are a couple other posts from pivotal moments in Server Check.in's lifespan:

I'll likely do another retrospective in 2018 or so—see you then!

- Jeff Geerling

Responsive Emails, SVG on the site!

At long last, we had time to update some of the graphics and CSS driving the site and email design; for a very long time, emails would extend beyond the bounds of smaller device screens, making for a frustrating side-scrolling experience on mobile.

Additionally, way back when the original version of this site was designed, SVG files were not well supported in all browsers, so we used a 'poor man's retina compatibility mode' by uploading a few double-resolution PNG files for certain graphics... meaning things looked sharp on high-res displays, but the Server Check.in site would take a little longer to load—never a good thing for a service that holds itself to a standard of speed and simplicity!

So for emails, we futzed with the one-column layout to make them completely responsive, meaning you can read the email on the smallest mobile device or a giant 5K display, with no need for scrolling.

And for the site, we regenerated all the vector graphics as SVGs instead of double-resolution PNGs. The results of this change are pretty dramatic—average page load time went from 1.3s (for 156KB of assets) to 1.0s (for 89.4KB of assets).

That's a 50% reduction in data, and almost the same reduction in page load time!

If you have a site that still uses rasterized graphics where you could be using SVGs, consider taking the plunge and converting all the graphics—all modern browser now support SVGs very well, and there's really no reason to use GIF, PNG, or JPEG if you don't have to!

Server Check.in Architecture and Dashboard Improvements - Summer 2014

Over the past few months, a few improvements have been made to Server pages, as well as some major underlying infrastructure changes which have helped to dramatically reduce false positives.

More information about server outages

For every server in your account, data about which server reported an outage, and the reason for the outage (a 'status code'), are shown on your Server's main dashboard page, in the 'Outages' section. This should help you determine if your site may be having trouble only in a particular geographical region, or if your site is having a particular issue (like DNS trouble, or the page is returning a specific non-200 HTTP code).

Hopefully this extra data will be of use! We're working on integrating the data into email communications, and possibly SMS messages as well, so be on the lookout for that—or not, if you have 100.0% uptime!

General architecture improvements

Some users were reporting periods of frequent false-positives, where a server would reportedly go down-and-up many times in a short period of time. This was understandably very annoying, and was also extremely hard to reproduce, since the problem had to do with network connection issues between check servers (which are spread around the world, and among different hosting providers and networks!).

We have modified Server Check.in's check architecture so it will now perform redundant confirmation checks on servers that are in different areas than the original server that reported the outage, instead of confirming the outages from only one or two servers (potentially in the same region).

So far, it's been over two weeks since we deployed this improvement, and reports of false positives have decreased to zero!

As part of this work, we also did extra load testing to ensure that Server Check.in will be able to continue to quickly scale as our user accounts grow—the original version of Server Check.in could've only supported a few hundred customers at 10 minute checks, but the current architecture can support at least a thousand customers with 1 minute checks!

Thanks for using Server Check.in, and please contact support if you have any issues or questions!

New Feature: View Invoices Online

Server Check.in aims to keep things simple... but sometimes things are a little too simple. We email a very barebones invoice to you when you sign up for the service, and when your account renews, but we've received many requests to allow viewing full invoices online as well, especially for those who need to submit invoices to business offices.

View invoice link

You can now view your invoices under the 'Billing information' section of your account (see illustration above). To view or print an invoice, log in, click on "Edit Account", click on "Billing Information", then click on the "view" link next to an invoice. We've even added a nice, big "Print" button on the invoice so you don't have to move your mouse up to the File menu :)

Node.js and Drupal — Working Together - Presented at STL.JS May 2014 Meetup

Jeff Geerling presented a case study on Server Check.in at the May 15, 2014 STL.JS Meetup. The presentation details how Server Check.in was built and marketed and how it uses Node.js, Drupal Ansible, and inexpensive hosting providers to allow rapid development and reliable deployments.

Click through the slideshow below, or view the slideshow using the links below.

Link for full slideshow:

Server Check.in Case Study - Presented at DrupalCamp STL.14

Jeff Geerling presented a case study on Server Check.in at DrupalCamp STL.14. The presentation details how Server Check.in was built and marketed and how it uses Drupal, Node.js, Ansible, and inexpensive hosting providers to allow rapid development and reliable deployments.

Click through the slideshow below, or view the slideshow or full presentation video using the links below.

Links for full slideshow/video:

Pages

Subscribe to Server Check.in Blog Subscribe to RSS - Server Check.in Blog