Categories
monitoring optimization Performance python Technology

a scalable Analytics backend with Google BigQuery, AWS Lambda and Kinesis

On my previous post, I described the architecture of Gimel – an A/B testing backend using AWS Lambda and redis HyperLogLog. One of the commenters suggested looking into Google BigQuery as a potential alternative backend.

It looked quite promising, with the potential of increasing result accuracy even further. HyperLogLog is pretty awesome, but trades space for accuracy. Google BigQuery offers a very affordable analytics data storage with an SQL query interface.

There was one more thing I wanted to look into and could also improve the redis backend – batching writes. The current gimel architecture writes every event directly to redis. Whilst redis itself is fast and offers low latency, the AWS Lambda architecture means we might have lots of active simultaneous connections to redis. As another commenter noted, this can become a bottleneck, particularly on lower-end redis hosting plans. In addition, any other backend that does not offer low-latency writes could benefit from batching. Even before trying out BigQuery, I knew I’d be looking at much higher latency and needed to queue and batch writes.

Categories
monitoring optimization Performance python Technology

a Scaleable A/B testing backend in ~100 lines of code (and for free*)

(updated: 2016-05-07)

tip-toeing on the shoulders of giants

Before I dive into the reasons for writing Gimel in the first place, I’d like to cover what it’s based on. Clearly, 100 lines of code won’t get you that far on their own. There are two (or three) essential components this backend is running on, which makes it scalable and also light-weight in terms of actual code:

  1. AWS Lambda (and Amazon API Gateway) – handle the requests to both store experiment data and to return the experiment results.
  2. Redis – using Sets and HyperLogLog data structures to store the experiment data. It provides an extremely efficient memory footprint and great performance.

For free?

Categories
monitoring Security Technology

Route53 healthcheck failover for SSL pages with nginx

UPDATE: AWS recently introduced SSL Health checks. So the method in this post should no longer be necessary.


Amazon Route53 offers a DNS healthcheck that allows you to failover to another host / region if one IP is not responsive. This works great if you want to create a secondary site, or even a simple maintenance page to give your users a little more info than just an empty browser window.

There are some limitations to the healthchecks currently. Route53 allows you to choose between TCP and HTTP. However, there’s no HTTPS / SSL support for URLs.

So what can you do if your site is running only with SSL?

Categories
monitoring Security Technology

Getting a bit creepy

I spend a lot of time working with monitoring solutions, and like to measure and track things. The information we collect from our apps tells us a lot about what’s going on. Who’s using it. How frequently they access it. Where they are from. How much time they spend accessing the app etc. And then there’s a lot we can do as app owners with this data. We can measure it, trend it, slice and dice and produce nice reports. We can also action on this info. Offer people stuff based on their behaviour. Use those ‘lifecycle’ emails to improve conversion. Increase our sales. Bring people back to using our products.

I’m getting used to those supposedly-personal email from Matt, the founder of Widgets inc. who’s “just checking if I need any help using the product”, or Stuart from Rackspace who has “only one question”. I know it’s automated, but it’s fine. As long as I can hit reply and actually reach a person, that’s ok with me. I pretend to not notice.

However, I’m feeling recently that some of those emails get a little creepy. A couple of random examples:

Categories
graphite monitoring ruby Technology

Measure *everything*

Just a quick link to my recent talk at Ruby User Group Berlin

Slides are available on github

Categories
graphite monitoring Technology

Graphite Alerts with Monit

I love Graphite. It’s the most robust, flexible, kick-ass monitoring tool out there. But when I say monitoring, I’m actually not describing what graphite really does. In fact, it does almost anything but monitoring. It collects metrics via carbon, it stores them using whisper, and it provides a front-end (both API and web-based), via graphite-web. It does not however monitor anything, and certainly does not alert when certain things happen (or fail to happen).

So graphite is great for collecting, viewing and analyzing data, particularly with the multitude of dashboard front-ends, my favourite being giraffe ;-). But what can you do when you want to get an email or a text message when, say, carbon throws some errors, or your web server starts to bleed with 500’s like there’s no tomorrow? Even better – do you want to get an email when your conversion signup rates drops below a certain mark??

Monitoring graphite

So what can you use if you want to monitor stuff using graphite? And what kind of stuff can you monitor? I’ve come across a really great approach using nagios. In fact, I ‘borrowed’ the method the author was using for alerting on 500 errors for my own approach. So I wanted to do something very similar, but I really didn’t want nagios. It’s an overkill for me, if all I want is to get an email (or run a script) when something goes wrong.

Categories
monitoring python Security Technology

Statsd and Carbon security

I’ve written about installing and using Graphite and it’s a really great tool for measuring lots of kinds of metrics. Most of the guides online don’t touch on the security aspects of this setup, and there was at least one thing that I thought should be worth writing about.

How are we measuring

Metrics we gather from our applications have the current characteristics / requirements:

  • We want to gather lots of data over time.
  • Any single data-point isn’t significant on its own. Only in aggregate.
  • Measuring is important, but not if it slows down our application in any way.
Categories
linux monitoring network Technology wordpress

Webfaction fail. over.

This post starts as a rant about webfaction, but somehow turns into a rave. I recently discovered (the hard way) that I can failover almost any site to a secondary host in a different data centre, all with a few scripts on a webfaction shared hosting account.

Categories
django monitoring python Technology

Fabric Installer for Graphite

fabric-graphite is a fabric script to install Graphite, Nginx, uwsgi and all dependencies on a debian-based host.

Why?

I was reading a few interesting posts about graphite. When I tried to install it however, I couldn’t find anything that really covered all the steps. Some covered it well for Apache, others covered Nginx, but had steps missing or assumed the reader knows about them etc.

I’m a big fan of fabric, and try to do all deployments and installations using it. This way I can re-run the process, and also better document what needs to be done. So instead of writing another guide, I created this fabric script.

Categories
django monitoring optimization python Technology

django memory leaks, part II

On my previous post I talked about django memory management, the little-known maxrequests parameter in particular, and how it can help ‘pop’ some balloons, i.e. kill and restart some django processes in order to release some memory. On this post I’m going to cover some of the things to do or avoid in order to keep memory usage low from within your code. In addition, I am going to show at least one method to monitor (and act automatically!) when memory usage shoots through the roof.