This is a continuation of [An SRE’s confession. P(1/2)])(/confession). If you haven’t read that yet, you should start there.
For almost the entire time that the site was up in this form, it kept the “Just a hobby” status, and I treated it as such. But in reality, that had subtly changed over time. For the entire time that I have been making the FunnyHacks show on youtube, I have been linking to the site from the videos for more info. Other people were linking to it, and my CV was slowly accumulating links to it. Add to that, that my job is an SRE (Site Reliability Engineer). Keeping the company’s sites and services up, reliable, and secure is central to that role. So this site was kind of implicitly telling potential employers that I could do that.
At some point along the way, it had become important/visible enough that if it went down, it would be an embarrasment. Yet in my mind, it still had the “Just a hobby” status. This was until I replaced my payment info for AWS when a payment failed. Some weird funkiness on AWS’s side meant that the payment didn’t get automatically retried when I updated the payment details, yet the subsequent month’s payment went through fine. I got notifications, but I saw the successful transaction in the bank, so figured they must have been delayed notifications and that all was now well.
The account got locked, and access was blocked to the services. This was at a time while I was looking for work, and I knew that my CV was now riddled with links to my website, which was now down. After chatting with support, it didn’t take long to fix the problem. Although as far as I’m aware at the time of this writing (a couple of years later), AWS has not yet fixed that bug. The learning from this is that when you update your payment details in AWS, you need to manually check to see if there are any failed payments, and manually retry those.
In any case, it was now apparent that FunnyHacks.com was no longer “Just a hobby”. This was now playing on my mind, but it would have to wait. There was simply too much going on in our lives. And I don’t mean “we don’t have time, because we need to watch that episode of X-files”. I’m talking about a major problem/time sink in almost every aspect of our lives. I’m talking: Family death with a complicated and heart breaking lead up, dodgy landlord, regular visits to the hospital (independed of the death and lead up), wedding venue going bust, paperwork for the wedding, looking for work, and much, much more. My website sounds important, but it was not important compared to these.
We’ve seen one failure mode above. I’d like to tell you a story from nearly 15 years earlier:
I had just arrived in England and some friends called me in a panic to tell me that their website had just been hacked. “How do you know?”… They sent me the URL. Their content had been replaced by a mascot of a hacking group, with some text letting them know that they had been hacked. “That would do it.” I got SSH access and took a look around. Their server had been used as a proxy for the internet’s worst content, for months. My guess is that the person(s) using the server this way figured they were going to get caught sooner or later, and it was time to ditch the server.
If this were to happen to me, it would be a really bad look. Security is central to my job, and while you have to accept that breaches will happen, and instead do a good job at both preventing, and response for when they do happen, my site was seriously lacking in the prevention side. Add to that the damage that being associated with that type of content can do, this was not a good scenario to have happen.
Updates generally come in three forms:
^– Those are grossly over-simplified, but hopefully they give you a bit of a feel for it.
Attitudes towards updates tend to be pretty one-eyed. I’ll likely do a post about that at some point in the future. But for now let’s cover a couple of basics:
When a supplier becomes aware of a vulnerability in their product, they fix that vulnerability, and make that fix available in one of the three forms above.
When a given system involves something sensitive, standards tend to become applicable. Eg
There are many more standards that become applicable depending on what the business is trying to do. When those do become involved, there are legal and contractual obligations that come in to play. And usually, for the purposes of this conversation, they focus on security updates (as opposed to feature updates). The reason being that new features tend to change things, which can break fundamental assumptions. In an effort to remove this as a reason for not applying security update, security updates try not to break assumptions. They usually tweak the existing behavior a little so that the end result is close enough to the same, but the vulnerability that they are trying to fix is no longer there. In practice this can be really hard, and sometimes the behavior of the code has to change. This is why things sometimes break when security updates are applied. But the number of times that things don’t break is a testament to how much effort developers go to to make sure it’s a smooth transition.
For a business, you can gain a reasonable degree of confidence that it’s working as expected by putting in a thorough amount of automated, and manual testing. But no matter how much testing is done, the risk is still there. So the second part is watching it, monitoring it, and being ready to jump in when it goes horribly wrong.
For someone or a charity with more limited time/means this is a much more daunting task. Days, that simply aren’t spare, can be lost in an update that should have taken less than 5 minutes. This is not to say that it shouldn’t be done. But if you think it’s obvious, you don’t have a complete understanding.
There’s a community side to this as well. A decade or two ago, computers running old versions of Windows were a big problem in silently perpetuating botnets. Having an un-patched server/computer running old software can be a liability to other servers/computers if appropriate precautions are not taken. There is a lot more to say on this, but it will have to wait for a follow-up post.
But ultimately, I remember a lecturer phrasing it very well, many years ago, when it came to backups: “If you would cry if you lost you’re data, it’s time to do a backup.” And I think that applies beautifully to updates. “If you would cry if you got hacked, it’s time to update.” For a business, it’s usually “always”. For a site like FunnyHacks that holds no personal data what-so-ever it’s less clear cut.
Fast forward to this year. We had gradually worked through our life challenges and eventually got things under control. There was still plenty to do, but it wasn’t as critically urgent as it had been. We were even ready to do our honeymoon, so we got that booked for August this year (2020). A few days later COVID-19 hit the news… We’ll get there eventually :D
In the mean time, I had a site that was running as a single instance in AWS on a nearly 8 year old OS, with few updates. While I preached doing things right at work, I was doing almost everything wrong with my personal site. It was time to fix that.
In the 10+ years that I had been running the site in that form, my needs, and the computer industry had changed massively. At that time, ditching your physical servers, and trusting an over-seas company to provide you virtual machines that would perform well, was still a radical idea. And I was doing & planning a lot of dynamic content for my site. It was also a great way for me to experiment and learn about doing things in AWS independently of work.
Now, it’s fair to say that I have more experience with that ;-) and my portfolio of projects have changed enough that dynamic content isn’t that important to me right now. While a virtual machine made a lot of sense 10 years ago, a bucket running a simple static website makes more sense for my needs now.
There’s much more room to dive into this topic. So rather than add to an already long article, that will be my next article on the topic of the website, which will come out in the next month or two.
So that’s it. That’s my confession. While I had locked down my site early on, I did very little in the way of security updates to it during its lifetime. At some point that became too much of a risk, and it was time to replace it, which has now been done. I’m aware that I’m lucky that it didn’t get hacked. But I don’t feel too guilty about it given what has happened in my personal life over the last few years, if that were to repeat, my priorities would be the same. But I’m also glad that it wouldn’t be a relevant conundrum anymore, given the new design that I’ll go into in the next “/newSite” post.