screenshot of my mastodon instance

TL;DR Follow the instructions on https://github.com/faevourite/mastodon-oracle-cloud-free-tier.

When you join Mastodon you lend trust to whoever runs your server, including trust that they moderate the content, keep the service well-maintained, and back up its data. This is easy if you know the people running it! If no one in your friends/family group is running a server already, and you don’t mind the administrative responsibility that services like Masto.host save you, you can host your own without breaking the bank by leveraging Oracle Cloud’s generous free tier.

I did this recently to set up my personal instance! In addition to the reasons above, I wanted to own my data, have a @glyphy.com handle, and be able to make small tweaks to Mastodon itself (like adding custom server emoji). The last two reasons are admittedly all in service of my vanity. Mastodon can verify my identity via links in my profile, and I have writer’s block when it comes to the custom emoji (typist’s block?). But personalization is the soul of IndieWeb and I’m here for it.

While I could just create a free account on Oracle Cloud, spin up a 4-core 24GB ARM-based compute instance (free tier limit) using the console admin UI, and follow the official Mastodon installation instructions, I wanted something more maintainable and automated. If (when?) I mess up my instance beyond repair I’d like to be able to recover quickly.

Here’s what I ended up using to accomplish this:

  • Docker for running everything
    • Core components: Mastodon apps, Postgres DB, and Redis cache
    • Caddy, to serve everything over TLS, with a certificate provisioned using its Cloudflare integration
    • Backups via Kopia
    • Healthchecks.io and Newrelic for monitoring (free tiers)
  • Ansible to install and configure all of the above
  • Terraform to provision the underlying cloud infrastructure
  • Cloudflare (free tier again) to manage DNS and provide some bot protection
  • Sendgrid (you guessed it, free tier) for Mastodon emails, such as password recovery
  • Pushover for cron job failure notifications
    • This is the only thing that’s paid here (optional, though). $5 lifetime per device. It’s more than paid for itself over the years.

I put all the scripts and manifests together into this GitHub repository, along with instructions on how to get it all running.


Below are just some notes about the different choices, mostly so I myself remember them when they invariably turn out to be wrong.

Docker

I don’t want to deal with random OS portability issues or package conflicts. Docker also has a nice side effect of requiring me to think about where things will be stored. An inventory mandate.

One thing I’m worried about is the difficulty of future updates that require some manual steps in a specific order. Docker-compose’s “bring everything up together” behaviour spells danger here. I don’t mind a little bit of downtime on this personal instance, but it may be a bigger deal otherwise.

Caddy

Serves as few purposes:

  1. Multiplexes the “web” and “streaming” containers over the same domain
  2. Enables compression
  3. Sets aggressive caching headers
  4. With its Cloudflare DNS module it can provision a TLS cert, which means that Cloudflare->Mastodon traffic is also over HTTPS, and I can just block port 80 entirely

Kopia

As far as backup software goes, it’s relatively young, but I like it. I recently switched to it for some personal backups. It’s like a fancier restic. I point it at Google Drive via rclone (baked into Kopia’s Docker image). If/when I run of storage there I may move to Backblaze B2.

Healthchecks

This is a recent discovery. It’s like a dead-man’s switch for cron jobs, and has its own copious free tier. I have it integrated with Pushover and email, but it supports many other notification systems.

Ansible/Terraform

I first tried using Ansible to provision the infrastructure, but (slowly) realized why Terraform is preferred for this sort of work. The latter keeps track of state. With Ansible, I think I would’ve had to save the IDs of every piece of infra somewhere, and then read it back on startup, to avoid having it try to re-create what already exists.

Cloudflare

I turned on its proxying for my Mastodon domain. I was worried at first since they seem to mess with traffic initiated by systems, but so far I haven’t noticed any problems.

I don’t feel great about using a service that supports right-wing extremists. I’d like to move away to one that’s less harmful. For now, I just promise myself not to give them any of my money. I don’t mind turning off their bot protection, and Caddy is able to automatically refresh a TLS cert, so they’re really only managing DNS for me right now.

Sendgrid

Their customer service is atrocious and the product is stale. But I already had an account, so that’s what I went with. I turn off all email notifications from Mastodon itself except for password resets.


Overall, I’m very happy with how this all turned out. Maybe one day OCI will clamp down on its free tier or my account will get more popular (unlikely) and I’ll be forced to pay up for more infra, which is ok with me. I also like the idea of being the only one to blame when something goes wrong with my own instance.

I learned about difftastic today, which aims to show differences between files while being aware of the underlying programming language used in said files (if any).

Structured diff output with difftastic on a change in Mailhog
Structured diff output with difftastic on a change in Mailhog

It’s basically magic when it works!

I generally like the built-in diff in the JetBrains suite of IDEs. The one I use these days is GoLand, but I believe they all support adding an external diff tool. Since difftastic is a console app, here’s what I had to do on my Mac:

  1. brew install difftastic # install the tool
  2. Install ttab using their brew instructions. This allows GoLand to launch a new tab in iTerm and run the difft command there. Otherwise, using the External Diff Tool in Goland would appear to do absolutely nothing, as the output of the tool isn’t displayed natively.
  3. Configure the external diff tool using the instructions for GoLand.
    1. Program path: ttab
    2. Tool name: Difftastic (but can be anything you like)
    3. Argument pattern: -a iTerm2 difft %1 %2 %3
      1. The “-a iTerm2” is to ensure that iTerm is used instead of the default Terminal app.

Now you can click this little button in the standard GoLand diff view to open up the structural diff if needed:

Screenshot of GoLand's diff viewer with a highlight around the external diff tool button

Ideally the diff would be integrated into GoLand, but I don’t mind it being an extra click away, since difftastic doesn’t work reliably in many situations (particularly large additions or refactorings).

Prometheus has this line in its docs for recording rules:

Recording and alerting rules exist in a rule group. Rules within a group are run sequentially at a regular interval, with the same evaluation time.

Recording Rules

I read that a while ago, but at the time it wasn’t clear why it mattered. It seemed that groups were mostly intended to give a collection of recording rules a name. It became clear recently when I tried to set up a recording rule in one group that was using a metric produced by a recording rule in another group.

The expression for the first recording rule was something like this:

(
  sum(rate(http:requests[5m]))
  -
  sum(rate(http:low_latency[5m]))
)
/
(
  sum(rate(http:requests[5m]))
)

The result:

Using a recording rule from another group

It’s showing a ratio of “slow” requests as a value from 0 to 1. Compare that graph to one that’s based on the raw metric, and not the pre-calculated one:

Using the raw metric

The expression is:

(
  sum(rate(http_requests_seconds_count{somelabel="filter"}[5m]))
  -
  sum(rate(http_requests_seconds_bucket{somelabel="filter", le="1"}[5m]))
)
/
(
  sum(rate(http_requests_seconds_count{somelabel="filter"}[5m]))
)

The metrics used here correspond to the pre-calculated ones above. That is, http:requests is http_requests_seconds_count{somelabel="filter"}, and http:low_latency is http_requests_seconds_bucket{somelabel="filter", le="1"}. The graphs are similar, but the one using raw metrics doesn’t have the strange sharp spikes and drops.

I’m not sure what’s going on here exactly, but based on the explanation from the docs it’s probably a race between the evaluation of the two groups resulting in inconsistent number of samples used for http:requests and http:low_latency. Maybe one has one less sample than the other at the time they’re evaluated for the first group’s expression, which I think could show up as spikes.

Whatever the cause the solution is simple: if one recording rule uses metrics produced by another make sure they’re in the same group.

I was trying to add this site to Indieweb ring last night and found that it couldn’t validate the presence of the previous/next links, even though they were clearly in the footer of every page. I cleared the WordPress and Cloudflare caches without success.

Since Indieweb ring runs on Glitch, which is a large public service, I suspected that maybe Cloudflare was blocking their traffic. Sure enough, checking my http access logs I couldn’t find any requests from Glitch, and switching nameservers to my web host resulted in a successful check:

Indieweb ring's status checker log showing two failed checks and one successful one

This was happening even with the “Bot Fight” option set to off, “Security Level” set to “Essentially Off” and a disabled “Browser Integrity Check” option.

[side note] I love that my autogenerated site identifier for Indieweb ring is a person worried about taking pictures and writing. Accurate.

I got an M1 mac at work recently and hit a strange issue trying to run the Goland debugger:

Error running '<my test>':
Debugging programs compiled with go version go1.17.8 darwin/amd64 is not supported. Use go sdk for darwin/arm64.

Wait, amd64? I had installed the arm64 go package after the switch to M1, so why did it think I was running on amd64?

Turns out I had used homebrew on my old Intel Mac to install and use a newer version of my shell (bash). Because amd64 homebrew installs amd64 binaries the shell itself was running under amd64 and any app it ran would think the CPU architecture was also amd64. Running arch in the shell confirmed my suspicion.

The solution was to switch to the Mac’s built-in bash shell, reinstall homebrew, reinstall bash, and set iTerm to use /opt/homebrew/bin/bash instead of the default /bin/bash . I followed these instructions for switching to arm64 homebrew and kept the old Intel homebrew around aliased to oldbrew as suggested to still have access to amd64-only apps.

In re-reading The Wrong Abstraction recently, I’ve realized that while we often talk about code being an artifact of production, it often functions more as a decision log.

In her post Sandi writes about how the Sunk Cost Fallacy plays into the hesitation we as developers feel when encountering perplexing abstractions. Part of the recommendation is to consider that

“It may have been right to begin with, but that day has passed.”

I think this is often how we think about organizational decisions. We make new ones all the time, and often they alter or completely reverse those that came before, even if people who made the initial choices aren’t available for consultation anymore.

I suspect it’s easier for us to undo organizational decisions than code decisions because the former are made in a more visible and social environment. It’s easy to skip fixing the wrong abstraction when that choice may only be seen by one reviewer as part of an unrelated change. Perhaps this is where pair or mob programming can help. More eyes on the wrong abstraction at the right time could be all that’s needed to address it.

Thinking about code as a decision log could also help. Removing the abstraction is just another event. Events are bound to a specific moment in time, and maybe today is the right moment for an event that reverses some of those previous decisions.

Another goodie from InfoQ Engineering Culture podcast: CA Agile Leaders on the Using Data and Creating a Safe Environment to Drive Strategy. Some quick notes + insights:

  • Biweekly status reports don’t really mean anything. Progress is shown with real data: stories/tasks moving to Done (based on the agreed Definition of Done). This data is critical. Executives need access to it in order to make informed decisions. Q: How do you break down the work in advance effectively? If you’ve got a clear target, but no clear way to get to it, how do you convert that into stories?

The natural human reaction to try and do more when we have a history of not delivering, thus overloading the system even further

  • I didn’t realize how natural this really was until I heard it. If your team’s batting .200, just increase pitching frequency! Then you’ll definitely end up with more home runs, right? Not sure how far this analogy will go, but I’ll try: what you end up is non-stop bunting, because there’s no time to wind up or recover.
  • What worked for CA was Management asking everyone in the org to limit their WIP. That’s true leadership courage right there.
  • Organizational temporal myopia: people’s inability to see themselves in the future, applied to organizations. Organizational focus is often on the long-term vision, so individuals end up inflating the amount of work that can get done to get to that vision sooner. I think the implication in the podcast is to focus on and commit to the short-term (sprint?), because you can imagine what the organization will be like a week from now versus a year from now.

Everybody gets mad if you’re not working on their stuff

  • Nothing gets done, but at least it looks like you’re working on everyone’s thing. Addressing this may be a matter of shining a light on what is vs. isn’t being accomplished. Circles back to safety. If you feel you can’t say that you’re gonna miss, it will look like everything’s going great with 67 projects in progress until the last second when all of them go red.

Not one person can save the ship from sinking, and not one person can bring the ship to shore.

  • There’s a culture of heroism, which mitigates against this collective thinking. Leaders are incentivized to work against each other by optimizing their portion at the cost of other parts of the org. There’re individual performance reviews too. Individuals get praised, which then causes people to create situations in which they have an opportunity to save the day. CA tries to show that collective work is more effective than individual, and work with organizations to change incentive structures away from individuals and toward teams.
  • Act “as-if”. Start showing and encouraging behaviours before they’re made “official”.
  • Withholding information is punishment. It’s actually painful. People will do this to each other at work. Culture of transparency helps work against these bad behaviours.

Lately, I’ve been listening to podcasts on my commute and while at the gym. The InfoQ Engineering Culture podcast is probably my favourite one so far.

The last episode I heard was Diana Larsen on Organisation Design for Team Effectiveness and Having the Best Possible Work-Life. The show notes are fantastic, but I thought I’d jot down the points that particularly resonated with me.

Every team needs a purpose

The boundaries of the work are important. Not feature scope, but “what kind of work are we doing?”, and “how is it unique?” The agile teams I’ve been on had a purpose, but the boundaries were only loosely defined. This resulted in unexpected work landing on our plate, because it seemed like it didn’t belong there the least, which was as awkward as it sounds. This has the effect of diluting the team’s focus, making it possible for every member to push in different directions.

A group of people need to build some mutual history to become a high performing team

Shared experiences bring people closer together. It makes a lot of sense, which is why it’s so surprising that teams are often expected to rush through the Forming, Storming, and Norming phases without being able to first, you know, ship something together.

This history can be built quickly when responding to a crisis

This makes me wonder if the most effective time to spin up a team is when there’s an urgent business problem requiring attention. Is there an equally-effective way to start a team with longer-term or less-visible deliverables?

Software is learning work – typing is not where the work gets done

Right, so why do I feel guilty about spending office time learning? There’s a constant sense of urgency in a business (especially a small one, I think) to ship, and pausing to learn seems to work against that goal. I don’t want to read every tech book that’s relevant to my work before writing any code, but also want to be able to spend time digesting information before rushing to apply it.