Nothing Easy is Ever Simple or: How I Learned to Stop Bungling and Love Backups

The events described below occurred during the break following the Christmas holiday. I’ve just been too busy/tired/sick/lazy to write it all down until now.

Inspired by @nothingfuture’s post:

I thought to myselft, I have plenty of Raspberry Pis lying around, and this seems like a straightforward project…

But then I thought “But wait, I have a custom-built bespoke PC running pfSense that all my traffic’s running through anyway, surely that’s a better target for this project!”

So I fell down a different sort of hole looking for an alternative to a pi-hole.

And I found pfBlocker-NG.

And it installed without a hitch, was super easy to set up, and everything is awesome.

The package actually installed just fine. And then the murders problems began:

  • Problem #1: After installation, the Web UI in pfSense stopped working.
    • It turns out that the version of pfBlocker-NG that I installed, which was the correct version for my version of pfSense, had a dependency on a newer version of PHP (ugh) than the version I had installed.
    • The package manager in pfSense helpfully installed the newer version of PHP. It also helpfully uninstalled the existing version of PHP which pretty much every other installed package was dependent on, and which the newer version was incompatible with.
    • This left me with an unusable web interface.
    • Luckily, I still had a working router, just no easy way to administer it.
    • I had made the change late at night, as I often do, so I went to bed with plans to tackle the problem the next day.
    • By the way, Netgate/pfSense acknowledged that this was a bug some time ago, but didn’t consider it worth fixing.
  • Problem #2: Upgrading pfSense left me with a broken router.
    • So I did some digging the next day, and it looks like the dependency issue would be resolved if I just upgraded to the latest version of pfSense.
    • I was running version 2.4.3-something and the new version was 2.4.4.
    • Since things had already gone poorly, I decided to make a backup of the configuration before proceeding
      • I copied the config from it’s primary location to another location on the drive.
      • I considered copying it literally anywhere else, but didn’t want to be bothered with mapping a drive from bsd to windows or linux, or dealing with ssh file transfers. The fact that I’m even mentioning this is a pretty big clue that this came back to bite me in the end.
    • It’s a point release, so surely nothing could go wrong.
    • Something went wrong.
    • Apparently when I originally set up the PC that I installed pfSense on, there was an option in UEFI (née BIOS) whether to use a Legacy storage mode or a modern one. I honestly can’t remember what the setting is called, nor can I remember making the choice back then.
    • It turns out that this setting is vitally important, as in pfSense 2.4.4, they stopped supporting the legacy mode, and only support whatever the new mode is called.
      • When the upgrade was performed, it was unable to complete due to being unable to write to the drive.
      • Unfortunately, this left the drive in a completely unbootable state.
    • Meanwhile, my daughter has noticed that the Internet has stopped working.
  • Problem #3: No installation media for pfSense
    • So now I’m left with no other choice but to reinstall pfSense and see if I can recover anything off of it.
    • Without an internet connection.
    • Since my pfSense box is also my DHCP server, almost nothing has an IP address, so even local services are failing.
    • I have my phone, so I can at least do some research, and I have a Xubuntu installer, so I can boot from that and see if I can recover anything.
      • I can find an installer for pfSense, but downloading it via my mobile carrier(Google Fi) would cost literal dollars, and I’m far too cheap/stubborn to do that.
      • Mounting the drive from the Xubuntu live image reveals a 35MiB partition on the drive, which contains neither my primary nor backup copy of the configuration.
        • I think to myself, “Maybe BSD partitions are weird and Linux can’t just read them out of the box…”
    • I find evidence online that the pfSense installer has a slick recovery feature that has saved many a foolish admin.
    • I spend some time getting the Google Fiber box working enough so that I can download the installer, flash it to a USB drive, and boot.
  • Problem #4: Recovery doesn’t.
    • I spend multiple hours futilely attempting to recover something from the existing pfSense installation.
    • I’m never able to find anything on the drive other than that odd/useless 35MiB partition, even when using the pfSense recovery process.
    • As alluded to earlier, something went wrong during the upgrade process, and not only did it fail to write the update, it somehow deleted the existing installation entirely.
    • Out of options, I proceeded with a fresh install.
      • For context, most of this took place in my basement, which is unfinished and quite cold.
        • On the plus side, I did discover that the network port on my laptop which had stopped working after upgrading to Windows 8, had started working again sometime after that, but I digress.
    • It’s late, and my daughter has gone to bed after spending a day without the Internet.
  • Problem #5: I’m really bad at configuring this from scratch.
    • After installation, which took almost no time at all, I now had to get the Internet back up and running.
    • I went through several iterations before being able to get the WAN port on my router properly configured.
      • I honestly can’t remember what finally worked, but I even went back to using a standalone POE injector instead of the multi-port one I had been using.
    • It was all trial and error.
    • After that, it was a fairly straightforward matter of getting basic services like DHCP back up and running, and then tweaking QoS to get Google Fiber running at full speed.

In the end, I lost about a day’s work out of it. I’m on 2.4.4 now, but I haven’t set up ad blocking yet, or the VPN, or static IPs, etc. I’m still a bit sore about it all. Other than the backup, I don’t really feel like I made the wrong decision. They just ended up being the wrong decisions as a result of other circumstances that I didn’t predict. Which is why we make backups in the first place. On the upside, I’m now using the backup service built into pfSense that automatically uploads a copy of your config to the cloud every time you save, and gives you access to many previous snapshots that you can roll back to. I’m going to assume is brand new in 2.4.4 and has never existed before because it would have saved me so much time if it had existed, and I would have been a fool not to have taken advantage of it…

So, to recap:

  • Make backups.
    • It’s not a backup if it’s on the same machine. That’s just a copy.
  • Don’t trust version numbers.
  • PHP is the worst.
  • Open Source is only free if your time isn’t worth anything.
    • I kid, mostly. Using software written for underpowered hardware written by probably the lowest bidder isn’t better, it’s just less time consuming if you never want to do anything more advanced than open a port in your firewall because some game doesn’t work right.
  • Make backups. Seriously.
  • Don’t make big changes to important parts of your infrastructure when you’re on vacation, because your daughter is probably on vacation too.
    • I’m a bit fuzzy on the exact timeline, but there are many missing “Dad, when are you going to fix the Internet?” moments missing above.
  • Learn from my mistakes.
12 Likes
7 Likes

Wow, that looks like a lot of work! (Almost none of which I understood.) glad you only lost one day’s work.

Note to self: make a backup today.

7 Likes

In general, I’m really happy with pfSense. I built it in order to capitalize on the speed at which a real computer can handle routing traffic instead of something embedded, but also because in general the answer to the question of “Can I do this?” with pfSense is generally “Yes, and…” In this particular case, it failed so spectacularly, but I could have mitigated the damage and been back up and running in a matter of minutes if I had just done a better job of preparing for any disaster.

6 Likes

I’ve looked at pfSense.

Then looked at the time I’d need to spend on it to justify it.

Still running a router with custom firmware and my PiHole.

4 Likes

The thing that finally convinced me to go the pfSense route was that my consumer router (TP-Link Archer C9) with custom firmware couldn’t run at full speed because the hardware accelerated NIC didn’t have an open-source driver, so the WAN port was limited to 100Mbps.

If anyone is interested in going down this particular rabbit hole, here are the resources that got me started:



The approach he takes is very hard core, but the hardware I’m using is pretty similar. Given the rate that I had been accumulating routers up to this point, I’ve probably saved money using this approach and a Ubiquiti AP.

It’s obviously not for everyone, but I’ve found it pretty low maintenance overall. As stated above, my problems were the result of a confluence of edge-cases and the lack of a proper disaster recovery plan.

5 Likes