Fail-safe flexible remote firmware updates on almost any MCU

16 Apr 2018

In-field and remote updates of firmware are a useful thing to have on most consumer products, but many update schemes aren’t implemented well. A lot of devices suffer from vulnerability windows during firmware updates, where a power failure will lead to the device being bricked. Avoiding these vulnerability windows always means having some kind of bootloader on the device which doesn’t get reprogrammed, in which case the methods of update might be frozen for the lifetime of the device.

The scheme described here gives you an update system with arbitrary, and updatable, delivery methods, while preserving fail-safety in the case of a power failure. It’s presented specifically for the Kinetis KL27, but it’s easily adaptable to almost any device with per-page self-programming capabilities.

Memory map

We start by dividing the KL27’s 256 kB of NOR flash into individually reprogrammable sections at 1 kB page boundaries as follows:

┌───────────┐
│  0x00000  │ Bootloader and flash configuration bytes (BL)
│  .......  │ 2 kB
│  .......  │
│  0x007ff  │
├───────────┤
│  0x00800  │ Boot instruction block (BIB)
│  .......  │ 1 kB
│  0x00cff  │
├───────────┤
│  0x00c00  │ Application image (APP)
│  .......  │ 125 kB
│  .......  │
│  .......  │
│  .......  │
│  .......  │
│  0x1ffff  │
├───────────┤
│  0x20000  │ Installation image (INST)
│  .......  │ 128 kB
│  .......  │
│  .......  │
│  .......  │
│  .......  │
│  0x3ffff  │
└───────────┘

The sections are:

Boot instruction block (BIB): If valid, this block contains a small piece of data specifying an entry point for the application. This data is checksummed, so that a power failure during write is unlikely to leave valid data here.
Bootloader and flash configuration bytes (BL): This is the entry point on boot. It contains a very small non-updatable program which does one thing only: it reads the boot instructions block (BIB), checks to see if it contains valid data, and if so, jumps to the address specified. If the BIB is not valid, it jumps to the application image (APP).
Application image (APP): This is the default image to which the bootloader will transfer control, and under normal circumstances, it’s what runs on the device.
Installation image (INST): If valid, this section contains a program that rewrites the application image (APP). It may do this by copying a bundled image, or it might load an image from an external peripheral like an SD-card.

On the KL27, the BL section must also contain the flash configuration bytes, as these can’t be safely reprogrammed in the field (a failure during reprogramming may lead to a non-bootable device). The bootloader program itself is extremely simple, and should fit easily into a fraction of a page.

A possible layout for the BIB block is as follows (32-bit words assumed):

Offset      Value
────────────────────────────────────────────────────────────────
0           Magic number: 0x12345678
4           Image entry point: typically 0x00020000
8           Bitwise negation of entry point (0xfffeffff)
12          Image entry point, repeated

With this layout, the bootloader would check that the magic number is valid, and then verify that last two words are the negation and copy of the second word in order to determine whether the data is valid.

Update procedure

Under our scheme, the following invariants hold regarding section validity:

BL is always valid.
If BIB is valid, then INST is valid.
If BIB is invalid, then APP is valid.

The last two conditions imply that at least one of APP and INST is always valid (and in fact both will be valid at some points during the upgrade procedure).

The application manages its own upgrades, using the bootloader only as a trampoline for safely transferring control to the installer. This means that the method of upgrade delivery can itself by updated just as easily as any other feature of the system.

A safe atomic upgrade works as follows:

An installation image is prepared in the form of a self-install program to run from the INST section of the device. This image is delivered to the device somehow (e.g. the application fetches it over the network). It need not be copied directly to the INST section immediately, and might be buffered on an external peripheral.
The application decides to initiate a self-update using the delivered image. It erases and reprograms the INST section with the stored image. The programmed image is checked against a supplied checksum, and if it doesn’t match, the upgrade is aborted.
The application erases and programs a BIB block specifying the entry point of the installer image.
The application triggers a CPU reset.
The bootloader starts, reads the valid BIB and transfers control to the installation section.
The installer erases the application section and then copies its payload into the application section.
The installer erases the BIB.
The installer triggers a CPU reset.
The bootloader starts, reads an invalid BIB and transfers control to the application section, which now contains a new image.

The invariants described above hold at all points during the procedure described above. If the power fails at any point up to step 4, the device will roll back to the version of the firmware that was running before the upgrade started. If the power fails at any point after step 4, the upgrade restarts and the device ends up with the new firmware version.

Enhancements and adaptation to other CPUs

The general scheme described above doesn’t rely on any CPU feature other than the ability to self-program in page-sized chunks. Some CPUs present additional challenges or opportunities for improvement:

Some chips don’t support relocation of interrupt vector tables. If your chip doesn’t allow this, your bootloader may need to contain stub handlers for each vector which transfer control to the application section. It should be possible to simplify things by avoiding the use of interrupts altogether in the installer.
The AVR ATmega family doesn’t allow self-programming by code executing outside of a small preset bootloader region. One possible workaround is to have the bootloader contain routines for page erase and programming, which are exposed via a jump table placed at a known location.
If your chip contains NVRAM or EEPROM, it might be useful to store the BIB there rather than in flash, and save a whole page of flash.
The description above assumes that the installer contains just an application image prepended with a copier stub, which effectively requires that your device have enough storage for two side-by-side application images. You may be able to relax this requirement by compressing the payload and having the installer decompress as it copies. If you’re able to achieve a good compression ratio, then you can shrink the size of section reserved for the installer image.