Guaranteeing Reliable IoT Products on Unreliable Networks

Maintaining a reliable network connection to the internet is fundamental to IoT products. Unfortunately, network connections aren't reliable: they can fail in many ways. Products must be engineered to operate reliably on unreliable infrastructure. Many products use Wi-Fi to connect to the internet and Moddable has worked with many clients to successfully deliver products that use Wi-Fi. Testing these products can be tedious, walking back and forth to the closet to power off the Wi-Fi router to verify that the product gracefully recovers. We've long thought there must be a better way. A recent client project gave us the chance to try.

The Challenge

Network software has many layers - the Wi-Fi packet driver, sockets, protocol handlers, libraries, and application code. When Wi-Fi drops, failures must be handled and propagated at all of these layers so the product can react and recover. There are dozens of places across the code-base where that could go wrong. This is made more challenging because error handling behaviors are inconsistent across different silicon platforms.

Everyone knows that error handling code is error-prone. The best way to find those mistakes is testing. But real-world testing of network failures is time-consuming and disruptive: you don't want to turn off the Wi-Fi connection your colleagues and your development computer are using.

The Hardware

The first part of our solution is to use a dedicated Wi-Fi access point for testing. We chose an expensive Wi-Fi travel router, small enough for any desk and conveniently powered by USB. A nice feature of this Wi-Fi router is that it allowed us to create a new access point while connecting to our existing Wi-Fi network. This router eliminates the disruption of the office or home Wi-Fi to test and avoids having to take a walk to power-cycle the Wi-Fi router for each test cycle.

The second part of our solution is to automate power-cycling of the Wi-Fi router. The microcontroller being tested has digital outputs that can be used to control an electrical relay. We selected an inexpensive off-the-shelf relay that operates at the same 3.3V as the ESP32's digital output.

As the Wi-Fi router is USB powered, we cut a USB cable to run the power lead through the relay.

Experience has shown us that making a fixture fairly robust with strain relief on the connection cables saves developers time (from avoidable connection problems) and makes the fixture strong enough to be moved about and last for multiple uses going forward. We used the laser cutter in our shop to make top and bottom plates to avoid shorts and damage to the components. Here is the schematic:

This is the completed fixture. Notice that the relay is between the two ends of the USB cable and that three wires come out of the relay so it can be controlled by the microcontroller.

We created a simple class to control the Wi-Fi router using this relay. The class uses the Digital I/O class from ECMA-419 to toggle the power to the relay.

const Digital = device.io.Digital;
class WiFiPower extends Digital {
    #state;

    constructor() {
        super({
           pin: 13,
           mode: Digital.Output
        });
        this.on();
    }
    off() {
        this.#state = 1;
        super.write(1);
    }
    on() {
        this.#state = 0;
        super.write(0);
    }
    isOn() {
        return 0 === this.#state;
    }
}

The Tests

Next, we built a test application to stress the essential network protocol implementations of the Moddable SDK. Here's what it does:

  • Manages power to the Wi-Fi router
    • Turns off power to the Wi-Fi router at random intervals
    • Keeps power off for at least 500 ms to avoid stressing the relay
    • Keeps power on for at least 30 seconds to allow the Wi-Fi router enough time to connect on most (but not all) test runs
  • Maintains a durable Wi-Fi connection using the Wi-Fi Connection module which automatically reconnects after a drop
  • Downloads a 1 MB file using HTTPS every two minutes. The download takes long enough that it is sometimes interrupted by network disconnects.
  • Retrieves the current time using SNTP every 90 seconds
  • Maintains a durable MQTT connection using the MQTT Connection module which automatically reconnects after a drop
  • Performs DNS look-ups for the MQTT, HTTPS, and SNTP hosts. The results are cached by the host, triggering different code paths on successive runs.

A log of the full test run up to the point of a failure is essential to diagnosing problems. We take several steps to get the most information possible in logs:

  • Continuously outputs information about test app activities
  • Asserts in test app to log situations that should never occur
  • Publishes test progress to MQTT connection so test progress may be monitored from anywhere in the world
  • Runs an instrumented build so logs include system resource use including active sockets, memory use, CPU load, and active timers

The Results

We ran the test app continuously at two separate locations for nearly three weeks. At the start, the tests would quickly fail, often after just 30 minutes. We identified and fixed several bugs. All of the fixes are included in the September 1, 2022 release of the Moddable SDK. With the fixes in place, both test sites ran for over a week without a reboot, crash, unhandled JavaScript exception, stalled socket, or memory leak.

The tests power-cycle the Wi-Fi router about every two minutes, so one week of operation is over 5000 power cycles, far more Wi-Fi interruptions than most IoT products encounter in a year.

We're very happy with these results. Still, we will continue to look for ways to expand our testing methods to increase their coverage.

Reliability is a Feature

When creating IoT products, "reliable operation" isn't usually at the top of the feature list. It should be: if the product is unreliable, all the powerful and cool features it has are irrelevant. It's easy to take a reliable network connection for granted, but engineering a product to maintain a reliable Wi-Fi connection in difficult real-world environments only happens with diligent implementation and thorough testing.

Moddable is committed to doing everything we can to ensure that IoT products built on the Moddable SDK operate reliably year after year, even in an unreliable world. We are sharing our test methods so that developers of IoT products can learn from our experience to better test their own work.