Faster Flash

This blog post is about performance improvements in the way the Moddable SDK reads SPI flash. That might not sound exciting, but the result is: our favorite rendering benchmark (balls) runs 50% faster. When you update to the latest Moddable SDK, you'll get those changes automatically. The details behind the improvements are an interesting window into the challenges of embedded development. This blog is about those details.

About SPI Flash

The development boards used most often with the Moddable SDK have a microcontroller and a flash memory chip that are connected together using a SPI bus. Those boards include NodeMCU ESP8266 and ESP32 boards, M5Stack products, and all Moddable development boards.

Flash memory is the only long-term storage on these boards. Because RAM is very limited on a microcontroller, a common technique for minimizing RAM use is to keep as much code and data as possible in flash and access it directly from there. Consequently, the time required to read data from flash memory impacts overall performance. Here are some of the things stored in flash memory when using the Moddable SDK:

  • All native code (except a few interrupt handlers), including the XS JavaScript engine, Poco rendering engine, and Piu user interface framework
  • All JavaScript byte code
  • All JavaScript objects and data generated during the preload phase
  • All graphics assets

The SPI bus connecting flash memory to the microcontroller can have one, two, or four data pins. As you might expect, more pins allow more data to be transferred in parallel, speeding data transfer. The frequency at which the SPI bus operates -- how many samples per second are transmitted -- is configurable. Higher speeds transfer more data per second. However, if the SPI bus speed is faster than the flash component supports, read errors result leading to crashes. There's no one correct speed -- each flash chip is designed to run up to a certain speed. And, of course, having more SPI data pins and using chips that operate at higher speeds tends to cost more. Consequently, there are trade-offs to be made.

Flash Configuration in Moddable SDK

Roll back the clock a few years to when I began working on getting XS to run on an ESP8266. It was an exercise I was interested in trying. I thought it might work for a few simple examples. I expected it to fail. In setting up the build, my focus was to make it build at all. I borrowed bits and pieces of knowledge from the many excellent ESP8266 resources on around the web.

One step in the build is installing the firmware binary onto the ESP8266. When flashing the firmware, among the parameters are two that control SPI flash access -- the number of SPI data pins and the SPI bus connection. When I did that work, I didn't understand them. I copied the default values used in a working example project. That worked. I never looked back.

Roll the clock forward about a year and Mike Kellner and I brought up the Moddable SDK on the ESP32. The ESP IDF SDK is much more advanced than what I had when bringing up the ESP8266. It used the same default values to access SPI. We used those. That worked. We moved on to other challenges.

Roll forward to a few weeks ago. I was working on a new tool (details about that are a topic for a future blog post) that forced me to take a hard look at how SPI flash access is configured on the ESP8266 and ESP32. I knew in the back of my mind from reading data sheets that the ESP8266 and ESP32 modules typically have what is called a "quad-SPI" interface which means that they have four data pins on the SPI bus. The modules used in the NodeMCU, M5Stack, and Moddable development boards all use modules that contain quad-SPI flash. I found the binaries generated by the Moddable for ESP8266 are configured to use only two data pins to access flash. A review of the make file confirmed that. I changed dout to qio in the make file, rebuilt, and it still worked. Just faster.

While I was changing dout to qio I noticed that the SPI bus frequency is also configured in the makefile. That was set to "40M" for 40 megahertz, the second fastest allowed value. As an experiment, I changed that to "80M" and that still worked. Even faster.

The change from two to four data pins doubles SPI flash read speed. The change from 40 MHz to 80 MHz also doubles SPI flash read speed. Combining those two increases read speeds by four times.

That was the ESP8266 build. The ESP32 built, it turns out, had the same problem. By making analogous changes to the sdkconfig.defaults file, the ESP32 also got a 4x improvement in flash read speeds.

Note that the manufacturers of flash memory chips helped to mask the problem. The flash components used support both two and four wire SPI interfaces. That's great -- the same component can be used in more circumstances. Here it masked the sub-optimal configuration because dout didn't fail. If it had, it would have required trying other settings, leading to the use of qio from the start.

Impact on Performance

Making flash read access four times faster does not make everything run four times faster. There are other constraints on performances. For the balls application, the CPU performance is one bottleneck and the SPI flash bus used to deliver pixels to the display is another. The ESP8266 runs at 80 MHz (thought it can be overclocked to 160 MHz) and the ESP32 runs at 240 MHz. The SPI bus connection to the display runs at 40 MHz, becoming unreliable at 80 MHz when communicating with the display.

Still, the performance increase is significant. Both the ESP8266 and ESP32 experience about a 50% increase in frame rate. Most impressively, the ESP8266 running with the optimal SPI flash configuration is delivering higher frame rates than the ESP32 with non-optimal SPI flash configuration.

Before After Improvement
ESP8266 83 FPS 123 FPS 48%
ESP32 109 FPS 166 FPS 52%

The video below shows balls on an ESP32 before and after. The after device is rendering at about 166 frames per second, which is faster than the display updates its screen (and faster than your computer updates its screen). Consequently, you cannot see each individual frame, though they are rendered by the microcontroller and transmitted to the display.

It is difficult to predict the impact of these configuration improvements on any given project. You'll need to measure that on your own projects to find out.

Changing the Default

The default SPI settings in the Moddable SDK for ESP8266 and ESP32 have been changed to use the new faster settings. These work for the majority of development boards in use. But, there are exceptions. For example the 8285 variant of the ESP8266 used in the Sonoff B1 Lightbulb only has two data pins. To change the flash configuration for it, add the settings to your project manifest. Here's the setting for the 8285:

"platforms": {
    "esp/8285": {
        "build": {
            "FLASH_SIZE": "1M",
            "FLASH_LAYOUT": "eagle.flash.1m.ld",
            "FLASH_MODE": "dout",
            "FLASH_SPEED": "40"
        }
    }
}

If you need to change the SPI flash configuration on the ESP32, you need to modify the sdkconfig.defaults from the ESP IDF rather than the Moddable SDK's project manifest. Typically you do that with the make menu_config command line. You can also edit sdkconfig.defaults directly.

CONFIG_FLASHMODE_QIO=
CONFIG_FLASHMODE_QOUT=
CONFIG_FLASHMODE_DIO=
CONFIG_FLASHMODE_DOUT=y
CONFIG_ESPTOOLPY_FLASHMODE="dout"
CONFIG_ESPTOOLPY_FLASHFREQ_80M=
CONFIG_ESPTOOLPY_FLASHFREQ_40M=y
CONFIG_ESPTOOLPY_FLASHFREQ_26M=
CONFIG_ESPTOOLPY_FLASHFREQ_20M=
CONFIG_ESPTOOLPY_FLASHFREQ="40m"

Just Because It Works, Doesn't Mean It's Right

This performance improvement could have been available much sooner. The sub-optimal performance to-date is a direct consequence of mistakes I made during early development of what became the Moddable SDK. Those mistakes happened because I was moving quickly to get the project to work, without taking the time to review every configuration option I used. That's not good. It is also far from uncommon. Embedded development puts the software developer much closer to the hardware than most developers coding for computers or mobile. Embedded software development truly benefits from both software and hardware expertise, expertise I didn't have when starting.

Practically speaking, it is all but impossible to understand all the configuration options available, especially when working with a new components. The most realistic approach to getting started is to use the defaults. The defaults tend to be chosen to be safe on the broadest number of configurations rather than fast. Still, my mistake here was not going back to review the defaults but assuming that since it worked, the defaults were correct. And they were not entirely wrong -- the software works -- but the performance was less than it could be.

All that said, having your code run faster is always welcome. All of the projects you are building with the Moddable SDK just got faster, as did those of our customers. That gives us all the opportunity to squeeze even more out of these inexpensive microcontrollers. Enjoy.