SD Card
Preface
- Author:
- Auralius Manurung (auralius.manurung@ieee.org)
- Repositories:
- Video demonstration :

Plan
We aim to implement a slightly modified LeNet-5 network that runs entirely on-device on an ESP32. To reflect a realistic inference scenario, user input will be captured via a touchscreen interface. This means the project involves not only the neural network itself, but also a preprocessing pipeline that converts raw touch input into a digitized 28×28 representation suitable for inference.
In short, the system consists of:
- on-device CNN inference
- touchscreen-based data acquisition
- preprocessing and normalization tailored to embedded constraints
In this documentation, we skip the preprocessing pipeline and focus on the neural network implementation using the NOODLE framework.
Implementation steps:
- Design the network and train it (e.g., in Google Colab).
- Export weights and biases.
- Decide where each parameter lives:
- File storage: SD card (SPI/SDIO), or flash filesystem (FFat/LittleFS)
- In-memory storage:
constin flash, or SRAM/PSRAM
- Implement the network layer-by-layer using NOODLE
Cheap Yellow Display (CYD)
The CYD is an ESP32-based development board with an integrated LCD display and touchscreen. The display measures 2.8 inches with a 320×240 pixel resolution. Due to its low cost and decent performance, the CYD has gained popularity in the ESP32 community, with several public repositories and example projects available, such as ESP32-Cheap-Yellow-Display.

The CYD uses SPI for both the display and the touch controller. These are typically wired with separate chip-select lines (and on some CYD variants, separate SPI buses), which helps when tracking continuous touch input for handwriting.
As an alternative, we can also use displays where the TFT uses a parallel bus (or SPI) while the touch uses analog inputs, such as MCUFRIEND_kbv UNO shields.
Original architecture (1998)
The following figure and table describe the original LeNet-5 architecture.

| Layer | Type | Kernel / Stride | Input shape | Output shape | Notes |
|---|---|---|---|---|---|
| C1 | Convolution | 6×5×5 | 1×32×32 | 6×28×28 | tanh |
| S2 | Subsampling | 2×2 / 2 | 6×28×28 | 6×14×14 | average pooling (+ learnable scale/bias in classic LeNet) |
| C3 | Convolution | 16×5×5 | 6×14×14 | 16×10×10 | tanh, sparse connection map |
| S4 | Subsampling | 2×2 / 2 | 16×10×10 | 16×5×5 | average pooling |
| C5 | Convolution | 120×5×5 | 16×5×5 | 120×1×1 | effectively fully connected |
| F6 | Fully connected | — | 120 | 84 | tanh |
| Output | Fully connected | — | 84 | 10 | Euclidean RBF |
Slightly modified architecture
Because the C5 (120×5×5) layer is equivalent to a dense layer from 400 → 120, we implement it as flatten + FC:
| Layer | Type | Kernel / Stride | Input shape | Output shape | Notes |
|---|---|---|---|---|---|
| C1 | Convolution | 6×5×5 | 1×32×32 | 6×28×28 | ReLU |
| S2 | Subsampling | 2×2 / 2 | 6×28×28 | 6×14×14 | average pooling |
| C3 | Convolution | 16×5×5 | 6×14×14 | 16×10×10 | ReLU |
| S4 | Subsampling | 2×2 / 2 | 16×10×10 | 16×5×5 | average pooling |
| F5 | Flatten | — | 16×5×5 | 400 | 16×5×5 = 400 |
| F6 | FC | — | 400 | 120 | ReLU |
| F7 | FC | — | 120 | 84 | ReLU |
| Output | FC | — | 84 | 10 | Softmax |
Training (Keras / PyTorch)
Noodle does not provide training capability. We will perform training on a computer, extract the training results and deploy them to the ESP32. In this section, we use Google Colab (Keras and Python) for training.
Naming convention (.txt)
The provided Keras training program come with an automatic exporter function that will export weights and biases with the following naming convention.
Convolution kernels (4D tensors)
Filename format
w<NN>.txt
Where <NN> is a two-digit, zero-padded index identifying a weight tensor (layer index).
Example
w01.txt — convolution weights for layer 1.
File contents
Each file stores all convolution kernels for a single layer, serialized into a 1D sequence.
Dense weights (2D tensors)
Filename format:
w<NN>.txt
Where:
<NN>is a two-digit, zero-padded index identifying a weight tensor
Example:
w03.txt
Dense weights are exported as a transposed, flattened array, one value per line.
Bias vectors (1D tensors)
Filename format:
b<NN>.txt
Where:
<NN>is a two-digit, zero-padded index identifying a bias vector
Example:
b02.txt
Each file contains flattened bias values, one value per line.
The exporter program will print the generated files. These are some examples for the LeNet-5. Besides generating TXT files, the exporter also generates the header files. Thus, we can use them as const arrays stored in flash (variable-level mnemonics).
/content/drive/MyDrive/NOODLE/datasets/mnist/lenet-5/w01.txt /content/drive/MyDrive/NOODLE/datasets/mnist/lenet-5/w01.h /content/drive/MyDrive/NOODLE/datasets/mnist/lenet-5/b01.txt /content/drive/MyDrive/NOODLE/datasets/mnist/lenet-5/b01.h /content/drive/MyDrive/NOODLE/datasets/mnist/lenet-5/w02.txt /content/drive/MyDrive/NOODLE/datasets/mnist/lenet-5/w02.h /content/drive/MyDrive/NOODLE/datasets/mnist/lenet-5/b02.txt /content/drive/MyDrive/NOODLE/datasets/mnist/lenet-5/b02.h /content/drive/MyDrive/NOODLE/datasets/mnist/lenet-5/w03.txt /content/drive/MyDrive/NOODLE/datasets/mnist/lenet-5/w03.h /content/drive/MyDrive/NOODLE/datasets/mnist/lenet-5/b03.txt /content/drive/MyDrive/NOODLE/datasets/mnist/lenet-5/b03.h /content/drive/MyDrive/NOODLE/datasets/mnist/lenet-5/w04.txt /content/drive/MyDrive/NOODLE/datasets/mnist/lenet-5/w04.h /content/drive/MyDrive/NOODLE/datasets/mnist/lenet-5/b04.txt /content/drive/MyDrive/NOODLE/datasets/mnist/lenet-5/b04.h /content/drive/MyDrive/NOODLE/datasets/mnist/lenet-5/w05.txt /content/drive/MyDrive/NOODLE/datasets/mnist/lenet-5/w05.h /content/drive/MyDrive/NOODLE/datasets/mnist/lenet-5/b05.txt /content/drive/MyDrive/NOODLE/datasets/mnist/lenet-5/b05.h
On the ESP32 side
For fully connected layers, NOODLE provides two parameter structures: FCNMem and FCNFile.
/** FCN parameters (filenames; no tokenization). */
struct FCNFile {
const char *weight_fn = nullptr;
const char *bias_fn = nullptr;
Activation act = ACT_RELU;
};
/** FCN parameters for in-memory weights/bias (row-major weights [n_outputs, n_inputs]). */
struct FCNMem {
const float *weight = nullptr;
const float *bias = nullptr;
Activation act = ACT_RELU;
};
If weights/biases are in memory, we use FCNMem. If they are stored as files, we use FCNFile.
Parameter placement
| Layer | Files | Location | In-memory (const) |
File-based |
|---|---|---|---|---|
| C1 | w01.txt, b01.txt |
FFat | ✔ | |
| C3 | w02.txt, b02.txt |
FFat | ✔ | |
| F6 | w03.txt, b03.txt |
flash (const) |
✔ | |
| F7 | w04.txt, b04.txt |
flash (const) |
✔ | |
| Output | w05.txt, b05.txt |
flash (const) |
✔ |
Layer-by-layer implementation
Here, we define access to CNN parameters as files while access to FCN parameters as variables.
Conv cnn1;
cnn1.K = 5;
cnn1.P = 2;
cnn1.S = 1; // same padding
cnn1.weight_fn = "/w01.txt";
cnn1.bias_fn = "/b01.txt";
Conv cnn2;
cnn2.K = 5;
cnn2.P = 0;
cnn2.S = 1; // valid padding
cnn2.weight_fn = "/w02.txt";
cnn2.bias_fn = "/b02.txt";
Pool pool;
pool.M = 2;
pool.T = 2;
FCNMem fcn1;
fcn1.weight = w03;
fcn1.bias = b03;
fcn1.act = ACT_RELU;
FCNMem fcn2;
fcn2.weight = w04;
fcn2.bias = b04;
fcn2.act = ACT_RELU;
FCNMem fcn3;
fcn3.weight = w05;
fcn3.bias = b05;
fcn3.act = ACT_SOFTMAX;
uint16_t V;
V = noodle_conv_float(BUFFER1, 1, 6, BUFFER3, 28, cnn1, pool, nullptr);
V = noodle_conv_float(BUFFER3, 6, 16, BUFFER1, V, cnn2, pool, nullptr);
V = noodle_flat(BUFFER1, BUFFER3, V, 16);
V = noodle_fcn(BUFFER3, V, 120, BUFFER1, fcn1, nullptr);
V = noodle_fcn(BUFFER1, V, 84, BUFFER3, fcn2, nullptr);
V = noodle_fcn(BUFFER3, V, 10, BUFFER1, fcn3, nullptr);
This implementation requires three buffers:
BUFFER1: input bufferBUFFER2: temporary/scratch buffer (set vianoodle_setup_temp_buffers())BUFFER3: output buffer
For the modified LeNet-5 implementation:
Input image : BUFFER1
Conv + Pool #1 : BUFFER1 → BUFFER3
Conv + Pool #2 : BUFFER3 → BUFFER1
Flatten : BUFFER1 → BUFFER3
Dense #1 : BUFFER3 → BUFFER1
Dense #2 : BUFFER1 → BUFFER3
Dense #3 : BUFFER3 → BUFFER1
Visual Code with PlatformIO
We use Visual Studio Code with PlatformIO instead of the Arduino IDE primarily because it provides better support for file-system images on ESP32 devices (when compared to Arduino IDE). All files intended for the ESP32 filesystem are placed in the project’s data/ directory.

The workflows are as follows.
- Build a file-system image from the contents of
data/(all files indatadirectory to one image file:fatfs.bin). - Upload image file to the file-system partition in flash.
- Leave the application firmware unchanged
Benchmarking
To automate benchmarking, we deploy the classification model to an ESP32 and utilize a Python-based test harness. The harness applies random rotations within a range of \(\pm20^{\circ}\) before streaming the payloads to the ESP32 via serial communication for inference. These random rotations were not applied during training.
CNN as variables -- FCN as variables
To put the CNN parameters in SRAM, we use ConvMem structure.
void predict(){
ConvMem cnn1;
cnn1.K = 5;
cnn1.P = 2;
cnn1.S = 1; // same padding
cnn1.weight = w01;
cnn1.bias = b01;
ConvMem cnn2;
cnn2.K = 5;
cnn2.P = 0;
cnn2.S = 1; // valid padding
cnn2.weight = w02;
cnn2.bias = b02;
Pool pool;
pool.M = 2;
pool.T = 2;
FCNMem fcn_mem1;
fcn_mem1.weight = w03;
fcn_mem1.bias = b03;
fcn_mem1.act = ACT_RELU;
FCNMem fcn_mem2;
fcn_mem2.weight = w04;
fcn_mem2.bias = b04;
fcn_mem2.act = ACT_RELU;
FCNMem fcn_mem3;
fcn_mem3.weight = w05;
fcn_mem3.bias = b05;
fcn_mem3.act = ACT_SOFTMAX;
unsigned long st = micros();
uint16_t V;
V = noodle_conv_float(BUFFER1, 1, 6, BUFFER3, 28, cnn1, pool, NULL);
V = noodle_conv_float(BUFFER3, 6, 16, BUFFER1, V, cnn2, pool, NULL);
V = noodle_flat(BUFFER1, BUFFER3, V, 16);
V = noodle_fcn(BUFFER3, V, 120, BUFFER1, fcn_mem1, NULL);
V = noodle_fcn(BUFFER1, V, 84, BUFFER3, fcn_mem2, NULL);
V = noodle_fcn(BUFFER3, V, 10, BUFFER1, fcn_mem3, NULL);
⋮
⋮
}

CNN as files -- FCN as variables
To put the CNN parameters in File, we use Conv structure.
void predict()
{
Conv cnn1;
cnn1.K = 5;
cnn1.P = 2;
cnn1.S = 1; // same padding
cnn1.weight_fn = "/w01.txt";
cnn1.bias_fn = "/b01.txt";
Conv cnn2;
cnn2.K = 5;
cnn2.P = 0;
cnn2.S = 1; // valid padding
cnn2.weight_fn = "/w02.txt";
cnn2.bias_fn = "/b02.txt";
Pool pool;
pool.M = 2;
pool.T = 2;
FCNMem fcn_mem1;
fcn_mem1.weight = w03;
fcn_mem1.bias = b03;
fcn_mem1.act = ACT_RELU;
FCNMem fcn_mem2;
fcn_mem2.weight = w04;
fcn_mem2.bias = b04;
fcn_mem2.act = ACT_RELU;
FCNMem fcn_mem3;
fcn_mem3.weight = w05;
fcn_mem3.bias = b05;
fcn_mem3.act = ACT_SOFTMAX;
unsigned long st = micros();
uint16_t V;
V = noodle_conv_float(BUFFER1, 1, 6, BUFFER3, 28, cnn1, pool, NULL);
V = noodle_conv_float(BUFFER3, 6, 16, BUFFER1, V, cnn2, pool, NULL);
V = noodle_flat(BUFFER1, BUFFER3, V, 16);
V = noodle_fcn(BUFFER3, V, 120, BUFFER1, fcn_mem1, NULL);
V = noodle_fcn(BUFFER1, V, 84, BUFFER3, fcn_mem2, NULL);
V = noodle_fcn(BUFFER3, V, 10, BUFFER1, fcn_mem3, NULL);
⋮
⋮
