BloodMNIST with Mobile-Lenet

Google Colab notebook can be downloaded from here.
Here, we can find more information about MedMNIST.
Machine learning model
We will use the same Mobile-LeNet model but now with 3-channel input.
| # | Layer (Type) | Output Shape | Params | Weight / Bias file | K | S | Padding |
|---|---|---|---|---|---|---|---|
| 1 | stem_conv3x3 (Conv2D) |
28×28×8 | 224 | w01.txt, b01.txt |
3 | 1 | same (sym, +1 each side) |
| 2 | stem_bn (BatchNorm) |
28×28×8 | 32 | bn01.txt |
— | — | — |
| 3 | stem_relu (ReLU) |
28×28×8 | 0 | — | — | — | — |
| 4 | B1_dw3x3_s1 (DepthwiseConv2D) |
28×28×8 | 72 | w02.txt |
3 | 1 | same (sym, +1 each side) |
| 5 | B1_dw_bn (BatchNorm) |
28×28×8 | 32 | bn02.txt |
— | — | — |
| 6 | B1_dw_relu (ReLU) |
28×28×8 | 0 | — | — | — | — |
| 7 | B1_pw1x1 (Conv2D) |
28×28×8 | 64 | w03.txt |
1 | 1 | same (effectively 0) |
| 8 | B1_pw_bn (BatchNorm) |
28×28×8 | 32 | bn03.txt |
— | — | — |
| 9 | B1_pw_relu (ReLU) |
28×28×8 | 0 | — | — | — | — |
| 10 | B2_dw3x3_s1 (DepthwiseConv2D) |
28×28×8 | 72 | w04.txt |
3 | 1 | same (sym, +1 each side) |
| 11 | B2_dw_bn (BatchNorm) |
28×28×8 | 32 | bn04.txt |
— | — | — |
| 12 | B2_dw_relu (ReLU) |
28×28×8 | 0 | — | — | — | — |
| 13 | B2_pw1x1 (Conv2D) |
28×28×8 | 64 | w05.txt |
1 | 1 | same (effectively 0) |
| 14 | B2_pw_bn (BatchNorm) |
28×28×8 | 32 | bn05.txt |
— | — | — |
| 15 | B2_pw_relu (ReLU) |
28×28×8 | 0 | — | — | — | — |
| 16 | B3_pad1 (ZeroPadding2D) |
30×30×8 | 0 | — | — | — | explicit: +1 each side |
| 17 | B3_dw3x3_s2 (DepthwiseConv2D) |
14×14×8 | 72 | w06.txt |
3 | 2 | valid (after pad1 → net “same-ish”, symmetric) |
| 18 | B3_dw_bn (BatchNorm) |
14×14×8 | 32 | bn06.txt |
— | — | — |
| 19 | B3_dw_relu (ReLU) |
14×14×8 | 0 | — | — | — | — |
| 20 | B3_pw1x1 (Conv2D) |
14×14×16 | 128 | w07.txt |
1 | 1 | same (effectively 0) |
| 21 | B3_pw_bn (BatchNorm) |
14×14×16 | 64 | bn07.txt |
— | — | — |
| 22 | B3_pw_relu (ReLU) |
14×14×16 | 0 | — | — | — | — |
| 23 | B4_dw3x3_s1 (DepthwiseConv2D) |
14×14×16 | 144 | w08.txt |
3 | 1 | same (sym, +1 each side) |
| 24 | B4_dw_bn (BatchNorm) |
14×14×16 | 64 | bn08.txt |
— | — | — |
| 25 | B4_dw_relu (ReLU) |
14×14×16 | 0 | — | — | — | — |
| 26 | B4_pw1x1 (Conv2D) |
14×14×16 | 256 | w09.txt |
1 | 1 | same (effectively 0) |
| 27 | B4_pw_bn (BatchNorm) |
14×14×16 | 64 | bn09.txt |
— | — | — |
| 28 | B4_pw_relu (ReLU) |
14×14×16 | 0 | — | — | — | — |
| 29 | B5_pad1 (ZeroPadding2D) |
16×16×16 | 0 | — | — | — | explicit: +1 each side |
| 30 | B5_dw3x3_s2 (DepthwiseConv2D) |
7×7×16 | 144 | w10.txt |
3 | 2 | valid (after pad1 → net “same-ish”, symmetric) |
| 31 | B5_dw_bn (BatchNorm) |
7×7×16 | 64 | bn10.txt |
— | — | — |
| 32 | B5_dw_relu (ReLU) |
7×7×16 | 0 | — | — | — | — |
| 33 | B5_pw1x1 (Conv2D) |
7×7×24 | 384 | w11.txt |
1 | 1 | same (effectively 0) |
| 34 | B5_pw_bn (BatchNorm) |
7×7×24 | 96 | bn11.txt |
— | — | — |
| 35 | B5_pw_relu (ReLU) |
7×7×24 | 0 | — | — | — | — |
| 36 | B6_dw3x3_s1 (DepthwiseConv2D) |
7×7×24 | 216 | w12.txt |
3 | 1 | same (sym, +1 each side) |
| 37 | B6_dw_bn (BatchNorm) |
7×7×24 | 96 | bn12.txt |
— | — | — |
| 38 | B6_dw_relu (ReLU) |
7×7×24 | 0 | — | — | — | — |
| 39 | B6_pw1x1 (Conv2D) |
7×7×24 | 576 | w13.txt |
1 | 1 | same (effectively 0) |
| 40 | B6_pw_bn (BatchNorm) |
7×7×24 | 96 | bn13.txt |
— | — | — |
| 41 | B6_pw_relu (ReLU) |
7×7×24 | 0 | — | — | — | — |
| 42 | GAP (GlobalAveragePooling2D) |
24 | 0 | — | — | — | — |
| 43 | OUT (Dense + softmax) |
8 | 200 | w14.txt, b14.txt |
— | — | — |
Layer-by layer implementation
The following lines of code show the layer-by-layer implementation on a standard ESP32. From the table above, we can conclude that the peak memory usage is 28 * 28 * 8 * 4 bytes. Since we will use variable-to-variable operations, we will need to ping-pong buffers of that size (FEAT_A and FEAT_B).
```).
⋮
⋮
// ---- Input ----
// FEAT_A holds input image in CHW where C=3, W=28: [3][28][28]
// ---- No Pooling ----
Pool none{}; none.M = 1; none.T = 1;
// ---- Stem: Conv3x3 (3->8) + BN + ReLU ----
ConvMem stem{};
stem.K = 3; stem.P = 1; stem.S = 1;
stem.weight = w01;
stem.bias = b01;
stem.act = ACT_NONE;
// ---- Dense: 24 -> 8 ----
FCNMem head{};
head.weight = w14; // row-major [8,24]
head.bias = b02; // 8
head.act = ACT_NONE;
uint16_t W = noodle_conv_float(FEAT_A, 3, 8, FEAT_B, 28, stem, none, nullptr);
noodle_bn_relu(FEAT_B, 8, W, bn01, 1e-3f);
// Ping-pong buffers
float *in = FEAT_B;
float *out = FEAT_A;
// ---- B1 (8->8, stride 1) ----
W = noodle_dw_pw_block(in, out, W, 8, 8, 1, w02, bn02, w03, bn03);
// ---- B2 (8->8, stride 1) ----
W = noodle_dw_pw_block(in, out, W, 8, 8, 1, w04, bn04, w05, bn05);
// ---- B3 (8->16, stride 2) ----
W = noodle_dw_pw_block(in, out, W, 8, 16, 2, w06, bn06, w07, bn07);
// ---- B4 (16->16, stride 1) ----
W = noodle_dw_pw_block(in, out, W, 16, 16, 1, w08, bn08, w09, bn09);
// ---- B5 (16->24, stride 2) ----
W = noodle_dw_pw_block(in, out, W, 16, 24, 2, w10, bn10, w11, bn11);
// ---- B6 (24->24, stride 1) ----
W = noodle_dw_pw_block(in, out, W, 24, 24, 1, w12, bn12, w13, bn13);
// ---- GAP: (W x W x 24) -> (24,) in-place ----
W = noodle_gap(in, 24, W);
// ---- Dense: (24,) -> (8,) ----
W = noodle_fcn(in, W, 8, out, head, nullptr);
// Softmax in-place on logits
noodle_soft_max(out, 8);
⋮
⋮