Skip to content

BloodMNIST with Mobile-Lenet

Google Colab notebook can be downloaded from here.

Here, we can find more information about MedMNIST.

Machine learning model

We will use the same Mobile-LeNet model but now with 3-channel input.

# Layer (Type) Output Shape Params Weight / Bias file K S Padding
1 stem_conv3x3 (Conv2D) 28×28×8 224 w01.txt, b01.txt 3 1 same (sym, +1 each side)
2 stem_bn (BatchNorm) 28×28×8 32 bn01.txt
3 stem_relu (ReLU) 28×28×8 0
4 B1_dw3x3_s1 (DepthwiseConv2D) 28×28×8 72 w02.txt 3 1 same (sym, +1 each side)
5 B1_dw_bn (BatchNorm) 28×28×8 32 bn02.txt
6 B1_dw_relu (ReLU) 28×28×8 0
7 B1_pw1x1 (Conv2D) 28×28×8 64 w03.txt 1 1 same (effectively 0)
8 B1_pw_bn (BatchNorm) 28×28×8 32 bn03.txt
9 B1_pw_relu (ReLU) 28×28×8 0
10 B2_dw3x3_s1 (DepthwiseConv2D) 28×28×8 72 w04.txt 3 1 same (sym, +1 each side)
11 B2_dw_bn (BatchNorm) 28×28×8 32 bn04.txt
12 B2_dw_relu (ReLU) 28×28×8 0
13 B2_pw1x1 (Conv2D) 28×28×8 64 w05.txt 1 1 same (effectively 0)
14 B2_pw_bn (BatchNorm) 28×28×8 32 bn05.txt
15 B2_pw_relu (ReLU) 28×28×8 0
16 B3_pad1 (ZeroPadding2D) 30×30×8 0 explicit: +1 each side
17 B3_dw3x3_s2 (DepthwiseConv2D) 14×14×8 72 w06.txt 3 2 valid (after pad1 → net “same-ish”, symmetric)
18 B3_dw_bn (BatchNorm) 14×14×8 32 bn06.txt
19 B3_dw_relu (ReLU) 14×14×8 0
20 B3_pw1x1 (Conv2D) 14×14×16 128 w07.txt 1 1 same (effectively 0)
21 B3_pw_bn (BatchNorm) 14×14×16 64 bn07.txt
22 B3_pw_relu (ReLU) 14×14×16 0
23 B4_dw3x3_s1 (DepthwiseConv2D) 14×14×16 144 w08.txt 3 1 same (sym, +1 each side)
24 B4_dw_bn (BatchNorm) 14×14×16 64 bn08.txt
25 B4_dw_relu (ReLU) 14×14×16 0
26 B4_pw1x1 (Conv2D) 14×14×16 256 w09.txt 1 1 same (effectively 0)
27 B4_pw_bn (BatchNorm) 14×14×16 64 bn09.txt
28 B4_pw_relu (ReLU) 14×14×16 0
29 B5_pad1 (ZeroPadding2D) 16×16×16 0 explicit: +1 each side
30 B5_dw3x3_s2 (DepthwiseConv2D) 7×7×16 144 w10.txt 3 2 valid (after pad1 → net “same-ish”, symmetric)
31 B5_dw_bn (BatchNorm) 7×7×16 64 bn10.txt
32 B5_dw_relu (ReLU) 7×7×16 0
33 B5_pw1x1 (Conv2D) 7×7×24 384 w11.txt 1 1 same (effectively 0)
34 B5_pw_bn (BatchNorm) 7×7×24 96 bn11.txt
35 B5_pw_relu (ReLU) 7×7×24 0
36 B6_dw3x3_s1 (DepthwiseConv2D) 7×7×24 216 w12.txt 3 1 same (sym, +1 each side)
37 B6_dw_bn (BatchNorm) 7×7×24 96 bn12.txt
38 B6_dw_relu (ReLU) 7×7×24 0
39 B6_pw1x1 (Conv2D) 7×7×24 576 w13.txt 1 1 same (effectively 0)
40 B6_pw_bn (BatchNorm) 7×7×24 96 bn13.txt
41 B6_pw_relu (ReLU) 7×7×24 0
42 GAP (GlobalAveragePooling2D) 24 0
43 OUT (Dense + softmax) 8 200 w14.txt, b14.txt

Layer-by layer implementation

The following lines of code show the layer-by-layer implementation on a standard ESP32. From the table above, we can conclude that the peak memory usage is 28 * 28 * 8 * 4 bytes. Since we will use variable-to-variable operations, we will need to ping-pong buffers of that size (FEAT_A and FEAT_B).

```).

⋮
⋮
// ---- Input ----
// FEAT_A holds input image in CHW where C=3, W=28: [3][28][28]

// ---- No Pooling ----
Pool none{}; none.M = 1; none.T = 1;

// ---- Stem: Conv3x3 (3->8) + BN + ReLU ----
ConvMem stem{};
stem.K = 3; stem.P = 1; stem.S = 1;
stem.weight = w01;
stem.bias   = b01;
stem.act    = ACT_NONE;

// ---- Dense: 24 -> 8 ----
FCNMem head{};
head.weight = w14;   // row-major [8,24]
head.bias   = b02;   // 8
head.act    = ACT_NONE;

uint16_t W = noodle_conv_float(FEAT_A, 3, 8, FEAT_B, 28, stem, none, nullptr);
noodle_bn_relu(FEAT_B, 8, W, bn01, 1e-3f);

// Ping-pong buffers
float *in  = FEAT_B;
float *out = FEAT_A;

// ---- B1 (8->8, stride 1) ----
W = noodle_dw_pw_block(in, out, W, 8, 8, 1, w02, bn02, w03, bn03);
// ---- B2 (8->8, stride 1) ----
W = noodle_dw_pw_block(in, out, W, 8, 8, 1, w04, bn04, w05, bn05);
// ---- B3 (8->16, stride 2) ----
W = noodle_dw_pw_block(in, out, W, 8, 16, 2, w06, bn06, w07, bn07);
// ---- B4 (16->16, stride 1) ----
W = noodle_dw_pw_block(in, out, W, 16, 16, 1, w08, bn08, w09, bn09);
// ---- B5 (16->24, stride 2) ----
W = noodle_dw_pw_block(in, out, W, 16, 24, 2, w10, bn10, w11, bn11);
// ---- B6 (24->24, stride 1) ----
W = noodle_dw_pw_block(in, out, W, 24, 24, 1, w12, bn12, w13, bn13);
// ---- GAP: (W x W x 24) -> (24,) in-place ----
W = noodle_gap(in, 24, W);
// ---- Dense: (24,) -> (8,) ----
W = noodle_fcn(in, W, 8, out, head, nullptr);
// Softmax in-place on logits
noodle_soft_max(out, 8);
⋮
⋮