BloodMNIST with Mobile-Lenet

Google Colab notebook can be downloaded from here.

Here, we can find more information about MedMNIST.

Machine learning model

We will use the same Mobile-LeNet model but now with 3-channel input.

#	Layer (Type)	Output Shape	Params	Weight / Bias file	K	S	Padding
1	`stem_conv3x3` (Conv2D)	28×28×8	224	`w01.txt`, `b01.txt`	3	1	same (sym, +1 each side)
2	`stem_bn` (BatchNorm)	28×28×8	32	`bn01.txt`	—	—	—
3	`stem_relu` (ReLU)	28×28×8	0	—	—	—	—
4	`B1_dw3x3_s1` (DepthwiseConv2D)	28×28×8	72	`w02.txt`	3	1	same (sym, +1 each side)
5	`B1_dw_bn` (BatchNorm)	28×28×8	32	`bn02.txt`	—	—	—
6	`B1_dw_relu` (ReLU)	28×28×8	0	—	—	—	—
7	`B1_pw1x1` (Conv2D)	28×28×8	64	`w03.txt`	1	1	same (effectively 0)
8	`B1_pw_bn` (BatchNorm)	28×28×8	32	`bn03.txt`	—	—	—
9	`B1_pw_relu` (ReLU)	28×28×8	0	—	—	—	—
10	`B2_dw3x3_s1` (DepthwiseConv2D)	28×28×8	72	`w04.txt`	3	1	same (sym, +1 each side)
11	`B2_dw_bn` (BatchNorm)	28×28×8	32	`bn04.txt`	—	—	—
12	`B2_dw_relu` (ReLU)	28×28×8	0	—	—	—	—
13	`B2_pw1x1` (Conv2D)	28×28×8	64	`w05.txt`	1	1	same (effectively 0)
14	`B2_pw_bn` (BatchNorm)	28×28×8	32	`bn05.txt`	—	—	—
15	`B2_pw_relu` (ReLU)	28×28×8	0	—	—	—	—
16	`B3_pad1` (ZeroPadding2D)	30×30×8	0	—	—	—	explicit: +1 each side
17	`B3_dw3x3_s2` (DepthwiseConv2D)	14×14×8	72	`w06.txt`	3	2	valid (after pad1 → net “same-ish”, symmetric)
18	`B3_dw_bn` (BatchNorm)	14×14×8	32	`bn06.txt`	—	—	—
19	`B3_dw_relu` (ReLU)	14×14×8	0	—	—	—	—
20	`B3_pw1x1` (Conv2D)	14×14×16	128	`w07.txt`	1	1	same (effectively 0)
21	`B3_pw_bn` (BatchNorm)	14×14×16	64	`bn07.txt`	—	—	—
22	`B3_pw_relu` (ReLU)	14×14×16	0	—	—	—	—
23	`B4_dw3x3_s1` (DepthwiseConv2D)	14×14×16	144	`w08.txt`	3	1	same (sym, +1 each side)
24	`B4_dw_bn` (BatchNorm)	14×14×16	64	`bn08.txt`	—	—	—
25	`B4_dw_relu` (ReLU)	14×14×16	0	—	—	—	—
26	`B4_pw1x1` (Conv2D)	14×14×16	256	`w09.txt`	1	1	same (effectively 0)
27	`B4_pw_bn` (BatchNorm)	14×14×16	64	`bn09.txt`	—	—	—
28	`B4_pw_relu` (ReLU)	14×14×16	0	—	—	—	—
29	`B5_pad1` (ZeroPadding2D)	16×16×16	0	—	—	—	explicit: +1 each side
30	`B5_dw3x3_s2` (DepthwiseConv2D)	7×7×16	144	`w10.txt`	3	2	valid (after pad1 → net “same-ish”, symmetric)
31	`B5_dw_bn` (BatchNorm)	7×7×16	64	`bn10.txt`	—	—	—
32	`B5_dw_relu` (ReLU)	7×7×16	0	—	—	—	—
33	`B5_pw1x1` (Conv2D)	7×7×24	384	`w11.txt`	1	1	same (effectively 0)
34	`B5_pw_bn` (BatchNorm)	7×7×24	96	`bn11.txt`	—	—	—
35	`B5_pw_relu` (ReLU)	7×7×24	0	—	—	—	—
36	`B6_dw3x3_s1` (DepthwiseConv2D)	7×7×24	216	`w12.txt`	3	1	same (sym, +1 each side)
37	`B6_dw_bn` (BatchNorm)	7×7×24	96	`bn12.txt`	—	—	—
38	`B6_dw_relu` (ReLU)	7×7×24	0	—	—	—	—
39	`B6_pw1x1` (Conv2D)	7×7×24	576	`w13.txt`	1	1	same (effectively 0)
40	`B6_pw_bn` (BatchNorm)	7×7×24	96	`bn13.txt`	—	—	—
41	`B6_pw_relu` (ReLU)	7×7×24	0	—	—	—	—
42	`GAP` (GlobalAveragePooling2D)	24	0	—	—	—	—
43	`OUT` (Dense + softmax)	8	200	`w14.txt`, `b14.txt`	—	—	—

Layer-by layer implementation

The following lines of code show the layer-by-layer implementation on a standard ESP32. From the table above, we can conclude that the peak memory usage is 28 * 28 * 8 * 4 bytes. Since we will use variable-to-variable operations, we will need to ping-pong buffers of that size (FEAT_A and FEAT_B).

```).

⋮
⋮
// ---- Input ----
// FEAT_A holds input image in CHW where C=3, W=28: [3][28][28]

// ---- No Pooling ----
Pool none{}; none.M = 1; none.T = 1;

// ---- Stem: Conv3x3 (3->8) + BN + ReLU ----
ConvMem stem{};
stem.K = 3; stem.P = 1; stem.S = 1;
stem.weight = w01;
stem.bias   = b01;
stem.act    = ACT_NONE;

// ---- Dense: 24 -> 8 ----
FCNMem head{};
head.weight = w14;   // row-major [8,24]
head.bias   = b02;   // 8
head.act    = ACT_NONE;

uint16_t W = noodle_conv_float(FEAT_A, 3, 8, FEAT_B, 28, stem, none, nullptr);
noodle_bn_relu(FEAT_B, 8, W, bn01, 1e-3f);

// Ping-pong buffers
float *in  = FEAT_B;
float *out = FEAT_A;

// ---- B1 (8->8, stride 1) ----
W = noodle_dw_pw_block(in, out, W, 8, 8, 1, w02, bn02, w03, bn03);
// ---- B2 (8->8, stride 1) ----
W = noodle_dw_pw_block(in, out, W, 8, 8, 1, w04, bn04, w05, bn05);
// ---- B3 (8->16, stride 2) ----
W = noodle_dw_pw_block(in, out, W, 8, 16, 2, w06, bn06, w07, bn07);
// ---- B4 (16->16, stride 1) ----
W = noodle_dw_pw_block(in, out, W, 16, 16, 1, w08, bn08, w09, bn09);
// ---- B5 (16->24, stride 2) ----
W = noodle_dw_pw_block(in, out, W, 16, 24, 2, w10, bn10, w11, bn11);
// ---- B6 (24->24, stride 1) ----
W = noodle_dw_pw_block(in, out, W, 24, 24, 1, w12, bn12, w13, bn13);
// ---- GAP: (W x W x 24) -> (24,) in-place ----
W = noodle_gap(in, 24, W);
// ---- Dense: (24,) -> (8,) ----
W = noodle_fcn(in, W, 8, out, head, nullptr);
// Softmax in-place on logits
noodle_soft_max(out, 8);
⋮
⋮