Part 20 of 58
The Summary
By Madhav Kaushish · Ages 12+
The single 3×3 filter detected blight boundaries. But Kvrothja's fields had more going on than boundaries.
Kvrothja: Your system detects the edge of a cluster well. But it misses isolated blighted plots — single sick plants with no cluster around them. And it does not detect the density of a cluster — a sparse cluster of three plots is different from a dense cluster of twelve.
Trviksha needed the network to detect multiple kinds of patterns, not just one.
Many Filters
She added more scanning teams. Instead of one 3×3 filter sliding across the field, she trained four filters simultaneously, each with its own nine weights:
Filter A: Detected blight boundaries — the advancing edge of a cluster. Filter B: Detected isolated blight — single blighted plots surrounded by healthy ones. Filter C: Detected dense clusters — areas where many adjacent plots were all blighted. Filter D: Detected healthy corridors — strips of healthy plots between two blighted areas.
Each filter scanned the entire field independently, producing its own grid of outputs. Where Filter A saw a strong boundary, its output was high at that position. Where Filter B saw an isolated case, its output was high there instead. Each filter "saw" the field through its own lens.
Trviksha: Each filter produces a map. Not a map of the field itself, but a map of where that filter's pattern appears. Filter A's map shows where the boundaries are. Filter B's map shows where the isolated cases are. Four filters, four maps.
Blortz: Four views of the same field, each highlighting a different kind of pattern.
These output grids — one per filter — were feature maps. Each feature map was the same size as the input field (roughly), but instead of showing raw blight status, it showed the strength of a particular pattern at each location. The feature maps were the network's internal representation of the field — translated from "what is here" into "what patterns are here."
Too Much Detail
Four feature maps, each 18×18 (slightly smaller than the 20×20 input because the 3×3 window could not extend beyond the edges). That was four times 324 values — nearly thirteen hundred numbers per field. Kvrothja did not need this level of detail.
Kvrothja: I do not treat individual plots. I treat quadrants. My workers can apply soil treatment to a section of the field — say, a 5×5 area. I need to know which sections are at risk, not which individual plots.
Trviksha needed to compress the feature maps. The question was how to reduce the resolution without losing the important information.
Pooling
She divided each feature map into non-overlapping 3×3 blocks. For each block, she took a single summary value: the maximum. If any position within the block showed a strong pattern, the block's summary was high. If no position showed the pattern, the summary was low.
Trviksha: The question changes from "is there blight at this exact position?" to "is there blight somewhere in this block?" The answer is yes if any plot in the block is affected. Taking the maximum preserves the strongest signal while discarding the precise location within the block.
Each 18×18 feature map, divided into 3×3 blocks, became a 6×6 summary. Four feature maps became four 6×6 summaries — 144 values instead of 1,296. A ninefold reduction in information, but the critical patterns survived.

Kvrothja: Now your 6×6 summary roughly corresponds to my treatment sections. Each cell in the summary tells me whether that section of the field has a blight pattern worth treating.
Blortz: You threw away the fine detail and kept the coarse structure. Where in the block the pattern appeared — top-left or bottom-right — is lost. But whether the pattern appeared is preserved.
Trviksha: For Kvrothja's purposes, the coarse structure is enough. She cannot treat individual plots anyway. The summary matches the resolution of her intervention.
The Pipeline
The full system was now a pipeline:
- Raw field grid (20×20) enters.
- Four convolutional filters scan the grid, producing four feature maps (18×18 each).
- Pooling compresses each feature map to 6×6.
- The four 6×6 summaries (144 values total) feed into a standard hidden layer.
- The hidden layer produces a final classification: which sections of the field need treatment.
The convolutional filters detected patterns. The pooling compressed the patterns. The hidden layer combined them into a decision. Each stage had a clear role, and together they processed the spatial data in a way that no fully connected network could match.