The Summary

The single 3×3 filter detected blight boundaries. But Kvrothja's fields had more going on than boundaries.

Kvrothja: Your system detects the edge of a cluster well. But it misses isolated blighted plots — single sick plants with no cluster around them. And it does not detect the density of a cluster — a sparse cluster of three plots is different from a dense cluster of twelve.

Trviksha needed the network to detect multiple kinds of patterns, not just one.

Many Filters

She added more scanning teams. Instead of one 3×3 filter sliding across the field, she trained four filters simultaneously, each with its own nine weights:

Filter A: Detected blight boundaries — the advancing edge of a cluster. Filter B: Detected isolated blight — single blighted plots surrounded by healthy ones. Filter C: Detected dense clusters — areas where many adjacent plots were all blighted. Filter D: Detected healthy corridors — strips of healthy plots between two blighted areas.

Each filter scanned the entire field independently, producing its own grid of outputs. Where Filter A saw a strong boundary, its output was high at that position. Where Filter B saw an isolated case, its output was high there instead. Each filter "saw" the field through its own lens.

Trviksha: Each filter produces a map. Not a map of the field itself, but a map of where that filter's pattern appears. Filter A's map shows where the boundaries are. Filter B's map shows where the isolated cases are. Four filters, four maps.

Blortz: Four views of the same field, each highlighting a different kind of pattern.

These output grids — one per filter — were feature maps. Each feature map was the same size as the input field (roughly), but instead of showing raw blight status, it showed the strength of a particular pattern at each location. The feature maps were the network's internal representation of the field — translated from "what is here" into "what patterns are here."

Too Much Detail

Four feature maps, each 18×18 (slightly smaller than the 20×20 input because the 3×3 window could not extend beyond the edges). That was four times 324 values — nearly thirteen hundred numbers per field. Kvrothja did not need this level of detail.

Kvrothja: I do not treat individual plots. I treat quadrants. My workers can apply soil treatment to a section of the field — say, a 5×5 area. I need to know which sections are at risk, not which individual plots.

Trviksha needed to compress the feature maps. The question was how to reduce the resolution without losing the important information.

Pooling

She divided each feature map into non-overlapping 3×3 blocks. For each block, she took a single summary value: the maximum. If any position within the block showed a strong pattern, the block's summary was high. If no position showed the pattern, the summary was low.

Trviksha: The question changes from "is there blight at this exact position?" to "is there blight somewhere in this block?" The answer is yes if any plot in the block is affected. Taking the maximum preserves the strongest signal while discarding the precise location within the block.

Each 18×18 feature map, divided into 3×3 blocks, became a 6×6 summary. Four feature maps became four 6×6 summaries — 144 values instead of 1,296. A ninefold reduction in information, but the critical patterns survived.

A feature map (18×18 grid of varying shades) being divided into 3×3 blocks. Each block is replaced by a single value — the maximum within the block. The resulting 6×6 grid is much smaller but preserves the rough locations of the strongest patterns. A velociraptor picks the brightest pebble from each block

Kvrothja: Now your 6×6 summary roughly corresponds to my treatment sections. Each cell in the summary tells me whether that section of the field has a blight pattern worth treating.

Blortz: You threw away the fine detail and kept the coarse structure. Where in the block the pattern appeared — top-left or bottom-right — is lost. But whether the pattern appeared is preserved.

Trviksha: For Kvrothja's purposes, the coarse structure is enough. She cannot treat individual plots anyway. The summary matches the resolution of her intervention.

The Pipeline

The full system was now a pipeline:

Raw field grid (20×20) enters.
Four convolutional filters scan the grid, producing four feature maps (18×18 each).
Pooling compresses each feature map to 6×6.
The four 6×6 summaries (144 values total) feed into a standard hidden layer.
The hidden layer produces a final classification: which sections of the field need treatment.

The convolutional filters detected patterns. The pooling compressed the patterns. The hidden layer combined them into a decision. Each stage had a clear role, and together they processed the spatial data in a way that no fully connected network could match.

Trviksha has built two key components of a convolutional neural network. Feature maps are the output of each convolutional filter — they show where in the input each pattern appears, rather than what the raw input looks like. Multiple filters produce multiple feature maps, each highlighting a different kind of pattern. Pooling compresses these maps by replacing a small region with a single summary value (typically the maximum or the average), reducing the spatial resolution while preserving the important signals. This serves two purposes: it makes the computation cheaper (fewer values to process in later layers), and it provides a degree of positional tolerance — if a pattern shifts by one or two positions, the pooled summary remains the same. Max pooling answers the question "is this pattern anywhere in this region?" rather than "is this pattern at this exact position?" Modern convolutional networks for image recognition (like those that identify objects in photographs) use exactly this architecture: convolve, pool, convolve, pool, then classify. Can you think of a situation where a rough summary of a region is more useful than a precise description of each point within it?