Common, classical computer architectures are based upon few computational cores that collaborate and communicate through larger, slower system memory. In this work, we introduce a configurable, checkerboard grid of processing cells architecture with distributed cores and memories designed to maximize the benefits of parallelization. We explore the checkerboard model and a classical model at a high level to compare their behaviors in a moderately parallelized JPEG encoder application benchmark. The models are simulated with a Loosely-Timed, SystemC TLM-2.0 test platform with timing by processor core, memory, and memory controller, and transaction. Our experimental results show a 66% faster execution speed and higher memory bandwidth headroom for the checkerboard architecture, compared to the classical architecture.