The serialization of memory accesses is a major limiting factor in high performance SIMD computers. For these machines, the data templates that are accessed by a program can be perceived. by the compiler, and therefore, the design of conflict-free storage schemes may dramatically improve performance.
The problem of finding storage schemes, with minimum hardware requirements, for accessing a set of arbitrary templates is proved to be NP-complete. To design cost-effective storage schemes, we introduce two parameters: the number of 1's in the storage matrix (affecting hardware complexity) and the access frequency of each template. Heuristics are proposed to find storage schemes with minimum hardware (Perfect Schemes) but without enforcing a high degree of conflict reduction. Another heuristic is proposed to augment perfect storage schemes by using minimum additional hardware in order to reduce the degree of conflict (Semi-Perfect Schemes).
Experimental evaluation is carried out using a Monte Carlo simulation. Performance of the proposed heuristics is compared to solutions obtained using branch-and-bound search. Results show that perfect-schemes may deviate on the average by 20% from the optimum access time in the case of 10 arbitrary templates and 16 memories. However, semi-perfect schemes lead to dramatic reduction of the degree of conflict compared to perfect-schemes. The proposed heuristic storage outperforms row-major interleaving and row-column-diagonals storage. The time complexity of the proposed heuristics is O(p(t + n) + n^2t), where t, 2^P, and n, are the number of templates, the number of processors, and the number of distinct vectors of the template bases, respectively.