Chemistry Climate Model (CCM) codes are important \cite{csusft2cbooatpfccm} to understand how to mitigate global warming \cite{ifcoccmaapoape,pasgw}. However, to produce meaningful results, CCM codes require performance in the scale of PetaFlop (and soon ExaFlop) \cite{eesiteri} calculations. These scales have to be achieved within a reasonable power budget. It is therefore important to speedup as much as possible the execution of the already highly optimized state-of-the-art CCM codes.
Fast-J \cite{tfs} is a very important and widely used CCM code for simulations at different scales of magnitude, i.e., local, global and cosmic. Each scale of magnitude and its performance rely on a code, the Fast-J core code.
In this thesis, we speedup the Fast-J core, selecting and implementing some high-level compiler transformations, which are not efficiently performed by CPU compilers.
The optimizations consistently reach a performance speedup always greater than 4.5 for each scale of simulation. To quantify this performance improvement, we compare the execution times of the new optimizations with the execution times of the state-of-the-art, already highly optimized, CPU multi-threading code.