Code summarization and generation are valuable tasks to master for their wide range ofapplications in code readability and code translation to name a few. This research work
is an extension of previously conducted research on the use of PLBART, a sequence-tosequence transformer model used for a variety of program and language understanding and
generation (PLUG) tasks. The ultimate goal is to improve the performance of PLBART
by modifying the noise function of it’s denoising autoencoder. The current noise function
corrupts code tokens randomly, but we hope to improve performance by masking nodes on
the corresponding Abstract Syntax Tree (AST) instead.To integrate the AST structure into
the self-attention mechanism, we adopt the dependency-guided self-attention mechanism explored in NLP literature in particular [ZKC21]. However, from the AST, we cannot compute
distances between all tokens that appear in a code since they need not necessarily appear
in the parse tree structure. So, we investigate how we can derive distances between tokens
from the AST structure.