Emergent communication (EC) is the field that seeks to understand the mechanisms behind the emergence and evolution of natural language. In EC, the de facto standard has been using sequential architectures that have not explicitly incorporated the "tree-structured hierarchy" inherent in human language. This study utilizes a stack-based model called RL-SPINN, which learns tree structures through reinforcement learning without ground-truth parsing data, and acquires sentence representations according to these structures. We use this model as the basis for the understanding agents and investigate the extent to which the inductive bias of an architecture that explicitly utilizes tree structures affects the emergent language. The experimental results show that the emergent language generated by our model exhibits higher communication accuracy than those generated by other baselines in some settings. This work is the first to focus on the tree-structured hierarchy of language and suggests new directions for future research in EC.