Recognition of animacy is fundamental to human cognition, yet robots complicate this categorization, because they are non-living objects with human-like traits. We examined this categorization of robot animacy using speech balloons from comics, which require connecting to animate “stems” (speakers), or coercing inanimate objects to become animate (e.g., a talking toaster). Participants rated the text-image congruity of silhouettes of humans, inanimate objects, and robots paired with descriptive words placed in either a speech balloon or a label box. Overall, humans and object text-image pairs were rated as more congruent than those with robots. However, a positive correlation suggested human-looking robots with balloons were more congruent than less human-looking ones, but such a graded congruency did not appear with labels. This suggested that speech balloons select for an animate stem compared to labels, but also that intuitions for animacy in robots falls along a gradient depending on their human-like traits.