We used cross-modal generative AI models, which rely on the
Contrastive Language-Image Pretraining (CLIP) encoder, to
generate portraits of fictional characters based on their names.
We then studied to what extent image generation captures
names' gender and age connotations when information from
linguistic distribution is rich and informative (talking names,
e.g., Bolt), present but possibly uninformative (real names,
e.g., John), and absent (made-up names, e.g., Arobynn). Three
pre-trained Computer Vision classifiers for each attribute ex-
hibit reliable agreement in classifying generated images, also
for made-up names. We further show a robust correlation
between the classifiers' confidence in detecting an attribute
and the ratings provided by participants in an online survey
about how suitable each name is for characters bearing a cer-
tain attribute. These models and their learning strategies can
shed light on mechanisms that support human learning of non-
arbitrary form-meaning mappings.