Skip to main content
Download PDF
- Main
Multiple Language Gender Identification for Blog Posts
Abstract
In data-driven gender identification, it has been so far largely assumed that the same types of (mostly content-oriented) data features can be used to differentiate between male and female authors. In most cases, this distinction is done in a monolingual scenario. In this work, we discuss a set of features that distinguish between genders in six different datasets of blog data in English, Spanish, French, German, Italian and Catalan with accuracies that range from 77% to 88%. Using a reduced set of language-independent structural features in a multilingual scenario we first identify the gender and then the gender and language of the author, achieving accuracies higher than 74%.
Main Content
For improved accessibility of PDF content, download the file to your device.
If you recently published or updated this item, please wait up to 30 minutes for the PDF to appear here.
Enter the password to open this PDF file:
File name:
-
File size:
-
Title:
-
Author:
-
Subject:
-
Keywords:
-
Creation Date:
-
Modification Date:
-
Creator:
-
PDF Producer:
-
PDF Version:
-
Page Count:
-
Page Size:
-
Fast Web View:
-
Preparing document for printing…
0%