Skip to main content
Download PDF
- Main
Multiple Language Gender Identification for Blog Posts
Abstract
In data-driven gender identification, it has been so far largely assumed that the same types of (mostly content-oriented) data features can be used to differentiate between male and female authors. In most cases, this distinction is done in a monolingual scenario. In this work, we discuss a set of features that distinguish between genders in six different datasets of blog data in English, Spanish, French, German, Italian and Catalan with accuracies that range from 77% to 88%. Using a reduced set of language-independent structural features in a multilingual scenario we first identify the gender and then the gender and language of the author, achieving accuracies higher than 74%.
Main Content
For improved accessibility of PDF content, download the file to your device.
Enter the password to open this PDF file:
File name:
-
File size:
-
Title:
-
Author:
-
Subject:
-
Keywords:
-
Creation Date:
-
Modification Date:
-
Creator:
-
PDF Producer:
-
PDF Version:
-
Page Count:
-
Page Size:
-
Fast Web View:
-
Preparing document for printing…
0%