Title: | Estimate Gender from Names in Spanish and Portuguese |
---|---|
Description: | Estimate gender from names in Spanish and Portuguese. Works with vectors and dataframes. The estimation works not only for first names but also full names. The package relies on a compilation of common names with it's most frequent associated gender in both languages which are used as look up tables for gender inference. |
Authors: | Juan Pablo Marin Diaz [aut, cre] |
Maintainer: | Juan Pablo Marin Diaz <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.1.0 |
Built: | 2025-02-17 05:00:24 UTC |
Source: | https://github.com/datasketch/genero |
Panel component for shiny panels layout
genero( nms, result_as = c(male = "male", female = "female"), lang = "es", col = NULL, na = NA, rev_weights = FALSE )
genero( nms, result_as = c(male = "male", female = "female"), lang = "es", col = NULL, na = NA, rev_weights = FALSE )
result_as |
A named vector with names c("male", "female") values can be used to override the results. |
lang |
Use "es" for Spanish (default), "pt" for Portuguese. |
col |
The name of the column with the names or full names. when the input is a data frame. |
na |
String to be used when there is not match for gender |
rev_weights |
Boolean to indicate if weights should be reversed when input names have the format Last Name First Name. |
names |
A vector or data.frame with names or full names |
A vector of data frame with the estimated gender for the input. When the input is data.frame a column is attached next to the column used for the input names with the result.
genero(c("Juan", "Pablo", "Camila", "Mariana"))
genero(c("Juan", "Pablo", "Camila", "Mariana"))
These data was collected and organized manually from multiples sources. It consists of more than 9810 names in Spanish and its corresponding associated gender accounting for name variations.
names_gender_es
names_gender_es
Data frame with two columns: name and gender.
names_gender_es
names_gender_es
These data is created and derived from https://brasil.io/dataset/genero-nomes/nomes it consists of more than 50.000 names in Portuguese and its corresponding associated gender.
names_gender_pt
names_gender_pt
Data frame with two columns: name and gender.
names_gender_pt
names_gender_pt
Which name column
which_name_column(colnames, colname_variations = NULL, show_guess = FALSE)
which_name_column(colnames, colname_variations = NULL, show_guess = FALSE)
colnames |
A vector of data.frame names. |
colname_variations |
A vector of custom names to append to the vector of frequent colnames for first names. |
show_guess |
Show message with the guessed column. |
A single colname with the match of common first name columns.
which_name_column(c("Name", "Age", "City"))
which_name_column(c("Name", "Age", "City"))