Lately I’ve been working with several different gender-inference tools, tweaking them here and there to serve my purposes. Since I’m working with a historical dataset with about eight million records, from 1800 to today, once of the packages I’m using is the gender library for R by Lincoln Mullen, which uses historical US census and Social Security information to predict the gender of a given name. This is important because some names are unisex (or androgynous), and many change their gender tendency over time.
As a diversion, I thought I’d try to find the most-changey names in the datasets used by this particular tool. I’ve graphed a selection of these below. Note that the graphs concatenate the pre-1930 IPUMS data, drawn from Census samples, and the post-1930s SSA data, so take the borderline (1920-1940) as somewhat fuzzy, and caveat emptor if you want to draw conclusions across it.
Each graph plots a +/-10 year average for each year from 1800-2000. I.e. the values for 1950 represent all people born 1940-1960, and so on. Names that were selected were at least 90% female at one point in time, and at least 90% male at another.
A couple trends to spot:
- Most shifty names move in the direction from Male->Female. I’ve included a couple that buck that trend, Auguste and Augustine, both of which have become more male.
- From WWII to the end of the Boomer generation (1960s), many previously male names became increasingly, and often exclusively, female. This group includes Stacey, Robin, Lynn, Leslie, Leigh, Lauren, and Hillary.
- A few names, like Shirley, Meredith, Fay, and Kim, become increasingly female starting in the 19th C.
- Lacey is maybe the changeyest name in this group, going back and forth from over 70% male to over 70% female at least four times throughout the period. Kerry is also up and down.