An exploration of baby name trends in the USA from 1880 to 2025
Baby name rankings are very popular on the internet and on social media. A typical ranking is the top baby names for each gender in the USA in 2025 (source):
| Ranking | Male | Female |
|---|---|---|
| 1 | Liam | Olivia |
| 2 | Noah | Charlotte |
| 3 | Oliver | Emma |
| 4 | Theodore | Amelia |
| 5 | Henry | Sophia |
| 6 | James | Mia |
| 7 | Elijah | Isabella |
| 8 | Mateo | Evelyn |
| 9 | William | Sofia |
| 10 | Lucas | Eliana |
What these rankings often don’t show is the quantitative values behind these rankings. That is, how many babies were given each name in the year 2025, and how does that compare to previous years?
Thankfully the USA Social Security Administration (SSA) releases the full dataset for baby names going back to 1880. I took this data, and summed up the counts of the top ranked names each year, and plotted them against each other. See the above graph.
From this graph we can see that the popular names are getting less popular, while the number of births is still in line with what it was 50 years ago. This is something that plain rankings obscure.
If “Liam” or “Olivia” were ranked in previous years with the same frequency as 2025, they would be ranked noticeably lower than #1. For example, in 2000 “Liam” would have been ranked 11th and “Olivia” would have been ranked 12th. In 1950, “Liam” would have been ranked 19th and “Olivia” 27th. But names are more spread out these days, so their frequencies are sufficient to be ranked #1 in 2025.
The other sections present other charts and insights derived from the data.
These are the top level findings:
The data is the “National data” released by the USA Social Security Administration (SSA) at SSA: Beyond the Top 1000 Names, released in 2026. It has the following limitations:
The dataset consists of 146 CSV files for each year from 1880 to 2025. Each CSV file has three columns: name, gender (M or F) and frequency. The CSV files are ordered by gender (F than M), then descending frequency and then alphabetically for tied frequencies.
I used a dense ranking. This means that identical counts are given the same rank and there are no gaps in the rankings. So the top 100 ranked names could account for more than 100 names in a given year. However, ties in the top 100 names are rare, but they get more common as the counts get lower. At the lowest values there can be up to 2000 names per ranking.
No single year exceeded a dense ranking of 1000 per gender despite some years having move than 20,000 unique names for a single gender.
I used Julia to analyse the data. I will not present the full code here, but here are some snippets.
Loading a single file and transforming:
using CSV, DataFrames
filepath = joinpath(data_dir, "yob2025.txt")
df = CSV.read(filepath, DataFrame; header=["Name", "Gender", "Count"])
transform!(
df,
:Name => ByRow(x -> x[1]) => :FirstLetter,
:Name => ByRow(length) => :Length,
) # name composition
transform!(groupby(df, :Gender),
:Count => cumsum => :CumulativeCount,
:Count => (x -> x ./ sum(x)) => :Frequency,
:Count => (x -> denserank(x, rev=true)) => :Rank,
) # gender ranks
transform!(groupby(df, :Gender), :Frequency => cumsum => :CumulativeFrequency)Filtering on gender:
m_df = filter(:Gender => ==("M"), df);
f_df = filter(:Gender => ==("F"), df);Quantiles:
idx = something(
findfirst(m_df.CumulativeFrequency .>= 0.5),
nrow(m_df)
) # rank of 50% quantile / medianLoading multiple files and joining into one large dataframe:
using CSV, DataFrames
using Parquet2
filepath = joinpath(data_dir, "yob1880.txt")
df = CSV.read(filepath, DataFrame; header=["Name", "Gender", "1880"])
for year in 1881:2025
print("$(year), ")
filepath_next = joinpath(data_dir, "yob$year.txt")
next_df = CSV.read(filepath_next, DataFrame; header=["Name", "Gender", "$year"])
df = outerjoin(df, next_df, on=[:Name, :Gender])
end
year_matrix = df[:, string.(1880:2025)]
df.Total = sum(eachcol(coalesce.(year_matrix, 0)))
df.Count = sum(eachcol(.!ismissing.(year_matrix)))
size(df) # (117820, 150)
Parquet2.writefile("names_ssa_1880-2025.parquet", df)Transforming the joint dataframe:
transform!(groupby(df, :Gender),
:Total => (x -> denserank(x, rev=true)) => :TotalRank,
) # gender total ranks
transform!(groupby(df, :Gender),
[y => (x -> denserank(x, rev=true)) => "Rank$y" for y in years]...
) # gender yearly ranks
transform!(df,
:Name => ByRow(length) => :Length,
:Name => ByRow(x -> x[1]) => :FirstLetter,
); # name compositionExport data to JSON:
using JSON
out = Dict{String, Any}("years"=> 1880:2025)
names_to_save = Dict("M"=> ["John"], "F" => ["Mary"])
for gender in ["M", "F"]
out[gender] = Dict{String, Any}()
gender_df = gender == "M" ? m_df : f_df
for name in names_to_save[gender]
idx = findfirst(gender_df.Name .== name)
out[gender][name] = Dict(
"count" => Vector(gender_df[idx, years]),
)
end
end
JSON.json("output/names.json", out)The “Top 1” dataset from the Top N graph can be decomposed into the top names each year. This produces the following graphs:
These graphs show how relatively “unpopular” the most popular names are now compared to the popular names of the 1900s.
The number of names that have reached the top spot is very small. There are only 19 in total, 8 boy names and 11 girl names. Mary alone was the #1 girls name for 76 years, more than half the total period from 1880 to 2025.
Here is how these top yearly names are ranked across all 146 years:
| Rank | M | Total Rank | F | Total Rank |
|---|---|---|---|---|
| 1 | James | 1 | Mary | 1 |
| 2 | John | 2 | Jennifer | 4 |
| 3 | Robert | 3 | Linda | 5 |
| 4 | Michael | 4 | Jessica | 11 |
| 5 | David | 6 | Lisa | 16 |
| 6 | Jacob | 29 | Emily | 18 |
| 7 | Noah | 63 | Ashley | 20 |
| 8 | Liam | 97 | Emma | 28 |
| 9 | Olivia | 51 | ||
| 10 | Sophia | 82 | ||
| 11 | Isabella | 86 |
There are gaps here because many popular names have never been ranked #1. For example, “Elizabeth” is the overall #2 female name, but there was never a year it was ranked #1.
The number of unique names has grown from 2000 names in 1880 to over 31,000 names in 2025. Every year has seen names added and removed from the list, with up to 4,000 removed and added each year in the 2020s. Overall there are 117,820 unique names in the datasets. Of these 31,227 (26.5%) are represented in 2025.
The above graph shows how skewed the dataset is, with the top 75% quantile line (75% of all baby births) hovering at around 4% of all names.
Girl names are consistently slightly more diverse than boy names. Over the whole period there were on average 40% more girl names than boy names each year. Over the last 5 years, there were 24% more girl names on average.
Some insight can be gained by looking at the ratio of the total count each year (the total number of births) to the count of unique names each year, taking into mind the heavy data skew towards popular names. From this graph we can see that names were most concentrated in the 1950s, with about 470 names per birth for boys and 300 names per birth for girls. These ratios have come down almost 4×, and now sits at 120 for boys and 91 for girls. This implies a greater diversity in naming in recent years.
We can also investigate the composition of the names in the dataset. Here I do so for the first letter and also for the name length.
Over the whole period the most popular first letter for boy names was “A” (Anthony, Andrew, Alexander) and “J” (James, John, Joseph), and for girls was also “A” (Anna, Ashley, Amanda) followed by “S” (Susan, Sarah, Sandra).
If we were to take a random person at any year in the period, for a man their name would most likely start with a “J” while for a woman it would most likely start with an “M” (Mary, Margaret, Michelle).
The names vary in length from 2 letters (Al, Ty, Jo, Lu) to 15. (Many of the 15 letter names look like concatenations of shorter names and might be mistakes e.g. Muhammadibrahim, Christopherjohn, Mariadelosangel.) Most names are 5 to 8 letters long.
My name is one of the many rare names. It is a Hebrew name, spelt as ליאור and transliterated as “Lior” or “Leor”. It is gender neutral. There is also a female only version, ליאורה, which is transliterated as “Liora” or “Leora”.
The data shows that “Leora” has been used in the USA since at least 1880, but “Leor” was only first used in 1979. It is ever so slightly gaining in popularity, with 93 baby boys and 473 girls given a variation of the name in 2025. For girls, the “Liora” spelling recently overtook “Leora” in popularity.