Founding-lineage-diaspora · methodology
The estimator answers a single question: how many people alive today have at least one ancestor who was alive and resident in the territory of the Thirteen Colonies in 1776? "At least one" is a binary criterion. Post-1776 immigration does not dilute the ancestry; it only adds non-ancestor lineages.
1. Binary-ancestor identity
For a person p with up to 2^g ancestor slots
g generations back, under independent random mating with per-slot
colonial probability p̂:
P(≥1 colonial ancestor | p) = 1 − (1 − p̂)^(2^g)
At eight generations the slot count is up to 256, so even a small per-slot
probability gives a very high marginal probability of at least one match. Real
pedigrees collapse — cousin marriage and shared ancestors mean
2^g overestimates the count of distinct ancestors. The
engine exposes effective_slot_factor (default 0.5) as a knob for
this.
2. Race-specific reconciliation
The published anchor is national: ~60% of present-day Americans carry at least one colonial ancestor (Ancestry.com, 2010). Black Americans descended from people enslaved in the U.S. carry the trait at ~93% (Pew, ancestry literature). Hispanic, Asian, and "other" Americans are dominated by post-1900 arrivals and carry it at far lower rates. The reconciliation identity is:
0.60 = r_w·f_w + r_b·f_b + r_h·f_h + r_a·f_a + r_o·f_o
With current census shares (white-NH 58.3%, Black 12.3%, Hispanic 19.5%, Asian
6.4%, other 3.5%) and the priors above, the back-solved white-American rate
(f_w) lands near 0.80. The reconciler must be feasible —
if the implied f_w falls outside [0, 1] the priors are mutually
inconsistent and the page surfaces a warning.
3. Global propagation
For each receiving country and each decade-stamped emigration flow, the
propagator forward-projects descendants alive today using a per-generation
multiplier r over a 25-year generation. Each flow is scaled by the
decade's ancestry_share — the fraction of those emigrants who
themselves already carried at least one colonial ancestor when they left. That
share is high for the early 1800s (mostly old-stock) and drops through the
1850–1920 mass-immigration window.
4. Who actually lived in the territory in 1776
With Indigenous nations residing inside the colonies' claimed territory counted alongside the U.S. Census Bureau's enumerated colonial population, the territorial denominator is roughly 2.65 million:
- White European descent — ~2,000,000 (~75.5%)
- Black, enslaved, of African descent — ~450,000 (~17.0%)
- Black, free — ~50,000 (~1.9%)
- Indigenous, within colonial claim — ~150,000 (~5.7%, 90% range 100–200k)
- Asian and other — a few hundred individuals, statistical noise
"White European" was not a homogeneous category in 1776. The estimator subdivides it into ten ethnic-origin cohorts using the Purvis (1984) revision of the 1909 Census Bureau surname analysis, cross-checked against McDonald & McDonald (1980) and Gibson & Jung (Working Paper 56). Middle-of-band synthesis with unassignable surnames distributed pro rata:
- English — ~1,100,000 (~55% of white)
- Scots-Irish (Ulster Scots) — ~180,000 (~9%)
- German (Palatine and Swiss-German) — ~160,000 (~8%)
- Scottish — ~140,000 (~7%)
- Welsh — ~80,000 (~4%)
- Irish (non-Ulster, largely Catholic) — ~70,000 (~3.5%)
- Dutch — ~60,000 (~3%)
- French (Huguenot and other) — ~40,000 (~2%)
- Swedish and Finnish — ~20,000 (~1%)
- Sephardic Jewish — ~1,000 (~0.05%)
- Ashkenazi Jewish — ~1,000 (~0.05%)
- Other European — ~148,000 (~7.4%, the unassignable residue)
These bands are wide; treat the percentages as ±2-3 percentage points each. The point of the subdivision is to surface the heterogeneity, not to claim precision the underlying surname analyses don't support.
The Jewish split into Sephardic and Ashkenazi follows Eli Faber's A Time for Planting (1992): of the ~2,000 Jews in the 13 colonies in 1776, roughly half were Ashkenazi by birth (German-speaking origin, with the Sheftalls of Savannah, Haym Salomon, and Jonas Phillips as notable figures), and roughly half were Sephardic (Iberian descent via Amsterdam, London, and the Caribbean — Aaron Lopez of Newport, Francis Salvador of South Carolina). Sephardic liturgy and communal leadership remained dominant across all five colonial congregations even where the membership was already half Ashkenazi.
The estimator's colonial_1776.json exposes these as a
racial_distribution block; the
notable_residents_1776.json file lists named non-white-non-Black
individuals who were resident in the territory in 1776 — almost entirely
Indigenous leaders (Joseph and Molly Brant; Cornplanter; Red Jacket; Logan;
Daniel Nimham; Hendrick Aupaumut; Samson Occom; Joseph Johnson;
Attakullakulla; Old Tassel; Dragging Canoe; Nanyehi (Nancy Ward); Alexander
McGillivray; Hopothle Mico; Cunne Shote; Paul Cuffe; Catawba leaders), with
an honest note that no individually attested Asian residents of the 13
colonies in 1776 appear in surviving records.
5. By U.S. state & the hero choropleth
The hero map at the top of the estimator combines two choropleth layers on a black background:
- U.S. states — colored by quintile of
f_stateusing a cool ramp (dark navy → cyan). - World countries — colored by quintile of descendants per 1,000 residents using a warm ramp (dark brown → amber).
Country and state outlines come from Natural Earth 1:110m public-domain vector data (naturalearthdata.com), fetched at build time, slimmed (properties dropped, coordinates rounded to ~1 km), and shipped as a single ~220 KB GeoJSON bundle alongside the estimator dataset. Sliders on the page re-bin and re-paint both layers live.
5B. per-state methodology
Per-state ancestry rates apply the same race-share reconciler to each
state's ACS demography, with one state-level adjustment: a hand-curated
colonial_stock_factor that scales the national white-American
rate f_w up or down based on the state's settlement history.
Non-white per-state priors fall back to national defaults.
f_state = r_w · (f_w_national · stock_factor) + r_b · 0.93 + r_h · 0.05 + r_a · 0.02 + r_o · 0.30
The factor takes seven discrete values on a single tier ladder:
- 1.10 — Upland South Anglo-Celtic core: KY, TN, WV, AR, MS, AL, NC. American-ancestry response well above 15%, low post-1900 immigration, deep continuous old-stock fabric.
- 1.05 — Border / Old South + Lower Midwest old-stock heartland: VA, SC, GA, MO, OH, IN, PA, DE, ME, NH, VT.
- 1.00 — Mainstream: most states.
- 0.92 — Heavy 19th-century European immigration: NY, NJ, FL, TX. Substantial old-stock present, but the white-NH share itself carries lots of non-13-colony lineage.
- 0.85 — Recent (post-1965) immigration gateways: CA, WI, MN, ND, SD, NV, AZ, AK.
- 0.78 — Hispanic majority/near-majority with non-13-colony lineages: NM.
- 0.55 — Pacific / Native Hawaiian / heavy Asian fabric: HI.
These tiers are coarse on purpose. The methodology cannot resolve a state to within 2-3 percentage points without state-level ancestry surveys we don't have. Two limitations to call out:
- Hawaii's Native Hawaiian/Pacific Islander population sits in the "other" race bucket but has zero 13-colony ancestry. The national prior for "other" of 0.30 over-counts here. The 0.55 stock factor compensates indirectly but coarsely.
- New Mexico's Hispano and Pueblo populations have lineages older than the 13 colonies but unrelated to them. Same compensation pattern: an aggressive stock factor stands in for proper per-state non-white priors.
The state table on the estimator page sorts by ancestry rate, ancestor count, alphabet, or statehood era; sliders above re-run all 50 states + DC live.
6. Loyalist correction
Roughly 60–80,000 Loyalists left during and after the Revolution, primarily to
Canada (Maritimes and Upper Canada), the UK, and the Bahamas. Their ancestry
was 100% pre-1776 by definition. The engine treats them as a separate seeded
cohort with ancestry_share = 1.0 and propagates them forward
independently. The Bahamas line in the country table is dominated by this
cohort.
7. Uncertainty
A 400-sample Monte Carlo sweep over the three principal knobs produces the 90% band shown alongside the headline figure:
- Black-American enslaved-descent fraction (0.90–0.97)
- Per-decade ancestry-share decay scalar (0.85–1.10) and floor (0–0.20)
- Per-generation multiplier (1.45–1.75)
This is a knob sweep, not a calibrated posterior. No survey-grade dataset exists to calibrate against, so the band is honest about the assumption space rather than a Bayesian inference of the true value.
8. Data sources
- U.S. Census Bureau, Historical Statistics of the United States (Bicentennial Edition), decennial counts 1790–2020, ACS 2024 5-year estimates.
- American Battlefield Trust, "Population of the Colonies in 1776".
- Greene & Harrington, American Population Before the Federal Census of 1790.
- Brown, The King's Friends: The Composition and Motives of the American Loyalist Claimants (1965).
- Norton, The British-Americans: The Loyalist Exiles in England, 1774-1789 (1972).
- Saunders, The Loyalists in the Bahamas (1983).
- Klein, A Population History of the United States (Cambridge, 2012).
- Pew Research Center, surveys on African American ancestry and identity.
- Ancestry.com, 2010 estimate of share of Americans with at least one ancestor in the colonies in 1776.
- Migration Policy Institute country profiles, U.S. State Department consular reports.
9. Open questions
- Indigenous Americans alive in 1776. The default counts them toward the "ancestor in the colonies" criterion: they were resident in the territory. This raises descendant figures for Native and many mixed-ancestry Americans substantially.
- Territorial scope. The default is the 13 Colonies. The engine can be re-run with the broader "all land currently U.S." definition, which adds Indigenous populations from territories acquired later (Louisiana, Alaska, Hawai'i, the Southwest) and pre-1776 colonial populations from New France, New Spain, and Russian America.
- Pedigree collapse. The
effective_slot_factorhandles this crudely. A more careful treatment would model regional endogamy and the founder effect at the colony level.
10. Engine source
The engine lives under fld_engine/ in this repo. The frontend
consumes the precomputed bundle at
/projects/founding-lineage-diaspora/data.json and re-runs a port
of the lineage math in JavaScript so the sliders update live. Tests live under
fld_engine/tests/ and assert the reconciler identity, lineage
conservation, and the directional sensitivity bounds.