← back to estimator

Founding-lineage-diaspora · methodology

The estimator answers a single question: how many people alive today have at least one ancestor who was alive and resident in the territory of the Thirteen Colonies in 1776? "At least one" is a binary criterion. Post-1776 immigration does not dilute the ancestry; it only adds non-ancestor lineages.

1. Binary-ancestor identity

For a person p with up to 2^g ancestor slots g generations back, under independent random mating with per-slot colonial probability :

P(≥1 colonial ancestor | p) = 1 − (1 − p̂)^(2^g)

At eight generations the slot count is up to 256, so even a small per-slot probability gives a very high marginal probability of at least one match. Real pedigrees collapse — cousin marriage and shared ancestors mean 2^g overestimates the count of distinct ancestors. The engine exposes effective_slot_factor (default 0.5) as a knob for this.

2. Race-specific reconciliation

The published anchor is national: ~60% of present-day Americans carry at least one colonial ancestor (Ancestry.com, 2010). Black Americans descended from people enslaved in the U.S. carry the trait at ~93% (Pew, ancestry literature). Hispanic, Asian, and "other" Americans are dominated by post-1900 arrivals and carry it at far lower rates. The reconciliation identity is:

0.60 = r_w·f_w + r_b·f_b + r_h·f_h + r_a·f_a + r_o·f_o

With current census shares (white-NH 58.3%, Black 12.3%, Hispanic 19.5%, Asian 6.4%, other 3.5%) and the priors above, the back-solved white-American rate (f_w) lands near 0.80. The reconciler must be feasible — if the implied f_w falls outside [0, 1] the priors are mutually inconsistent and the page surfaces a warning.

3. Global propagation

For each receiving country and each decade-stamped emigration flow, the propagator forward-projects descendants alive today using a per-generation multiplier r over a 25-year generation. Each flow is scaled by the decade's ancestry_share — the fraction of those emigrants who themselves already carried at least one colonial ancestor when they left. That share is high for the early 1800s (mostly old-stock) and drops through the 1850–1920 mass-immigration window.

4. Who actually lived in the territory in 1776

With Indigenous nations residing inside the colonies' claimed territory counted alongside the U.S. Census Bureau's enumerated colonial population, the territorial denominator is roughly 2.65 million:

"White European" was not a homogeneous category in 1776. The estimator subdivides it into ten ethnic-origin cohorts using the Purvis (1984) revision of the 1909 Census Bureau surname analysis, cross-checked against McDonald & McDonald (1980) and Gibson & Jung (Working Paper 56). Middle-of-band synthesis with unassignable surnames distributed pro rata:

These bands are wide; treat the percentages as ±2-3 percentage points each. The point of the subdivision is to surface the heterogeneity, not to claim precision the underlying surname analyses don't support.

The Jewish split into Sephardic and Ashkenazi follows Eli Faber's A Time for Planting (1992): of the ~2,000 Jews in the 13 colonies in 1776, roughly half were Ashkenazi by birth (German-speaking origin, with the Sheftalls of Savannah, Haym Salomon, and Jonas Phillips as notable figures), and roughly half were Sephardic (Iberian descent via Amsterdam, London, and the Caribbean — Aaron Lopez of Newport, Francis Salvador of South Carolina). Sephardic liturgy and communal leadership remained dominant across all five colonial congregations even where the membership was already half Ashkenazi.

The estimator's colonial_1776.json exposes these as a racial_distribution block; the notable_residents_1776.json file lists named non-white-non-Black individuals who were resident in the territory in 1776 — almost entirely Indigenous leaders (Joseph and Molly Brant; Cornplanter; Red Jacket; Logan; Daniel Nimham; Hendrick Aupaumut; Samson Occom; Joseph Johnson; Attakullakulla; Old Tassel; Dragging Canoe; Nanyehi (Nancy Ward); Alexander McGillivray; Hopothle Mico; Cunne Shote; Paul Cuffe; Catawba leaders), with an honest note that no individually attested Asian residents of the 13 colonies in 1776 appear in surviving records.

5. By U.S. state & the hero choropleth

The hero map at the top of the estimator combines two choropleth layers on a black background:

Country and state outlines come from Natural Earth 1:110m public-domain vector data (naturalearthdata.com), fetched at build time, slimmed (properties dropped, coordinates rounded to ~1 km), and shipped as a single ~220 KB GeoJSON bundle alongside the estimator dataset. Sliders on the page re-bin and re-paint both layers live.

5B. per-state methodology

Per-state ancestry rates apply the same race-share reconciler to each state's ACS demography, with one state-level adjustment: a hand-curated colonial_stock_factor that scales the national white-American rate f_w up or down based on the state's settlement history. Non-white per-state priors fall back to national defaults.

f_state = r_w · (f_w_national · stock_factor) + r_b · 0.93 + r_h · 0.05 + r_a · 0.02 + r_o · 0.30

The factor takes seven discrete values on a single tier ladder:

These tiers are coarse on purpose. The methodology cannot resolve a state to within 2-3 percentage points without state-level ancestry surveys we don't have. Two limitations to call out:

The state table on the estimator page sorts by ancestry rate, ancestor count, alphabet, or statehood era; sliders above re-run all 50 states + DC live.

6. Loyalist correction

Roughly 60–80,000 Loyalists left during and after the Revolution, primarily to Canada (Maritimes and Upper Canada), the UK, and the Bahamas. Their ancestry was 100% pre-1776 by definition. The engine treats them as a separate seeded cohort with ancestry_share = 1.0 and propagates them forward independently. The Bahamas line in the country table is dominated by this cohort.

7. Uncertainty

A 400-sample Monte Carlo sweep over the three principal knobs produces the 90% band shown alongside the headline figure:

This is a knob sweep, not a calibrated posterior. No survey-grade dataset exists to calibrate against, so the band is honest about the assumption space rather than a Bayesian inference of the true value.

8. Data sources

9. Open questions

  1. Indigenous Americans alive in 1776. The default counts them toward the "ancestor in the colonies" criterion: they were resident in the territory. This raises descendant figures for Native and many mixed-ancestry Americans substantially.
  2. Territorial scope. The default is the 13 Colonies. The engine can be re-run with the broader "all land currently U.S." definition, which adds Indigenous populations from territories acquired later (Louisiana, Alaska, Hawai'i, the Southwest) and pre-1776 colonial populations from New France, New Spain, and Russian America.
  3. Pedigree collapse. The effective_slot_factor handles this crudely. A more careful treatment would model regional endogamy and the founder effect at the colony level.

10. Engine source

The engine lives under fld_engine/ in this repo. The frontend consumes the precomputed bundle at /projects/founding-lineage-diaspora/data.json and re-runs a port of the lineage math in JavaScript so the sliders update live. Tests live under fld_engine/tests/ and assert the reconciler identity, lineage conservation, and the directional sensitivity bounds.