Introduction ============ formative makes causal effect estimation more accessible. Causal estimation is full of statistical jargon that hinders adoption; formative's goal is to lower that barrier. Attempting to do causal estimation is better than simply comparing averages or correlations. Every analysis follows three steps. **1. Encode your causal assumptions as a DAG** Before any data is touched, you declare which variables are assumed to cause which. This makes your identification assumptions explicit and machine-readable. formative uses the DAG to determine which variables to control for (or how to use an instrument), and to check whether the data can support identification given those assumptions. Remember, the DAG is not a data model, but your assumptions about the data generating process. A partial DAG is better than no DAG. .. code-block:: python from formative.causal import DAG dag = DAG() dag.assume("proximity").causes("education") dag.assume("ability").causes("education", "income") dag.assume("education").causes("income") **2. Choose an estimator** Pass the DAG to an estimator along with your treatment and outcome. The estimator reads the DAG to determine which variables to control for (or how to use an instrument), then fits the model. If the data cannot support identification given the DAG, an error is raised before estimation runs. Choosing the right estimator for your causal question is crucial; not all methods are possible given your problem. See https://getformative.dev for an online wizard to help you choose. .. code-block:: python from formative.causal import IV2SLS result = IV2SLS( dag, treatment="education", outcome="income", instrument="proximity" ).fit(df) After you have obtained the result object, you can print a summary of the estimate and its assumptions. The assumptions are marked as testable or untestable depending on whether formative can check them. Most assumptions in causal inference are untestable by nature — things you must argue for based on domain knowledge and theory. .. code-block:: python print(result.summary()) .. code-block:: text IV (2SLS) Causal Effect: education → income Instrument: proximity ────────────────────────────────────────────────── IV estimate : 1.9643 Unadjusted estimate : 2.2598 (no controls) Confounding bias : +0.2955 Std. error : 0.0377 95% CI : [1.8905, 2.0381] p-value : 0.0000 Assumptions ┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄ [ testable ] Relevance: the instrument strongly affects treatment [ untestable ] Exclusion restriction: instrument only affects outcome through treatment [ untestable ] Independence: instrument is uncorrelated with unobserved confounders [ untestable ] Monotonicity: instrument affects treatment in same direction for everyone [ untestable ] Stable Unit Treatment Value Assumption (SUTVA) ``executive_summary()`` gives a plain-English version of the result, useful for communicating with non-technical audiences. **3. Refute** Once you have a result, run statistical checks that probe whether its assumptions hold in the data. Each check returns a clear pass or fail — but no set of tests can guarantee validity. Use them as diagnostics, not proof. Causal inference is a judgment call, not a mathematical certainty. .. code-block:: python report = result.refute(df) print(report.summary()) .. code-block:: text IV Refutation Report: education → income Instrument: proximity ────────────────────────────────────────────────── [PASS] First-stage F-statistic: F = 911.22 (threshold: F ≥ 10) [PASS] Random common cause: estimate shifted by 0.0001 (≤ 1 SE = 0.0377) All checks passed.