Methodology

From billed dollars to a defensible bonus check.

The full pipeline, in seven sections. No hand-waving — every constant on this page traces back to a public source or a unit test.

§ 1The problem

Healthcare revenue cycle management splits billed charges from realized cash by three independent mechanisms: contractual fee-schedule adjustments (Medicare PFS, Medicaid programs, commercial contracts), initial denials, and adjudication lag. Each varies by payer; together they make a single “collection rate” misleading.

For a Harris County primary-care clinic, three numbers structure the gap:

Texas Medicaid pays primary care at 0.52× Medicare (Skopec, Pugazhendhi, Zuckerman, Health Affairs 2024). That alone makes the realization rate of a Medicaid-heavy clinic materially below a commercial-heavy one of equivalent volume.
Initial denial rates run 8–17% by payer (Premier 2024 hospital survey; KFF 2024). About half are reversed on appeal — terminal denial 4–8%.
Adjudication lag spans 14 to 90+ days. Medicare's clean-claim 14-day floor and Texas TDI's 30/45-day prompt-pay rules set lower bounds; the right tail is the operationally costly part.

Sizing a bonus pool against billed revenue is, mechanically, a short-term loan from the clinic to its physicians — payable at a haircut and a delay neither party has priced.

Cost Predictor's product question is sharply scoped: of one dollar billed today against the clinic's actual payer mix, how many cents will arrive — and when? The answer is a quantile forecast, not a point estimate. That distinction is load-bearing.

§ 2Architecture

The pipeline is sequenced by data dependency, not by code modules. Phase 1 produces a daily aggregate-billed series and a per-claim outstanding ledger; Phase 3 forecasts the billed series; Phase 4 convolves through Phase 5's lag curves to produce realized; the bonus-pool calculator operates on the resulting quantiles.

Fig 1 Phase 2 (denial classifier) is documented in the system design but not yet implemented; downstream phases currently treat the input billing series as already adjudicated through the per-payer denial Bernoulli inside the convolver.

Inputs

In production the ingestion layer reads 837 claim and 835 remittance EDI files (or flattened EHR/PM CSV exports keyed on Date of Service) and produces a daily aggregate. The evaluation harness substitutes Synthea v4.0.0 with a Texas-Medicaid MCO encoding plus a post-generation adjudication injector that attaches per-payer denial / lag / paid-claim realization draws.

State of Phase 2

The denial classifier (XGBoost / LightGBM, scored before forecasting) is documented in §2.1 of the system design but not yet built. Today the per-payer Bernoulli denial draw inside the Monte-Carlo convolver stands in for the classifier's expected output — a defensible simplification while terminal-denial empirical rates are bracketed at 4–8% and dwarfed by lag and realization-rate uncertainty in the pooled forecast variance.

§ 3Approach A vs Approach B

Two forecasting routes were evaluated head-to-head on the Synthea evaluation set. Both use Google Research's TimesFM 2.5 (200M parameters, native quantile head) as the time-series engine.

Approach A. Univariate forecast of realized revenue directly. The simpler model: one TimesFM fit, one quantile output, done.
Approach B. Decompose. Forecast billed revenue with TimesFM, then convolve through per-payer lag curves and a paid-claim realization factor inside a Monte-Carlo simulator to produce realized.

On pooled origin evaluation, A passes the product threshold (row breach_rate_p10 = 0.0998). When evaluation is broken out per payer — the mode clinic admins actually need — A breaches the row-level threshold on two of five payers. B passes every payer, with zero window-level breach after summed origin/horizon bonus windows.

Per-payer row breach rate at p10, Synthea, no calibration
Payer	A · TimesFM	B · TimesFM-decomposed
Commercial	0.1004 ⚠	0.0802
Medicaid MCO	0.1093 ⚠	0.0000
Medicare Advantage	0.0585	0.0000
Medicare FFS	0.0835	0.0179
Self-pay	0.0962	0.0000
Pooled rows	0.0896	0.0196
Pooled windows	0.0000	0.0000

B is structurally legible — it tells admins which payer is driving over-allocation risk — and it inherits the right substitution path: when measured 835 ERA data lands, the YAML priors are replaced in place with measured per-payer lag and realization-rate distributions; nothing else changes.

Verdict: ship Approach B uncalibrated, with --by-payer as the recommended runtime mode. Calibration wrappers (split-conformal, isotonic-PIT) stay in the codebase for ARIMA/Prophet but neither A nor B benefits.

§ 4The lag curve

For each payer, the lag distribution is the PDF of days from claim submission to first remit, conditioned on eventual payment. Two empirical realities shape the choice of estimator:

Right-censoring. At any origin t, claims billed near t have not been paid yet — they are not denied, they are not slow, they are simply unobserved. Treating them as missing biases the fit toward fast claims.
Heavy right tail. The 90th-percentile lag is where bonus-pool breach risk concentrates. A model that fits the body well but not the tail will look good on MAPE and fail on breach_rate_p10.

Estimator

Hybrid: Kaplan–Meier non-parametric body up to the 0.8 KM-quantile, stitched onto a LogNormal tail fit by censored MLE. KM is unbiased through the body without assuming a distributional family; the LogNormal extrapolates past the largest observed lag, which is exactly where breach risk lives.

L(μ, σ) = Σ_uncensored log f(x_i; μ, σ)
       + Σ_censored   log S(T_j; μ, σ)

f is the LogNormal PDF; S = 1 - F is the survival function. The censored-MLE log-likelihood reduces to an ordinary LogNormal MLE when no observations are censored.

Empirical-Bayes shrinkage

For payers with few events, the MLE overfits the small sample. The fit is shrunk toward the adjudication_params.yaml prior with weight w = n_p / (n_p + 50) — MLE dominates above ~150 events, prior dominates below ~10. For n_p < 30 a pure prior is used. The shrinkage target is the cited per-payer YAML, which is itself sourced from Premier 2024, KFF 2024, and TDI prompt-pay statutes.

§ 5The convolution

The composer (models/timesfm_b.py) takes a per-period quantile billing forecast and produces a per-period quantile realized forecast in three steps:

Sample billing draws. For each forecast period k, sample 5,000 billing values via inverse-CDF interpolation on the TimesFM quantile array, with log-linear extrapolation outside [q_min, q_max].
Split by payer mix. Each draw is split across the six payer categories using the configured mix (HRSA UDS Texas Table 4 by default).
Lag-shift each payer's dollars. Sample a lag from that payer's curve for every dollar; accumulate the dollar into the period it lands in. Apply the deterministic realization factor (1 - denial_p) × E[paid-claim realization_p].

Carryforward. Pre-origin outstanding dollars (claims already billed but not yet paid) are sampled separately, with the lag conditioned on lag > T_k via rejection sampling, so the pre-origin tail lands in the correct forecast periods. This is load-bearing for early-horizon accuracy — without it, the first 14 days of any forecast underbills systematically.

Why deterministic realization & denial

The Beta realization variance and the binary denial draw contribute small tail width relative to the lag and billing-quantile draws when aggregated to a daily/weekly period. Using E[paid-claim realization] and E[deny] = denial_p as the per-dollar realization factor keeps the convolver fast (~5,000 trials × 6 payers × 30-day horizon in <100 ms) without measurable loss of fidelity on the Synthea evaluation set. This is one of the levers that tightens further if real ERA data shows tail under-dispersion.

§ 6Calibration & evaluation

Every model is evaluated identically: rolling origin, with an as_of leakage mask that hides any remit date later than the origin from the training history. Two quantitative metrics, one product metric:

Pinball loss at q=0.10 — sharpness-aware proper score at the bonus-relevant quantile. Lower is sharper at equal coverage.
Coverage at q=0.90 — fraction of held-out actuals below the p90 forecast. Diagnostic, not load-bearing.
breach_rate_p10 — the product threshold: the empirical fraction of held-out actuals below the p10 forecast. Target ≤ 0.10. Above 0.10 means the clinic over-allocates bonus pool more often than the model claims; well below 0.10 means the clinic is sandbagging bonuses (leaving money on the table for staff).

PIT histograms

PIT histograms binned at 4 quantile levels (below-p10, p10–p50, p50–p90, above-p90) collapse to four atoms, so the smoothness test is the wrong diagnostic at this granularity. The legibility test is the proportion in the edge bin — the fraction of held-out actuals that fall below the p10. That's the breach_rate_p10. Approach B's per-payer evaluation shows 0.0196 in the below-p10 row bin and 0.0000 at the summed-window bonus-pool surface — conservative but operationally safe.

Calibration wrappers

Two post-hoc calibration wrappers sit in the evaluation harness: split-conformal (residual-based) and isotonic-PIT (monotone re-mapping of nominal to empirical CDF). Both remain useful for baseline diagnostics, but neither is the production mode. On the regenerated dense-grid matrix, split-conformal pushes TimesFM-B above the row-level threshold (0.0944 → 0.1060) while window breach remains 0.0; production therefore stays uncalibrated.

§ 7What would tighten the model

The current accuracy claim is bounded to directionally useful for cash-flow risk management against the configured payer mix. Three upgrades, in priority order:

Partner-clinic 835 ERA feed under BAA. Replaces the post-generation adjudication injector with measured per-payer lag and realization-rate distributions. Single highest-leverage upgrade. Drops the YAML priors as the EB shrinkage target and substitutes them with empirical curves. Also unlocks multi-remit support (initial pay → recoupment → secondary), currently a structural gap documented in the lag-curve loader's TODO.
MGMA DataDive Texas/Gulf-Coast cut. Calibrates non-FQHC primary-care payer mix that HRSA UDS doesn't cover. Useful when the deployment context shifts from FQHC to independent practice.
Athenahealth Network Insights or equivalent. Measured per-payer adjudication-delay percentiles to replace the statutory-floor seeding of the YAML priors.

One known gap is deliberate: Synthea's eligibility-driven payer mix differs structurally from HRSA UDS Texas FQHC Table 4 (Synthea-heavy Medicare/commercial; FQHC-heavy Medicaid/uninsured). Closing it requires demographic re-weighting, deferred. Per-payer behavioral fidelity (denial / lag / realization) — which drives the p10/p90 width that the bonus-pool decision depends on — is intact.