Methodology
From billed dollars to a defensible bonus check.
The full pipeline, in seven sections. No hand-waving — every constant on this page traces back to a public source or a unit test.
§ 1The problem
Healthcare revenue cycle management splits billed charges from realized cash by three independent mechanisms: contractual fee-schedule adjustments (Medicare PFS, Medicaid programs, commercial contracts), initial denials, and adjudication lag. Each varies by payer; together they make a single “collection rate” misleading.
For a Harris County primary-care clinic, three numbers structure the gap:
- Texas Medicaid pays primary care at 0.52× Medicare (Skopec, Pugazhendhi, Zuckerman, Health Affairs 2024). That alone makes the realization rate of a Medicaid-heavy clinic materially below a commercial-heavy one of equivalent volume.
- Initial denial rates run 8–17% by payer (Premier 2024 hospital survey; KFF 2024). About half are reversed on appeal — terminal denial 4–8%.
- Adjudication lag spans 14 to 90+ days. Medicare's clean-claim 14-day floor and Texas TDI's 30/45-day prompt-pay rules set lower bounds; the right tail is the operationally costly part.
Sizing a bonus pool against billed revenue is, mechanically, a short-term loan from the clinic to its physicians — payable at a haircut and a delay neither party has priced.
Cost Predictor's product question is sharply scoped: of one dollar billed today against the clinic's actual payer mix, how many cents will arrive — and when? The answer is a quantile forecast, not a point estimate. That distinction is load-bearing.
§ 2Architecture
The pipeline is sequenced by data dependency, not by code modules. Phase 1 produces a daily aggregate-billed series and a per-claim outstanding ledger; Phase 3 forecasts the billed series; Phase 4 convolves through Phase 5's lag curves to produce realized; the bonus-pool calculator operates on the resulting quantiles.
Inputs
In production the ingestion layer reads 837 claim and
835 remittance EDI files (or flattened EHR/PM CSV exports
keyed on Date of Service) and produces a daily aggregate. The
evaluation harness substitutes Synthea v4.0.0 with a Texas-Medicaid
MCO encoding plus a post-generation adjudication injector that
attaches per-payer denial / lag / paid-claim realization draws.
State of Phase 2
The denial classifier (XGBoost / LightGBM, scored before forecasting) is documented in §2.1 of the system design but not yet built. Today the per-payer Bernoulli denial draw inside the Monte-Carlo convolver stands in for the classifier's expected output — a defensible simplification while terminal-denial empirical rates are bracketed at 4–8% and dwarfed by lag and realization-rate uncertainty in the pooled forecast variance.
§ 3Approach A vs Approach B
Two forecasting routes were evaluated head-to-head on the Synthea evaluation set. Both use Google Research's TimesFM 2.5 (200M parameters, native quantile head) as the time-series engine.
- Approach A. Univariate forecast of realized revenue directly. The simpler model: one TimesFM fit, one quantile output, done.
- Approach B. Decompose. Forecast billed revenue with TimesFM, then convolve through per-payer lag curves and a paid-claim realization factor inside a Monte-Carlo simulator to produce realized.
On pooled origin evaluation, A passes the product threshold
(row breach_rate_p10 = 0.0998). When evaluation is
broken out per payer — the mode clinic admins actually need —
A breaches the row-level threshold on two of five payers.
B passes every payer, with zero window-level breach after summed
origin/horizon bonus windows.
| Payer | A · TimesFM | B · TimesFM-decomposed |
|---|---|---|
| Commercial | 0.1004 ⚠ | 0.0802 |
| Medicaid MCO | 0.1093 ⚠ | 0.0000 |
| Medicare Advantage | 0.0585 | 0.0000 |
| Medicare FFS | 0.0835 | 0.0179 |
| Self-pay | 0.0962 | 0.0000 |
| Pooled rows | 0.0896 | 0.0196 |
| Pooled windows | 0.0000 | 0.0000 |
B is structurally legible — it tells admins which payer is driving over-allocation risk — and it inherits the right substitution path: when measured 835 ERA data lands, the YAML priors are replaced in place with measured per-payer lag and realization-rate distributions; nothing else changes.
Verdict: ship Approach B uncalibrated, with
--by-payer as the recommended runtime mode. Calibration
wrappers (split-conformal, isotonic-PIT) stay in the codebase for
ARIMA/Prophet but neither A nor B benefits.
§ 4The lag curve
For each payer, the lag distribution is the PDF of days from claim submission to first remit, conditioned on eventual payment. Two empirical realities shape the choice of estimator:
- Right-censoring. At any origin
t, claims billed nearthave not been paid yet — they are not denied, they are not slow, they are simply unobserved. Treating them as missing biases the fit toward fast claims. - Heavy right tail. The 90th-percentile lag
is where bonus-pool breach risk concentrates. A model that
fits the body well but not the tail will look good on
MAPE and fail on
breach_rate_p10.
Estimator
Hybrid: Kaplan–Meier non-parametric body up to the 0.8 KM-quantile, stitched onto a LogNormal tail fit by censored MLE. KM is unbiased through the body without assuming a distributional family; the LogNormal extrapolates past the largest observed lag, which is exactly where breach risk lives.
L(μ, σ) = Σ_uncensored log f(x_i; μ, σ)
+ Σ_censored log S(T_j; μ, σ)
f is the LogNormal PDF; S = 1 - F is the
survival function. The censored-MLE log-likelihood reduces to
an ordinary LogNormal MLE when no observations are censored.
Empirical-Bayes shrinkage
For payers with few events, the MLE overfits the small sample.
The fit is shrunk toward the adjudication_params.yaml
prior with weight w = n_p / (n_p + 50) — MLE
dominates above ~150 events, prior dominates below ~10. For
n_p < 30 a pure prior is used. The shrinkage
target is the cited per-payer YAML, which is itself sourced
from Premier 2024, KFF 2024, and TDI prompt-pay statutes.
§ 5The convolution
The composer (models/timesfm_b.py) takes a per-period
quantile billing forecast and produces a per-period quantile
realized forecast in three steps:
- Sample billing draws. For each forecast period
k, sample 5,000 billing values via inverse-CDF interpolation on the TimesFM quantile array, with log-linear extrapolation outside[qmin, qmax]. - Split by payer mix. Each draw is split across the six payer categories using the configured mix (HRSA UDS Texas Table 4 by default).
- Lag-shift each payer's dollars. Sample a lag
from that payer's curve for every dollar; accumulate the dollar
into the period it lands in. Apply the deterministic realization
factor
(1 - denial_p) × E[paid-claim realization_p].
Carryforward. Pre-origin outstanding dollars
(claims already billed but not yet paid) are sampled separately,
with the lag conditioned on lag > T_k via rejection
sampling, so the pre-origin tail lands in the correct forecast
periods. This is load-bearing for early-horizon accuracy — without
it, the first 14 days of any forecast underbills systematically.
Why deterministic realization & denial
The Beta realization variance and the binary denial draw contribute small
tail width relative to the lag and billing-quantile draws when
aggregated to a daily/weekly period. Using E[paid-claim realization] and
E[deny] = denial_p as the per-dollar realization factor
keeps the convolver fast (~5,000 trials × 6 payers × 30-day
horizon in <100 ms) without measurable loss of fidelity
on the Synthea evaluation set. This is one of the levers that
tightens further if real ERA data shows tail under-dispersion.
§ 6Calibration & evaluation
Every model is evaluated identically: rolling origin, with an
as_of leakage mask that hides any remit date later
than the origin from the training history. Two quantitative
metrics, one product metric:
- Pinball loss at q=0.10 — sharpness-aware proper score at the bonus-relevant quantile. Lower is sharper at equal coverage.
- Coverage at q=0.90 — fraction of held-out actuals below the p90 forecast. Diagnostic, not load-bearing.
- breach_rate_p10 — the product threshold: the empirical fraction of held-out actuals below the p10 forecast. Target ≤ 0.10. Above 0.10 means the clinic over-allocates bonus pool more often than the model claims; well below 0.10 means the clinic is sandbagging bonuses (leaving money on the table for staff).
PIT histograms
PIT histograms binned at 4 quantile levels (below-p10, p10–p50,
p50–p90, above-p90) collapse to four atoms, so the smoothness
test is the wrong diagnostic at this granularity. The legibility
test is the proportion in the edge bin — the fraction of held-out
actuals that fall below the p10. That's the
breach_rate_p10. Approach B's per-payer
evaluation shows 0.0196 in the
below-p10 row bin and 0.0000 at the
summed-window bonus-pool surface — conservative but operationally safe.
Calibration wrappers
Two post-hoc calibration wrappers sit in the evaluation harness:
split-conformal (residual-based) and
isotonic-PIT (monotone re-mapping of nominal
to empirical CDF). Both remain useful for baseline diagnostics,
but neither is the production mode. On the regenerated dense-grid
matrix, split-conformal pushes TimesFM-B above the row-level
threshold (0.0944 → 0.1060) while window breach remains
0.0; production therefore stays uncalibrated.
§ 7What would tighten the model
The current accuracy claim is bounded to directionally useful for cash-flow risk management against the configured payer mix. Three upgrades, in priority order:
- Partner-clinic 835 ERA feed under BAA. Replaces the post-generation adjudication injector with measured per-payer lag and realization-rate distributions. Single highest-leverage upgrade. Drops the YAML priors as the EB shrinkage target and substitutes them with empirical curves. Also unlocks multi-remit support (initial pay → recoupment → secondary), currently a structural gap documented in the lag-curve loader's TODO.
- MGMA DataDive Texas/Gulf-Coast cut. Calibrates non-FQHC primary-care payer mix that HRSA UDS doesn't cover. Useful when the deployment context shifts from FQHC to independent practice.
- Athenahealth Network Insights or equivalent. Measured per-payer adjudication-delay percentiles to replace the statutory-floor seeding of the YAML priors.
One known gap is deliberate: Synthea's eligibility-driven payer mix differs structurally from HRSA UDS Texas FQHC Table 4 (Synthea-heavy Medicare/commercial; FQHC-heavy Medicaid/uninsured). Closing it requires demographic re-weighting, deferred. Per-payer behavioral fidelity (denial / lag / realization) — which drives the p10/p90 width that the bonus-pool decision depends on — is intact.