Installation

Five minutes to a passing test suite.

Cost Predictor is pure Python. Pick your OS below for the path of least resistance. The smoke run takes roughly seven seconds; the full Synthea generation path is opt-in.

§Prerequisites — common to all platforms

  • Python 3.10+ — the runner is tested on 3.14; any 3.10+ should work.
  • git — the TimesFM install pulls from a GitHub source URL.
  • ~5 GB free disk — venv + the TimesFM 200M-parameter checkpoint (~800 MB) + Synthea output if you generate it.
  • Optional: JDK 17+ if you intend to regenerate the Synthea population (Temurin 25 LTS is what's pinned upstream).

1. Clone

git clone https://github.com/GrayBeamTechnology/cost-predictor.git
cd cost-predictor

2. Virtual environment

python3 -m venv venv
source venv/bin/activate
python -m pip install --upgrade pip

3. Install dependencies

Install the base requirements first, then TimesFM from source. On Apple Silicon (M-series), TimesFM's PyPI package is not compatible — pull from GitHub.

pip install -r requirements.txt
pip install "timesfm[torch] @ git+https://github.com/google-research/timesfm.git"
Why source? The PyPI timesfm package caps at Python <3.12 and ships a JAX/lingvo dependency tree that won't resolve on ARM macOS. The GitHub HEAD pulls timesfm 2.0+ with lazy JAX imports — only the covariate path needs JAX, and we don't use it.

4. Run the test suite

python -m pytest cost_predictor

Expect 158 tests passing in ~14 seconds.

5. Smoke-run the experiment runner

python -m cost_predictor.experiments.runner \
    --dataset fixture --model seasonal_naive --fast

Writes a run dir under runs/<UTC>_<gitsha7>/ with config.json, metrics.json, forecasts.parquet, and REPORT.md. The full Synthea generation path is documented at the bottom of this page; it requires the JDK toolchain.

6. Read the bonus-pool summary

python main.py --latest --reference-monthly 5000000

Loads the most recent run and prints the one-page sizing summary. Pass --bonus-rate 0.05 to override the default 5%, or --run-id <run> to load a specific run.

1. Clone

git clone https://github.com/GrayBeamTechnology/cost-predictor.git
cd cost-predictor

2. Virtual environment

python3 -m venv venv
source venv/bin/activate
python -m pip install --upgrade pip

3. Install dependencies

pip install -r requirements.txt
pip install "timesfm[torch] @ git+https://github.com/google-research/timesfm.git"

On x86_64 Linux, the PyPI timesfm wheel works for Python 3.10/3.11. The GitHub source path is recommended for Python 3.12+ (which is most current distros), and is what the project tracks.

4. Run the test suite

python -m pytest cost_predictor

5. Smoke-run the experiment runner

python -m cost_predictor.experiments.runner \
    --dataset fixture --model seasonal_naive --fast

6. Read the bonus-pool summary

python main.py --latest --reference-monthly 5000000
Recommended: WSL2. Native Python on Windows works for the runner and tests, but the heavy ML deps (TimesFM, torch) install with materially less friction inside WSL2 — and the rest of the project assumes a POSIX-ish shell. If you have a corporate constraint that rules out WSL2, the native instructions below are tested and work; expect a slightly bumpier ride on the TimesFM line.

1. Clone (PowerShell)

git clone https://github.com/GrayBeamTechnology/cost-predictor.git
cd cost-predictor

2. Virtual environment

python -m venv venv
venv\Scripts\activate
python -m pip install --upgrade pip

3. Install dependencies

pip install -r requirements.txt
pip install "timesfm[torch] @ git+https://github.com/google-research/timesfm.git"

If pip install fails on the TimesFM line, you're hit by the same ARM/JAX dependency knot Apple Silicon users encounter — easiest fix is to do this part inside WSL2.

4. Run the test suite

python -m pytest cost_predictor

5. Smoke-run the experiment runner

python -m cost_predictor.experiments.runner ^
    --dataset fixture --model seasonal_naive --fast

6. Read the bonus-pool summary

python main.py --latest --reference-monthly 5000000

WSL2 fallback

If you've installed WSL2 with an Ubuntu distribution, switch to the Linux tab above — the instructions are identical from there.

§Optional · Synthea generation

The smoke run uses a small synthetic fixture. To run against the full Synthea population the project was calibrated on, you will need a JDK and a Synthea checkout. This is not required for the test suite or the bonus-pool CLI.

1. Install a JDK (17+)

On macOS we use mise; on Linux any system OpenJDK works. Pin Temurin 25 LTS to match upstream:

# macOS / Linux with mise
mise use --global java@temurin-25

2. Clone Synthea

cd ~
git clone https://github.com/synthetichealth/synthea.git
cd synthea

Apply the project's Texas Medicaid MCO authoring patches to payers/insurance_companies.csv and insurance_plans.csv — see the project's docs/synthea-mco-encoding.md for the rows to add (IDs 110000–150000 and Plan IDs 110001–150001).

3. Generate

./run_synthea -p 1000 -s 42 \
    --exporter.csv.export true \
    --generate.only_alive_patients true \
    "Texas" "Houston"

Outputs CSVs under output/csv/ — copy these to cost-predictor/data/synthea/v1/csv/ (gitignored) and the runner will pick them up via --dataset synthea.

4. Run against Synthea

python -m cost_predictor.experiments.runner \
    --dataset synthea --model timesfm_b --by-payer --fast

§Verifying the install

After the smoke run, two artifacts confirm the installation is wired correctly:

  1. runs/<ts>_<sha>/REPORT.md — a generated per-run report. Open it; the headline matrix should show n_eval_origins and a sub-0.15 breach_rate_p10 on the seasonal_naive fixture.
  2. python main.py --latest — prints the bonus-pool summary block. The dollar translation rescales to whatever you pass to --reference-monthly.

If the test suite passes but the runner errors out on timesfm, the TimesFM install hit the JAX dependency tree — your fix is to ensure the GitHub source install succeeded (pip show timesfm should return version 2.0.0+).