0
Model
Same hybrid stack across every benchmark — no per-dataset retuning.
0
Model
Same hybrid stack across every benchmark — no per-dataset retuning.
0
Public benchmarks
Severson primary + secondary, CALCE CS2, NASA PCoE, Oxford, BatteryLife Na-ion.
0
Research labs
MIT-Stanford-SLAC, University of Maryland, NASA Ames, University of Oxford, HKUST-GZ.
0
Battery chemistries
LFP, LCO, NMC, and Na-ion — four chemistries, same hybrid model.
Public benchmark results
Same hybrid stack across LFP, LCO, NMC, and Na-ion — the deployment story plant teams care about, proven on the field's gold standard public datasets. Cycle-life prediction reported in median absolute percent error; cells-trained, cells-tested as documented in each paper. No per-dataset retuning between runs.
4.3%
median percent error
n=31 cells
Protocol
18650 format · 3 charge protocols · LOO-CV across cells.
First Na-ion public benchmark. Best single result across six benchmarks. Same hybrid stack, no per-dataset retuning.
8.9%
median percent error
10.2% mean · n=42 cells
Protocol
Pooled batches 1+2 — alternating-cell train/test split.
Outperforms Severson 2019 (Nature Energy) published 9.1% on the primary split.
5.0%
median percent error
5.4% mean · n=6 cells
Protocol
Leave-one-out cross-validation across cells.
Cross-chemistry generalization — same model trained on LFP performs on LCO without retuning.
7.8%
median percent error
n=8 cells
Protocol
Drive-cycle protocol · LOO-CV across cells.
NMC pouch chemistry — the cathode running in modern EVs. Same hybrid model, no retuning. Oxford-born, Oxford-validated.
8.7%
median percent error
13.1% mean · n=4 cells
Protocol
Leave-one-out cross-validation across cells.
Second LCO benchmark, second institution — same hybrid model, no retuning. Small dataset noted.
11.9%
median percent error
12.2% mean · n=40 cells
Protocol
Held-out batch 3 — novel charge protocols the model never saw.
Harder generalization regime. Severson 2019 reaches 8.6% with the full feature set on this split — gap reported transparently.
Why generic AI doesn't work on batteries
Every plant team gets the same vendor question: 'why not just use generic AI?' Here's the answer in median absolute percent error — Chronos zero-shot (AWS, the leading time-series foundation model) lands at 400–760% on the same data Lychee runs at 5–12%. Same hybrid across every row, no per-dataset retuning.
5–12%
Lychee hybrid · median MAPE
Across six benchmarks. Single-digit on all six.
400–760%
Chronos zero-shot · MAPE
Saturates on three of the four datasets where Chronos was tested — per-cycle drop below its forecast precision.
30–150×
Tighter than Chronos
On three of four Li-ion datasets where Chronos was tested. Same hybrid across all six benchmarks.
| Dataset | Lychee hybrid | Physics-only baseline | Chronos zero-shot (AWS) |
|---|---|---|---|
BatteryLife Na-ion Na-ion · HKUST-GZ · n=31 · 3 protocols · LOO-CV | 4.3% | — | — not tested† |
Severson primary LFP · MIT-Stanford-SLAC · n=42 · pooled b1+b2 | 8.9% 10.2% mean | 24.7% | 730% saturates |
Severson secondary LFP · MIT-Stanford-SLAC · n=40 · novel protocols | 11.9% 12.2% mean | 42.3% | 409% saturates |
CALCE CS2 LCO · University of Maryland · n=6 · LOO-CV | 5.0% 5.4% mean | 129.7% | 763% saturates |
Oxford NMC · University of Oxford · n=8 · drive-cycle LOO-CV | 7.8% | — | — not tested** |
NASA PCoE LCO · NASA Ames · n=4 · LOO-CV | 8.7% 13.1% mean | 10.6% | 723% mean · 6.5% median* |
Reference: Severson 2019 (Nature Energy) reports 9.1% on the primary split / 8.6% on the secondary split with the full feature set — Lychee outperforms on the primary (8.9% vs 9.1%). *NASA PCoE (n=4): Chronos's median (6.5%) lands close to Lychee's due to small-sample variance, while its mean explodes to 723% — Lychee's mean (13.1%) is two orders of magnitude tighter on the same data. **Oxford NMC (n=8, drive-cycle protocol): Lychee result reported here; physics-only baseline and Chronos zero-shot not run in this benchmark cycle. Saturates = Chronos's outputs cluster around a flat prediction in the slow-decay regime, a known limitation of generic time-series foundation models on monotonic curves.
Lychee's edge
Every plant team gets pitched the same tools — Chronos, GPT-style forecasters, off-the-shelf time-series models. They saturate at 400–760% error on slow battery degradation because per-cycle capacity drop falls below their forecast precision. Lychee uses the same time-series architecture class but grounds it in battery physics — single-digit median APE on all six public benchmarks (LFP, LCO, NMC, Na-ion across five labs) and 30–150× tighter than Chronos zero-shot (AWS) where tested.
Lychee generalizes across chemistries, labs, and protocols without retuning — same model, every benchmark. The physics + ML hybrid built for the regime where neither approach alone is enough.
Real plant data is mostly cells that haven't failed yet. Lychee uses Weibull accelerated-failure-time regression with right-censoring — every unfailed cell trains the model. At 46% of training cells censored, the survival model holds 8.9% median APE while a point-estimate model degrades to 14.2%. Every flag ships with two confidence intervals — distribution-free conformal and Weibull-parametric, both validated at 90–95% empirical coverage on a 42-cell holdout against a 90% nominal target. A flag without a confidence interval is an opinion. Every Lychee prediction ships with one.
Cross-protocol calibration
We ran a cross-population study on the Na-ion benchmark to ask: does adding protocol metadata as a feature close the gap between coherent training data and real-plant data? The answer is partly. Protocol metadata cuts out-of-distribution prediction error nearly in half — but mixing populations still degrades main-cohort accuracy by 3×.
The right deployment architecture is per-protocol modeling with out-of-distribution detection, not protocol-as-a-feature. Real plant data spans many protocols simultaneously — the pilot calibrates exactly this, and a calibrated system surfaces unfamiliar cells as flagged for manual review rather than confident wrong predictions.
This is the kind of finding the academic-benchmark literature doesn't report. We do — because pilots stand or fall on it.
Reported transparently
Severson secondary split (held-out novel protocols) reaches 11.9% median in our run; published Severson 2019 reports 8.6% on the same split with the full feature set. CALCE (n=6), NASA PCoE (n=4), and Oxford NMC (n=8) are small public datasets — reported transparently. BatteryLife Na-ion (n=31, HKUST-GZ): three charge protocols, LOO-CV; physics-only baseline and Chronos not run on this benchmark. Cycle-life prediction is the validation regime here; production defect detection is validated separately on customer pilots.