Field-leading results

Detect battery defects from the first 100 cycles — proven where generic AI flatlines.

What plant teams get from this validation: ranked-risk on real cells weeks before downstream QC confirms defects, and generalization across every chemistry without retraining. Same hybrid model across six public datasets, five labs, all four major chemistries (LFP, LCO, NMC, Na-ion) — single-digit median APE on all six benchmarks, and 30–150× tighter than Chronos zero-shot (AWS) where tested.

Model

Same hybrid stack across every benchmark — no per-dataset retuning.

Public benchmarks

Severson primary + secondary, CALCE CS2, NASA PCoE, Oxford, BatteryLife Na-ion.

Research labs

MIT-Stanford-SLAC, University of Maryland, NASA Ames, University of Oxford, HKUST-GZ.

Battery chemistries

LFP, LCO, NMC, and Na-ion — four chemistries, same hybrid model.

Public benchmark results

One hybrid. Every chemistry. No retraining per line.

Same hybrid stack across LFP, LCO, NMC, and Na-ion — the deployment story plant teams care about, proven on the field's gold standard public datasets. Cycle-life prediction reported in median absolute percent error; cells-trained, cells-tested as documented in each paper. No per-dataset retuning between runs.

BatteryLife Na-ionNa-ion · HKUST-GZ

4.3%

median percent error

n=31 cells

Protocol

18650 format · 3 charge protocols · LOO-CV across cells.

First Na-ion public benchmark. Best single result across six benchmarks. Same hybrid stack, no per-dataset retuning.

Severson primaryLFP · MIT · Stanford · SLAC

8.9%

median percent error

10.2% mean · n=42 cells

Protocol

Pooled batches 1+2 — alternating-cell train/test split.

Outperforms Severson 2019 (Nature Energy) published 9.1% on the primary split.

CALCE CS2LCO · University of Maryland

5.0%

median percent error

5.4% mean · n=6 cells

Protocol

Leave-one-out cross-validation across cells.

Cross-chemistry generalization — same model trained on LFP performs on LCO without retuning.

OxfordNMC · University of Oxford

7.8%

median percent error

n=8 cells

Protocol

Drive-cycle protocol · LOO-CV across cells.

NMC pouch chemistry — the cathode running in modern EVs. Same hybrid model, no retuning. Oxford-born, Oxford-validated.

NASA PCoELCO · NASA Ames

8.7%

median percent error

13.1% mean · n=4 cells

Protocol

Leave-one-out cross-validation across cells.

Second LCO benchmark, second institution — same hybrid model, no retuning. Small dataset noted.

Severson secondaryLFP · MIT · Stanford · SLAC

11.9%

median percent error

12.2% mean · n=40 cells

Protocol

Held-out batch 3 — novel charge protocols the model never saw.

Harder generalization regime. Severson 2019 reaches 8.6% with the full feature set on this split — gap reported transparently.

Why generic AI doesn't work on batteries

Lychee runs single-digit. Chronos zero-shot saturates.

Every plant team gets the same vendor question: 'why not just use generic AI?' Here's the answer in median absolute percent error — Chronos zero-shot (AWS, the leading time-series foundation model) lands at 400–760% on the same data Lychee runs at 5–12%. Same hybrid across every row, no per-dataset retuning.

5–12%

Lychee hybrid · median MAPE

Across six benchmarks. Single-digit on all six.

400–760%

Chronos zero-shot · MAPE

Saturates on three of the four datasets where Chronos was tested — per-cycle drop below its forecast precision.

30–150×

Tighter than Chronos

On three of four Li-ion datasets where Chronos was tested. Same hybrid across all six benchmarks.

Dataset	Lychee hybrid	Physics-only baseline	Chronos zero-shot (AWS)
BatteryLife Na-ion Na-ion · HKUST-GZ · n=31 · 3 protocols · LOO-CV	4.3%	—	— not tested†
Severson primary LFP · MIT-Stanford-SLAC · n=42 · pooled b1+b2	8.9% 10.2% mean	24.7%	730% saturates
Severson secondary LFP · MIT-Stanford-SLAC · n=40 · novel protocols	11.9% 12.2% mean	42.3%	409% saturates
CALCE CS2 LCO · University of Maryland · n=6 · LOO-CV	5.0% 5.4% mean	129.7%	763% saturates
Oxford NMC · University of Oxford · n=8 · drive-cycle LOO-CV	7.8%	—	— not tested**
NASA PCoE LCO · NASA Ames · n=4 · LOO-CV	8.7% 13.1% mean	10.6%	723% mean · 6.5% median*

Dataset

Lychee hybrid

Physics-only baseline

Chronos zero-shot (AWS)

BatteryLife Na-ion

Na-ion · HKUST-GZ · n=31 · 3 protocols · LOO-CV

4.3%

—

not tested†

Severson primary

LFP · MIT-Stanford-SLAC · n=42 · pooled b1+b2

8.9%

10.2% mean

24.7%

730%

saturates

Severson secondary

LFP · MIT-Stanford-SLAC · n=40 · novel protocols

11.9%

12.2% mean

42.3%

409%

saturates

CALCE CS2

LCO · University of Maryland · n=6 · LOO-CV

5.0%

5.4% mean

129.7%

763%

saturates

Oxford

NMC · University of Oxford · n=8 · drive-cycle LOO-CV

7.8%

—

not tested**

NASA PCoE

LCO · NASA Ames · n=4 · LOO-CV

8.7%

13.1% mean

10.6%

723%

mean · 6.5% median*

Reference: Severson 2019 (Nature Energy) reports 9.1% on the primary split / 8.6% on the secondary split with the full feature set — Lychee outperforms on the primary (8.9% vs 9.1%). *NASA PCoE (n=4): Chronos's median (6.5%) lands close to Lychee's due to small-sample variance, while its mean explodes to 723% — Lychee's mean (13.1%) is two orders of magnitude tighter on the same data. **Oxford NMC (n=8, drive-cycle protocol): Lychee result reported here; physics-only baseline and Chronos zero-shot not run in this benchmark cycle. Saturates = Chronos's outputs cluster around a flat prediction in the slow-decay regime, a known limitation of generic time-series foundation models on monotonic curves.

Lychee's edge

Lychee's hybrid AI generalizes where generic foundation models flatline.

Every plant team gets pitched the same tools — Chronos, GPT-style forecasters, off-the-shelf time-series models. They saturate at 400–760% error on slow battery degradation because per-cycle capacity drop falls below their forecast precision. Lychee uses the same time-series architecture class but grounds it in battery physics — single-digit median APE on all six public benchmarks (LFP, LCO, NMC, Na-ion across five labs) and 30–150× tighter than Chronos zero-shot (AWS) where tested.

Lychee generalizes across chemistries, labs, and protocols without retuning — same model, every benchmark. The physics + ML hybrid built for the regime where neither approach alone is enough.

Real plant data is mostly cells that haven't failed yet. Lychee uses Weibull accelerated-failure-time regression with right-censoring — every unfailed cell trains the model. At 46% of training cells censored, the survival model holds 8.9% median APE while a point-estimate model degrades to 14.2%. Every flag ships with two confidence intervals — distribution-free conformal and Weibull-parametric, both validated at 90–95% empirical coverage on a 42-cell holdout against a 90% nominal target. A flag without a confidence interval is an opinion. Every Lychee prediction ships with one.

Cross-protocol calibration

Public benchmarks underestimate real-plant prediction error.

We ran a cross-population study on the Na-ion benchmark to ask: does adding protocol metadata as a feature close the gap between coherent training data and real-plant data? The answer is partly. Protocol metadata cuts out-of-distribution prediction error nearly in half — but mixing populations still degrades main-cohort accuracy by 3×.

The right deployment architecture is per-protocol modeling with out-of-distribution detection, not protocol-as-a-feature. Real plant data spans many protocols simultaneously — the pilot calibrates exactly this, and a calibrated system surfaces unfamiliar cells as flagged for manual review rather than confident wrong predictions.

This is the kind of finding the academic-benchmark literature doesn't report. We do — because pilots stand or fall on it.

Reported transparently

Severson secondary split (held-out novel protocols) reaches 11.9% median in our run; published Severson 2019 reports 8.6% on the same split with the full feature set. CALCE (n=6), NASA PCoE (n=4), and Oxford NMC (n=8) are small public datasets — reported transparently. BatteryLife Na-ion (n=31, HKUST-GZ): three charge protocols, LOO-CV; physics-only baseline and Chronos not run on this benchmark. Cycle-life prediction is the validation regime here; production defect detection is validated separately on customer pilots.