Here’s the short answer: I’d use a smartwatch VO2 max score for trends, but I’d trust a lab test for the actual number. A lab CPET measures oxygen use directly and is usually within about ±2% to 3%. Wearables estimate VO2 max from heart rate, pace, and profile data, and the gap can be large enough to matter.
If I boil the article down, these are the main points:
VO2 max still matters because it points to aerobic fitness and can help track change with age. But the number only makes sense if I know how it was produced: direct measurement vs algorithm estimate.

| Method | How it gets the number | What I’d use it for | Main limit |
|---|---|---|---|
| Apple Watch | Heart rate + outdoor pace + profile data | General trend tracking | Often reads low |
| Garmin | Heart rate + pace/power + workout data | Training trends | Less steady at high fitness levels |
| Fitbit | Resting heart rate + profile data | Casual fitness awareness | Less sensitive to fitness change |
| Lab CPET | Direct gas analysis during exercise | Exact measurement, health and performance use | Takes time, cost, and max effort |
So if I want direction, a watch is fine. If I want measurement, I’d get the lab test.

Fitness trackers don't measure oxygen use the way a lab test does. Instead, they estimate VO2 max from signals like heart rate, pace, power, and your profile data. Apple, Garmin, and Fitbit each use a different mix, which is why the numbers can vary from one device to another.
Apple Watch labels this estimate Cardio Fitness. It uses optical heart rate, GPS pace, and personal details like age, sex, height, and weight. To get a reading, you usually need an outdoor walk, run, or hike that pushes your heart rate to about 30% above resting. Treadmill workouts usually don't count. Apple also averages results over 30 days, so if your fitness has changed lately, the number may lag behind.
A 2025 validation study found that Apple Watch underestimated VO2 max by an average of 6.07 mL/kg/min, with a mean absolute percentage error of 13.31%. That setup tends to work best during steady outdoor exercise. Change the conditions, though, and the estimate can drift.
Garmin uses Firstbeat Analytics. The system looks at how heart rate lines up with running pace or cycling power, while also factoring in GPS speed, elevation, and ambient temperature. It usually needs at least 10 minutes of steady outdoor running above 70% of maximum heart rate before it can produce a reading.
That can work well, but there are limits. In highly trained athletes and at the far ends of the fitness range, error rates can climb to 9.4% to 10.4%. So even when the watch seems precise, the estimate can still swing from day to day.

Fitbit reports VO2 max as Cardio Fitness Score. It leans mostly on resting heart rate, along with age, sex, weight, and GPS pace from outdoor runs. Since it doesn't need a set workout to make an estimate, it's easier for more people to use.
The tradeoff is simple: resting heart rate is a weaker signal than exercise data when you're trying to estimate VO2 max. That makes Fitbit's score less sensitive to actual fitness shifts.
One of the biggest trouble spots is noisy heart rate data. If the watch is loose, the skin contact is poor, or the sensor struggles for any reason, the estimate can move even when your fitness hasn't.
Common issues include:
So if your wearable VO2 max jumps or drops once, don't panic. A one-off change often says more about device noise than your body. That's why these numbers are best used for trend tracking, not as a replacement for lab testing.

When numbers from a watch or app start shaping your training or health choices, CPET is the baseline test.
A Cardiopulmonary Exercise Test (CPET) is the reference standard for VO2 max measurement because it directly measures the volume of oxygen inhaled and carbon dioxide exhaled using a gas-analysis system and a mask.
In plain English, a lab CPET measures VO2 max directly. It tracks how much oxygen your body uses and how much carbon dioxide you produce while you exercise. You begin at a comfortable pace on a treadmill or cycle ergometer, then the workload goes up every 1 to 3 minutes until you can't continue. A plateau in oxygen use helps confirm maximal effort.
The exercise part usually lasts 8 to 14 minutes, and a well-run test is accurate to within ±1–2 mL/kg/min.
That peak VO2 max number matters, of course. But the test does more than hand you one score. CPET also maps your ventilatory thresholds (VT1 and VT2), which are the points where your breathing changes and your body shifts between aerobic and anaerobic metabolism. For many people, those thresholds matter more for training because they set your effort zones.
There’s also a sport-specific piece here. Treadmill VO2 max results usually come in 5–10% higher than cycling results because running uses more muscle mass. So if you’re a cyclist, runner, or triathlete, the test setup should match how you train.
CPET gives you more than a single fitness score. Benchmark Body Metrics can pair CPET with body-composition, resting-metabolic-rate, and blood testing to build a broader health picture.
It also captures RER and substrate-oxidation data, which show when your body leans more on fat versus carbohydrate during exercise. Wearables can't provide that kind of detail. And that’s the main reason CPET is used as the benchmark for comparison.
Smartwatch vs Lab VO2 Max: Accuracy Comparison Guide
The short answer is simple: wearables are good for trends, not exact numbers.
A watch can help you see whether your fitness seems to be moving up, down, or staying flat. But the error range is wide enough that you shouldn't treat the result like a clinical measurement.
Apple Watch Series 9 and Ultra 2 tend to come in low. On average, they underestimate VO2 max by 6.07 mL/kg/min, with a Mean Absolute Percentage Error (MAPE) of 13.31%.
Garmin devices usually do better in moderately trained users, with MAPE values around 2.8% to 4.1%. But that edge shrinks in highly trained athletes. When lab-tested VO2 max goes above 59.8 mL/kg/min, MAPE climbs to 9.4% to 10.4%, and underestimation can reach 6.3 mL/kg/min.
Fitbit-style resting-heart-rate estimates are less dependable. These methods show a pooled overestimation bias of 2.17 mL/kg/min, with much wider limits of agreement of -13.07 to 17.41 mL/kg/min.
So the main issue isn't whether wearables can estimate VO2 max. They can. The issue is how far the estimate may drift from a lab result.
Table 1: VO2 max methods side by side
| Method | Data Source | Estimated or Measured | Typical Error Pattern | Best Use Case |
|---|---|---|---|---|
| Apple Watch (Series 9/Ultra 2) | Heart rate, GPS pace, age, sex | Algorithmic estimate | Underestimates by about 6.07 mL/kg/min on average; MAPE 13.31% | General health and trend tracking |
| Garmin (Forerunner/Fenix) | Heart rate, GPS speed/distance, age, sex | Algorithmic estimate | MAPE about 2.8%–4.1% in moderately trained users; higher error in highly trained athletes | Athletic training and performance trends |
| Fitbit | Resting heart rate, user profile | Algorithmic estimate | Resting-heart-rate-based estimate; widest error range | Casual health awareness |
| Benchmark Body Metrics CPET | Direct gas analysis (O₂/CO₂) | Directly measured | Lab reference method; technical error typically under 3% | Clinical diagnosis, elite performance, medical decisions |
That gap is what separates a number that's good enough for tracking from one you'd trust for a medical call or high-stakes performance testing.
Wearable VO2 max works best for trend tracking in healthy, recreationally active users. It should not be used on its own for medical decisions or elite performance calls.
These devices tend to work best when the estimate comes from exercise-based data, especially with a strong GPS signal and a steady heart-rate signal. If your number moves in a steady direction over several weeks or months, that pattern usually tells you more than a single reading ever could.
Where things break down is when precision matters. A watch estimate is not enough for:
In those cases, relying on wearable data alone is risky.
The same caution applies to older adults and people with cardiovascular conditions. In those groups, the algorithms are less predictable.
The practical difference isn't just about accuracy. It's about what each method can actually do for you.
Table 2: Pros, cons, and best fit
| Feature | Wearables (Apple/Garmin/Fitbit) | Lab CPET at Benchmark Body Metrics |
|---|---|---|
| Accuracy | Moderate; error varies by device and fitness level | High - lab reference method, typically within about 2% to 3% technical error |
| Convenience | High - automatic, continuous, no appointment needed | Lower - requires scheduling and a supervised maximal effort test |
| Depth of insight | VO2 max estimate only | VO2 max plus thresholds and substrate-use data |
| Best fit | Trend tracking for healthy, recreationally active adults | Clinical evaluation, surgical risk assessment, performance diagnostics, unexplained symptoms |
The better question isn't whether your watch nails the exact number. It's whether the trend tells you something useful.
Across Apple Watch, Garmin, and Fitbit, the basic idea is the same: these devices work best as trend tools first and measurement tools second. Small ups and downs often come from device noise, not a change in fitness. So focus on the pattern over several months, not the little swings from one week to the next.
Your setup matters too. If your profile is out of date, your estimate will be out of date too. Keep it current, especially your max heart rate.
Once your watch number is useful as a trend line, the next step is figuring out when exactness matters.
A wearable estimate isn't enough when the number needs to be precise. That's when a lab test makes sense: for exact training zones, figuring out a plateau, or setting a medical baseline. This matters even more for adults over 45 with cardiovascular risk factors.
Here's why that matters. Laboratory testing is accurate to within 1–2 mL/kg/min, while wearable estimates are often off by about 5–8+ mL/kg/min.
If you're serious about using the data to guide training or health choices, periodic lab testing can help calibrate what your watch is telling you.
The choice comes down to purpose: tracking or measurement. Use wearables for direction. Use a lab test for precision.
Treat watch VO2 max as a long-term trend, not something to judge day by day. These estimates often rely on rolling averages, so small daily shifts can give the wrong impression.
A better way to use the data is to compare your trend over weeks or months. If your training changes or you want to check your progress, looking at it every 6 to 8 weeks makes more sense.
Yes. Treadmill workouts can affect your VO2 max estimate, but the result may be less accurate.
Many smartwatches rely on GPS data from outdoor runs, so treadmill sessions are often left out. And when a watch does count them, the estimate can get skewed. That's because treadmill runs miss outside factors like wind resistance and GPS-checked pace. On top of that, a set treadmill speed doesn't always line up with your heart rate the same way an outdoor run does.
A lab VO2 max test makes the most sense for athletes who want objective numbers to dial in training or track strict performance goals. It also matters in clinical settings, because consumer wearables aren't reliable enough for cardiovascular risk screening or medical diagnosis.
That said, it isn't a fit for everyone. The test puts a heavy load on the heart and lungs, and people over 45 or those with risk factors may need professional supervision. Results can also come in lower than they should if someone isn't trained enough to push to true exhaustion.