June 28, 2026

How do smartwathe and fitness tracker VO2 max values compare to a real test?

Here’s the short answer: I’d use a smartwatch VO2 max score for trends, but I’d trust a lab test for the actual number. A lab CPET measures oxygen use directly and is usually within about ±2% to 3%. Wearables estimate VO2 max from heart rate, pace, and profile data, and the gap can be large enough to matter.

If I boil the article down, these are the main points:

  • Apple Watch often reads low. One study found an average underestimation of 6.07 mL/kg/min and 13.31% MAPE.
  • Garmin is often closer for moderately trained users, with error around 2.8% to 4.1%, but it can miss more in highly trained athletes.
  • Fitbit is easier to use, but its resting-heart-rate approach is less tight and can swing more.
  • Lab CPET is the best choice when I need exact training zones, symptom checks, or a baseline for health decisions.
  • One device reading means less than the trend over weeks or months.

VO2 max still matters because it points to aerobic fitness and can help track change with age. But the number only makes sense if I know how it was produced: direct measurement vs algorithm estimate.

VO2 Max Accuracy on Smartwatches TESTED! (Apple, Garmin, Polar, COROS, Suunto)

Garmin

Quick Comparison

Method How it gets the number What I’d use it for Main limit
Apple Watch Heart rate + outdoor pace + profile data General trend tracking Often reads low
Garmin Heart rate + pace/power + workout data Training trends Less steady at high fitness levels
Fitbit Resting heart rate + profile data Casual fitness awareness Less sensitive to fitness change
Lab CPET Direct gas analysis during exercise Exact measurement, health and performance use Takes time, cost, and max effort

So if I want direction, a watch is fine. If I want measurement, I’d get the lab test.

How Apple Watch, Garmin, and other fitness trackers estimate VO2 max

Apple Watch

Fitness trackers don't measure oxygen use the way a lab test does. Instead, they estimate VO2 max from signals like heart rate, pace, power, and your profile data. Apple, Garmin, and Fitbit each use a different mix, which is why the numbers can vary from one device to another.

Apple Watch: heart rate and outdoor pace-based estimates

Apple Watch labels this estimate Cardio Fitness. It uses optical heart rate, GPS pace, and personal details like age, sex, height, and weight. To get a reading, you usually need an outdoor walk, run, or hike that pushes your heart rate to about 30% above resting. Treadmill workouts usually don't count. Apple also averages results over 30 days, so if your fitness has changed lately, the number may lag behind.

A 2025 validation study found that Apple Watch underestimated VO2 max by an average of 6.07 mL/kg/min, with a mean absolute percentage error of 13.31%. That setup tends to work best during steady outdoor exercise. Change the conditions, though, and the estimate can drift.

Garmin: algorithm-based fitness scores from exercise data

Garmin uses Firstbeat Analytics. The system looks at how heart rate lines up with running pace or cycling power, while also factoring in GPS speed, elevation, and ambient temperature. It usually needs at least 10 minutes of steady outdoor running above 70% of maximum heart rate before it can produce a reading.

That can work well, but there are limits. In highly trained athletes and at the far ends of the fitness range, error rates can climb to 9.4% to 10.4%. So even when the watch seems precise, the estimate can still swing from day to day.

Fitbit: resting heart rate and profile-based estimates

Fitbit

Fitbit reports VO2 max as Cardio Fitness Score. It leans mostly on resting heart rate, along with age, sex, weight, and GPS pace from outdoor runs. Since it doesn't need a set workout to make an estimate, it's easier for more people to use.

The tradeoff is simple: resting heart rate is a weaker signal than exercise data when you're trying to estimate VO2 max. That makes Fitbit's score less sensitive to actual fitness shifts.

What can throw off wearable estimates

One of the biggest trouble spots is noisy heart rate data. If the watch is loose, the skin contact is poor, or the sensor struggles for any reason, the estimate can move even when your fitness hasn't.

Common issues include:

  • Watch fit and sensor noise: A loose watch, tattoos, extreme heat or cold, and heavy movement can interfere with optical heart rate tracking.
  • Medications: Beta-blockers lower heart rate, which can make your cardiovascular system seem more efficient than it is.
  • Terrain and activity type: Hills, treadmill sessions, and stop-and-go workouts can break the pace-to-heart-rate pattern the device relies on.
  • Physiology and recovery: Arrhythmias, dehydration, fatigue, caffeine, alcohol, and illness can all shift heart rate for reasons that have nothing to do with fitness.

So if your wearable VO2 max jumps or drops once, don't panic. A one-off change often says more about device noise than your body. That's why these numbers are best used for trend tracking, not as a replacement for lab testing.

What a real VO2 max test measures at Benchmark Body Metrics

Benchmark Body Metrics

When numbers from a watch or app start shaping your training or health choices, CPET is the baseline test.

CPET: direct gas analysis during graded exercise

A Cardiopulmonary Exercise Test (CPET) is the reference standard for VO2 max measurement because it directly measures the volume of oxygen inhaled and carbon dioxide exhaled using a gas-analysis system and a mask.

In plain English, a lab CPET measures VO2 max directly. It tracks how much oxygen your body uses and how much carbon dioxide you produce while you exercise. You begin at a comfortable pace on a treadmill or cycle ergometer, then the workload goes up every 1 to 3 minutes until you can't continue. A plateau in oxygen use helps confirm maximal effort.

The exercise part usually lasts 8 to 14 minutes, and a well-run test is accurate to within ±1–2 mL/kg/min.

That peak VO2 max number matters, of course. But the test does more than hand you one score. CPET also maps your ventilatory thresholds (VT1 and VT2), which are the points where your breathing changes and your body shifts between aerobic and anaerobic metabolism. For many people, those thresholds matter more for training because they set your effort zones.

There’s also a sport-specific piece here. Treadmill VO2 max results usually come in 5–10% higher than cycling results because running uses more muscle mass. So if you’re a cyclist, runner, or triathlete, the test setup should match how you train.

What the test adds beyond VO2 max

CPET gives you more than a single fitness score. Benchmark Body Metrics can pair CPET with body-composition, resting-metabolic-rate, and blood testing to build a broader health picture.

It also captures RER and substrate-oxidation data, which show when your body leans more on fat versus carbohydrate during exercise. Wearables can't provide that kind of detail. And that’s the main reason CPET is used as the benchmark for comparison.

Wearables vs lab VO2 max: accuracy, limits, and best use cases

Smartwatch vs Lab VO2 Max: Accuracy Comparison Guide

Smartwatch vs Lab VO2 Max: Accuracy Comparison Guide

How accurate are device estimates compared to lab results

The short answer is simple: wearables are good for trends, not exact numbers.

A watch can help you see whether your fitness seems to be moving up, down, or staying flat. But the error range is wide enough that you shouldn't treat the result like a clinical measurement.

Apple Watch Series 9 and Ultra 2 tend to come in low. On average, they underestimate VO2 max by 6.07 mL/kg/min, with a Mean Absolute Percentage Error (MAPE) of 13.31%.

Garmin devices usually do better in moderately trained users, with MAPE values around 2.8% to 4.1%. But that edge shrinks in highly trained athletes. When lab-tested VO2 max goes above 59.8 mL/kg/min, MAPE climbs to 9.4% to 10.4%, and underestimation can reach 6.3 mL/kg/min.

Fitbit-style resting-heart-rate estimates are less dependable. These methods show a pooled overestimation bias of 2.17 mL/kg/min, with much wider limits of agreement of -13.07 to 17.41 mL/kg/min.

So the main issue isn't whether wearables can estimate VO2 max. They can. The issue is how far the estimate may drift from a lab result.

Table 1: VO2 max methods side by side

Method Data Source Estimated or Measured Typical Error Pattern Best Use Case
Apple Watch (Series 9/Ultra 2) Heart rate, GPS pace, age, sex Algorithmic estimate Underestimates by about 6.07 mL/kg/min on average; MAPE 13.31% General health and trend tracking
Garmin (Forerunner/Fenix) Heart rate, GPS speed/distance, age, sex Algorithmic estimate MAPE about 2.8%–4.1% in moderately trained users; higher error in highly trained athletes Athletic training and performance trends
Fitbit Resting heart rate, user profile Algorithmic estimate Resting-heart-rate-based estimate; widest error range Casual health awareness
Benchmark Body Metrics CPET Direct gas analysis (O₂/CO₂) Directly measured Lab reference method; technical error typically under 3% Clinical diagnosis, elite performance, medical decisions

That gap is what separates a number that's good enough for tracking from one you'd trust for a medical call or high-stakes performance testing.

When wearable VO2 max works best and when it falls short

Wearable VO2 max works best for trend tracking in healthy, recreationally active users. It should not be used on its own for medical decisions or elite performance calls.

These devices tend to work best when the estimate comes from exercise-based data, especially with a strong GPS signal and a steady heart-rate signal. If your number moves in a steady direction over several weeks or months, that pattern usually tells you more than a single reading ever could.

Where things break down is when precision matters. A watch estimate is not enough for:

  • medical decisions
  • unexplained shortness of breath
  • chest symptoms
  • surgical risk assessment
  • elite performance diagnostics

In those cases, relying on wearable data alone is risky.

The same caution applies to older adults and people with cardiovascular conditions. In those groups, the algorithms are less predictable.

Comparison tables: methods, pros and cons, and best fit by use case

The practical difference isn't just about accuracy. It's about what each method can actually do for you.

Table 2: Pros, cons, and best fit

Feature Wearables (Apple/Garmin/Fitbit) Lab CPET at Benchmark Body Metrics
Accuracy Moderate; error varies by device and fitness level High - lab reference method, typically within about 2% to 3% technical error
Convenience High - automatic, continuous, no appointment needed Lower - requires scheduling and a supervised maximal effort test
Depth of insight VO2 max estimate only VO2 max plus thresholds and substrate-use data
Best fit Trend tracking for healthy, recreationally active adults Clinical evaluation, surgical risk assessment, performance diagnostics, unexplained symptoms

How to use your VO2 max number and when to get a real test

The better question isn't whether your watch nails the exact number. It's whether the trend tells you something useful.

Across Apple Watch, Garmin, and Fitbit, the basic idea is the same: these devices work best as trend tools first and measurement tools second. Small ups and downs often come from device noise, not a change in fitness. So focus on the pattern over several months, not the little swings from one week to the next.

Your setup matters too. If your profile is out of date, your estimate will be out of date too. Keep it current, especially your max heart rate.

When a lab VO2 max test is worth the time and cost

Once your watch number is useful as a trend line, the next step is figuring out when exactness matters.

A wearable estimate isn't enough when the number needs to be precise. That's when a lab test makes sense: for exact training zones, figuring out a plateau, or setting a medical baseline. This matters even more for adults over 45 with cardiovascular risk factors.

Here's why that matters. Laboratory testing is accurate to within 1–2 mL/kg/min, while wearable estimates are often off by about 5–8+ mL/kg/min.

If you're serious about using the data to guide training or health choices, periodic lab testing can help calibrate what your watch is telling you.

Conclusion: the main takeaway for device users

The choice comes down to purpose: tracking or measurement. Use wearables for direction. Use a lab test for precision.

FAQs

How often should I compare my watch VO2 max trend?

Treat watch VO2 max as a long-term trend, not something to judge day by day. These estimates often rely on rolling averages, so small daily shifts can give the wrong impression.

A better way to use the data is to compare your trend over weeks or months. If your training changes or you want to check your progress, looking at it every 6 to 8 weeks makes more sense.

Can treadmill workouts affect my VO2 max estimate?

Yes. Treadmill workouts can affect your VO2 max estimate, but the result may be less accurate.

Many smartwatches rely on GPS data from outdoor runs, so treadmill sessions are often left out. And when a watch does count them, the estimate can get skewed. That's because treadmill runs miss outside factors like wind resistance and GPS-checked pace. On top of that, a set treadmill speed doesn't always line up with your heart rate the same way an outdoor run does.

Who should get a lab VO2 max test?

A lab VO2 max test makes the most sense for athletes who want objective numbers to dial in training or track strict performance goals. It also matters in clinical settings, because consumer wearables aren't reliable enough for cardiovascular risk screening or medical diagnosis.

That said, it isn't a fit for everyone. The test puts a heavy load on the heart and lungs, and people over 45 or those with risk factors may need professional supervision. Results can also come in lower than they should if someone isn't trained enough to push to true exhaustion.

Related Blog Posts