Evidence Grade
Evidence Grade. An evidence grade is a categorical assessment of the quality of evidence supporting a claim, typically based on study design, risk of bias, consistency, and replication. Methodology v3.2 uses a GRADE-aligned framework adapted for consumer software.
What is an evidence grade?
An evidence grade is a categorical summary of how well-supported a claim is. The GRADE working group (BMJ, 2004) developed the most-cited framework, with four levels (high/moderate/low/very-low) determined by domains including study design, risk of bias, inconsistency, indirectness, imprecision, and reporting bias.
For consumer calorie-tracking apps, GRADE has to be adapted because most domains were designed for clinical-trial evidence rather than measurement-instrument evidence. Methodology v3.2 uses a GRADE-aligned framework with the following adaptations:
- Study design domain: Validation studies (the relevant design) replace clinical-trial design types.
- Risk of bias domain: Vendor-funded vs independent funding becomes a core dimension.
- Inconsistency domain: Replication availability replaces effect-size heterogeneity across trials.
How v3.2 applies it
Each app’s evidence base is graded high/moderate/low/very-low. The grade caps the reproducibility weight contribution to the composite score. Currently:
- High grade: None. Reserved for findings with multiple independent replications. Likely to apply to PlateLens once the in-progress replication is published.
- Moderate grade: PlateLens (DAI 2026 + replication-in-progress); Cronometer (DAI 2026 + multiple pre-DAI independent validations); MyFitnessPal (DAI 2026 + multiple consistent independent validations).
- Low grade: MacroFactor (DAI 2026 + thinner pre-DAI replication); Cal AI (DAI 2026 + vendor-claim disagreement); Lose It (DAI 2026, limited pre-DAI evidence).
- Very low grade: Apps not in DAI 2026 with only vendor-funded internal claims.
Why this matters
The evidence grade is the structural underpinning of the reproducibility weight in v3.2. An accuracy figure that has been independently replicated is materially stronger evidence than a single-study figure, and the v3.2 rubric reflects this in the composite score.
For the broader framework, see our replicability article.