Skip to content

Add readability check #2349

@kevinmessiaen

Description

@kevinmessiaen

Summary

Add a built-in check that computes a readability score (e.g., Flesch-Kincaid grade level) for the model output and validates it against a threshold.

Implementation Guide

Pattern to follow

Reference: libs/giskard-checks/src/giskard/checks/builtin/comparison.py (threshold-based)

Steps

  1. Add to libs/giskard-checks/src/giskard/checks/builtin/nlp_metrics.py
  2. Register with @Check.register("readability")
  3. Implement async run(self, trace: Trace) -> CheckResult
  4. Support:
    • key: JSONPathStr — JSONPath for output (default: trace.last.outputs)
    • metric: Literal["flesch_reading_ease", "flesch_kincaid_grade", "gunning_fog"] = "flesch_reading_ease"
    • min_score: float | None = None — minimum readability score
    • max_score: float | None = None — maximum readability score
  5. Use textstat library for computation
  6. Include readability score as a Metric
  7. Add textstat to the nlp optional dependency group
  8. Add tests in tests/builtin/test_nlp_metrics.py

Example usage

from giskard.checks import Readability, Scenario

# Ensure response is easy to read (Flesch Reading Ease > 60 = "standard")
scenario = (
    Scenario(name="readable_response")
    .interact(inputs="Explain quantum computing", outputs="Quantum computing uses...")
    .check(Readability(metric="flesch_reading_ease", min_score=60))
)

Acceptance Criteria

  • Computes Flesch Reading Ease, Flesch-Kincaid Grade, and Gunning Fog scores
  • Configurable metric selection
  • Min/max score thresholds work
  • Score is included as a Metric
  • Clear error when textstat is not installed
  • Tests cover: simple text, complex text, each metric variant

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions