Summary
Add a built-in check that computes a readability score (e.g., Flesch-Kincaid grade level) for the model output and validates it against a threshold.
Implementation Guide
Pattern to follow
Reference: libs/giskard-checks/src/giskard/checks/builtin/comparison.py (threshold-based)
Steps
- Add to
libs/giskard-checks/src/giskard/checks/builtin/nlp_metrics.py
- Register with
@Check.register("readability")
- Implement
async run(self, trace: Trace) -> CheckResult
- Support:
key: JSONPathStr — JSONPath for output (default: trace.last.outputs)
metric: Literal["flesch_reading_ease", "flesch_kincaid_grade", "gunning_fog"] = "flesch_reading_ease"
min_score: float | None = None — minimum readability score
max_score: float | None = None — maximum readability score
- Use
textstat library for computation
- Include readability score as a
Metric
- Add
textstat to the nlp optional dependency group
- Add tests in
tests/builtin/test_nlp_metrics.py
Example usage
from giskard.checks import Readability, Scenario
# Ensure response is easy to read (Flesch Reading Ease > 60 = "standard")
scenario = (
Scenario(name="readable_response")
.interact(inputs="Explain quantum computing", outputs="Quantum computing uses...")
.check(Readability(metric="flesch_reading_ease", min_score=60))
)
Acceptance Criteria
Summary
Add a built-in check that computes a readability score (e.g., Flesch-Kincaid grade level) for the model output and validates it against a threshold.
Implementation Guide
Pattern to follow
Reference:
libs/giskard-checks/src/giskard/checks/builtin/comparison.py(threshold-based)Steps
libs/giskard-checks/src/giskard/checks/builtin/nlp_metrics.py@Check.register("readability")async run(self, trace: Trace) -> CheckResultkey: JSONPathStr— JSONPath for output (default:trace.last.outputs)metric: Literal["flesch_reading_ease", "flesch_kincaid_grade", "gunning_fog"] = "flesch_reading_ease"min_score: float | None = None— minimum readability scoremax_score: float | None = None— maximum readability scoretextstatlibrary for computationMetrictextstatto thenlpoptional dependency grouptests/builtin/test_nlp_metrics.pyExample usage
Acceptance Criteria
textstatis not installed