Skip to content

fix: add MarkInfo and ViewerPreferences to accessible PDF output#14999

Open
anuradha1304 wants to merge 1 commit intolaurent22:devfrom
anuradha1304:fix/accessible-pdf-missing-tags
Open

fix: add MarkInfo and ViewerPreferences to accessible PDF output#14999
anuradha1304 wants to merge 1 commit intolaurent22:devfrom
anuradha1304:fix/accessible-pdf-missing-tags

Conversation

@anuradha1304
Copy link
Copy Markdown
Contributor

@anuradha1304 anuradha1304 commented Apr 3, 2026

Fixes #14994

Problem

The "Create accessible document" context menu option generates PDFs without accessibility tags. pdf-lib does not set MarkInfo or ViewerPreferences in the PDF catalog by default, so screen readers cannot identify the output as a tagged document - despite the feature being specifically for accessibility.

The note export path already handles this correctly via Electron's generateTaggedPDF flag in InteropServiceHelper.ts. This fix brings the OCR-based accessible PDF path to the same standard.

Fix

Inject MarkInfo << /Marked true >> and ViewerPreferences into the PDF catalog using pdf-lib's low-level context API, immediately before saving the document.

Changes

File :packages/lib/services/ocr/utils/createAccessiblePdf.ts

Change: Added MarkInfo and ViewerPreferences to PDF catalog

Note

This adds the required catalog entries for tagged PDFs. A fully PDF/UA compliant document would additionally require a structure tree,which is beyond the scope of this fix and the current pdf-lib capabilities.

Test Plan

  1. Open a note with headings and lists in Joplin desktop
  2. Right-click → Export → "Create accessible PDF"
  3. Open the PDF in a hex editor or PDF inspector (e.g. pdfinfo, pdfid.py, or Adobe Acrobat's preflight)
  4. Verify MarkInfo dictionary is present in the PDF catalog with /Marked true
  5. Verify ViewerPreferences is present
  6. Before this fix: these entries were absent entirely

The createAccessiblePdf function was generating PDFs without
accessibility tags, despite the feature being marketed as creating
accessible documents. pdf-lib does not set MarkInfo or ViewerPreferences
by default, so screen readers could not identify the output as tagged.

Fix: inject MarkInfo << /Marked true >> and ViewerPreferences into the
PDF catalog after document creation, using pdf-lib's low-level context
API. This matches what Electron's generateTaggedPDF flag does for the
note export path.

Fixes laurent22#14994
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 3, 2026

📝 Walkthrough

Walkthrough

The createAccessiblePdf function has been updated to enhance PDF accessibility by adding two metadata catalog entries: MarkInfo with Marked set to true, and ViewerPreferences with DisplayDocTitle set to true. These additions are performed after the document pages and invisible text layer have been generated.

Changes

Cohort / File(s) Summary
PDF Accessibility Metadata
packages/lib/services/ocr/utils/createAccessiblePdf.ts
Added imports for PDFBool and PDFName. Integrated MarkInfo and ViewerPreferences catalog entries to the PDF document to improve accessibility and viewer preferences handling.

Suggested labels

bug, OCR, export


Important

Pre-merge checks failed

Please resolve all errors before merging. Addressing warnings is optional.

❌ Failed checks (1 error)

Check name Status Explanation Resolution
Pr Description Must Follow Guidelines ❌ Error The PR description lacks a Test Plan or verification steps section required by custom check guidelines, despite adequately explaining the problem and solution. Add a Test Plan section detailing testing approach, specific verification steps, and any tools used to validate that MarkInfo and ViewerPreferences were correctly injected into the PDF.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarises the main change: adding MarkInfo and ViewerPreferences to the PDF catalog for accessibility compliance.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Description check ✅ Passed The pull request description clearly relates to the changeset, explaining the problem (missing MarkInfo and ViewerPreferences), the solution, and the specific file modified.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai bot added bug It's a bug export OCR Related to optical character recognition or handwriting transcription. labels Apr 3, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
packages/lib/services/ocr/utils/createAccessiblePdf.ts (1)

1-2: Consider consolidating imports from the same module.

Both lines import from pdf-lib. Merging them into a single import statement would be cleaner.

Proposed fix
-import { PDFDocument, PDFFont, PDFPage, rgb, StandardFonts } from 'pdf-lib';
-import { PDFBool, PDFName } from 'pdf-lib';
+import { PDFBool, PDFDocument, PDFFont, PDFName, PDFPage, rgb, StandardFonts } from 'pdf-lib';
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/lib/services/ocr/utils/createAccessiblePdf.ts` around lines 1 - 2,
Consolidate the two imports from the same module by merging the separate import
lines for PDFDocument, PDFFont, PDFPage, rgb, StandardFonts and PDFBool, PDFName
into a single import from 'pdf-lib' (referencing the existing imported symbols
PDFDocument, PDFFont, PDFPage, rgb, StandardFonts, PDFBool, PDFName) so there is
only one import statement at the top that includes all required exports from
pdf-lib.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@packages/lib/services/ocr/utils/createAccessiblePdf.ts`:
- Around line 1-2: Consolidate the two imports from the same module by merging
the separate import lines for PDFDocument, PDFFont, PDFPage, rgb, StandardFonts
and PDFBool, PDFName into a single import from 'pdf-lib' (referencing the
existing imported symbols PDFDocument, PDFFont, PDFPage, rgb, StandardFonts,
PDFBool, PDFName) so there is only one import statement at the top that includes
all required exports from pdf-lib.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 421a73d5-8d60-4cb6-944b-27c32b7d50f0

📥 Commits

Reviewing files that changed from the base of the PR and between acd2ef4 and 8294b2a.

📒 Files selected for processing (1)
  • packages/lib/services/ocr/utils/createAccessiblePdf.ts

@personalizedrefrigerator
Copy link
Copy Markdown
Collaborator

With this change, lists/headings/etc still don't seem to be recognized and tagged as such. What are the accessibility benefits of this change by itself?

Notes:

  • The text overlay seems to remain untagged.
  • Tags from the original PDF do not seem to be transferred.

(Thank you for the pull request!)

@anuradha1304
Copy link
Copy Markdown
Contributor Author

You're right that without a structure tree the practical benefit is limited. The main value is that some assistive tools check for MarkInfo before attempting to parse anything - without it they skip the document entirely.
Happy to either keep this as a foundational step or close it in favour of a more complete fix, whatever you think is better.

@personalizedrefrigerator
Copy link
Copy Markdown
Collaborator

The main value is that some assistive tools check for MarkInfo before attempting to parse anything - without it they skip the document entirely.

Thanks for the clarification!

To help prevent future regressions, consider adding a code comment with an example of accessibility tools that will skip parsing the document. (A link to documentation for MarkInfo could also be helpful).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug It's a bug export OCR Related to optical character recognition or handwriting transcription.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

The option to generate accessible PDFs creates them without tags

2 participants