[codex] fix large repository graph loading by JaysonAlbert · Pull Request #732 · abhigyanpatwari/GitNexus

JaysonAlbert · 2026-04-08T09:31:16Z

Summary

stream /api/graph responses as NDJSON instead of building one huge JSON string
parse streamed graph records incrementally in the web client
add a regression test for NDJSON graph parsing

Root Cause

Large indexed repositories can produce an /api/graph payload that is too large to serialize as a single JSON string in Node. That caused the backend to fail with Invalid string length before the browser could finish loading the repository.

Fix

add streamQuery() on the LadybugDB adapter so large result sets can be emitted row-by-row
add stream=true support on /api/graph and write node/relationship records as NDJSON
update fetchGraph() in gitnexus-web to consume NDJSON incrementally while preserving the old JSON fallback

Fixes #495.

Validation

cd gitnexus-web && npm test -- test/unit/server-connection.test.ts
cd gitnexus && npm test -- test/unit/cors.test.ts
cd gitnexus && npx tsc --noEmit
cd gitnexus-web && npx tsc -b --noEmit
manual validation against libra-server-release-1.63.x (338911 nodes / 1437718 relationships) through the streamed graph path

vercel · 2026-04-08T09:31:20Z

Someone is attempting to deploy a commit to the NexusCore Team on Vercel.

A member of the Team first needs to authorize it.

magyargergo · 2026-04-09T05:08:14Z

@claude is this ready for production? Is there any performance issues and gaps in the implementation? Did we implement what we promised in the summary? Also analyze the architectural fit.

github-actions · 2026-04-09T05:08:30Z

Claude finished @magyargergo's task in 2m 3s —— View job

PR Review: Large Graph Streaming

Tasks

Gather context and understand the request
Read changed files
Analyze production readiness
Analyze performance issues and gaps
Verify implementation completeness vs. summary promises
Analyze architectural fit

TL;DR

The fix correctly addresses the root cause (Invalid string length on large JSON serialization). The NDJSON streaming approach is architecturally sound and the summary promises are fully delivered. However, there are several production gaps that should be addressed before merging.

✅ Summary Promises — Delivered

Promise	Status
`streamQuery()` on LadybugDB adapter	✅ Delivered (`lbug-adapter.ts:640-666`)
`stream=true` support on `/api/graph` with NDJSON	✅ Delivered (`api.ts:506-514`)
`fetchGraph()` consumes NDJSON incrementally	✅ Delivered (`backend-client.ts:447-505`)
Fallback to old JSON path preserved	✅ Delivered (Content-Type check at line 415)
Regression test for NDJSON parsing	✅ Delivered (`server-connection.test.ts:44-83`)
Manual validation on large repo (338k nodes / 1.4M rels)	✅ Stated in PR body

🔴 Production Blockers

1. No client-disconnect handling — stream keeps running after browser closes

streamGraphNdjson (api.ts:187-214) runs inside withLbugDb with no mechanism to detect when the client disconnects. If the browser closes the tab mid-stream, the server will continue iterating all 1.4M rows, writing to a closed socket, holding the session lock the entire time.

// Missing in api.ts around line 508:
req.on('close', () => { /* need a way to abort streamGraphNdjson */ });

Fix this →

2. Session lock held for entire stream duration — blocks all other DB queries

withLbugDb holds the session lock while streamGraphNdjson streams all nodes and relationships. For a 1.4M relationship graph, this could block all other API requests (search, cypher queries, etc.) for potentially several minutes.

// api.ts:512 — this locks the session for the full stream duration
await withLbugDb(lbugPath, async () => streamGraphNdjson(res, includeContent));

The solution would be to either (a) release the lock after the query is prepared and use a cursor approach, or (b) accept the limitation but document it clearly.

3. No backpressure — res.write() called in a tight loop

streamGraphNdjson calls res.write() on every row without checking res.writable or awaiting drain events. For 338k nodes + 1.4M relationships, this floods Node's write buffer.

// api.ts:194 — no backpressure
res.write(JSON.stringify({ type: 'node', data: ... }) + '\n');

Fix this →

🟡 Significant Gaps

4. Silent error swallowing in node table queries

// api.ts:201-203
} catch {
  // ignore empty tables
}

This catches all errors, not just "table empty" errors. A real DB failure mid-stream would be silently discarded, leaving the client with a partial graph and no error signal. Should at minimum check if the error is "table not found" before swallowing it.
Fix this →

5. Client fallback JSON path is dead code

fetchGraph hardcodes stream=true in its params (line 407), so the Content-Type check at line 415 will always match application/x-ndjson. The old streaming-with-progress fallback (lines 419-444) is now unreachable code. This is harmless but misleading.

6. Content-Length is always null for NDJSON — progress % never works

// backend-client.ts:455-456
const contentLength = response.headers.get('Content-Length');
const total = contentLength ? parseInt(contentLength, 10) : null;

NDJSON responses use chunked transfer encoding — no Content-Length header is sent. So total is always null and progress callbacks can never report a percentage. The onProgress param is accepted but functionally incomplete for NDJSON.

7. streamQuery may not actually stream at the DB level

lbug-adapter.ts:648: conn.query(cypher) is called first, which may buffer all rows before returning. Whether hasNext()/getNext() truly streams row-by-row depends on the LadybugDB driver internals. If the driver materializes the full result set before the first hasNext() call, then memory isn't actually reduced — only the JSON serialization bottleneck is fixed.

🟢 Architectural Assessment

The NDJSON-over-HTTP approach is a good architectural fit:

It reuses the existing HTTP path without introducing WebSockets or SSE for the graph endpoint
The type: 'node' | 'relationship' | 'error' envelope is clean and extensible
The fallback to legacy JSON path (via Content-Type detection) is forward-compatible
streamQuery using hasNext()/getNext() is the right abstraction for the DB cursor pattern
The client correctly flushes the remaining buffer after stream end (decoder.decode() at line 501)
Error propagation mid-stream via { type: 'error' } records is well-designed

The main architectural concern is the session lock holding strategy: the single global conn means only one graph stream can run at a time, and it blocks all other queries. For a multi-tenant server this is a significant bottleneck.

🧪 Test Coverage Gaps

No test for lines split across chunk boundaries (the most common real-world streaming failure mode)
No test for { type: 'error' } records
No test verifying that stream=true is sent in the request URL
No test for the 60-second timeout behavior with a slow server

Verdict

Not production-ready as-is. The root-cause fix is correct and the approach is sound, but issues #1 (client disconnect) and #3 (backpressure) are genuine production risks for the large-repo use case this PR targets. Issue #2 (session lock) is an architectural concern that should be documented if not fixed. The rest are lower priority but worth tracking.

magyargergo · 2026-04-09T05:12:43Z

@JaysonAlbert could you please look at these findings?

JaysonAlbert · 2026-04-09T05:15:27Z

@JaysonAlbert could you please look at these findings?

I'll check it out

github-actions · 2026-04-09T05:17:52Z

CI Report

❌ Some checks failed

Pipeline Status

Stage	Status	Details
❌ Typecheck	`failure`	tsc --noEmit
✅ Tests	`success`	unit tests, 3 platforms
❌ E2E	`failure`	gitnexus-web changes only

Test Results

Tests	Passed	Failed	Skipped	Duration
5578	5486	0	92	194s

✅ All 5486 tests passed

92 test(s) skipped — expand for details

Swift MethodExtractor > isTypeDeclaration > recognizes class_declaration
Swift MethodExtractor > isTypeDeclaration > recognizes protocol_declaration
Swift MethodExtractor > isTypeDeclaration > rejects import_declaration
Swift MethodExtractor > visibility > extracts public method
Swift MethodExtractor > visibility > extracts private method
Swift MethodExtractor > visibility > defaults to internal when no modifier
Swift MethodExtractor > protocol methods > marks protocol method as abstract
Swift MethodExtractor > static and class methods > detects static func as isStatic
Swift MethodExtractor > static and class methods > detects class func as isStatic
Swift MethodExtractor > parameters > extracts parameters with types and default values
Swift MethodExtractor > return type > extracts return type from -> annotation
Swift MethodExtractor > annotations > extracts @objc attribute
Swift MethodExtractor > isFinal > detects final func
Swift MethodExtractor > isFinal > is false when not final
Swift MethodExtractor > isAsync > detects async func
Swift MethodExtractor > isOverride > detects override method
buildTypeEnv > known limitations (documented skip tests) > Ruby block parameter: users.each { |user| } — closure param inference, different feature
Swift constructor-inferred type resolution > detects User and Repo classes, both with save methods
Swift constructor-inferred type resolution > resolves user.save() to Models/User.swift via constructor-inferred type
Swift constructor-inferred type resolution > resolves repo.save() to Models/Repo.swift via constructor-inferred type
Swift constructor-inferred type resolution > emits exactly 2 save() CALLS edges (one per receiver type)
Swift self resolution > detects User and Repo classes, each with a save function
Swift self resolution > resolves self.save() inside User.process to User.save, not Repo.save
Swift parent resolution > detects BaseModel and User classes plus Serializable protocol
Swift parent resolution > emits EXTENDS edge: User → BaseModel
Swift parent resolution > emits IMPLEMENTS edge: User → Serializable (protocol conformance)
Swift cross-file User.init() inference > resolves user.save() via User.init(name:) inference
Swift cross-file User.init() inference > resolves user.greet() via User.init(name:) inference
Swift return type inference > detects User class and getUser function
Swift return type inference > detects save function on User (Swift class methods are Function nodes)
Swift return type inference > resolves user.save() to User#save via return type of getUser() -> User
Swift return-type inference via function return type > resolves user.save() to User#save via return type of getUser()
Swift return-type inference via function return type > user.save() does NOT resolve to Repo#save
Swift return-type inference via function return type > resolves repo.save() to Repo#save via return type of getRepo()
Swift implicit imports (cross-file visibility) > detects UserService class in Models.swift
Swift implicit imports (cross-file visibility) > resolves UserService() constructor call across files (no explicit import)
Swift implicit imports (cross-file visibility) > resolves service.fetchUser() member call across files
Swift implicit imports (cross-file visibility) > creates IMPORTS edges between files in the same module
Swift extension deduplication > detects Product class
Swift extension deduplication > resolves Product() constructor despite extension creating duplicate class node
Swift extension deduplication > resolves product.save() to Product.swift (primary definition)
Swift constructor call fallback (no new keyword) > resolves OCRService() as constructor call across files
Swift constructor call fallback (no new keyword) > resolves ocr.recognize() member call via constructor-inferred type
Swift export visibility (internal vs private) > resolves PublicService() constructor across files
Swift export visibility (internal vs private) > resolves internalHelper() across files (internal = module-scoped)
Swift if let / guard let binding resolution > detects User and Repo classes
Swift if let / guard let binding resolution > resolves user.save() inside if-let to User#save
Swift if let / guard let binding resolution > resolves repo.save() inside guard-let to Repo#save
Swift if let / guard let binding resolution > user.save() in if-let does NOT resolve to Repo#save
Swift await / try expression unwrapping > resolves user.save() via await fetchUser() return type
Swift await / try expression unwrapping > resolves repo.save() via try parseRepo() return type
Swift await / try expression unwrapping > detects fetchUser and parseRepo as functions
Swift for-in loop element type inference > detects User and Repo classes
Swift for-in loop element type inference > creates implicit import edges between files
Swift field-type resolution > detects classes and their properties
Swift field-type resolution > emits HAS_PROPERTY edges from class to field
Swift field-type resolution > resolves field-chain call user.address.save() → Address#save
Swift field-type resolution > emits ACCESSES edges for field reads in chains
Swift field-type resolution > populates field metadata (visibility, declaredType) on Property nodes
Swift call-result binding > resolves call-result-bound method call user.save() → User#save
Swift call-result binding > getUser() is present as a defined function
Swift call-result binding > emits processUser -> getUser CALLS edge for let-assigned free function call
Swift method enrichment > detects Animal protocol and Dog class
Swift method enrichment > emits IMPLEMENTS edge Dog -> Animal
Swift method enrichment > emits HAS_METHOD edges for Dog methods
Swift method enrichment > marks protocol Animal.speak as isAbstract
Swift method enrichment > marks Dog.speak as NOT isAbstract
Swift method enrichment > marks breathe as isFinal
Swift method enrichment > marks classify as isStatic
Swift method enrichment > captures @objc annotation on breathe
Swift method enrichment > populates parameterTypes for classify(_ name: String)
Swift method enrichment > records parameterCount for classify
Swift method enrichment > records returnType for speak
Swift method enrichment > resolves dog.speak() CALLS edge
Swift method enrichment > resolves Dog.classify("dog") CALLS edge
Swift abstract dispatch > detects Repository protocol and SqlRepository class
Swift abstract dispatch > emits IMPLEMENTS edge SqlRepository -> Repository
Swift abstract dispatch > emits HAS_METHOD edges for Repository.find and Repository.save
Swift abstract dispatch > emits HAS_METHOD edges for SqlRepository.find and SqlRepository.save
Swift abstract dispatch > marks base Repository.find as isAbstract
Swift abstract dispatch > marks base Repository.save as isAbstract
Swift abstract dispatch > marks concrete SqlRepository.find as NOT isAbstract
Swift abstract dispatch > resolves repo.find(id: 42) CALLS edge
Swift abstract dispatch > resolves repo.save(entity: user) CALLS edge
Swift abstract dispatch > populates parameterTypes for Repository.find
Swift abstract dispatch > populates parameterTypes for Repository.save
Swift abstract dispatch > records returnType for SqlRepository.find
Swift abstract dispatch > emits METHOD_IMPLEMENTS edges from SqlRepository methods → Repository protocol methods
Swift overloaded method disambiguation > detects 2 distinct find Method nodes on SqlRepository
Swift overloaded method disambiguation > emits METHOD_IMPLEMENTS edges for both find overloads
Swift overloaded method disambiguation > emits METHOD_IMPLEMENTS edge for save
Swift overloaded method disambiguation > emits exactly 3 METHOD_IMPLEMENTS edges total

Code Coverage

Tests

Metric	Coverage	Covered	Base	Delta	Status
Statements	71.7%	15150/21129	71.96%	📉 -0.3	🔴 ██████████████░░░░░░
Branches	60.86%	9895/16257	61.04%	📉 -0.2	🔴 ████████████░░░░░░░░
Functions	76.14%	1395/1832	76.52%	📉 -0.4	🔴 ███████████████░░░░░
Lines	73.87%	13723/18577	74.12%	📉 -0.2	🔴 ██████████████░░░░░░

_{📋 View full run · Generated by CI}

magyargergo · 2026-04-09T05:24:46Z

Thank you for your contribution too! This looks very promising!

magyargergo · 2026-04-09T05:25:13Z

We need to ensure green CI and increased test coverage.

fix(web): stream large graph responses

d288c39

JaysonAlbert marked this pull request as ready for review April 8, 2026 11:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[codex] fix large repository graph loading#732

[codex] fix large repository graph loading#732
JaysonAlbert wants to merge 1 commit intoabhigyanpatwari:mainfrom
JaysonAlbert:codex/fix-large-graph-streaming

JaysonAlbert commented Apr 8, 2026

Uh oh!

vercel bot commented Apr 8, 2026

Uh oh!

magyargergo commented Apr 9, 2026

Uh oh!

github-actions bot commented Apr 9, 2026 •

edited

Loading

Uh oh!

magyargergo commented Apr 9, 2026

Uh oh!

JaysonAlbert commented Apr 9, 2026

Uh oh!

github-actions bot commented Apr 9, 2026

Uh oh!

magyargergo commented Apr 9, 2026

Uh oh!

magyargergo commented Apr 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

JaysonAlbert commented Apr 8, 2026

Summary

Root Cause

Fix

Validation

Uh oh!

vercel bot commented Apr 8, 2026

Uh oh!

magyargergo commented Apr 9, 2026

Uh oh!

github-actions bot commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: Large Graph Streaming

TL;DR

✅ Summary Promises — Delivered

🔴 Production Blockers

🟡 Significant Gaps

🟢 Architectural Assessment

🧪 Test Coverage Gaps

Verdict

Uh oh!

magyargergo commented Apr 9, 2026

Uh oh!

JaysonAlbert commented Apr 9, 2026

Uh oh!

github-actions bot commented Apr 9, 2026

CI Report

Pipeline Status

Test Results

Code Coverage

Tests

Uh oh!

magyargergo commented Apr 9, 2026

Uh oh!

magyargergo commented Apr 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions bot commented Apr 9, 2026 •

edited

Loading