Version
No response
Platform
**Summary:** releaseWritingBuf() in lib/internal/streams/fast-utf8-stream.js incorrectly calculates string slice positions when fs.write returns a byte count that splits a multi-byte UTF-8 character, causing silent data corruption (lost characters, lone surrogates in output).
**Description:**
The releaseWritingBuf function (line 896) converts bytes-written to character count using:
n = Buffer.from(writingBuf).subarray(0, n).toString().length;
When n bytes cuts through a multi-byte character, the incomplete UTF-8 sequence becomes U+FFFD (replacement character) via .toString(). This replacement character has a different .length than the original character in JS UTF-16, causing .slice(n) to cut at the wrong position:
- 3-byte characters (CJK, most non-Latin): character silently dropped from output
- 4-byte characters (emoji, supplementary CJK): lone low surrogate left in remaining buffer, producing invalid UTF-8 on next write
The file was recently added (January 2026), ported from SonicBoom. It is used as the fast path for streaming UTF-8 output.
## Steps To Reproduce:
1. Save this as poc.js and run with node poc.js:
// Reproduces the releaseWritingBuf logic from lib/internal/streams/fast-utf8-stream.js lines 896-906
function releaseWritingBuf(writingBuf, len, n) {
if (typeof writingBuf === 'string' && Buffer.byteLength(writingBuf) !== n) {
n = Buffer.from(writingBuf).subarray(0, n).toString().length;
}
len = Math.max(len - n, 0);
writingBuf = writingBuf.slice(n);
return { writingBuf, len };
}
// Case 1: 4-byte emoji split at byte 7 — lone surrogate
const r1 = releaseWritingBuf("hello🌍world", 14, 7);
console.log("Case 1 - Emoji split:");
console.log(" Result:", JSON.stringify(r1.writingBuf));
console.log(" Expected:", JSON.stringify("🌍world"));
console.log(" First char code: 0x" + r1.writingBuf.charCodeAt(0).toString(16));
console.log(" Is lone surrogate:", r1.writingBuf.charCodeAt(0) >= 0xDC00 &&
r1.writingBuf.charCodeAt(0) <= 0xDFFF);
// Case 2: 3-byte CJK char split at byte 4 — character lost
const r2 = releaseWritingBuf("abc中def", 9, 4);
console.log("\nCase 2 - CJK split:");
console.log(" Result:", JSON.stringify(r2.writingBuf));
console.log(" Expected:", JSON.stringify("中def"));
console.log(" Character 中 lost:", !r2.writingBuf.includes("中"));
2. Output shows:
Case 1 - Emoji split:
Result: "\udf0dworld" ← CORRUPTED (lone surrogate)
Expected: "🌍world"
First char code: 0xdf0d
Is lone surrogate: true
Case 2 - CJK split:
Result: "def" ← CHARACTER LOST
Expected: "中def"
Character 中 lost: true
3. The vulnerable code is at:
https://github.com/nodejs/node/blob/main/lib/internal/streams/fast-utf8-stream.js#L896-L906
Partial fs.write returns are possible when writing to pipes near capacity, under disk I/O pressure, or to Docker log pipes (the exact use case mentioned in the file's comments on line 69-70).
Additional finding: Line 240 has a typo from the SonicBoom port — this._asyncDrainScheduled should be this.#asyncDrainScheduled. All other 5 references use the private field correctly. The newListener handler is effectively dead code.
## Impact:
Silent data corruption in output files. Applications using Utf8Stream for logging with international characters (CJK, emoji, Cyrillic) can produce corrupted output when partial writes occur. 3-byte characters are silently lost (no error emitted). 4-byte characters produce invalid UTF-8 (lone surrogates). This is especially relevant for the Docker container logging use case the file was designed for.
## Supporting Material/References:
- Vulnerable function: releaseWritingBuf() at https://github.com/nodejs/node/blob/main/lib/internal/streams/fast-utf8-stream.js#L896-L906
- Typo (secondary): line 240, _asyncDrainScheduled vs #asyncDrainScheduled
- File derived from SonicBoom (https://github.com/pinojs/sonic-boom) — the original has a similar issue but uses _ prefix consistently
- The PoC script above is standalone and runs on any Node.js version
Subsystem
No response
What steps will reproduce the bug?
-
Save the following script as poc.js and run it with node poc.js.
-
The script reproduces the exact logic from
lib/internal/streams/fast-utf8-stream.js (lines 896–906),
specifically the releaseWritingBuf() function.
-
The script simulates partial fs.write() behavior where the number
of bytes written splits a multi-byte UTF-8 character.
-
Observe the output:
- When a 4-byte UTF-8 character (emoji) is split, a lone surrogate
remains in the output.
- When a 3-byte UTF-8 character (CJK) is split, the character is
silently dropped.
-
This demonstrates incorrect string slicing caused by converting
byte counts to character counts via .toString().length.
How often does it reproduce? Is there a required condition?
It reproduces deterministically whenever fs.write() (or an equivalent
internal write) returns a byte count that splits a multi-byte UTF-8
character.
The issue is not timing-dependent or race-based. The required condition
is a partial write that ends in the middle of a UTF-8 sequence.
This can occur when writing to pipes, sockets, or log streams under
backpressure (e.g. near-capacity pipes, Docker container logs, or heavy
I/O), which is a documented and expected behavior of fs.write().
What is the expected behavior? Why is that the expected behavior?
The output must always preserve valid UTF-8 and must not silently
corrupt data.
When a partial write ends in the middle of a multi-byte UTF-8 character,
the remaining bytes for that character should be preserved and written
in a subsequent write, rather than being dropped or converted into
replacement characters.
This is the expected behavior because:
fs.write() is documented to return partial byte counts.
- UTF-8 stream handling must be byte-safe across writes.
- Producing lone surrogates or dropping characters violates UTF-8
correctness and results in silent data corruption.
The current behavior breaks UTF-8 invariants and can corrupt log output
in real-world streaming scenarios, such as container logging and pipe-
based streams, which this module explicitly targets.
What do you see instead?
Instead of preserving valid UTF-8 output, the stream produces corrupted
results when a partial write splits a multi-byte character.
Specifically:
- For 3-byte UTF-8 characters (e.g. CJK), the character is silently
dropped from the output with no error.
- For 4-byte UTF-8 characters (e.g. emoji), the remaining buffer starts
with a lone UTF-16 surrogate, producing invalid UTF-8 on subsequent
writes.
No error or warning is emitted, resulting in silent data corruption in
the output stream.
Additional information
No response
Version
No response
Platform
Subsystem
No response
What steps will reproduce the bug?
Save the following script as
poc.jsand run it withnode poc.js.The script reproduces the exact logic from
lib/internal/streams/fast-utf8-stream.js(lines 896–906),specifically the
releaseWritingBuf()function.The script simulates partial
fs.write()behavior where the numberof bytes written splits a multi-byte UTF-8 character.
Observe the output:
remains in the output.
silently dropped.
This demonstrates incorrect string slicing caused by converting
byte counts to character counts via
.toString().length.How often does it reproduce? Is there a required condition?
It reproduces deterministically whenever
fs.write()(or an equivalentinternal write) returns a byte count that splits a multi-byte UTF-8
character.
The issue is not timing-dependent or race-based. The required condition
is a partial write that ends in the middle of a UTF-8 sequence.
This can occur when writing to pipes, sockets, or log streams under
backpressure (e.g. near-capacity pipes, Docker container logs, or heavy
I/O), which is a documented and expected behavior of
fs.write().What is the expected behavior? Why is that the expected behavior?
The output must always preserve valid UTF-8 and must not silently
corrupt data.
When a partial write ends in the middle of a multi-byte UTF-8 character,
the remaining bytes for that character should be preserved and written
in a subsequent write, rather than being dropped or converted into
replacement characters.
This is the expected behavior because:
fs.write()is documented to return partial byte counts.correctness and results in silent data corruption.
The current behavior breaks UTF-8 invariants and can corrupt log output
in real-world streaming scenarios, such as container logging and pipe-
based streams, which this module explicitly targets.
What do you see instead?
Instead of preserving valid UTF-8 output, the stream produces corrupted
results when a partial write splits a multi-byte character.
Specifically:
dropped from the output with no error.
with a lone UTF-16 surrogate, producing invalid UTF-8 on subsequent
writes.
No error or warning is emitted, resulting in silent data corruption in
the output stream.
Additional information
No response