Skip to content

Server: Performance: Store update changes per-user, rather than per-item#14712

Open
personalizedrefrigerator wants to merge 159 commits intolaurent22:devfrom
personalizedrefrigerator:pr/server/denormalize-changes
Open

Server: Performance: Store update changes per-user, rather than per-item#14712
personalizedrefrigerator wants to merge 159 commits intolaurent22:devfrom
personalizedrefrigerator:pr/server/denormalize-changes

Conversation

@personalizedrefrigerator
Copy link
Copy Markdown
Collaborator

@personalizedrefrigerator personalizedrefrigerator commented Mar 11, 2026

Summary

This pull request should:

  1. Simplify a complicated database query.
  2. Improve the performance of delta queries in Joplin Server.

May resolve #13655.
Follow-up to #13862.

IMPORTANT: This pull request makes Update changes per-user. As such, in a share with 100 participants, a change to a single item results in 100 entries written to the changes_2 table on the server. Previously, just one entry would be written. This pull request will make changes_2 grow faster than it did before.

Problems

  • The current delta query depends on data from two different tables.
  • The current delta query is complicated.
  • The changes table is capped at just over two billion entries in PostgreSQL.
    • The changes table uses a serial for its autoincrementing counter, which is four bytes and has a maximum of 2147483647.
    • This may or may not be an issue, depending on server traffic.

Solution

  • Create a changes_2 table for new changes, stop updating changes.
    • New changes will be written to changes_2. changes is an archive of old, pre-migration changes.
    • changes contains changes in the old format. changes_2 contains changes in the new format.
    • See server_delta_sync.md for further details.
  • Use a bigserial instead of a serial for the changes_2 counter.
  • Make changes per-user in changes_2, rather than per-item.
    • This allows simplifying the delta query.
    • See server_delta_sync.md for further details.

Notes

  • Migration must happen on startup: This change assumes that the migration is done on startup. changes_2's counter is set to start just after the last change in changes. If new changes are written to changes after the migration, there may be issues with querying changes.
  • Reverting the migration will be slow for many new changes: Reverting the migration has been observed to be very slow for a large number of local changes. The migrate down step involves copying and reformatting changes from changes_2 to the changes table. If changes_2 has a large number of entries, this migrate down step will be slow.
  • New changes have type Changes2, while old changes have type Change. The database type generator currently uses sqlts for generating type names, which doesn't depluralize changes_2.
    • If desired, generateTypes.ts can be updated to use a different name (e.g. Change2) for the new change type.
  • ChangeModel.old and ChangeModel.new contain logic for interacting with the old and new changes tables, respectively. ChangeModel is responsible for bridging between the two.
    • ChangeModel.new and ChangeModel.old were copied and modified from the original ChangeModel logic. At present, some logic (e.g. the compressOldChanges implementation) is roughly the same in both.
  • ChangeModel.new's recordChange avoids using BaseModel.save and instead applies all changes at once. This fixes a performance issue in shares with hundreds of participants.
    • chart: data points labelled unoptimized form a line with a steep slope. Data points labelled optimized form a line with a much gentler slope

      In the above chart, "unoptimized" (BaseItem.save-based change recording) has a much steeper slope than "optimized" (bulk db().insert-based change recording). For example, in a share with 300 participants, it takes about 10 seconds to creating an item, write 10 update changes, then deleting it. However, if changes are saved with a bulk call to db().insert, the same process takes about 300 ms.

    • Even with the optimization, writing a change now involves a call to allShareUserIds, which will impact write performance.

Comparison with #13862

#13862 This pull request
All update changes are denormalized New update changes are denormalized
Migrates all changes from changes to changes_2. This was a slow and very complicated migration. This pull request keeps existing changes in changes. The migration is much faster, since changes_2 can be initially almost empty.

Performance

Some performance testing has been done by:

  1. Generating a large number of changes (a combination of the sync fuzzer and populateDatabase debug calls).
  2. Running the benchmarkDeltaPerformance debug command.
  3. Copying the output .csv files to a safe place.
  4. Migrating the database down, removing the migration file, and checking out dev.
  5. Rebuilding and re-running benchmarkDeltaPerformance.
  6. Comparing the results from steps 2 and 5.

Graphing the results from step 2 and step 5 results in:
chart: new format is under 5ms, old format often above

This chart suggests that the new changes_2-based approach is much faster. However, the chart in the "old format" case is generated using data from the "migrate down" logic. As a result, there are the same number of "Update" changes with the old and new format data. Realistically, there would be fewer "Update" changes with the old format.

Update: For more up-to-date benchmarks, see #14712 (comment).

Testing

Fuzzing

This pull request has been tested with the sync fuzzer. To test the old/new changes table split, the server was stopped and the migration was run while running the sync fuzzer. This results in some changes in changes and some in changes_2 for user accounts under test by the fuzzer.

Steps:

  1. Checking out the dev branch.
  2. Starting the PostgreSQL dev container and a server instance.
    • The server instance was started with DB_CLIENT=pg yarn start --env dev from packages/server.
  3. Starting the sync fuzzer with the PostgreSQL backend.
    • Command: DB_CLIENT=pg yarn syncFuzzer start
    • The fuzzer auto-connects to the running server instance, but will print an "address already in use" warning.
  4. Letting the sync fuzzer run for 16 steps.
  5. While the fuzzer is still running:
    1. Switching to the pr/server/denormalize-changes branch.
    2. Recompiling the server.
  6. At the end of step 16, restarting the server while the fuzzer is still running.
    • At this point, there are just under 1300 changes in changes.
  7. After about 50 steps, there are still 1283 entries in the original changes table and 909 entries in changes_2.
  8. After about 135 steps, there are 4457 entries in changes_2 (and still 1283 entries in changes).

Note

Most testing was done before 6f7d608. However, it has been verified that the sync fuzzer runs successfully from 646f0fb for >= 90 steps, and again from 36cdad9 that the fuzzer runs successfully for >= 150 steps.

This commit modifies `populateDatabase.ts` to:
- Be runnable from the api/debug (dev-mode-only) endpoint.
- Create shares, in addition to users and items.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
packages/server/src/migrations/planned/regroup_changes.ts (1)

184-184: Prefer template literal for string construction.

Based on learnings: In all TypeScript files across the repository, prefer template literals for string construction instead of string concatenation with +.

♻️ Suggested refactor
-	logger.info('Migrating', await getTotal('changes') + await getTotal('changes_2'), 'changes... start', offset);
+	const totalChanges = await getTotal('changes') + await getTotal('changes_2');
+	logger.info(`Migrating ${totalChanges} changes... start ${offset}`);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/server/src/migrations/planned/regroup_changes.ts` at line 184,
Replace the string concatenation in the logger.info call with a template
literal: update the logging expression that currently uses
logger.info('Migrating', await getTotal('changes') + await
getTotal('changes_2'), 'changes... start', offset) to a single template literal
string that interpolates the totals from await getTotal('changes'), await
getTotal('changes_2') and the offset so the message is constructed with
backticks and ${...} placeholders while still calling logger.info with the final
interpolated string.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/server/src/migrations/planned/regroup_changes.ts`:
- Around line 190-200: The down migration's lookup of lastChange may be
undefined when changes_2 is empty, causing lastChange.counter to throw; modify
the down function (around the lastChange retrieval) to check for a missing
result (e.g., if (!lastChange) return or otherwise handle it) before using
lastChange.counter so the subsequent insert from db('changes_3') only runs when
a valid counter exists; update the logic in the down function referencing
lastChange, db('changes_2'), and db('changes_3') accordingly.

---

Nitpick comments:
In `@packages/server/src/migrations/planned/regroup_changes.ts`:
- Line 184: Replace the string concatenation in the logger.info call with a
template literal: update the logging expression that currently uses
logger.info('Migrating', await getTotal('changes') + await
getTotal('changes_2'), 'changes... start', offset) to a single template literal
string that interpolates the totals from await getTotal('changes'), await
getTotal('changes_2') and the offset so the message is constructed with
backticks and ${...} placeholders while still calling logger.info with the final
interpolated string.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 51d58e67-d1de-4fb3-b79c-8faa660f35b2

📥 Commits

Reviewing files that changed from the base of the PR and between 7c2b1ec and 0f5bc89.

📒 Files selected for processing (16)
  • packages/server/src/migrations/20260310123600_split_changes.ts
  • packages/server/src/migrations/planned/regroup_changes.ts
  • packages/server/src/models/ChangeModel/BaseChangeModel.ts
  • packages/server/src/models/ChangeModel/ChangeModel.new.ts
  • packages/server/src/models/ChangeModel/ChangeModel.old.ts
  • packages/server/src/models/ChangeModel/ChangeModel.test.ts
  • packages/server/src/models/ChangeModel/ChangeModel.ts
  • packages/server/src/models/ItemModel.ts
  • packages/server/src/models/ShareModel.ts
  • packages/server/src/models/UserItemModel.ts
  • packages/server/src/models/factory.ts
  • packages/server/src/routes/api/items.ts
  • packages/server/src/routes/api/shares.folder.test.ts
  • packages/server/src/services/database/types.ts
  • packages/server/src/tools/generateTypes.ts
  • packages/tools/cspell/dictionary4.txt
✅ Files skipped from review due to trivial changes (5)
  • packages/server/src/tools/generateTypes.ts
  • packages/tools/cspell/dictionary4.txt
  • packages/server/src/routes/api/shares.folder.test.ts
  • packages/server/src/models/UserItemModel.ts
  • packages/server/src/routes/api/items.ts
🚧 Files skipped from review as they are similar to previous changes (3)
  • packages/server/src/models/ItemModel.ts
  • packages/server/src/services/database/types.ts
  • packages/server/src/migrations/20260310123600_split_changes.ts

@coderabbitai coderabbitai bot added api Joplin API sync sync related issue and removed database Database issues server Issues related to Joplin Server performance Performance issues labels Mar 27, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
packages/server/src/migrations/planned/regroup_changes.ts (1)

216-227: ⚠️ Potential issue | 🟡 Minor

Handle empty changes_2 table in down migration.

If changes_2 is empty, lastCounter retrieval via getLastPreMigrationCounter will return undefined.counter, causing a runtime error. This scenario could occur if the migration runs before any new changes are recorded.

🐛 Proposed fix
 export const down = async (db: DbConnection) => {
-	const lastCounter = await getLastPreMigrationCounter(db);
+	const lastChange = await db('changes_2')
+		.select('counter')
+		.orderBy('counter', 'desc')
+		.first();

-	await db('changes_2').insert(
-		db('changes_3')
-			.select('*')
-			.where('counter', '>', lastCounter)
-			.orderBy('counter', 'asc'),
-	);
+	if (lastChange) {
+		await db('changes_2').insert(
+			db('changes_3')
+				.select('*')
+				.where('counter', '>', lastChange.counter)
+				.orderBy('counter', 'asc'),
+		);
+	} else {
+		// If changes_2 was empty, copy all changes_3 records
+		await db('changes_2').insert(
+			db('changes_3')
+				.select('*')
+				.orderBy('counter', 'asc'),
+		);
+	}

 	await db.schema.dropTable('changes_3');
 };
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/server/src/migrations/planned/regroup_changes.ts` around lines 216 -
227, The down migration's use of getLastPreMigrationCounter() can return
undefined and cause a runtime error when accessing lastCounter; update the down
function to guard against an empty changes_2 by checking if lastCounter is
null/undefined before using it — if undefined, perform the insert from changes_3
without the .where('counter','>', lastCounter) filter (i.e., copy all rows),
otherwise use the existing where clause; reference the down function,
getLastPreMigrationCounter, and tables changes_2/changes_3 when making this
change.
🧹 Nitpick comments (1)
packages/server/src/migrations/planned/regroup_changes.ts (1)

200-200: Use template literal for string construction.

Per coding guidelines, prefer template literals over string concatenation.

🔧 Suggested fix
-	logger.info('Migrating', await getTotal('changes') + await getTotal('changes_2'), 'changes... start', offset);
+	const totalChanges = await getTotal('changes') + await getTotal('changes_2');
+	logger.info(`Migrating ${totalChanges} changes... start ${offset}`);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/server/src/migrations/planned/regroup_changes.ts` at line 200,
Replace the concatenated logger.info call with a single template literal string
so it uses template syntax instead of passing mixed args; locate the line that
calls logger.info in regroup_changes.ts (the call that references
getTotal('changes'), getTotal('changes_2') and offset) and construct one
template literal like `Migrating ${await getTotal('changes') + await
getTotal('changes_2')} changes... start ${offset}` before passing it to
logger.info.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@packages/server/src/migrations/planned/regroup_changes.ts`:
- Around line 216-227: The down migration's use of getLastPreMigrationCounter()
can return undefined and cause a runtime error when accessing lastCounter;
update the down function to guard against an empty changes_2 by checking if
lastCounter is null/undefined before using it — if undefined, perform the insert
from changes_3 without the .where('counter','>', lastCounter) filter (i.e., copy
all rows), otherwise use the existing where clause; reference the down function,
getLastPreMigrationCounter, and tables changes_2/changes_3 when making this
change.

---

Nitpick comments:
In `@packages/server/src/migrations/planned/regroup_changes.ts`:
- Line 200: Replace the concatenated logger.info call with a single template
literal string so it uses template syntax instead of passing mixed args; locate
the line that calls logger.info in regroup_changes.ts (the call that references
getTotal('changes'), getTotal('changes_2') and offset) and construct one
template literal like `Migrating ${await getTotal('changes') + await
getTotal('changes_2')} changes... start ${offset}` before passing it to
logger.info.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 24606cab-acb3-4618-8d14-90d5c674fec7

📥 Commits

Reviewing files that changed from the base of the PR and between 0f5bc89 and 4c99b6e.

📒 Files selected for processing (5)
  • packages/server/src/migrations/20260310123600_split_changes.ts
  • packages/server/src/migrations/planned/regroup_changes.test.ts
  • packages/server/src/migrations/planned/regroup_changes.ts
  • packages/server/src/models/ChangeModel/ChangeModel.old.ts
  • packages/server/src/models/ChangeModel/ChangeModel.test.ts

@coderabbitai coderabbitai bot added database Database issues server Issues related to Joplin Server performance Performance issues and removed api Joplin API sync sync related issue labels Mar 30, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/server/src/models/ChangeModel/ChangeModel.new.ts`:
- Around line 30-59: The method changesForUserQuery currently always declares a
return type of Promise<Changes2[]> but returns a count shape when doCountQuery
is true; update the function signature to use TypeScript overloads so one
overload returns Promise<{ total: number }[]> when doCountQuery is true and the
other returns Promise<Changes2[]> when doCountQuery is false, implement a single
implementation matching both overloads, and mirror these overloads in the
abstract declaration on BaseChangeModel so all subclasses (including
ChangeModel) have accurate type contracts; ensure call sites compile and that
the runtime still returns the count rows when doCountQuery is true and Changes2
rows otherwise.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 5f6fe62d-28fc-44d1-9d08-695f43b06745

📥 Commits

Reviewing files that changed from the base of the PR and between 4c99b6e and 5927a5e.

📒 Files selected for processing (1)
  • packages/server/src/models/ChangeModel/ChangeModel.new.ts

@coderabbitai coderabbitai bot added sync sync related issue and removed database Database issues server Issues related to Joplin Server performance Performance issues labels Apr 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

sync sync related issue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Optimise changes table query speed

2 participants