Offline: Add batchId to batch metadata on resubmit #21767

markfields · 2024-07-03T03:59:14Z

Description

Batch ID is the key piece of information needed to detect/prevent duplication of changes due to a forked container.

We compute Batch ID as a tuple of original client ID and original Client Sequence number (of the batch start). We only stamp the batchId on batch metadata (of first message in the batch) on resubmit - if a batch is missing batchId we will be able to compute it from the client ID and Client Sequence Number.

The next PR will be more interesting - comparing batch IDs in PendingStateManager as incoming messages are processed. This just sets it up by adding batch ID to batch metadata on resubmit.

Breaking Changes

This PR breaks the format of the stashed pending local state. Back-compat support for reading old-format stashes is included. New format is backwards compatible (additive). This is slightly superfluous anyway since there is no production usage of the stashed schema.

Reviewer Guidance

For the end-to-end flow, see prototype PR #21266

Here are some tests I considered in my test plan but I am not planning to write at this point (i.e. scenarios without direct test coverage). Many of these will be covered in the next PR where we have end-to-end coverage of the forked container scenarios.

PendingStateManager
- Load from old stashed format. batchIdContext !== undefined
- Load twice from old stashed format. Batches are not correlated and duplication occurs
- Flushing two batches when disconnected should yield unique batch IDs
- Resubmitting twice preserves existing batchId
- Resubmitting after rehydrate preserves stashed batchIdContext or batchId
end-to-end
- batchId is present on sequenced messages being processed when expected

PendingStateManager is mostly done

tyler-cai-microsoft · 2024-07-09T18:07:58Z

packages/runtime/container-runtime/src/containerRuntime.ts

+		// Only include Batch ID if "Offline Load" feature is enabled
+		// It's only needed to identify batches across container forks arising from misuse of offline load.
+		const includeBatchId =
+			this.mc.config.getBoolean("Fluid.Container.enableOfflineLoad") ?? false;


It's interesting that we're reading this flag from two different places. I don't know if that's a good or bad thing. And it always makes me worry about loader-runtime back compat with older drivers and how they might behave.

Two different places in this PR? I thought it was just here.

This flag is read in multiple places and is the single gate for all Offline related scenarios. Good question about loader/runtime... Yeah... I see what you mean, the other hit is in Loader layer. I'll follow up, but leaving here for now since none of this has shipped yet.

tyler-cai-microsoft · 2024-07-09T18:10:20Z

packages/runtime/container-runtime/src/metadata.ts

 export function asBatchMetadata(metadata: unknown): IBatchMetadata | undefined {
-	return isBatchMetadata(metadata) ? metadata : undefined;
+	return metadata as IBatchMetadata | undefined;


For some reason, I favored the older solution as it seemed a little safer, but I don't know if this is important.

Yeah I had just added that type guard, but on second though didn't think the runtime checks were that interesting. The only risk is if two different parts of code used the same key batch or batchId and stored data with a different type and different semantics. That's not going to happen though... So let's not waste time with the runtime checks.

Since all IBatchMetadata props are optional, and this thing indicates it could be undefined, callers have to handle those cases.

Maybe it's better to return Partial<IBatchMetadata> | undefined to emphasize the uncertainty (basically just want callers to be able to get intellisense on valid property names + expected types)

msfluid-bot · 2024-07-09T18:31:23Z

⯅ @fluid-example/bundle-size-tests: +2.55 KB

Metric Name	Baseline Size	Compare Size	Size Diff
aqueduct.js	456.5 KB	457.11 KB	⯅ +625 Bytes
azureClient.js	554.23 KB	554.86 KB	⯅ +639 Bytes
connectionState.js	680 Bytes	680 Bytes	■ No change
containerRuntime.js	257.71 KB	258.3 KB	⯅ +606 Bytes
fluidFramework.js	388.04 KB	388.06 KB	⯅ +14 Bytes
loader.js	134.1 KB	134.11 KB	⯅ +14 Bytes
map.js	42.17 KB	42.17 KB	⯅ +7 Bytes
matrix.js	145.43 KB	145.44 KB	⯅ +7 Bytes
odspClient.js	522.23 KB	522.86 KB	⯅ +639 Bytes
odspDriver.js	97.17 KB	97.19 KB	⯅ +21 Bytes
odspPrefetchSnapshot.js	42.27 KB	42.29 KB	⯅ +14 Bytes
sharedString.js	162.51 KB	162.52 KB	⯅ +7 Bytes
sharedTree.js	378.51 KB	378.51 KB	⯅ +7 Bytes
Total Size	3.26 MB	3.26 MB	⯅ +2.55 KB

Baseline commit: 0450953

Generated by 🚫 dangerJS against 13d315d

packages/runtime/container-runtime/src/opLifecycle/batchManager.ts

From #21767

markfields added 3 commits July 3, 2024 00:17

Start adding batchId

90f1cc7

PendingStateManager is mostly done

Plumb batchId around if present

f50f63e

Only write batchId to op metadata if OfflineLoad is enabled

c406d9d

github-actions bot added area: runtime Runtime related issues base: main PRs targeted against main branch labels Jul 3, 2024

markfields added 5 commits July 3, 2024 08:00

Deal with most //* comments

000e8cb

Fix existing UTs

e38b0a3

Working on outbox UTs

4c5dcfa

Basic UT passes

143bd31

Move config check, and some cleanup

33adcb2

markfields force-pushed the o3/batch-id branch from 2c1acbb to 33adcb2 Compare July 8, 2024 17:03

markfields added 8 commits July 8, 2024 17:07

Revert unneeded changes

93c50fa

BatchManager tests

6b62404

Merge remote-tracking branch 'origin/main' into o3/batch-id

5ef9da6

Remove lint disable

db69243

Revert unneeded change

fa7c4da

Finish new outbox unit test

9fd0b04

Restore existing test

08f7ea8

Working through corner cases and tests

0727727

markfields marked this pull request as ready for review July 9, 2024 06:24

Fix assert

13d315d

markfields requested review from anthony-murphy, kian-thompson, dannimad, vladsud, a team, pragya91, agarwal-navin and jatgarg and removed request for a team July 9, 2024 17:48

markfields requested review from tyler-cai-microsoft and rajatch-ff July 9, 2024 17:48

tyler-cai-microsoft reviewed Jul 9, 2024

View reviewed changes

dannimad reviewed Jul 10, 2024

View reviewed changes

packages/runtime/container-runtime/src/opLifecycle/batchManager.ts Show resolved Hide resolved

dannimad approved these changes Jul 10, 2024

View reviewed changes

markfields merged commit adf81de into microsoft:main Jul 10, 2024
36 checks passed

markfields deleted the o3/batch-id branch July 10, 2024 23:45

This was referenced Jul 10, 2024

Offline: Compare incoming Batch ID with pending local Batch ID to detect container forking #21837

Draft

build fix (from stale CI) #21843

Merged

markfields added a commit that referenced this pull request Jul 11, 2024

build fix (from stale CI) (#21843)

65e6a0b

From #21767

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Offline: Add batchId to batch metadata on resubmit #21767

Offline: Add batchId to batch metadata on resubmit #21767

markfields commented Jul 3, 2024 •

edited

Loading

tyler-cai-microsoft Jul 9, 2024

markfields Jul 9, 2024

tyler-cai-microsoft Jul 9, 2024

markfields Jul 9, 2024

msfluid-bot commented Jul 9, 2024

Offline: Add batchId to batch metadata on resubmit #21767

Offline: Add batchId to batch metadata on resubmit #21767

Conversation

markfields commented Jul 3, 2024 • edited Loading

Description

Breaking Changes

Reviewer Guidance

tyler-cai-microsoft Jul 9, 2024

Choose a reason for hiding this comment

markfields Jul 9, 2024

Choose a reason for hiding this comment

tyler-cai-microsoft Jul 9, 2024

Choose a reason for hiding this comment

markfields Jul 9, 2024

Choose a reason for hiding this comment

msfluid-bot commented Jul 9, 2024

markfields commented Jul 3, 2024 •

edited

Loading