-
Notifications
You must be signed in to change notification settings - Fork 522
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SharedMatrix undefined cell values #18082
Comments
A node project to reproduce this issue can be found here https://github.com/SampoSyrjanen/shared-matrix-undefined-cell-test |
Hello! Thank you for the reproduction. I believe there are two issues going on here:
FYI mostly for awareness: I had to tweak the logic around |
@Abe27342, with turn-based flushing enabled, did you repro it using 2.0 bits or 1.0 bits? |
It repros on both, I ported the repro to our latest main branch and the behavior is basically the same AFAICT; maybe likelihood of occurrence is a bit different and I didn't notice but the same sort of behavior shows up. Agreed with trying to focus on 2.0. It's pretty clear with latest main bits and FlushMode switched to TurnBased that the remaining issues here are in the DDS realm. |
@DLehenbauer - any chance you can take a look? I have not look at details, but it feels like we should start with assumption that the bug is on SharedMatrix side. That said, it could be a bug in overall op processing pipeline (or even a service implementation). |
I'm already investigating the SharedMatrix side. But is it not true that even with SharedMatrix issues entirely fixed, if one has a client with while (true) {
matrix.insertRow(0, 1); // insert row at start
matrix.setCell(0, 1, "value"); // populate that row with a value
await sleep(1000);
} while another client observes, there's nothing stopping the observing client's op processing from yielding to application logic while in between a row insert and a cell set? This is the crux of my point that we'll need turn-based flushing for the repro provided here to work as expected (i.e. not observe 'partially applied states'). |
…#19211) ## Description Fixes a matrix resubmission bug, where zamboni might clean up permutation segments which in-flight ops might need to refer to. In general, using the minimum sequence number of the most recently processed message in isolation to determine the collab window for a DDS is not correct, since if that DDS has any in-flight ops and they get resubmitted, they will need to be rebased from their refSeq to the last processed deltaManager seq at the time of rebasing. This bug seems to generally apply to DDSes which clean up collab-window focused state. AB#6866 tracks fixing other DDSes / unifying the mechanism used here. While investigating this bug, I also found a bug related to the double-reconnect scenario after adding matrix fuzz tests. This bug was relatively easy to fix: the localOpMetadata used for a resubmitted op needs to update its refSeq to be the current seq rather than reuse the refSeq of the original submission time. This resolves SharedMatrix-side issues for #18082. AB#6356 --------- Co-authored-by: Abram Sanderson <absander@microsoft.com> Co-authored-by: Connor Skees <39542938+connorskees@users.noreply.github.com>
Hi, quick update here: #19211 fixes the SharedMatrix-side issue for this bug. This fixes bugs where row values may appear undefined indefinitely (the bug was an eventual consistency issue). It's still true that our azure APIs have FlushMode.Immediate set, and as long as that's true it will be the case that bits of your matrix may be temporarily undefined in the test case. With the example above which looks similar to the test repro: while (true) {
matrix.insertRow(0, 1); // insert row at start
matrix.setCell(0, 1, "value"); // populate that row with a value
await sleep(1000);
} a client may yield between the insertRow op and the setCell op, in which case that cell will be undefined until the setCell op is sequenced and processed. @andre4i could you follow-up here once we have more details on our plan for FlushMode and azure client? |
This is addressed by compatibility plans around the 2.0 release. After an application ecosystem saturates Fluid 2.0, they can pass a flag which will make the container use FlushMode.Immediate (among other improvements). See the changeset on #20997 for more details. |
SharedMatrix from "@fluidframework/matrix" sometimes has temporary undefined cell values. This happens in a highly parallel situation, when a client reads all values of a row of a shared matrix.
To reproduce this, you can create a bunch a clients in parallel processes, connect them to the same container, and start adding columns to a row from each client simultaneously. At the same time, clients can periodically read all column values of the row and check if they got an undefined value.
The expected behavior would be that when the values of the matrix row are read the values would never be undefined.
The text was updated successfully, but these errors were encountered: