I rewrote jepsen test to merge multiple micro ops into a single query (does UPSERT from SELECT with LEFT JOIN and GROUP BY) and started to get inconsistent read failures. It appears that multiple transactions claim the same list index and successfully overwrite it, even though only one transaction should have succeeded.
# append 630
{:index 300389, :time 240680658687, :type :ok, :process 1547, :f :txn,
:value [
[:append 192 630]
[:append 3 52]],
:modified-txn [
[:batch #{192 3} #jepsen.ydb.append.OperationBatch{:ops [[:append 192 630] [:append 3 52]], :reads [], :writes [[192 630] [3 52]], :readmap {}}]],
:debug-info [
[
{:tablet 72075186226226475, :gen 25, :read-version "v1710410837721/18446744073709551615", :op "read", :node 50003, :read-id 2, :snapshot-repeatable true, :lock-tx-id 844430958736899}
{:tablet 72075186226227048, :gen 2, :read-version "v1710410837721/18446744073709551615", :op "read", :node 50008, :read-id 1, :snapshot-repeatable true, :lock-tx-id 844430958736899}]
[
{:tablet 72075186226227048, :node 50008, :gen 2, :op "kqp-tx", :tx-id 281481004377038, :step 1710410838273, :version "v1710410838273/281481004377038"}
{:tablet 72075186226226475, :node 50003, :gen 25, :op "kqp-tx", :tx-id 281481004377038, :step 1710410838273, :version "v1710410838273/281481004377038"}]]}
# append 637
{:index 300437, :time 240689511468, :type :ok, :process 1159, :f :txn,
:value [
[:append 192 637]
[:append 196 45]
[:append 196 46]],
:modified-txn [
[:batch #{192 196} #jepsen.ydb.append.OperationBatch{:ops [[:append 192 637] [:append 196 45]], :reads [], :writes [[192 637] [196 45]], :readmap {}}]
[:commit nil [:append 196 46]]],
:debug-info [
[
{:tablet 72075186226227048, :gen 2, :read-version "v1710410838260/18446744073709551615", :op "read", :node 50008, :read-id 1, :snapshot-repeatable true, :lock-tx-id 562955991686003}
{:tablet 72075186226227048, :gen 2, :read-version "v1710410838260/18446744073709551615", :op "read", :node 50008, :read-id 1, :snapshot-repeatable true, :lock-tx-id 562955991686003}]
[
{:tablet 72075186226227048, :gen 2, :op "kqp-tx", :node 50008, :snapshot "v1710410838260/18446744073709551615", :tx-id 562955991686035, :snapshot-repeatable true, :lock-tx-id 562955991686035}
{:tablet 72075186226227048, :gen 2, :read-version "v1710410838260/18446744073709551615", :op "read", :node 50008, :read-id 0, :snapshot-repeatable true, :lock-tx-id 562955991686035}
{:tablet 72075186226227048, :node 50008, :gen 2, :op "kqp-tx", :tx-id 562955991686134, :version "v1710410838283/18446744073709551615"}]]}
I started seeing this after switching to complex query, so join and group by might be involved somehow.
I rewrote jepsen test to merge multiple micro ops into a single query (does UPSERT from SELECT with LEFT JOIN and GROUP BY) and started to get inconsistent read failures. It appears that multiple transactions claim the same list index and successfully overwrite it, even though only one transaction should have succeeded.
A good example of such failure:
History has this info on two appends:
What I could infer from this:
:append 192 630committed first atv1710410838273/281481004377038, it was a pretty straight forward batch, single query with a separate commit, which applied deferred effects directly at commit time:append 192 637was part of a batch query (without commit), it performed reads atv1710410838260/18446744073709551615with:lock-tx-id 562955991686003, deferred effects should have been buffered in memory[:append 196 46]with fused commit. Since it append reads from the table it needed to flush deferred effects, and you can see it happened with:tx-id 562955991686035and:lock-tx-id 562955991686035. Notice LockTxId is different (not 562955991686003) and matches TxId. This meansAcquireLocksTxIdwas 0, and this can only happen when transaction has absolutely no locks in the list.:read-idthat was not zero, apparently read was restarted for some reason, but current debug-info is only passed on success.I started seeing this after switching to complex query, so join and group by might be involved somehow.