Timestamps in column-level conflict resolution v5
Column-level conflict resolution depends on a timestamp column being included in the table.
Comparing column_modify_timestamp
and column_commit_timestamp
When you select one of the two column-level conflict detection methods, a column is added to the table that contains a mapping of modified columns and timestamps.
The column that stores timestamp mapping is managed automatically. Don't specify or override the value in your queries, as the results can be unpredictable. When possible, user attempts to override the value are ignored.
When enabling or disabling column timestamps on a table, the code uses DDL locking to ensure that there are no pending changes from before the switch. This approach ensures only conflicts with timestamps in both tuples or in neither of them are seen. Otherwise, the code might unexpectedly see timestamps in the local tuple and NULL in the remote one. It also ensures that the changes are resolved the same way (column-level or row-level) on all nodes.
column_modify_timestamp
When column_modify_timestamp
is selected as the conflict detection method, the timestamp assigned to the modified columns is the current timestamp, similar to the value you might get running select_clock_timestamp()
.
This approach is simple and, for many cases, it's correct, for example, when the conflicting rows modify non-overlapping subsets of columns. Its simplicity can, though, lead to unexpected effects.
For example, if an UPDATE
affects multiple rows, the clock continues ticking while the UPDATE
runs. So each row gets a slightly different timestamp, even if they're being modified concurrently by the one UPDATE
. This behavior, in turn, means that the effects of concurrent changes might get "mixed" in various ways, depending on how the changes performed on different nodes interleaves.
Another possible issue is clock skew. When the clocks on different nodes drift, the timestamps generated by those nodes also drift. This clock skew can induce unexpected behavior such as newer changes being discarded because the timestamps are apparently switched around. However, you can manage clock skew between nodes using the parameters bdr.maximum_clock_skew and bdr.maximum_clock_skew_action.
As the current timestamp is unrelated to the commit timestamp, using it to resolve conflicts means that the result isn't equivalent to the commit order, which means it probably can't be serialized.
When using current timestamps to order changes or commits, the conflicting changes might have exactly the same timestamp because two or more nodes happened to generate the same timestamp. This risk isn't unique to column-level conflict resolution, as it can happen even for regular row-level conflict resolution. The node id is used as the tiebreaker in this situation. The higher node id wins. This approach ensures that the same changes are applied on all nodes.
column_commit_timestamp
You can also use the actual commit timestamp specified with column_commit_timestamp
as the conflict detection method. This approach has the advantage of using the commit time, which is the same for all changes made in an UPDATE
.
Note
Statement transactions might be added in the future, which will address issues with mixing effects of concurrent statements or transactions. Still, neither of these options can ever produce results equivalent to commit order.
Inspecting column timestamps
The column storing timestamps for modified columns is maintained by triggers. Don't modify it directly. It can be useful to inspect the current timestamp's value, for example, while investigating how a conflict was resolved.
Note
The timestamp mapping is maintained by triggers, and the order in which triggers execute matters. If your custom triggers modify tuples and are executed after the pgl_clcd_ triggers, the modified columns aren't detected correctly. This can lead to incorrect conflict resolution. If you need to modify tuples in your triggers, make sure they're executed before the pgl_clcd_ triggers.
The following functions are useful for inspecting timestamps.
bdr.column_timestamps_to_text(bdr.column_timestamps)
This function returns a human-readable representation of the timestamp mapping and is used when casting the value to text:
bdr.column_timestamps_to_jsonb(bdr.column_timestamps)
This function turns a JSONB representation of the timestamps mapping and is used when casting the value to jsonb:
bdr.column_timestamps_resolve(bdr.column_timestamps, xid)
This function updates the mapping with the commit timestamp for the attributes modified by the most recent transaction if it already committed. This matters only when using the commit timestamp. For example, in this case, the last transaction updated the second attribute (with attnum = 2):