how to troubleshoot when things go wrong with Spock replication .

Article will provide some useful functions and queries that can be use for troubleshooting Spock replication issues

There are different ways to understand the cause when pgEdge Spock replication stops working and get stuck due to an issue. The replication can stop working due to different reasons some of them are very obvious like the pgEdge subscription not setup properly or not enabled. In order understand the root cause of the issue, we can rely on several ways provided by Spock that can help understand the issue.

Spock Functions

Spock provides various functions that can be used to check status of subscription or replication status of a table.

The following Spock function provides basic information about the subscription

spock.sub_show_status() - show replication status

It can be used to check if the subscription has been setup properly and is enabled. Here is the documentation link for this function. https://docs.pgedge.com/spock_ext/using_spock/sub_mgmt#spocksub_show_status

The following Spock function provides information about the replication status of a table.

spock.sub_show_table()

Please see this link for details https://docs.pgedge.com/spock_ext/using_spock/sub_mgmt#spocksub_show_table the function provide synchronisation status of a table.

Spock Tables

It is important to check whether the replication set has been configured properly and the desired table have been added to the replication set. Spock provides a list of tables that can be used to get important information about the replication cluster.

The documentation link provides the list of all the tables that are provided by Spock to get cluster information. It contains tables like Spock.node that list the pgEdge nodes in the cluster or Spock.tables that contains one row per table and the replication set it belongs to.

The following documentation link provides list of Spock tables :

https://docs.pgedge.com/spock_ext/spock_info

They are some additional Spock tables that provide replication statistics. they are currently not covered in documentation, will be added in due coruse.

select * from spock.channel_summary_stats ;
subid | sub_name | n_tup_ins | n_tup_upd | n_tup_del | n_conflict | n_dca
------------+-----------------+-----------+-----------+-----------+------------+-------
4075287733 | node1_sub_node2 | 261 | 0 | 0 | 0 | 0
0 | <output> | 947 | 0 | 0 | 0 | 0
(2 rows)

select * from spock.channel_table_stats ;
subid | relid | sub_name | table_name | n_tup_ins | n_tup_upd | n_tup_del | n_conflict | n_dca
------------+-------+-----------------+-------------+-----------+-----------+-----------+------------+-------
4075287733 | 16564 | node1_sub_node2 | spock.queue | 262 | 0 | 0 | 0 | 0
0 | 16564 | <output> | spock.queue | 955 | 0 | 0 | 0 | 0
(2 rows)

PostgreSQL logs

It is really important to save PostgreSQL database server output in a log file for diagnostic purposes. Log output tends to grow in volume, it is best to rotate the log files so that new log files are started and old ones removed after a reasonable period of time.

If the replication issue is identifiable using the above approaches, it is best to start scanning then PostgreSQL log file for potential issues. Sometime the replication can stop when trying to add duplicate rows in a table, this results in an error and causes replication to stop. We can look at the log file for errors causing the replication process to halt, fix the error and the restart the replication process.