Database schema assessment for pgEdge Platform compatibility

pgEdge Team performs a schema assessment of the Databases and schemas that would be part of Multi-master replication using pgEdge Platform.

Transitioning from a single Primary PostgreSQL server to a Multi-master pgEdge platform involves a significant shift in how data is managed and distributed across your database infrastructure. A crucial step in this transition is performing a comprehensive schema assessment. This assessment ensures that your database schema is compatible with and optimized for the distributed nature of a multi-master environment. Factors such as conflict resolution strategies, data type compatibility, indexing, and partitioning require careful consideration to leverage the full potential of a multi-master setup. 

Moreover, assessing your schema helps in identifying potential challenges and opportunities for optimization, ensuring a smooth and efficient migration to the pgEdge platform. This preparatory step is essential for organizations aiming to scale their database systems, improve availability, and enhance data resilience without compromising on consistency and performance.

pgEdge Team performs a schema assessment of the Databases and schemas that would be part of Multi-master replication using pgEdge Platform.

Prerequisite: We request the Customer Database team to send us the "schema-only" dump of the database in plain SQL format.

The main assessment areas we look at are:

  • a) pgEdge Platform Compatibility with current schema
  • b) Data Types and Sizes
  • c) Indexing
  • d) Database Extensions
  • e) Database Sequences

Following are the SQLs we use to perform the above steps:

Find Missing Indexes

CREATE OR REPLACE FUNCTION public.missing_fk_indexes () 
returns table (
  referencing_table regclass,
  fk_columns        varchar,
  table_size        varchar,
  fk_constraint     name,
  referenced_table  regclass
)
language sql as $$
  select
    -- referencing table having ta foreign key declaration
    tc.conrelid::regclass as referencing_table,
    
    -- ordered list of foreign key columns
    string_agg(ta.attname, ', ' order by tx.n) as fk_columns,
    
    -- referencing table size
    pg_catalog.pg_size_pretty (
      pg_catalog.pg_relation_size(tc.conrelid)
    ) as table_size,
    
    -- name of the foreign key constraint
    tc.conname as fk_constraint,
    
    -- name of the target or destination table
    tc.confrelid::regclass as referenced_table
    
  from pg_catalog.pg_constraint tc
  
  -- enumerated key column numbers per foreign key
  cross join lateral unnest(tc.conkey) with ordinality as tx(attnum, n)
  
  -- name for each key column
  join pg_catalog.pg_attribute ta on ta.attnum = tx.attnum and ta.attrelid = tc.conrelid
  
  where not exists (
    -- is there ta matching index for the constraint?
    select 1 from pg_catalog.pg_index i
    where 
      i.indrelid = tc.conrelid and 
      -- the first index columns must be the same as the key columns, but order doesn't matter
      (i.indkey::smallint[])[0:cardinality(tc.conkey)-1] @> tc.conkey) and 
      tc.contype = 'f'
    group by 
      tc.conrelid, 
      tc.conname, 
      tc.confrelid
    order by 
      pg_catalog.pg_relation_size(tc.conrelid) desc
$$;

select * from missing_fk_indexes();

Find data types used in the database

SELECT data_type, count(*) 
FROM information_schema.columns 
WHERE table_schema NOT IN ('information_schema','pg_catalog','pg_toast')
GROUP BY data_type ;

Find tables without PRIMARY KEY

select tab.table_schema,
       tab.table_name
from information_schema.tables tab
left join information_schema.table_constraints tco 
          on tab.table_schema = tco.table_schema
          and tab.table_name = tco.table_name 
          and tco.constraint_type = 'PRIMARY KEY'
where tab.table_type = 'BASE TABLE'
      and tab.table_schema not in ('pg_catalog', 'information_schema')
      and tco.constraint_name is null
order by table_schema,
         table_name;

Find SEQUENCES
SELECT t.oid::regclass AS table_name,
       a.attname AS column_name,
       s.relname AS sequence_name
FROM pg_class AS t
   JOIN pg_attribute AS a
      ON a.attrelid = t.oid
   JOIN pg_depend AS d
      ON d.refobjid = t.oid
         AND d.refobjsubid = a.attnum
   JOIN pg_class AS s
      ON s.oid = d.objid
WHERE d.classid = 'pg_catalog.pg_class'::regclass
  AND d.refclassid = 'pg_catalog.pg_class'::regclass
  AND d.deptype IN ('i', 'a')
  AND t.relkind IN ('r', 'P')
  AND s.relkind = 'S';

Database Sequences:
When the database has Sequences created, we recommend using snowflake sequences for the Multi-master replication. You can read more about the snowflake sequences in our Knowledge base article: https://knowledgebase.pgedge.com/getting-to-know-snowflake-extension