Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Downgrade may cause undefined behavior #7515

Open
Green-Chan opened this issue Feb 16, 2024 · 0 comments
Open

Downgrade may cause undefined behavior #7515

Green-Chan opened this issue Feb 16, 2024 · 0 comments

Comments

@Green-Chan
Copy link
Contributor

The problem comes from the fact that when a column is dropped, it's not actually removed from the database as if it never were there. Actually, it's marked dropped and renamed to "something that isn't likely to conflict". See RemoveAttributeById function in postgres.

See an example:

postgres=# CREATE TABLE t (a int);
CREATE TABLE
postgres=# ALTER TABLE t ADD COLUMN b int;
ALTER TABLE
postgres=# SELECT attname FROM pg_attribute WHERE attrelid = 't'::regclass AND attnum > 0;
 attname 
---------
 a
 b
(2 rows)

postgres=# ALTER TABLE t DROP COLUMN b;
ALTER TABLE
postgres=# SELECT attname FROM pg_attribute WHERE attrelid = 't'::regclass AND attnum > 0;
           attname            
------------------------------
 a
 ........pg.dropped.2........
(2 rows)

postgres=# ALTER TABLE t ADD COLUMN b int;
ALTER TABLE
postgres=# SELECT attname FROM pg_attribute WHERE attrelid = 't'::regclass AND attnum > 0;
           attname            
------------------------------
 a
 ........pg.dropped.2........
 b
(3 rows)

It seems that there is no official way to get rid of this column completely. See https://www.postgresql.org/message-id/flat/86FE93ED-B4E9-4907-8E5C-FB8A566115E7%40icloud.com

Deleted column is no longer seen when querying the table by usual SQL queries, but it's still present in pg_attribute and it should be handled right when dealing with tuples inside of the postgres.

I found six "DROP COLUMN" operations in downgrade scripts:

src/backend/columnar/sql/downgrades/columnar--10.2-1--10.1-1.sql:ALTER TABLE columnar.stripe DROP COLUMN first_row_number;
src/backend/distributed/sql/downgrades/citus--10.1-1--10.0-4.sql:ALTER TABLE pg_catalog.pg_dist_rebalance_strategy DROP COLUMN improvement_threshold;
src/backend/distributed/sql/downgrades/citus--11.3-1--11.2-2.sql:ALTER TABLE pg_catalog.pg_dist_background_task DROP COLUMN nodes_involved;
src/backend/distributed/sql/downgrades/citus--11.0-1--10.2-4.sql:ALTER TABLE pg_catalog.pg_dist_partition DROP COLUMN autoconverted;
src/backend/distributed/sql/downgrades/citus--11.0-1--10.2-4.sql:ALTER TABLE citus.pg_dist_object DROP COLUMN force_delegation;
src/backend/distributed/sql/downgrades/citus--12.2-1--12.1-1.sql:ALTER TABLE pg_catalog.pg_dist_transaction DROP COLUMN outer_xid;

Some of these table (maybe all of them) are treated as if they have fixed ('#define'd) number of columns and/or each column has a fixed place (index) in tuple. But that's not true if a column was dropped by downgrade script and then added back by upgrade script (see how in example above column 'b' became a third column).

Here is one of the examples of where undefined behavior is present. Add the following assert:

diff --git a/src/backend/distributed/metadata/metadata_cache.c b/src/backend/distributed/metadata/metadata_cache.c
index 402dedb8a..0315fd11c 100644
--- a/src/backend/distributed/metadata/metadata_cache.c
+++ b/src/backend/distributed/metadata/metadata_cache.c
@@ -1733,6 +1733,8 @@ BuildCitusTableCacheEntry(Oid relationId)
        bool isNullArray[Natts_pg_dist_partition];
 
        TupleDesc tupleDescriptor = RelationGetDescr(pgDistPartition);
+
+       Assert(tupleDescriptor->natts == Natts_pg_dist_partition);
        heap_deform_tuple(distPartitionTuple, tupleDescriptor, datumArray, isNullArray);
 
        CitusTableCacheEntry *cacheEntry =

And then run

SET citus.enable_version_checks TO 'false';
SET columnar.enable_version_checks TO 'false';

CREATE EXTENSION citus;
ALTER EXTENSION citus UPDATE TO '10.2-5';
ALTER EXTENSION citus UPDATE TO '12.1-1';

CREATE TABLE t (a int, b int);
SELECT create_distributed_table('t', 'a', 'hash');

You'll get an assertion failure, because tupleDescriptor->natts is one more than Natts_pg_dist_partition. If that is the case, there will be undefined behavior in heap_deform_tuple(), as it uses indexes in range [0; tupleDescriptor->natts) while valid indexes for datumArray[] and isNullArray[] are in range [0; Natts_pg_dist_partition).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant