Iceberg Tables

At this moment SnowDDL supports UNMANAGED Iceberg tables using external catalog only.

If you are looking for MANAGED Iceberg tables support, please leave a comment in this thread describing the use case: https://github.com/littleK0i/SnowDDL/discussions/81

How to create unmanaged Iceberg table with SnowDDL

Step 1: Create permission model which includes permissions for Iceberg tables

Example of custom permission model:

iceberg:
  inherit_from: default
  owner_create_grants:
    - ICEBERG_TABLE
  owner_future_grants:
    ICEBERG_TABLE: [OWNERSHIP]
  write_future_grants:
    ICEBERG_TABLE: [INSERT, UPDATE, DELETE, TRUNCATE]
  read_future_grants:
    ICEBERG_TABLE: [SELECT, REFERENCES]

Iceberg tables are not included in default permission model, since adding it to all schemas seems to introduce noticeable additional overhead during schema role creation.

Step 2: Create external volume

Similar to integration objects, EXTERNAL VOLUME should be created by ACCOUNTADMIN.

Documentation: https://docs.snowflake.com/en/sql-reference/sql/create-external-volume

Example:

SET STORAGE_BASE_URL = 's3://my-bucket/iceberg_glue/';
SET STORAGE_ROLE_ARN = 'arn:aws:iam::123:role/snowflake_role';
SET STORAGE_EXTERNAL_ID = '...';

CREATE OR REPLACE EXTERNAL VOLUME TEST_EXTERNAL_VOLUME_GLUE
STORAGE_LOCATIONS =
(
    (
        NAME = 'iceberg_glue'
        STORAGE_PROVIDER = 'S3'
        STORAGE_BASE_URL = $STORAGE_BASE_URL
        STORAGE_AWS_ROLE_ARN = $STORAGE_ROLE_ARN
        STORAGE_AWS_EXTERNAL_ID = $STORAGE_EXTERNAL_ID
    )
)
ALLOW_WRITES = FALSE;

Step 3: Create catalog

Similar to integration objects, CATALOG should be created by ACCOUNTADMIN.

Documentation: https://docs.snowflake.com/en/sql-reference/sql/create-catalog-integration

Example:

SET GLUE_CATALOG_NAMESPACE = 'iceberg_glue'
SET GLUE_ROLE_ARN = 'arn:aws:iam::123:role/snowflake_glue';
SET GLUE_CATALOG_ID = '123';
SET GLUE_REGION = 'us-east-1';

CREATE OR REPLACE CATALOG INTEGRATION TEST_CATALOG_GLUE
CATALOG_SOURCE = GLUE
CATALOG_NAMESPACE = $GLUE_CATALOG_NAMESPACE
TABLE_FORMAT = ICEBERG
GLUE_AWS_ROLE_ARN = $GLUE_ROLE_ARN
GLUE_CATALOG_ID = $GLUE_CATALOG_ID
GLUE_REGION = $GLUE_REGION
ENABLED = TRUE;

Step 4: Create schema with references to EXTERNAL VOLUME and CATALOG

Add parameters to config of schema which is supposed to contain Iceberg tables:

permission_model: iceberg
external_volume: text_external_volume_iceberg
catalog: test_catalog_glue

At this moment one schema may contain tables from one EXTERNAL VOLUME and one CATALOG only. If you have more external volumes and catalogs, please create more schemas.

Step 5: Create Iceberg tables

Create configs for individual Iceberg tables.

Example 1:

catalog_table_name: test_iceberg_table_1

Example 2:

metadata_file_path: test_iceberg_table_1/metadata/00001-cc112050-1448-4c2a-9e03-504e7f5fc62a.metadata.json
replace_invalid_characters: true

Rationale: Why is it not possible to specify EXTERNAL VOLUME and CATALOG individually for each table?

SnowDDL has built-in role hierarchy. It includes special "owner" roles which are created automatically for each schema. Objects in schema are supposed to be "owned" by schema owner role.

By specifying EXTERNAL VOLUME and CATALOG on schema level, we achieve two goals:

  1. Usage on EXTERNAL VOLUME and CATALOG are granted to schema owner role automatically.

  2. Each schema contains objects from one EXTERNAL VOLUME and one CATALOG, which helps to keep things clear and prevents mixture of various Iceberg table sub-types within one schema.

Last updated