Short hash explained

When SnowDDL compares desired config vs. actual metadata in Snowflake, it normally checks every property of every object.

But in some cases it is not practical due to:

lack of all object properties in output of SHOW and DESC commands;
a large number of properties and frequent changes in format of such properties;

SnowDDL implements a simplified method to compare objects of the following types:

EXTERNAL_FUNCTION
EXTERNAL_TABLE
FILE_FORMAT
FUNCTION
PIPE
PROCEDURE
STAGE
TASK

How does short hash work

When object is created for the first time, SnowDDL builds CREATE OR REPLACE <object> SQL and executes it.
Right after that SnowDDL calculates a short hash based on SQL text and stores it at the end of comment of created object. The original comment is preserved, but it is now a bit longer.
Later on, when existing object is "compared" by SnowDDL, it builds CREATE <object> SQL again, calculates a new short hash and tries to match it with existing short hash stored in object comment earlier. Unlike other properties, all SHOW commands always return comment.
If calculated short hash from SQL and existing short hash from comment are the same, SnowDDL assumes that object remains the same and returns NOCHANGE result.
If calculated short hash from SQL and existing short hash are different, SnowDDL assumes that at least one property was changed, and object is fully re-created.

How short hash is calculated

    import base64
    import hashlib
    
    def _short_hash(self):
        sha1_digest = sha1(str(self).encode('UTF-8')).digest()
        return f"#{urlsafe_b64encode(sha1_digest[:12])}"

Short hash is the first 12 bytes (out of 20) from SHA1, additionally encoded with url-safe base64 encode. It is a good balance between length and quality of hash.

The collisions are possible, but are extremely unlikely in real-world scenarios.

If Snowflake makes SHOW commands better in future, it will be possible to replace short hashes with full property checks, but it is not practical at this moment.

PreviousDependency management NextEnv Prefix explained

Last updated 3 years ago