Skip to Content
IMGE DigitalWhitepaperOctober 24, 2025
4 min read

How to Use Snowflake's Snowpark to Load and Unzip Files

A guide to unzipping data in external stages using Snowpark's FileOperation methods

How to Use Snowflake's Snowpark to Load and Unzip Files

At IMGE, we work with many different Marketing platforms and use Snowflake as our primary data warehouse. In several cases we have encountered, data providers and other companies we work with share data in our cloud buckets as zip folders as their standard practice. Because Snowflake does not support loading zipped directories this data is tedious to load. Searching online did not reveal many solutions for unzipping data in external stages. The closest result would be this guide in Snowflake's documentation about unzipping files, but this is only setup to work in internal stages and uses imports on the function definition which limits the usability of it for us.

For a while we bite the bullet and did what most people would probably do, set up a Lambda function in AWS to unzip files, but since we use Cloudflare's R2 as our primary cloud buckets, this created needless copying of the files back and forth between R2 and S3 just to load them into Snowflake. This is not necessarily something novel that we discovered, but just a feature of Snowpark that we think has gone underdiscussed in the documantation and online when dealing with non tabular data in Snowflake.

Snowpark's FileOperation for Reading Files in External Stages

Eventually, we discovered that SnowPark had a simple way itself of loading any kind of file from an external stage itself, meaning that we could then use Python's ZipFile module on the loaded file itself to unzip the file directly from our R2 external stage. Snowpark's file operations provide several methods for just that.

Creating the Unzipper Stored Procedure

By calling session.file.get in a Python SPROC, we are able to get the zip_file_path file from our external stage copied to a temporary directory ('/tmp'), making the zip file available using standard python file operations.

session.file.get('@external_stage_name/zip_file_path', '/tmp')

Now that we actually have the zip file accessible to our session, we can use the follow code to unzip it.

with zipfile.ZipFile(f"/tmp/{in_file.split('/')[-1]}",'r') as f:
        f.extractall(f"/tmp/res")

        for path, _, files in os.walk('/tmp/res'):
            for extracted_f in files:
                session.file.put(f'{path}/{extracted_f}', f"{out_dir}/{path.split('/tmp/res')[-1]}")

Using the ZipFile module, we extract the zip file into the directory /tmp/res in the session's local storage. We then use Python's os module to walk the extracted directory and upload each file back to our external stage in the same directory struture that the zip file has. The session.file.put statement is the FileOperation method that uploads each file back to the new directory in our external stage that holds the fully unzipped file. The full code for making the unzipper stored procedure looks as follows:

CREATE OR REPLACE PROCEDURE unzip(in_file string, out_dir string)

RETURNS string
LANGUAGE PYTHON
RUNTIME_VERSION = '3.10'
PACKAGES = ('snowflake-snowpark-python', 'zipfile-deflate64')
HANDLER = 'main'
EXECUTE AS CALLER
AS $$
import zipfile_deflate64 as zipfile
import os

def main(session, in_file, out_dir):
    _ = session.file.get(f'{in_file}', '/tmp')

    with zipfile.ZipFile(f"/tmp/{in_file.split('/')[-1]}",'r') as f:
        f.extractall(f"/tmp/res")

        for path, _, files in os.walk('/tmp/res'):
            for extracted_f in files:
                session.file.put(f'{path}/{extracted_f}', f"{out_dir}/{path.split('/tmp/res')[-1]}")

$$;

This procedure can then be called with the full external stage path of the zip file to be extracted and the directory the extracted files should be put in as so:

call unzip('@external_stage_name/zip_file_path.zip',
'@external_stage_name/out_dir_path')

Other Potential Uses for Snowpark's FileOperation

Being able to read and write files beyond just loading tabular data into tables has unlocked several other uses for us. We are now able to store our ML models in cloud buckets instead of relying on internal stages to handle imports with Snowflake SPROCs. We have also found uses when loading the geospatial datasets we sometimes work with that require reading/writing shapefiles, WKT, and other files that gave us trouble loading into Snowflake using internal stages with imports.

There are of course many other things File Operations could open up beyond these, such as loading in training data for vision models or any other data that is best loaded outside of the standard tabular format vanilla Snowflake copy statements support. Hopefully this guide was helpful in highlighting a useful part of the Snowpark kit that has been very useful to us here at IMGE.