In recent years, Data Warehousing has seen tremendous changes with the widely adopted move toward cloud solutions. But even among the cloud solutions at offer as we speak, innovation is driving new architectures with new, unique characteristics.
Snowflake has been particularly successful at drawing customers’ attention; part of its success can surely be attributed to its storage architecture. In the following, we shall have a closer look at Snowflake’s storage tier and lay out various ways in which it enables Snowflake to function in the way it does.
Snowflake is cloud-agnostic and runs on AWS, Azure and Google Cloud. For simplicity, we will only be referring to AWS Services, specifically S3, to illustrate the concepts; however, other cloud offerings work with Snowflake in very much the same way.
Snowflake stores data in new ways. As opposed to traditional solutions, it offers something which might be labeled a “shared-nothing architecture”. Usually, this architecture implies that all the data has to be replicated – incurring high cost by doing so. Snowflake’s approach is novel in the sense that while it does implement a share d nothing architecture, the replication is opaque to the processing units.
Specifically, Snowflake uses AWS’s S3 storage service; processing is performed by so-called “Virtual Warehouses” (denoted by WH1 and WH2 in the illustration). Each warehouse uses a local cache on solid state discs; those render requests to S3 unnecessary in case data was loaded before.
For every query to be executed, the warehouse checks whether local data supplements the queried data; only missing data is requested from S3. Once data is received from S3, the local cache is populated and the result is sent back to the end user.
This is the first instance in which Snowflake differs from traditional solutions. By leveraging S3, Snowflake uses a high-performance, high-availability replicated data storage solution without introducing additional complexity to its architecture. Note that traditional solutions like Redshift or Spark require data shuffling among nodes to redistribute data to the nodes that need them. Because data location is part of other storage systems’ architecture, distribution of data among the nodes (in DWH parlance “partitioning”) has to be defined and maintained. This introduces both extra complexity as well as possible delays in query execution due to nodes being responsible for both shuffling data and executing the query at the same time. This very step is not necessary for Snowflake, reducing complexity and making query execution times less susceptible to delays by data shuffling. In short, Snowflake combines the advantages of shared nothing – low maintenance and high performance – with those of shared disk – i.e. a single point of truth.
Traditional databases like MySQL or PostgreSQL use row-oriented data formats, i.e. columns belonging to the same row reside in spatially contiguous memory. This format is deal for transactional databases, where entities are usually requested as a whole and aggregations on columns are rarely run. In particular, updating columns depending on other columns‘ values becomes trivial with this approach – data access is kept local in those cases.
Like many other modern analytical databases, Snowflake uses a column-based storage format. That is to say, blocks of contiguous memory contain values of the very same column. In this way, aggregations can be implemented very efficiently and algorithms like Map-Reduce can easily be used for parallelization.
Furthermore, because blocks only cover individual columns and hence data of the same distribution, more efficient compression is possible.While being very interesting as a detail of implementation, this architecture is opaque to the end user. Note how this is one of the many aspects for which Snowflake excels at simplicity – for other warehousing solutions like Redshift, column compression optimization and data skew are serious concerns that would need to be addressed here.
Partitions are instrumental in understanding the impact of storage on performance in Snowflake. „Micropartitions“ are the atomic blocks of data that Snowflake deals with. Uncompressed, those can span anything from 50 to 500 Megabytes.
Each Micropartition carries a number of statistics on the column values it contains – among others, the minimum and maximum value. When searching partitions, Snowflake will skip partitions for which it can ensure that no relevant data is contained within. For instance, if a certain partition only contains data for transactions in March 2020, then this data is irrelevant for queries on April 2020. This process is called „pruning“ and is implemented using the aforementioned meta data.
Snowflake offers functionality to analyse the state individual tables‘ partitions are in by offering statistics on the degree of overlap between partitions and exposing partition pruning statistics in the query profile.
By design, Snowflake is a database requiring little to no maintenance. What lifts the burden of manual optimization off users’ shoulders also deprives them of the ability to do so. One of the very few optimization techniques that are available in Snowflake is the ordered insert of data. Inserting data ordered by columns on which we filter enables Snowflake to prune partitions effectively, reduce the number of partitions scanned and thus reduce querying time. Due to immutability – a concept we are going to cover just in a minute – partitions do not change once they have been created. In particular, it is not possible to change the order of data once it has been written to disk. Therefore, it is advisable to always insert data in a suitable order to enable Snowflake to perform optimisations efficiently.
Micropartitions are immutable. In other words, partitions are never updated, but rather whenever an update to any row in a partition is to be applied, that partition is loaded, the data is updated and a new partition is written to disk. That implies that for any write access – regardless of how small and insignificant it is – an entire partition will be written to disk. In very much the same way, single INSERT statements will also trigger the creation of a Micropartition. Naturally, Snowflake tries to batch INSERTS up, but when issuing isolated INSERT statements, this may not always be possible. This yet again underlines the importance of batch processing for Snowflake – it is simply not built to deal with small amounts of data.
It is noteworthy that this design choice is critical to a number of features in Snowflake, first and foremost Time Traveling. Making partitions immutable to changes facilitates providing historical data on tables very easily. It also means that data overhead can be considerable in Snowflake and must be evaluated upfront. Every TRUNCATE on a table will retire all Micropartitions currently on that table. Those Micropartitions are garbage-collected on a regular basis, but will continue to incur cost until they exceed the time traveling interval defined for that particular resource. Hence, usage of the TRANSIENT feature (which disables time traveling altogether) should be deliberated for tables without a strong need to retain history.
Object cloning is quite a novel feature in the realm of data warehousing and very unique to Snowflake. Due to the aforementioned architecture of immune Micropartitions, tables are essentially just a list of Micropartitions. This implies that when a table is copied, unless data is changed, it is sufficient to create a new table with the same list of underlying Micropartitions. In particular, it is not necessary to physically copy data, unless updates to a micropartition are to be applied. This approach is commonly referred to as “copy-on-write” and often used in the context of lazy evaluation.
Due to this implementation, cloning of objects in Snowflake is very cost-effective; even databases with hundreds of objects and data can be cloned in a matter of minutes. The bottleneck here is rather the sheer number of objects than the actual amount of memory “copied” over.
Having this feature at one’s disposal, new, previously simply infeasible approaches become possible. What previously could easily take a day now becomes a matter of minutes. For instance, it is now possible to quickly clone a production database for the purpose of system testing new features. Likewise, databases can be treated very much like branches in a versioning system, where all development environments derive – or clone – from a common master branch. All of those very interesting applications hinge on the immutability of Micropartitions.
Cost of a Brave New World
While Snowflake relies on AWS S3 for data storage, it manages the storage service itself and bills customers differently from the official AWS pricing. Currently, a terabyte of data runs at 45$ per month for on-demand and about 25$ per terabyte for capacity storage for the Frankfurt region on AWS. The latter is Snowflake’s name for “reserved capacity”, i.e. pricing for long-term commitments. Thus, the actual S3 pricing is only passed to the customer if he chooses the long-term commitment.
Note that it is in principle possible to keep data on one’s own S3 storage account and reference the data in Snowflake using the EXTERNAL TABLES feature. However, there are a number of downsides to this approach:
Firstly, data is then not managed in terms of Micropartitions, possible incurring a high performance penalty. After all, the snowflake query optimizer will not be able to apply any optimization based on Micropartition meta data.
Secondly, the data is read-only, so neither write nor update operations can be performance on those tables.
Thirdly, updates to external tables are passed to Snowflake via S3 event notifications. Using this architecture, it is quite hard to enforce strict database consistency (i.e. not just eventual consistency).
This article hopefully shed some light on the internal workings of Snowflake Storage and how they support you in offering a well-performing warehousing solution. This article is part of a blog series on Snowflake covering a number of different aspects.
For more information on this topic, you can attend our regular online seminar for free or book a seminar on the topic with our experts.