„Data in the 21st cen­tury is like oil in the 18th cen­tury “- and Snow­flake makes it pos­si­ble to get the maxi­mum out of this resource. Snow­flake is Big Data wit­hout its com­ple­xity. Data ware­house migra­tion aims to satisfy the need of data. Snow­flake is a great solu­tion to achieve pre­cis­ely that due to its great archi­tec­ture, its fea­tures and its part­ners – inde­pendently of the size of the orga­niza­tion.  Start­ups bene­fit from Snowflake’s fle­xi­ble sca­ling which allows Snow­flake to scale  as the startup grows, as well as from Snowflake’s low-cost pay-as-you-go model. An enter­prise orga­niza­tion may use Snow­flake to improve its exis­ting solu­ti­ons to reduce costs and com­ple­xity by migra­ting all their data to a sin­gle plat­form in which they are able to store and ana­lyze their data wit­hout having to move it phy­si­cally. This gives the oppor­tu­nity to make data dri­ven busi­ness decis­i­ons easier and more precise.

Snow­flake is chan­ging the game

For some years now, every com­pany had to decide whe­ther they want to store their data on pre­mise or in the Cloud (or hybrid). Deci­ding for the on-pre­mise solu­tion requi­red a lot of money and time since the com­pany had to create and manage the neces­sary infra­struc­ture wit­hout being able to make use of it in the begin­ning. On the other hand, cloud data ware­house solu­ti­ons  were limi­ted to the offers of these pro­vi­ders: Ama­zon (AWS Reds­hift), Micro­soft (Azure) and Google (Google Big­Query). Snow­flake is now chan­ging this game. As a “Soft­ware as a Ser­vice” (SaaS) there is no need to spend money for an infra­struc­ture or for the manage­ment of said. Snow­flake is desi­gned for the cloud. It is fully mana­ged, and with its pay-as-you-go model, you only have to pay for resour­ces you actually are using. Fur­ther­more, it can be com­bi­ned with AWS, Azure or Google Cloud (after mid 2020) and make use of their func­tions and struc­tures like Ama­zon S3 or Azure Blobs. The illus­tra­tion below shows the cloud ser­vices with an ana­logy “Ham­bur­ger as a Ser­vice”. (Image taken from D.Anoshin, D.Shirokov, D.Strok – “Jump­start Snowflake”)

Why migrate to Snowflake Bild
Figure 1 Ham­bur­ger as a Service 

Snowflake’s advan­tage through architecture

The gist of the Cloud was to give com­pa­nies the oppor­tu­nity to choose a solu­tion for their data in which they only need to spend money for resour­ces they use. Snow­flake hono­urs this inten­tion. Loo­king at the archi­tec­ture of Snow­flake, this can cle­arly be recognized.

Why migrate to Snowflake Bild2
Figure 2 Snow­flake Multi-Clus­ter Shared Data Architecture

Snowflake’s uni­que Multi-Clus­ter Shared Data Archi­tec­ture is a com­bi­na­tion of the shared-disk and shared-not­hing archi­tec­ture, which reaped the best pro­per­ties of both and is now split in three parts: the Cloud ser­vices, the Com­pute resour­ces and the Sto­rage resour­ces. The Cloud ser­vices can be com­pared to how a brain works. It is a coll­ec­tion of ser­vices that coor­di­na­tes the acti­vi­ties in the Data Ware­house, for ins­tance helps it to opti­mize queries and is respon­si­ble for secu­rity. The Com­pute resour­ces aka Query Processing,which includes the Vir­tual Warehou­ses, can be descri­bed as the mus­cle of the sys­tem. The size of the Warehou­ses com­ple­tely depends on the needs of the orga­niza­tion and allows to sepa­rate dif­fe­rent workloads. The third and last com­po­nent are the Sto­rage resour­ces. They are respon­si­ble for sto­ring any kind of data and are syn­chro­no­usly repli­ca­ted across mul­ti­ple disk devices as well as three avai­la­bi­lity zones that gua­ran­tee a high avai­la­bi­lity of the data. Fur­ther­more,  data is auto­ma­ti­cally split into Micro-Par­ti­ti­ons, which are the phy­si­cal data files that com­prise Snowflake’s logi­cal tables. Each of them has a size of up to 16MB. These Micro-Par­ti­ti­ons allow for some of Snow­flakes ama­zing fea­tures like Time-Tra­vel and zero-copy clo­ning which will be men­tio­ned later.

But why is this archi­tec­ture so special?

As men­tio­ned ear­lier, an on-pre­mise solu­tion requi­res sub­stan­tial spen­ding for the infra­struc­ture right at the begin­ning of your pro­ject – wit­hout any imme­diate bene­fits. Snowflake’s archi­tec­ture allows to scale the sto­rage and the Com­pute resour­ces inde­pendently from each other and gives you the oppor­tu­nity to only pay for what you use at any given moment.

Sto­rage can be bought as on-Demand Sto­rage or fixed capa­city at the begin­ning of each month. The first option is a little bit more expen­sive per TB but more fle­xi­ble and the costs cor­re­spond to actual usage. The second option has lower cost per TB than the first, but you may mis­cal­cu­late your nee­ded sto­rage , hence it is the­r­e­fore not neces­s­a­rily more economic.

At Snow­flake you pay for Com­pute resour­ces for any hour a Ware­house is used, and Warehou­ses can be pro­vi­sio­ned for tasks of arbi­trary load. Ima­gine a com­pany who runs an expen­si­ve­ETL pro­cess every night but at day­time there are no or just a few ETL pro­ces­ses. In Snow­flake you may choose a Ware­house for said ETL pro­ces­ses that is small (and the­r­e­fore inex­pen­sive) during the day to handle the few pro­ces­ses nee­ded during that time. At night you can increase the size of the ware­house (this is cal­led „Scale out“ or „ver­ti­cal sca­ling“) to improve the per­for­mance. After the work is done, the Ware­house will auto­ma­ti­cally shut down and it will only restart once there are more queries to handle. In the mor­ning the ware­house can be decreased in size again to handle the daily work. Since you need only pay for a run­ning Ware­house, you can mini­mize your costs during the day, because the Ware­house will auto-sus­pend when there is no work to do.

Why migrate to Snowflake Bild3
Figure 3 Snow­flake Ware­house Sizes
Why migrate to Snowflake Bild4
Figure 4 Tra­di­tio­nal Data Architecture
Why migrate to Snowflake Bild5
Figure 5 Modern Data Architecture

Fur­ther­more, Snow­flake caches infor­ma­tion in dif­fe­rent caches. Queries are cached for 24 hours in the Data­base Sto­rage Layer, which allows you to run the same query over and over wit­hout spin­ning up a ware­house. The Meta­data Cache is in the Ser­vice Layer. This cache enables Snow­flake to handle gene­ral queries about the data. These con­cepts reduce the accu­mu­la­ted costs since repea­ted queries do not need a ware­house and queries which are taking a long time can be acces­sed wit­hout them running.

It’s not just the architecture

Bes­ide Snowflake’s uni­que archi­tec­ture there are great fea­tures like zero-copy cloning.

Zero-copy clo­ning allows to dupli­cate big data in seconds – wit­hout using addi­tio­nal sto­rage. Snow­flake achie­ves this by taking a snapshot of the source the instant the clone is crea­ted. Since the clone object is wri­ta­ble and inde­pen­dent from its source, it is pos­si­ble to mani­pu­late its data inde­pendently from the source. Chan­ging the clone will make the clone cost sto­rage resour­ces since the under­ly­ing data got chan­ged. It is the­r­e­fore a great fea­ture to run tests wit­hout chan­ging the ori­gi­nal data or even (acci­den­tally) dele­ting it. But even dele­ting data is not a big deal in Snow­flake due to its Time Tra­vel fea­ture. The fea­ture allows you to recreate the data the way it was at a spe­ci­fic point in time or before a query ran. You can even com­bine Time Tra­vel and clo­ning to recreate an older set of data and compare the pre­sent data to the old one.

Bes­i­des Snowflake’s uni­que archi­tec­ture and great fea­tures there are more reasons to migrate to Snow­flake. Snow­flake has high secu­rity stan­dards, for ins­tance Multi-Fac­tor Authen­ti­ca­tion and encryp­tion. It sup­ports struc­tu­red and semi-struc­tu­red data like XML, JSON or Par­quet, which can be easily trans­fer­red to your Data­base with part­ners like Five­tran, Matil­lion or Infor­ma­tica and can be trans­for­med by them. With more expen­dit­ure you may imple­ment your own Python solu­tion for loa­ding and trans­forming your Data as well. Aside from these trans­for­ma­tion and loa­ding part­ners, you can use Tableau as a BI Tool to visua­lize your Data or Spark for data sci­ence pro­jects. There are even more solu­ti­ons than the afo­re­men­tio­ned part­ners since tools like DBT can con­nect via com­mu­nity solu­ti­ons to Snowflake.

All in all, Snow­flake sol­ved many issues of other Data Warehou­ses. As a Soft­ware as a Ser­vice, Snow­flake eli­mi­na­ted the com­ple­xity of mana­ging infra­struc­ture and data. Through its nearly unli­mi­ted sca­la­bi­lity it sup­ports all kind of data (fun fact aside: the big­gest table in snow­flake has 64 tril­lion rows and belongs to Adobe), with its ela­s­ti­city and sca­ling hori­zon­tally as well as ver­ti­cally for impro­ving per­for­mance and con­cur­rency, it is able to handle any situa­tion.  Fur­ther­more, Snowflake’s Multi Clus­ter Shared Data archi­tec­ture gives the pos­si­bi­lity for ever­yone to have access to all the data.