Cloud Data Warehousing - synvert saracus

The fourth industrial revolution – that is incorporating the use of smart technologies in industrial practices – is what drives many companies to reinvent their IT solutions during the past decade. Data obtained from many different sources, such as mobile devices, Internet of Things (IoT) applications, smart sensors or customer profiling solutions require flexible data warehouses to store, enrich and expose the data to end users. Further, the amount of data grows rapidly. Traditional on-premises data warehouses would need to be extended regularly, which would result in a lot of administrative overhead.

Cloud data warehouses remedy this shortcoming by their native flexibility. Further, the modern design of typical cloud data warehouses allows for flexible integration of data from a variety of sources – structured, semi-structured or unstructured – usually by leveraging other services of the respective cloud provider.

Several cloud data warehouse solutions have emerged in the past decade, building upon different strategies. Prominent players in the field are AWS Redshift, Azure Synapse Analytics, Google BigQuery, and Snowflake. But which one to choose, based on use-case and IT ecosystems already preexisting in your company? To provide a base for this decision, we will compare these four technologies on a high level, considering points like scalability, maintainability, pricing models, and security. But before addressing these details, first some general remarks on the cloud data warehouses:

AWS Redshift

Redshift was first released by AWS in 2012 and is based on Postgres. As such, it uses a shared-nothing architecture, meaning that data is distributed between multiple nodes within a cluster. Each node will operate only on the data stored on its disk and results will be merged by a master node. This data locality leads to high performance but requires some maintenance.

Azure Synapse Analytics

The first version of Azures data warehouse solution was released in 2016. A second generation of Synapse Analytics was then made available in 2018. The data warehouse uses a multi cluster shared-data approach, where all nodes within a cluster share the same data pool. This reduces maintenance effort but introduces some overhead for network transfer.

Google BigQuery

BigQuery is a fully serverless data warehouse. It is designed by Google, based on the Google Dremel technology. Transient compute resources operate on a shared pool of data only for as long as the queries need to execute. Therefore, this solution consequently follows the pay-as-you-go approach of serverless cloud solutions.

Snowflake

Snowflake, released in 2014, is a data warehouse that was conceptualized to work in the cloud. In contrast to the other three solutions, which all operate within their respective cloud platforms, Snowflake can be hosted in AWS, Azure or Google Cloud environments. It follows a hybrid shared-data / shared-nothing approach, in which serverless compute nodes load data from a shared data pool and operate on them locally.

Scalability

Being Cloud solutions, the data warehouse technologies discussed here all follow the concept of scalability. Each solution can scaled in and out flexibly. In BigQuery, resources get provisioned dynamically based on demand, without the need for reconfiguration. Within Snowflake, the user gets to choose from t‑shirt sizes (XS, S, M, L, XL, …). A change in warehouse size will be in effect at the start of the next query. Apart from that, an autoscale feature can spawn additional compute clusters of the same size to handle high concurrency.

In the cases of Synapse Analytics and Redshift, the user needs to increase (or decrease) the number of resources associated with a permanently running cluster. While association or dissociation of nodes is managed by the cloud provider, the process usually takes a few minutes and very short downtimes may occur. Additionally, the shared-nothing architecture of AWS Redshift requires to always scale compute and storage alongside each other. Other solutions allow a decoupling in this matter, where storage is automatically scaled to the data volume used by the data warehouse. Redshift added the new RA3 node type to introduce this feature as well. However, the use of this node type is probably exaggerated for most use cases.

Maintainability

Snowflake and BigQuery follow a serverless strategy, which means that there is no architecture to configure or maintain. Apart from the choice of the t‑shirt size in Snowflake, both solutions can be used „out-of-the-box“. Synapse Analytics and Redshift, on the other hand, require the configuration and management of cluster sizes. Furthermore, due to its shared-nothing architecture, Redshift’s performance is sensitive to the distribution of da data between the nodes. Therefore, the user is well advised to have an eye on this to improve query performance.

Pricing Models

The pricing models of the four solutions compared here perhaps show the largest differences. First, let us consider compute resources. Google BigQuery strictly follows the pay-as-you-go and charges by the data volume that is processed by the engine. Snowflake’s pricing model involves the runtime of the transient compute resources provisioned to run the queries. On top of that, cloud services costs, associated with background processes like automations or caches, are charged if they exceed 10% of the usual compute charges.
Synapse Analytics and Redshift charge per cluster size and uptime of the cluster. Synapse Analytics uses the concepts of „Data Warehouse Units“ to group CPU and RAM resources, while Redshift uses the more transparent but more coarse-grained concept of virtual machines.

In terms of storage costs, BigQuery, Synapse Analytics, Snowflake, and Redshifts RA3 nodes charge only for the data volume used by the customer. Within BigQuery and Redshifts RA3, the price follows the respective cloud object store (Google Cloud Storage and AWS S3) and is usually little short of 25$ per TB per month. Snowflake charges almost twice that price for on-demand storage. However, the price is reduced to the price of the underlying cloud object storage price if a long-term commitment is made. Synapse Analytics uses a shared-disk concept to build its data pool, which increases the price for storage dramatically (by a factor of 5 compared to other technologies). Traditional Redshift nodes follow a similar concept, using HDDs or SSDs, which are 2 to 4 times higher in price than the S3 storage solution.

Conclusion

In conclusion, the choice of a new cloud data warehouse should be guided by a few general design principles. Do you want a more traditional architecture with permanently running clusters because ETL and reporting queries will run constantly? Then choosing AWS Redshift or Azure Synapse Analytics with their constantly running but cheaper infrastructure will be a good option. Are you using your data warehouse only few hours a day? Then serverless data warehouses like BigQuery or Snowflake might be your option. Do you expect a fast growth of data in your data warehouse? Consider using data warehouses which scale flexibly with your usage, like Synapse Analytics, Snowflake or BigQuery.

If this high-level comparison did not fully build the fundament for your decision yet, watch out for the in-depth comparison of Redshift and Snowflake coming soon or take a look at our Cloud Data Warehouse Seminar.

Akzeptieren
Name	YouTube
Anbieter	Google LLC
Zweck	Diese Webseite verwendet Youtube zu Marketingzwecken. Die Daten werden an einen Server in den USA übertragen und dort gespeichert. Die personenbezogenen Daten werden auf Grundlage des Art. 46 und/oder Art. 49 Abs. 1 lit. a) DSGVO übermittelt.
Laufzeit	Daten werden gelöscht, sobald sie für die Bearbeitung nicht mehr benötigt werden.
Weiterführende Infos	https://policies.google.com/privacy

Akzeptieren
Name	hellotrust
Anbieter	Keyed GmbH
Zweck	hellotrust speichert den Zustimmungsstatus des Benutzers für Cookies auf der aktuellen Domain.
Laufzeit	Daten werden gelöscht, sobald sie für die Bearbeitung nicht mehr benötigt werden.
Weiterführende Infos	https://hellotrust.de/datenschutz

Akzeptieren
Name	Google Analytics
Anbieter	Google LLC
Zweck	Diese Webseite verwendet Google Analytics zur Analyse der Websitebenutzung durch Nutzer. Die Daten werden an einen Server von Google in den USA übertragen und dort gespeichert. Die personenbezogenen Daten werden auf Grundlage des Art. 46 und/oder Art. 49 Abs. 1 lit. a) DSGVO übermittelt.
Laufzeit	Daten werden gelöscht, sobald sie für die Bearbeitung nicht mehr benötigt werden. In der Regel werden die Cookies von Google für eine Dauer von 2 Jahren gespeichert.
Weiterführende Infos	https://policies.google.com/privacy

Cloud Data Warehousing: Which Technology should you use?

AWS Redshift

Azure Synapse Analytics

Google BigQuery

Snowflake

Scalability

Maintainability

Pricing Models

Conclusion

Weitere interessante synvert saracus Materialien

POCing Snowflake

Sichere und personalisierte Web-Apps: Eine Anleitung zur Integration von OAuth und Streamlit mit Azure

Kommende Webinare

CI/CD entschlüsselt: Tools und Techniken für kontinuierliche Integration und Bereitstellung

Bringing GenAI to Production

Snowflake Basics

Einführung in Kafka Streams

Snowflake Advanced

Einführung in Snowpark

POCing Snowflake

Data Ingestion aus SAP-Systemen mit Azure Data Factory

Einführung in CI/CD mit Azure DevOps

Webinare auf Abruf

Data Catalogs Part I: Einführung in Metadatenmanagement und Data Catalogs

Data Catalogs Part II: Data Catalog in Action: Alation

Data Catalogs Part III: Data Catalog in Action: Informatica Enterprise Data Catalog

Ähnliche Artikel

NeMo Guardrails, das ultimative Open-Source LLM Sicherheits-Toolkit

Synchronisierung eines DWH mit einem Quellsystem nach fehlerhafter Datenlieferung

Die 5 Geheimnisse des Machine Learnings

Verständnis der verfassungsrechtlichen KI

Wir verwenden Cookies

AWS Reds­hift

Azure Syn­apse Analytics

Google Big­Query

Snow­flake

Sca­la­bi­lity

Main­taina­bi­lity

Pri­cing Models

Con­clu­sion

Beitrag teilen

Wei­tere inter­es­sante syn­vert saracus Materialien

POCing Snow­flake

Sichere und per­so­na­li­sierte Web-Apps: Eine Anlei­tung zur Inte­gra­tion von OAuth und Stream­lit mit Azure

Kommende Webinare

CI/CD ent­schlüs­selt: Tools und Tech­ni­ken für kon­ti­nu­ier­li­che Inte­gra­tion und Bereitstellung

Brin­ging GenAI to Production

Snow­flake Basics

Ein­füh­rung in Kafka Streams

Snow­flake Advanced

Ein­füh­rung in Snowpark

POCing Snow­flake

Data Inges­tion aus SAP-Sys­te­men mit Azure Data Factory

Ein­füh­rung in CI/CD mit Azure DevOps

Webinare auf Abruf

Data Cata­logs Part I: Ein­füh­rung in Meta­da­ten­ma­nage­ment und Data Catalogs

Data Cata­logs Part II: Data Cata­log in Action: Alation

Data Cata­logs Part III: Data Cata­log in Action: Infor­ma­tica Enter­prise Data Catalog

Ähnliche Artikel

NeMo Guar­drails, das ulti­ma­tive Open-Source LLM Sicherheits-Toolkit

Syn­chro­ni­sie­rung eines DWH mit einem Quell­sys­tem nach feh­ler­haf­ter Datenlieferung

Die 5 Geheim­nisse des Machine Learnings

Ver­ständ­nis der ver­fas­sungs­recht­li­chen KI

Abonnieren Sie unseren Newsletter, um aktuelle Infos von synvert saracus zu erhalten

Dan­ke­schön!

Anmel­dung

Um Zugang zu all unse­ren On-Demand-Web­i­na­ren und White­pa­pers zu erhalten!

Ein­log­gen

Stö­bern Sie jeder­zeit in Web­i­nar-Videos und White­pa­pers von syn­vert saracus

Pass­wort zurücksetzen

Pass­wort ver­ges­sen? Geben Sie Ihre E‑Mail-Adresse ein und Sie erhal­ten einen Link zum Zurück­set­zen des Passworts

Wir verwenden Cookies

AWS Redshift

Azure Synapse Analytics

Google BigQuery

Snowflake

Scalability

Maintainability

Pricing Models

Conclusion

Weitere interessante synvert saracus Materialien

POCing Snowflake

Sichere und personalisierte Web-Apps: Eine Anleitung zur Integration von OAuth und Streamlit mit Azure

CI/CD entschlüsselt: Tools und Techniken für kontinuierliche Integration und Bereitstellung

Bringing GenAI to Production

Snowflake Basics

Einführung in Kafka Streams

Snowflake Advanced

Einführung in Snowpark

POCing Snowflake

Data Ingestion aus SAP-Systemen mit Azure Data Factory

Einführung in CI/CD mit Azure DevOps

Data Catalogs Part I: Einführung in Metadatenmanagement und Data Catalogs

Data Catalogs Part II: Data Catalog in Action: Alation

Data Catalogs Part III: Data Catalog in Action: Informatica Enterprise Data Catalog

NeMo Guardrails, das ultimative Open-Source LLM Sicherheits-Toolkit

Synchronisierung eines DWH mit einem Quellsystem nach fehlerhafter Datenlieferung

Die 5 Geheimnisse des Machine Learnings

Verständnis der verfassungsrechtlichen KI

Abonnieren Sie unseren Newsletter,
um aktuelle Infos von synvert saracus zu erhalten

Dankeschön!

Anmeldung

Um Zugang zu all unseren On-Demand-Webinaren und Whitepapers zu erhalten!

Einloggen

Stöbern Sie jederzeit in Webinar-Videos und Whitepapers von synvert saracus

Passwort zurücksetzen

Passwort vergessen? Geben Sie Ihre E‑Mail-Adresse ein und Sie erhalten einen Link zum Zurücksetzen des Passworts