Snowflake
Contents
Snowflake#
The Ray connector for Snowflake allows Ray clusters to easily copy data between Snowflake and a Ray cluster. All data exchanges are in parallel, taking full advantage of the distributed compute capabilities of Ray datasets, and the parallel data pipelining available from Snowflake. This allows your distributed python and machine learning workloads to seamlessly integrate with Snowflake data.
Why use the connector?#
Prior to the Ray DB Snowflake connector, using Snowflake data with Ray required exporting data to external object storage like S3, and then reading the data into Ray datasets. This additional staging step requires developing more complex processing stages that require coordination, cause governance and security issues and also increase overall workload execution times.
Lightening fast data exchange
Fast parallel data exchange at speeds of millions of rows per second
Data exchange speeds scale with the size of you Ray cluster and Snowflake warehouse
Simple and easy integration with Ray AIR
Data is read into Ray datasets, allowing it to easily leverage the full power of Ray AI Runtime
Data can be featurised, used for training, tuning and batch inference with Ray AIR
Data governance and security
Zero intermediate data storage of data, minimizing data governance issues
Data is encrypted in transit between clusters
Data can be encrypted at rest within a clusters temporary instance store as well
Ray clusters are ephemeral, and no data remains within clusters after workload completion
Connectors load connection properties external to code