Ray Connector for Snowflake

Snowflake#

The Ray connector for Snowflake allows Ray clusters to easily copy data between Snowflake and a Ray cluster. All data exchanges are in parallel, taking full advantage of the distributed compute capabilities of Ray datasets, and the parallel data pipelining available from Snowflake. This allows your distributed python and machine learning workloads to seamlessly integrate with Snowflake data.

Why use the connector?#

Prior to the Ray DB Snowflake connector, using Snowflake data with Ray required exporting data to external object storage like S3, and then reading the data into Ray datasets. This additional staging step requires developing more complex processing stages that require coordination, cause governance and security issues and also increase overall workload execution times.

../../_images/speed.png

Lightening fast data exchange

  • Fast parallel data exchange at speeds of millions of rows per second

  • Data exchange speeds scale with the size of you Ray cluster and Snowflake warehouse

../../_images/integration.png

Simple and easy integration with Ray AIR

  • Data is read into Ray datasets, allowing it to easily leverage the full power of Ray AI Runtime

  • Data can be featurised, used for training, tuning and batch inference with Ray AIR

../../_images/security.png

Data governance and security

  • Zero intermediate data storage of data, minimizing data governance issues

  • Data is encrypted in transit between clusters

  • Data can be encrypted at rest within a clusters temporary instance store as well

  • Ray clusters are ephemeral, and no data remains within clusters after workload completion

  • Connectors load connection properties external to code