Benchmarks#

The Ray data Snowflake connector reads and writes in parallel, large scale data exchange between Snowflake and Ray scale with the size of the Ray cluster. The main constraint on speed is the Snowflake Warehouses ability to prepare the data for reading and writing by Ray data, and the number of batches it returns for parallel processing. Ray data itself can scale to any number of parallel read and write processes. As seen in the benchmarks below, the throughput of reading and writing increases exponentially with the size of the data.

Read benchmarks#

rows

time (s)

bytes (MB)

mean throughput (MB/s)

150000.00

2.41

25.88

14.73

1500000.00

2.97

258.73

94.64

15000000.00

9.21

2587.27

555.13

150000000.00

28.29

25874.89

1380.59

Write benchmarks#

rows

time (s)

bytes (MB)

mean throughput (MB/s)

150000.00

5.39

25.88

4.93

1500000.00

8.26

258.73

33.18

15000000.00

19.54

2587.27

195.64

150000000.00

120.61

25874.89

367.96