Sample Header Ad - 728x90

spark-cassandra-connector read throughput unpredictable

0 votes
1 answer
286 views
A user reports that the range query throughput is far higher than expected when setting spark.cassandra.input.readsPerSec in the spark-cassandra-connector. Job dependencies. The Java driver version is set to 4.13.0. com.datastax.spark spark-cassandra-connector_2.12 3.2.0 com.datastax.oss java-driver-core-shaded ... com.datastax.oss java-driver-core 4.13.0 There are two steps in the job (both an FTS): Dataset dataset = sparkSession.sqlContext().read() .format("org.apache.spark.sql.cassandra") .option("table", "inbox_user_msg_dummy") .option("keyspace", "ssmp_inbox2").load(); -and- Dataset olderDataset = sparkSession.sql("SELECT * FROM inbox_user_msg_dummy where app_uuid = 'cb663e07-7bcc-4039-ae97-8fb8e8a9ff77' AND " + "create_hour = token(G9e7Y4Y, 2023-08-10T04:17:27.234Z, cb663e07-7bcc-4039-ae97-8fb8e8a9ff77) AND token(user_id, create_hour, app_uuid) <= 9121832956220923771 LIMIT 10 FWIW, avg partition size is 649 bytes, max is 2.7kb.
Asked by Paul (416 rep)
Nov 7, 2023, 07:56 PM
Last activity: Nov 8, 2023, 02:07 PM