DSBulk count returns more rows than unloaded in CSV files

0 votes

2 answers

175 views

I'm doing dsbulk unload for one table, with one primary key field, without clustering key. At the end in console I see something like this

total | failed |    rows/s |  p50ms |  p99ms | p999ms
174,971,236 |      0 | 1,946,689 | 148.95 | 285.21 | 400.56

but when I counting number lines in all csv files I getting ~170M, in ~5M less. My main confusing difference in cli output and number in csv files. I also: dsbulk count shows me same result as unload ~175M. My general question why it happens? What is explanation? What real size of my table? and how to debug? I already did dsbulk count with --dsbulk.engine.maxConcurrentQueries 1, --datastax-java-driver.basic.request.consistency LOCAL_QUORUM and number is same ~175M.

Asked by Viktor Tsymbaliuk (25 rep)

Jun 12, 2024, 11:58 AM
Last activity: Jul 29, 2025, 01:03 AM

DSBulk count returns more rows than unloaded in CSV files

Related Questions