Database Administrators

Q&A for database professionals who wish to improve their database skills

Latest Questions

0 votes

2 answers

175 views

DSBulk count returns more rows than unloaded in CSV files

I'm doing `dsbulk unload` for one table, with one primary key field, without clustering key. At the end in console I see something like this ``` total | failed | rows/s | p50ms | p99ms | p999ms 174,971,236 | 0 | 1,946,689 | 148.95 | 285.21 | 400.56 ``` but when I counting number lines in all csv fil...

I'm doing dsbulk unload for one table, with one primary key field, without clustering key. At the end in console I see something like this

total | failed |    rows/s |  p50ms |  p99ms | p999ms
174,971,236 |      0 | 1,946,689 | 148.95 | 285.21 | 400.56

but when I counting number lines in all csv files I getting ~170M, in ~5M less. My main confusing difference in cli output and number in csv files. I also: dsbulk count shows me same result as unload ~175M. My general question why it happens? What is explanation? What real size of my table? and how to debug? I already did dsbulk count with --dsbulk.engine.maxConcurrentQueries 1, --datastax-java-driver.basic.request.consistency LOCAL_QUORUM and number is same ~175M.

Viktor Tsymbaliuk (25 rep)

Jun 12, 2024, 11:58 AM • Last activity: Jul 29, 2025, 01:03 AM

Showing page 1 of 1 total questions