Which database to use when you have a 6 billion of rows inside and need to query rows with from list of IDs?
-1
votes
2
answers
397
views
We are currently researching our case of storing the road distances between cities. Right now we have 6 billion of those distances.
Our structure right now in SQL Server is that we have a
float
which represents the relationship between cities. For example if we have a city with Id
1 inside the Locations
table and a city with Id
2 inside the same table, a row with distance from 1 to 2 will look like 1.2,'1000 miles'
. That column is indexed.
So to get the distance from the city 1000
to the city 2535
, we would find 1000.2535
inside the Distances
.
Besides getting single distances, we need to select groups of 1000 distances from those 6 billion rows:
`
SELECT
id
,distance
FROM
Distances
WHERE
id IN (1000.2535, 1.2, ...)
`
Right now we've only tested SQL Server on a local machine and it gives us around 300 ms for such query of 1000 rows, but only when we set a 50 ms timeout (this is needed for a lot of parallel requests from multiple users), if 50 ms timeout is not used it just grows exponentially like 300 ms for the first, 500 ms for the second, 800 ms for the third, etc.
And right now we taking a look at ElasticSearch specifically for mget
.
So my questions are:
1. Which database would you recommend for such a use case?
2. What would you recommend besides what we've thought of maybe some other ideas like splitting into two different columns cities IDs, etc?
3. What would be the best ways to optimize such a database?
Asked by Artem Biryukov
(3 rep)
Sep 24, 2020, 04:47 PM
Last activity: Sep 25, 2020, 08:59 AM
Last activity: Sep 25, 2020, 08:59 AM