Which database to use when you have a 6 billion of rows inside and need to query rows with from list of IDs?

-1 votes

2 answers

397 views

database-design nosql database-recommendation elasticsearch

We are currently researching our case of storing the road distances between cities. Right now we have 6 billion of those distances. Our structure right now in SQL Server is that we have a float which represents the relationship between cities. For example if we have a city with Id 1 inside the Locations table and a city with Id 2 inside the same table, a row with distance from 1 to 2 will look like 1.2,'1000 miles'. That column is indexed. So to get the distance from the city 1000 to the city 2535, we would find 1000.2535 inside the Distances. Besides getting single distances, we need to select groups of 1000 distances from those 6 billion rows:

`
SELECT
  id
 ,distance
FROM
  Distances 
WHERE 
  id IN (1000.2535, 1.2, ...)

` Right now we've only tested SQL Server on a local machine and it gives us around 300 ms for such query of 1000 rows, but only when we set a 50 ms timeout (this is needed for a lot of parallel requests from multiple users), if 50 ms timeout is not used it just grows exponentially like 300 ms for the first, 500 ms for the second, 800 ms for the third, etc. And right now we taking a look at ElasticSearch specifically for mget. So my questions are: 1. Which database would you recommend for such a use case? 2. What would you recommend besides what we've thought of maybe some other ideas like splitting into two different columns cities IDs, etc? 3. What would be the best ways to optimize such a database?

Asked by Artem Biryukov (3 rep)

Sep 24, 2020, 04:47 PM
Last activity: Sep 25, 2020, 08:59 AM

Which database to use when you have a 6 billion of rows inside and need to query rows with from list of IDs?

Related Questions