NoSQL DB candidate for the project (looking for experienced advice)
-1
votes
1
answer
66
views
## Intro ##
I am stuck with the limitations of the DBs I've tested* and need someone experienced to give me some advice.
## Project details ##
### Data sets ###
The project has 2 sets of data, the sets of
items
and their correlated prices
.
Items
are the list of all the supported items with their properties.
Prices
are the list of item
's price data for many countries where each country may have multiple stores (shops) that offer the item
. This one-many-many (item-country-store) relation makes it hard to flatten the document into a single set of data where each item
has prices
embedded as properties.
### Main goal ###
The user should be able to get a list of item
s with their lowest price offer after applying preferred filters and sort.
The user should be able to filter by item
properties (such as item.title LIKE 'The %'
, item.rating > 8
or other filters) AND/OR price
properties (such as price.value = 50
or other filters).
The user should be able to sort by either the selected item
property (sort by item.title ASC
or item.rating DESC
) or the price
property (sort by price.value ASC
, price.discount DESC
)
The response needs to be fast so the user can quickly paginate through the list. I suppose the DB caching, pre-sorted collections (such as ArangoDB views ), pre-aggregated (filtered) collections (such as MongoDB on-demand materialized views ) or some other solution is a must, but correct me if I am wrong. Relying on just indexing proved to be slow in the DBs I've tried since there are around 150 000 items
with the total records of prices
easily going over a million of records per country.
### Limitations ###
I cannot afford to use enterprise (proprietary) or cloud licenses so I would have to stick to open source (community) DB projects.
## Question ##
Based on your experience, which DB would you recommend for this use case and why?
---
*
I've tested MariaDB, Redis (with RedisJSON module) and ArangoDB.
### MariaDB ###
Since it's a relational DB, it's too slow to JOIN more than 10 different tables to get a single list. I use MariaDB to store the data, but I need another solution for the frontend serving.
**Edit**
I did not give details about my MariaDB implementation because I thought it was irrelevant since I asked for a NoSQL solution. Looks like this is the part that everyone wants to know about.
I said it was too slow because the items
and prices
data types in MariaDB are spread out to multiple tables.
In MariaDB, item
has 17 related tables with mostly one-to-many relations. Think of properties such as item.genre
, item.category
, item.rating
and others. Price
has up to 6 relations with mostly one-to-one relations.
When flattened to JSON documents, everything fits into two data sets, hence I said items
and prices
in NoSQL.
Because of the complexity of relations, just building one whole item
_should_ take longer in SQL (even with the simplest ID index relations) than fetching one document from NoSQL. This is especially because of a lot of one-to-many relations, which would take more than one query to build the arrays.
I am not an expert and I do not claim to be one so correct me if I am wrong.
### Redis ###
Redis with its RedisJSON module looked like a great candidate because it's an in-memory DB, but there were too many limitations to make it a viable option.
Because of the prices
complexity, I am unable to flatten the data into a single JSON document so I have multiple roundtrips to get the "JOINed" results. The complexity of multiple roundtrips escalated when I had to introduce filtering by both data sets—I had to filter prices
to see which items had an offer within the wished ranges, then filter items
to see which of those with an offer were within wished item
ranges, then, for example, sort by item.title
, and finally get the best corresponding price
details.
I lost performance since I had to use FT.AGGREGATE
with SORT
by the grouped property—this made Redis work on results without the use of index.
Final nail on the coffin was Redis being single-threaded. When I ran multiple queries at the same time, I saw queries go over 10 seconds and time out.
### ArangoDB ###
ArangoDB was a great candidate until I saw missing data in the results.
It has neat caching, inverted indexes with pre-sorted collections in view
s and even full-text search on the collections.
I was able to separate prices
into collections for each country and then create multiple view
s to include all possible sort options.
The pre-sorted view
on price.value
was huge because I was able to always get the lowest price offer just by doing a FIRST()
in the subquery (AQL syntax similar to LIMIT 1
in SQL).
The queries were very fast once the results got cached.
The main issue with ArangoDB I had was getting the null
s in the results array while I was doing UPDATE
s* on the collections.
Examples...
Query without update:
[{"id":"348","t":"Item 348"}, {"id":"112","t":"Item 112"}, {"id":"225","t":"Item 225"}]
Query when update starts:
[null,null,null]
Mid-update query:
[null,null,{"id":"225","t":"Item 225"}]
I thought it was an issue because I would first run a subquery to get the candidate document _key
s and then run the main query with those _key
s, and if there was an UPDATE
, the document might not exist during that short time span. But because an UPDATE
should not first DELETE
and then INSERT
the document (the way the REPLACE
usually works), I doubt this is the case. I couldn't get the answers from the support.
This issue made it unreliable because the user would get null
s as the results, which would also break my frontend app.
*
ArangoDB has the ImportDocuments
function to import multiple documents at once with an option to UPDATE
on duplicate.
---
### Note ###
I have also considered ElasticSearch, but I would not be able to afford it if the project grows and needs any kind of scaling.
Asked by Delicious Bacon
(99 rep)
Aug 26, 2022, 10:23 AM
Last activity: Aug 26, 2022, 02:19 PM
Last activity: Aug 26, 2022, 02:19 PM