Sample Header Ad - 728x90

NoSQL DB candidate for the project (looking for experienced advice)

-1 votes
1 answer
66 views
## Intro ## I am stuck with the limitations of the DBs I've tested* and need someone experienced to give me some advice. ## Project details ## ### Data sets ### The project has 2 sets of data, the sets of items and their correlated prices. Items are the list of all the supported items with their properties. Prices are the list of item's price data for many countries where each country may have multiple stores (shops) that offer the item. This one-many-many (item-country-store) relation makes it hard to flatten the document into a single set of data where each item has prices embedded as properties. ### Main goal ### The user should be able to get a list of items with their lowest price offer after applying preferred filters and sort. The user should be able to filter by item properties (such as item.title LIKE 'The %', item.rating > 8 or other filters) AND/OR price properties (such as price.value = 50 or other filters). The user should be able to sort by either the selected item property (sort by item.title ASC or item.rating DESC) or the price property (sort by price.value ASC, price.discount DESC) The response needs to be fast so the user can quickly paginate through the list. I suppose the DB caching, pre-sorted collections (such as ArangoDB views ), pre-aggregated (filtered) collections (such as MongoDB on-demand materialized views ) or some other solution is a must, but correct me if I am wrong. Relying on just indexing proved to be slow in the DBs I've tried since there are around 150 000 items with the total records of prices easily going over a million of records per country. ### Limitations ### I cannot afford to use enterprise (proprietary) or cloud licenses so I would have to stick to open source (community) DB projects. ## Question ## Based on your experience, which DB would you recommend for this use case and why? --- * I've tested MariaDB, Redis (with RedisJSON module) and ArangoDB. ### MariaDB ### Since it's a relational DB, it's too slow to JOIN more than 10 different tables to get a single list. I use MariaDB to store the data, but I need another solution for the frontend serving. **Edit** I did not give details about my MariaDB implementation because I thought it was irrelevant since I asked for a NoSQL solution. Looks like this is the part that everyone wants to know about. I said it was too slow because the items and prices data types in MariaDB are spread out to multiple tables. In MariaDB, item has 17 related tables with mostly one-to-many relations. Think of properties such as item.genre, item.category, item.rating and others. Price has up to 6 relations with mostly one-to-one relations. When flattened to JSON documents, everything fits into two data sets, hence I said items and prices in NoSQL. Because of the complexity of relations, just building one whole item _should_ take longer in SQL (even with the simplest ID index relations) than fetching one document from NoSQL. This is especially because of a lot of one-to-many relations, which would take more than one query to build the arrays. I am not an expert and I do not claim to be one so correct me if I am wrong. ### Redis ### Redis with its RedisJSON module looked like a great candidate because it's an in-memory DB, but there were too many limitations to make it a viable option. Because of the prices complexity, I am unable to flatten the data into a single JSON document so I have multiple roundtrips to get the "JOINed" results. The complexity of multiple roundtrips escalated when I had to introduce filtering by both data sets—I had to filter prices to see which items had an offer within the wished ranges, then filter items to see which of those with an offer were within wished item ranges, then, for example, sort by item.title, and finally get the best corresponding price details. I lost performance since I had to use FT.AGGREGATE with SORT by the grouped property—this made Redis work on results without the use of index. Final nail on the coffin was Redis being single-threaded. When I ran multiple queries at the same time, I saw queries go over 10 seconds and time out. ### ArangoDB ### ArangoDB was a great candidate until I saw missing data in the results. It has neat caching, inverted indexes with pre-sorted collections in views and even full-text search on the collections. I was able to separate prices into collections for each country and then create multiple views to include all possible sort options. The pre-sorted view on price.value was huge because I was able to always get the lowest price offer just by doing a FIRST() in the subquery (AQL syntax similar to LIMIT 1 in SQL). The queries were very fast once the results got cached. The main issue with ArangoDB I had was getting the nulls in the results array while I was doing UPDATEs* on the collections. Examples... Query without update:
[{"id":"348","t":"Item 348"}, {"id":"112","t":"Item 112"}, {"id":"225","t":"Item 225"}]
Query when update starts:
[null,null,null]
Mid-update query:
[null,null,{"id":"225","t":"Item 225"}]
I thought it was an issue because I would first run a subquery to get the candidate document _keys and then run the main query with those _keys, and if there was an UPDATE, the document might not exist during that short time span. But because an UPDATE should not first DELETE and then INSERT the document (the way the REPLACE usually works), I doubt this is the case. I couldn't get the answers from the support. This issue made it unreliable because the user would get nulls as the results, which would also break my frontend app. * ArangoDB has the ImportDocuments function to import multiple documents at once with an option to UPDATE on duplicate. --- ### Note ### I have also considered ElasticSearch, but I would not be able to afford it if the project grows and needs any kind of scaling.
Asked by Delicious Bacon (99 rep)
Aug 26, 2022, 10:23 AM
Last activity: Aug 26, 2022, 02:19 PM