Sample Header Ad - 728x90

What type of database should I use for this application?

0 votes
1 answer
75 views
I am currently designing a mobile application that sends push notifications to users subscribed to a set of topics. The list of possible topics changes infrequently (not more than once per year), while users are frequently added or removed, and their subscription list changes frequently. I have researched both relational databases and key/value databases, and am struggling to determine the best type of database for this problem. I have several parameters/restrictions on the design of the database: 1. Given any
, quickly retrieve the list of topics that a user is subscribed to 2. Given any
, quickly retrieve the list of all users subscribed to that topic 3. Allow persistence to disk as a single file to allow easy network transfer across machines, but cross-platform compatibility is not required as the file will only be accessed across homogenous machines. 4. Preference is given to proven and robust solutions; but if using a prebuilt solution, it must be free or open source (working with a strict budget) 5. Can use as much disk space as needed, but must not require excessive amounts of CPU or RAM 6. Built-in replication capability and consistency assurance is preferred, but not required 7. macOS solutions are preferred, Linux-only solutions are permitted, Windows-only solutions are discouraged My thinking is that since there is both an association between users and topics, I need two key/value tables as in the following schema:
UserTable  = {userID : [topic]}
TopicTable = {topicID: [user]}
where
{:}
denotes a hash map (such as
::map
in C++),
[...]
denotes a dynamic-length array (such as
::vector
in C++). Note that
is a pointer to specific
(userID, topic)
key-value pair in
, and
is a pointer to a specific
(topicID, user)
key-value pair in
. This is a data structure I was able to implement in C++ successfully, and this satisfies requirements (1), (2), (5), and (7). Unfortunately, there doesn't seem to be an industry-standard method of efficiently serializing pointers to disk, which seemingly make (3) and (6) impossible. The cereal or boost C++ libraries have methods for this purpose, but I don't know if they are designed for real-time performance, or if they could be considered to satisfy (4). All of these issues make me think that perhaps I need to rethink my database schema, but then I'm back to having issues with (1) and (2). I have thought of several ways of doing so. One would be to store the data as JSON and serialize directly to disk with a fast and compact binary representation such as BSON, but this would seem to involve an excessive amount of I/O operations as the table grows larger, and reduce performance due to the increased number of cache misses and page loads required. I could use a relational database such as Python's
, which seems to meet all of the requirements, but my lack of background in databases makes me unable to determine how I would make this data fit the relational model (would the primary key be
, or
? How could I ensure fast lookups by either key, when a user could potentially subscribe to 100 different topics, or a topic might have 100,000 users? Surely I wouldn't make one column for each of the ~10,000 possible topics, when it is rare for a user to subscribe to more than 10?) I've also considered Erlang's
as it is one of the only databases I have found besides Python's
which is tightly coupled with a general-purpose programming language, and it seems to support (6) nicely. Unfortunately, I haven't been able to find any clear information about the performance of
or Erlang in general. **Important Note** I understand many users may suggest I benchmark each of these options. Unfortunately, since I am new to database management, even developing a benchmark for each of these options will require a significant investment into learning the associated frameworks. This is also a startup company whose mobile application has not yet been released, so any benchmark is likely a poor representation of real-world traffic. As with any new app, most likely user onboarding will be a slow and gradual process, but I want to avoid what would amount to an unintentional DoS attack if the app goes "viral" and garners a large number of downloads quickly. This is why the numbers given above are loose estimates of a worst-case scenario (with the exception of possible topics, which is guaranteed to be ~10k for the foreseeable future), and why I am unable to perform adequate benchmarking as user activity remains a significant unknown variable that cannot be eliminated prior to launch. **EDIT** @FrankHeikens mentioned in comments that my requirement (3) is highly restrictive. I originally sought a single-file model due to the ease of backing up or transferring such a database; however, if replication and backup capabilities are included in a solution this requirement can be relaxed.
Asked by Math Rules (178 rep)
Jul 21, 2025, 05:21 PM
Last activity: Jul 22, 2025, 06:25 AM