Database Administrators
Q&A for database professionals who wish to improve their database skills
Latest Questions
0
votes
1
answers
42
views
Should I create 2 different databases if they would keep 70% similar data?
I am writing a microservices based NodeJS-Cassandra application and I haVe few services they would need 70% similar data( like username, avatar, videos, etc.) and I am just wondering to know is it a good idea that I create 2 databases like: DB1: > A,B,C,D,E,F,G,H,I,J,K,L,M,N DB2: > A,B,C,D,E,F,G,H,I...
I am writing a microservices based NodeJS-Cassandra application and I haVe few services they would need 70% similar data( like username, avatar, videos, etc.) and I am just wondering to know is it a good idea that I create 2 databases like:
DB1:
> A,B,C,D,E,F,G,H,I,J,K,L,M,N
DB2:
> A,B,C,D,E,F,G,H,I,J,W,X,Y,Z
Or just create one database like:
> A,B,C,D,E,F,G,H,I,J,K,L,M,N,W,X,Y,Z
I know from microservices concept that we should give it's own database to each service, but as long as my data contains some massive data like video files, I don't know what should I do in this specific case?
I like to know if there is any rule of thumb says **"if your databases would have more than x-% of similar data, or the data you should replicate in both databases exceeds x-GB, it's better to keep them in 1 database"**?
best_of_man
(117 rep)
May 22, 2023, 05:34 PM
• Last activity: May 23, 2023, 12:04 AM
2
votes
2
answers
68
views
Complicated relationship Q
I'm trying to do a query to find products that *aren't* on the customer's authorized products list, and it's breaking my brain. Each customer has an authorized product group list, and within a product group each entry might consist of a Product ID, a Product Category ID, Vendor Number, Price Schedul...
I'm trying to do a query to find products that *aren't* on the customer's authorized products list, and it's breaking my brain.
Each customer has an authorized product group list, and within a product group each entry might consist of a Product ID, a Product Category ID, Vendor Number, Price Schedule ID, or Brand ID (all different columns on the Product record).
Many authorized product group lists have a mix of 2 or more kinds of entries, and often there's some overlap between the different columns used as selection criteria. Of course, there's a code on the product group list item that indicates which column the entry is based on, so it's easy enough to parse.
On the Product side, I'd be looking for products without an "Inactive" flag that aren't on the authorized list for each customer.
The tables look roughly like this:
**Customer**
CustomerID |AuthorizedListID
----------|----------------
1 |345
2 |345
3 |117
**Authorized List**
ListID |ListName
------|-----------------
117 |Frozen goods
345 |Pizza
**Authorized List Items**
ListKey |ItemRefNo |Type
-------|---------|----
117 |68 |BRND
117 |202 |ITEM
117 |1415 |ITEM
117 |3012 |ITEM
117 |61152 |CATG
117 |31667 |PRIC
345 |9615 |ITEM
345 |2100 |ITEM
345 |13229 |VNDR
345 |87211 |CATG
**Products**
ProductID|CategoryKey|PriceKey|VendorKey|BrandKey|Inactive
---------|-----------|--------|---------|--------|--------
202 |61152 |113 |45529 |16 |0
1415 |4990 |5601 |882 |41009 |0
2100 |61152 |37772 |9101 |20900 |0
3557 |87211 |4101 |17743 |23777 |0
4027 |5193 |644 |882 |68 |0
4028 |5193 |580 |882 |68 |0
9615 |87211 |1512 |17743 |8738 |0
13017 |87211 |4040 |13229 |23795 |0
42500 |87211 |20155 |17743 |4355 |0
53529 |1011 |31667 |13229 |602 |1
While I have a fair bit of SQL experience, this has a level of complexity that's beyond me at this point...so part of what's messing me up is that I don't even know how to succinctly and accurately describe what I'm trying to do so I can do a search for it. I don't even know the appropriate tags for this question. Also, I forgot to mention earlier: T-SQL for SQL Server 2016R2.
Expected results would be a set of records with CustomerID and ProductID columns where the product wasn't in the authorized list for the customer and Inactive = 0. Ideally, this would be a view.
I've gotten as far as this query:
SELECT C.*,
A.ItemRefNo,
A.Type,
CASE
WHEN CA.ProductId IS NOT NULL THEN CA.ProductId
WHEN B.ProductId IS NOT NULL THEN B.ProductId
WHEN V.ProductId IS NOT NULL THEN V.ProductId
WHEN P.ProductId IS NOT NULL THEN P.ProductId
WHEN A.Type = 'ITEM' THEN A.ItemRefNo
ELSE NULL
END
AS ProductId
FROM Customer C
INNER JOIN authorizedListItems A
ON C.AuthorizedListId = A.ListKey
LEFT JOIN products CA
ON A.ItemRefNo = CA.CategoryKey
AND A.Type = 'CATG'
LEFT JOIN products B
ON A.ItemRefNo = B.BrandKey
AND A.Type = 'BRND'
LEFT JOIN products V
ON A.ItemRefNo = V.VendorKey
AND A.Type = 'VNDR'
LEFT JOIN products P
ON A.ItemRefNo = P.PriceKey
AND A.Type = 'PRIC'
Query results:
CustomerId|AuthorizedListId|ItemRefNo|Type|ProductId
----------|----------------|---------|----|---------
3 |117 |68 |BRND|4027
3 |117 |68 |BRND|4028
3 |117 |202 |ITEM|202
3 |117 |1415 |ITEM|1415
3 |117 |3012 |ITEM|3012
3 |117 |61152 |CATG|202
3 |117 |61152 |CATG|2100
1 |345 |9615 |ITEM|9615
2 |345 |9615 |ITEM|9615
1 |345 |2100 |ITEM|2100
2 |345 |2100 |ITEM|2100
1 |345 |13229 |VNDR|13017
1 |345 |13229 |VNDR|53529
The result I'd like to see is:
CustomerId|ProductId
----------|---------
1 |202
1 |1415
1 |4027
1 |4028
2 |202
2 |1415
2 |4027
2 |4028
3 |3557
3 |9615
3 |13017
3 |42500
Let me say in advance, I know I wasn't filtering inactive items in the query and the results include superfluous columns.
asasaki
(63 rep)
May 9, 2023, 05:55 PM
• Last activity: May 11, 2023, 12:36 AM
-1
votes
2
answers
558
views
To inquire about the query complexity of the following query in SQL
I would like to inquire about the complexity of the following example query in SQL: ``` SELECT id from DB where Country = "India" AND Size = "Large" ``` That is, taking the AND of rows matching the filters for any two columns out of **m>>2** columns. I want to return the first column which contain t...
I would like to inquire about the complexity of the following example query in SQL:
SELECT id
from DB
where Country = "India"
AND Size = "Large"
That is, taking the AND of rows matching the filters for any two columns out of **m>>2** columns. I want to return the first column which contain the ids.
Can it be done in time sublinear in the number of rows?
Thank you.
Neel Karia
(1 rep)
Feb 14, 2022, 09:31 AM
• Last activity: Feb 15, 2022, 03:04 AM
1
votes
2
answers
382
views
Does the PostgreSQL array inTOAST storage have random access times?
This question is related to PostgreSQL TOAST storage and the GIS.SE question: [Should TOAST compression be disabled for PostGIS?][1] *Basically, I was wondering if there is any **guarantee** of constant time complexity (O(1)) for the random access of array elements?* That is, to get `arr[n]`, is the...
This question is related to PostgreSQL TOAST storage and the GIS.SE question: Should TOAST compression be disabled for PostGIS?
*Basically, I was wondering if there is any **guarantee** of constant time complexity (O(1)) for the random access of array elements?*
That is, to get
arr[n]
, is the worst case number of steps required a constant (i.e. O(1)
) or something else (O(log n)
etc.)?
I am asking because in certain data forms such as PostGIS lines or rasters, the main data is logically an array (of coordinates). It is known that access time for such data can be extremely long once the data amount exceed a certain limit (like 500 points). A possible reason is that data of such sizes are transferred to TOAST storage and potentially the data can be compressed (e.g. with the main
storage). It's not clear how PostgreSQL can predict the approximate location of an element and still offer random access time.
The access time for arrays in most programming languages is constant (O(1)
). And that is the point of using an array. Just out of curiosity:
*Do PostgreSQL arrays have constant access times?* (when and when not?)
tinlyx
(3820 rep)
Oct 12, 2021, 09:45 AM
• Last activity: Oct 13, 2021, 11:15 AM
Showing page 1 of 4 total questions