Database Administrators

Q&A for database professionals who wish to improve their database skills

Latest Questions

0 votes

1 answers

42 views

Should I create 2 different databases if they would keep 70% similar data?

database-design cassandra distributed-databases complexity

I am writing a microservices based NodeJS-Cassandra application and I haVe few services they would need 70% similar data( like username, avatar, videos, etc.) and I am just wondering to know is it a good idea that I create 2 databases like: DB1: > A,B,C,D,E,F,G,H,I,J,K,L,M,N DB2: > A,B,C,D,E,F,G,H,I...

                                  I am writing a microservices based NodeJS-Cassandra application and I haVe few services they would need 70% similar data( like username, avatar, videos, etc.) and I am just wondering to know is it a good idea that I create 2 databases like:

DB1:

> A,B,C,D,E,F,G,H,I,J,K,L,M,N

DB2:

> A,B,C,D,E,F,G,H,I,J,W,X,Y,Z

Or just create one database like:

> A,B,C,D,E,F,G,H,I,J,K,L,M,N,W,X,Y,Z

I know from microservices concept that we should give it's own database to each service, but as long as my data contains some massive data like video files, I don't know what should I do in this specific case?

I like to know if there is any rule of thumb says **"if your databases would have more than x-% of similar data, or the data you should replicate in both databases exceeds x-GB, it's better to keep them in 1 database"**?

best_of_man (117 rep)

May 22, 2023, 05:34 PM • Last activity: May 23, 2023, 12:04 AM

2 votes

2 answers

68 views

Complicated relationship Q

sql-server sql-server-2016 query complexity

I'm trying to do a query to find products that *aren't* on the customer's authorized products list, and it's breaking my brain. Each customer has an authorized product group list, and within a product group each entry might consist of a Product ID, a Product Category ID, Vendor Number, Price Schedul...

                                  I'm trying to do a query to find products that *aren't* on the customer's authorized products list, and it's breaking my brain.

Each customer has an authorized product group list, and within a product group each entry might consist of a Product ID, a Product Category ID, Vendor Number, Price Schedule ID, or Brand ID (all different columns on the Product record).

Many authorized product group lists have a mix of 2 or more kinds of entries, and often there's some overlap between the different columns used as selection criteria. Of course, there's a code on the product group list item that indicates which column the entry is based on, so it's easy enough to parse.

On the Product side, I'd be looking for products without an "Inactive" flag that aren't on the authorized list for each customer.

The tables look roughly like this:

  **Customer**
    
  CustomerID    |AuthorizedListID
  ----------|----------------
  1           |345
  2           |345
  3           |117
    
  **Authorized List**
    
  ListID        |ListName
  ------|-----------------
  117           |Frozen goods
  345           |Pizza
    
  **Authorized List Items**
    
  ListKey   |ItemRefNo   |Type
  -------|---------|----
  117       |68          |BRND
  117       |202         |ITEM
  117       |1415        |ITEM
  117       |3012        |ITEM
  117       |61152       |CATG
  117       |31667       |PRIC
  345       |9615        |ITEM
  345       |2100        |ITEM
  345       |13229       |VNDR
  345       |87211       |CATG
   
  **Products**
    
  ProductID|CategoryKey|PriceKey|VendorKey|BrandKey|Inactive
  ---------|-----------|--------|---------|--------|--------
  202      |61152      |113     |45529    |16      |0
  1415     |4990       |5601    |882      |41009   |0
  2100     |61152      |37772   |9101     |20900   |0
  3557     |87211      |4101    |17743    |23777   |0
  4027     |5193       |644     |882      |68      |0
  4028     |5193       |580     |882      |68      |0
  9615     |87211      |1512    |17743    |8738    |0
  13017    |87211      |4040    |13229    |23795   |0
  42500    |87211      |20155   |17743    |4355    |0
  53529    |1011       |31667   |13229    |602     |1

While I have a fair bit of SQL experience, this has a level of complexity that's beyond me at this point...so part of what's messing me up is that I don't even know how to succinctly and accurately describe what I'm trying to do so I can do a search for it. I don't even know the appropriate tags for this question. Also, I forgot to mention earlier: T-SQL for SQL Server 2016R2.

Expected results would be a set of records with CustomerID and ProductID columns where the product wasn't in the authorized list for the customer and Inactive = 0. Ideally, this would be a view. 

I've gotten as far as this query:

    SELECT C.*, 
    A.ItemRefNo,
    A.Type,
    CASE 
    	WHEN CA.ProductId IS NOT NULL THEN CA.ProductId 
    	WHEN B.ProductId IS NOT NULL THEN B.ProductId
    	WHEN V.ProductId IS NOT NULL THEN V.ProductId 
    	WHEN P.ProductId IS NOT NULL THEN P.ProductId 
    	WHEN A.Type = 'ITEM' THEN A.ItemRefNo 
    	ELSE NULL 
    	END
    AS ProductId
    FROM Customer C
    INNER JOIN authorizedListItems A
    ON C.AuthorizedListId = A.ListKey 
    LEFT JOIN products CA
    	ON A.ItemRefNo = CA.CategoryKey 
    	AND A.Type = 'CATG'
    LEFT JOIN products B
    	ON A.ItemRefNo = B.BrandKey 
    	AND A.Type = 'BRND' 
    LEFT JOIN products V
    	ON A.ItemRefNo = V.VendorKey 
    	AND A.Type = 'VNDR' 
    LEFT JOIN products P
    	ON A.ItemRefNo = P.PriceKey 
    	AND A.Type = 'PRIC' 

Query results:

CustomerId|AuthorizedListId|ItemRefNo|Type|ProductId
----------|----------------|---------|----|---------
3         |117             |68       |BRND|4027
3         |117             |68       |BRND|4028
3         |117             |202      |ITEM|202
3         |117             |1415     |ITEM|1415
3         |117             |3012     |ITEM|3012
3         |117             |61152    |CATG|202
3         |117             |61152    |CATG|2100
1         |345             |9615     |ITEM|9615
2         |345             |9615     |ITEM|9615
1         |345             |2100     |ITEM|2100
2         |345             |2100     |ITEM|2100
1         |345             |13229    |VNDR|13017
1         |345             |13229    |VNDR|53529

The result I'd like to see is:

CustomerId|ProductId
----------|---------
1         |202
1         |1415
1         |4027
1         |4028
2         |202
2         |1415
2         |4027
2         |4028
3         |3557
3         |9615
3         |13017
3         |42500

Let me say in advance, I know I wasn't filtering inactive items in the query and the results include superfluous columns. 
                                

asasaki (63 rep)

May 9, 2023, 05:55 PM • Last activity: May 11, 2023, 12:36 AM

-1 votes

2 answers

558 views

To inquire about the query complexity of the following query in SQL

mysql query complexity

I would like to inquire about the complexity of the following example query in SQL: ``` SELECT id from DB where Country = "India" AND Size = "Large" ``` That is, taking the AND of rows matching the filters for any two columns out of **m>>2** columns. I want to return the first column which contain t...

I would like to inquire about the complexity of the following example query in SQL:

SELECT id
from DB
where Country = "India"
    AND Size = "Large"

That is, taking the AND of rows matching the filters for any two columns out of **m>>2** columns. I want to return the first column which contain the ids. Can it be done in time sublinear in the number of rows? Thank you.

Neel Karia (1 rep)

Feb 14, 2022, 09:31 AM • Last activity: Feb 15, 2022, 03:04 AM

1 votes

2 answers

382 views

Does the PostgreSQL array inTOAST storage have random access times?

postgresql array complexity

This question is related to PostgreSQL TOAST storage and the GIS.SE question: [Should TOAST compression be disabled for PostGIS?][1] *Basically, I was wondering if there is any **guarantee** of constant time complexity (O(1)) for the random access of array elements?* That is, to get `arr[n]`, is the...

                                  This question is related to PostgreSQL TOAST storage and the GIS.SE question: Should TOAST compression be disabled for PostGIS? 

*Basically, I was wondering if there is any **guarantee** of constant time complexity (O(1)) for the random access of array elements?*

That is, to get arr[n], is the worst case number of steps required a constant (i.e. O(1) ) or something else (O(log n) etc.)?

I am asking because in certain data forms such as PostGIS lines or rasters, the main data is logically an array (of coordinates). It is known that access time for such data can be extremely long once the data amount exceed a certain limit (like 500 points). A possible reason is that data of such sizes are transferred to TOAST storage and potentially the data can be compressed (e.g. with the main storage). It's not clear how PostgreSQL can predict the approximate location of an element and still offer random access time.

The access time for arrays in most programming languages is constant (O(1)). And that is the point of using an array. Just out of curiosity: 

*Do PostgreSQL arrays have constant access times?* (when and when not?)

tinlyx (3820 rep)

Oct 12, 2021, 09:45 AM • Last activity: Oct 13, 2021, 11:15 AM

Showing page 1 of 4 total questions