Database Administrators
Q&A for database professionals who wish to improve their database skills
Latest Questions
2
votes
1
answers
465
views
Cosmos DB ODBC driver installation problem
We are trying to install Cosmos DB ODBC Driver following the instructions provided [in this Microsoft Azure resource](https://learn.microsoft.com/en-us/azure/cosmos-db/odbc-driver) and are running into the following error: >The setup routines for Microsoft DocumentDB ODBC Driver ODBC driver could no...
We are trying to install Cosmos DB ODBC Driver following the instructions provided [in this Microsoft Azure resource](https://learn.microsoft.com/en-us/azure/cosmos-db/odbc-driver) and are running into the following error:
>The setup routines for Microsoft DocumentDB ODBC Driver ODBC driver could not be loaded due to a system error code 126: The specified module could not be found. (C:\Program Files\Microsoft DocumentDB ODBC Driver\lib\DocumentDBODBC_sb64.dll).
We've tried unsuccessfully on Windows Server 2008R2 and Windows Server 2012R2. We have it working on several Windows 10 desktops, just not server OS's unfortunately. We tried several things like installing using run as administrator, from command line, with both 32 and 64 bit installed, and then one or the other. We opened the location of the DLL to 'Everyone/full control' We checked that the Visual C++ redistributables are installed although not sure which specific ones are needed.
Any advice would be greatly appreciated.
Ben Saunders
(21 rep)
Jan 11, 2018, 01:56 PM
• Last activity: Jul 26, 2025, 06:08 AM
0
votes
1
answers
212
views
How to connect in case Azure Cosmos DB default force use TLS/SSL?
My case: Client: Windows 10 x64. pgAdmin 4 version 8.4 (latest). Server: Cloud Azure Cosmos DB for PostgreSQL burstable, version 16. [![enter image description here][1]][1] My connection string will be likes this ``` jdbc:postgresql://c-donhuvycluster.qs4vulsae3t7uz.postgres.cosmos.azure.com:5432/vy...
My case:
Client: Windows 10 x64. pgAdmin 4 version 8.4 (latest).
Server: Cloud Azure Cosmos DB for PostgreSQL burstable, version 16.
My connection string will be likes this

jdbc:postgresql://c-donhuvycluster.qs4vulsae3t7uz.postgres.cosmos.azure.com:5432/vydb?user=citus&password={your_password}&sslmode=require
Encrypted connections
This cluster enforces encrypted connections using Transport Layer Security (TLS). For information on TLS version and certificates, refer to connecting with TLS/SSL.
How to connect in case Azure Cosmos DB default force use TLS/SSL? Please guide me. I search video tutorial on YouTube but did not see.
I did not know how to face with sslmode=require
.
My workaround: https://learn.microsoft.com/en-us/azure/cosmos-db/postgresql/howto-ssl-connection-security#applications-that-require-certificate-verification-for-tls-connectivity
But Is this command support Windows OS (I can run OpenSSL with Git Bash for Windows)? , how to declare certificate it with pgAdmin v4 8.4?
David Lapetina
(219 rep)
Mar 11, 2024, 02:02 AM
• Last activity: Jun 16, 2025, 03:09 AM
0
votes
0
answers
39
views
Handling feed and feed subscriptions with document database
I have a data feed that contains channels that users can subscribe to. It's similar to Facebook or LinkedIn where each channel is a topic and users can subscribe to as many channels as they like and when they do, content posted to that channel appear in user's feed. The best design I've come up with...
I have a data feed that contains channels that users can subscribe to. It's similar to Facebook or LinkedIn where each channel is a topic and users can subscribe to as many channels as they like and when they do, content posted to that channel appear in user's feed.
The best design I've come up with in handling this scenario is still with the good old relational database because with just three simple tables -- Channels, Content and Subscriptions -- and one join, it's possible to get a user's feed. The best part about this is that it seems pretty efficient and no duplicate data whatsoever.
I've been thinking about implementing this using a document database but I have not been able to figure out an equally simple and efficient design. Even though "joins" can create performance issues in relational databases, in this scenario the single join really seems to help create a simple and efficient solution.
Has anyone implemented a feed with a document database backend? I especially think about what happens when the number of subscribers to a channel reaches millions e.g. say 2 million subscribers to sports channel. In a document database scenario, there always seems to be duplicated data and when the numbers hit millions, a document database solution just doesn't seem appropriate.
I'd love to hear comments and suggestions on this.
P.S. I remember reading an article about Facebook using a relational database for their feed and it just seems to make sense. As I understand it, they addressed performance issues as the numbers grew by adding additional layers of memory based databases in front of the relational database. I'd love to hear any insights into how these hyperscale platforms handle things.
Sam
(113 rep)
Mar 20, 2024, 05:35 AM
1
votes
0
answers
109
views
MongoDB: Queries running 3x slow on Azure Serverless Mongodb compared to MongoDb Atlas Free Tier server
I have a database in MongoDB Atlas using the free tier package. I recently transferred my data to Azure Serverless CosmosDB for MongoDB using mongoexport. After the data transfer, I've noticed that the query times have increased significantly. Specifically, the queries are now taking 3 times longer...
I have a database in MongoDB Atlas using the free tier package.
I recently transferred my data to Azure Serverless CosmosDB for MongoDB using mongoexport. After the data transfer, I've noticed that the query times have increased significantly. Specifically, the queries are now taking 3 times longer in Azure CosmosDB compared to their original performance in MongoDB Atlas. **Details:** **Database size:** 623 MB
**Types of queries:** Get Queries / Facet Aggregation
**Indexing:** It Seems Same as the old server **Questions:** Are there known performance differences between MongoDB Atlas and Azure Serverless CosmosDB for MongoDB that could account for this increase in query time?
Could the mongoexport process have impacted the data in a way that affects performance?
What steps can I take to diagnose and improve the performance of my queries in Azure CosmosDB?
I recently transferred my data to Azure Serverless CosmosDB for MongoDB using mongoexport. After the data transfer, I've noticed that the query times have increased significantly. Specifically, the queries are now taking 3 times longer in Azure CosmosDB compared to their original performance in MongoDB Atlas. **Details:** **Database size:** 623 MB
**Types of queries:** Get Queries / Facet Aggregation
**Indexing:** It Seems Same as the old server **Questions:** Are there known performance differences between MongoDB Atlas and Azure Serverless CosmosDB for MongoDB that could account for this increase in query time?
Could the mongoexport process have impacted the data in a way that affects performance?
What steps can I take to diagnose and improve the performance of my queries in Azure CosmosDB?
Veli Eroglu
(11 rep)
Dec 18, 2023, 10:29 AM
• Last activity: Dec 18, 2023, 10:34 AM
1
votes
0
answers
87
views
How to efficiently manage large amount data in Web apps
I am developing an application that needs to store a large amount of product data and display it in a web app. The data is generated every day, and each product will have approximately 100 rows per day, resulting in a total of approximately 1 million rows for the entire application per day. One item...
I am developing an application that needs to store a large amount of product data and display it in a web app. The data is generated every day, and each product will have approximately 100 rows per day, resulting in a total of approximately 1 million rows for the entire application per day.
One item looks like following:
{
"ProductId": "P-1",
"ReportDate": "2023-12-12",
"Type": "T-1",
"Usage": 100
}
Requirements:
- store data from now back 3 years
- performance for the last 3 months must be very fast
- user can choose any date range from now to 3 years back, *i.e. 2023-12-01 - 2023-12-12 or 2020-12-12 - 2023-12-12.*
The data is displaying in the table. The table contains a list of products with pagination, set to a default of 50 but can be changed up to 100. Each product is expandable, revealing an inner table with the product's data.
**Now, the problem I am facing is** that one day has 100 rows per product. So, for one page with 50 items, I need to load 50 * 100 rows. However, the date range makes it more complicated because a longer range means more data is loaded. Essentially, the data is not loaded directly to the web app; it is aggregated in memory, and the web app still displays only 100 rows per product. Since the date range depends entirely on the user, I don't have aggregated data by date range combinations.
Main query looks like following:
SELECT ProductId, Type, SUM(Usage) FROM c WHERE c.ProductId = "P-1" AND c.ReportDate >= '2023-12-01' and c.ReportDate <= '2023-12-12' GROUP BY c.ProductId, c.Type
Now, I am exploring how to handle this. Where should I store it? Our application uses Azure SQL, and some data is stored in CosmosDB.
Initially, I considered storing the data in CosmosDB. However, the data has dependencies on Azure SQL, such as the product name. Since the name can change, and these changes must be reflected in the product's data, I don't think it's a good idea to store them together. Additionally, the data is filtered by permissions stored in Azure SQL. For these reasons, I am leaning towards using Azure SQL. The potential issue is the size of the data; the table will be approximately 750 GB for three years of data.
I am also considering storing the last 3 months of data in Azure SQL as 'hot data,' and storing all remaining data, including those 3 months, in a different storage, perhaps CosmosDB, as 'cold data.
Moreover, I am considering sharding, but I'm unsure if it's suitable when I need access to some common data.
Please, do you have any advice on how to approach this?
Thanks for any advice.
Flow
(11 rep)
Dec 12, 2023, 06:21 PM
0
votes
0
answers
194
views
Linked service: SQL Server-2019 to Cosmos DB
I have created linked service for importing data from Azure cosmos DB to on-prem MS-SQL-2019. Services created from SSMS and test connection is successful, but query is getting error. Query: ``` select * from openquery ( linked_service_name,'select * from cosmos_collection'); ``` >Error: "Error occu...
I have created linked service for importing data from Azure cosmos DB to on-prem MS-SQL-2019. Services created from SSMS and test connection is successful, but query is getting error.
Query:
select * from openquery (
linked_service_name,'select * from
cosmos_collection');
>Error: "Error occurred while submitting query text to OLE DB provider "SSISOLEDB" for linked server "linked_service_name"
**Linked service set-up:**
Provider: SQL Server OLE DB
Data Source: User DSN [ Created in ODBC Data source admin ]
Provider String: Secondary key from Cosmos DB
Catalog: Schema name of Cosmos DB
maverick_1
(1 rep)
Jun 21, 2023, 09:42 AM
• Last activity: Jun 22, 2023, 09:48 AM
1
votes
1
answers
401
views
Product Object for Ecommerce in NoSQL
I'm working on an ecommerce app that will be using a NoSQL document database. I want to have a general purpose product that can be used for different types of products, ranging from coffee to furniture to electronics. I thought one way to achieve this is to have "attributes" and "values" so that dep...
I'm working on an ecommerce app that will be using a NoSQL document database. I want to have a general purpose product that can be used for different types of products, ranging from coffee to furniture to electronics.
I thought one way to achieve this is to have "attributes" and "values" so that depending on the product, there can be different ones and I can use them as "options" for the user to select on the front-end.
For example, in the case of a tshirt, attributes and their values could be:
{
"productOptions": [
{ "colors": ["red", "white", "blue"] },
{ "sizes": ["small", "medium", "large", "extra large"] }
]
}
There are 4 issues that I'm trying to solve here:
1. It's fairly typical that each unique configuration will have a different SKU number
2. In some cases, it's also fairly common to have a different price point for a particular SKU. For example, the regular tshirt price could be $9.99 but the extra large could be $12.99.
3. Another possibility is that a particular configuration option may not be available. Using the
Product
object example above, red color may not be available in extra large size
4. Most ecommerce systems will also allow merchants to upload an Excel sheet for importing their data into the database as nobody would like to create even 20 products with several options manually
This means, each product option/configuration may have to be represented with its own SKU and price in the database, meaning individual entries. Something like:
{
"id": 123,
"sku": "987-rs",
"name": "Quality Brand Tshirt",
"description": "100% Cotton Small Red",
"price": 9.99
},
{
"id": 234,
"sku": "987-rxl",
"name": "Quality Brand Tshirt",
"description": "100% Cotton XL Red",
"price": 12.99
}
I also thought I could have both the attribute/value in the database as well as all available SKU's, i.e. both examples above used togehter. This would allow handling product options easily on the front-end while still allowing for individual SKUs and pricing.
Another challenge with general purpose product design is that in some cases, there's no actual product configuration until the user creates it. For example, coffee or pizza. It would make no sense to create every combination of toppings and assign SKUs ahead of time. This situation is not unique to food products. The same scenario actually applies to regular products like clothing if any customization is involved.
I'm sure many others have already tackled this scenario and wanted to see how they handled it.
Sam
(113 rep)
Jan 3, 2023, 09:24 PM
• Last activity: Jan 4, 2023, 02:02 PM
2
votes
1
answers
1215
views
How to store and query a tree structure in Cosmos DB?
I'm migrating a relational DB over to Cosmos DB and am unsure about a tree structure. There are several thousand rows of data ("Offerings") where each offering can have child items arranged in a tree hierarchy. In the source DB it's using the traditional `ParentID` tactic: ``` | ID | Content | Paren...
I'm migrating a relational DB over to Cosmos DB and am unsure about a tree structure.
There are several thousand rows of data ("Offerings") where each offering can have child items arranged in a tree hierarchy.
In the source DB it's using the traditional
ParentID
tactic:
| ID | Content | ParentID
|0 | "Root" | -1
|1 | "Child 1" | 0
|2 | "Child 2" | 0
|3 | "Child's child" | 2
In my actual DB, "Content" is not just one field but multiple columns, some of them being JSON blobs already.
When moving over to Cosmos DB, I see three options:
**Option 1: Should I store the actual hierarchy?**
{
id : 0,
content : "Root",
children : [
{
id : 1,
content : "Child 1",
children : []
},
{
id : 2,
content : "Child 2",
children : [
{
id : 3,
content : "Child's child",
children : []
}
]
}
]
}
This represents the actual structure and allows easy traversal of the hierarchy when reading it. However: what if I need (for example) "all children aged 10"? Is there a way to query for the property age
, no matter how deep down in the hierachy and have it return a list of child object?
**Option 2: Should I store a list and an extra property representing the hierarchy?**
{
id : 0,
content : "Root",
descendants : [
{
id : 1,
content : "Child 1",
},
{
id : 2,
content : "Child 2"
},
{
id : 3,
content : "Child's child",
}
],
hierarchy : {
id : 0,
children : [
{
id : 1,
children : []
},
{
id : 2,
children : [
{
id : ,
children : []
}
]
}
]
}
}
This allows me to easily get all descendants without tree traversals. Is there a major pro/con for either one? Is either one bad design? Or does it entirely depend on what queries I mostly expect?
Krumelur
(121 rep)
Dec 6, 2021, 11:54 AM
• Last activity: Dec 6, 2022, 10:32 PM
2
votes
2
answers
3234
views
Choosing values from multiple columns based on latest value of one column
I have an Azure Cosmos DB that contains columns DateTime, userName, temperature, and bloodPressure, plus many other columns. There are several distinct userName values, and each userName has temperature and bloodPressure data collected at different values of DateTime. The temperature and bloodPressu...
I have an Azure Cosmos DB that contains columns DateTime, userName, temperature, and bloodPressure, plus many other columns. There are several distinct userName values, and each userName has temperature and bloodPressure data collected at different values of DateTime. The temperature and bloodPressure values are different for each of those 15 times.
Here is an example of the data,
userName, DateTime, bloodPressure, temperature, heartBeat, hatSize
Curly, "2021-01-19 18:24:53", 121, 98, 60, 7.25
Larry, "2021-01-14 18:24:53", 125, 99, 80, 7.5
Mo, "2021-01-13 18:24:53", 123, 100, 70, 7.75
Curly, "2021-01-18 18:24:53", 120, 97, 50, 8
Larry, "2021-01-13 18:24:53", 119, 99, 75, 7.15
Larry, "2021-01-15 18:24:53", 115, 98, 85, 7.37
Mo, "2021-01-12 18:24:53", 110, 102, 100, 8.01
Mo, "2021-01-20 18:24:53", 130, 99, 110, 6.97
Larry, "2021-01-18 18:24:53", 127, 98, 72, 7.76
Curly, "2021-01-17 18:24:53", 126, 97, 82, 8.0
For each user, I would like to return the **DateTime of their latest measurement**, and their **temperature and bloodPressure** from their latest DateTime:
Larry, "2021-01-18 18:24:53", 127, 98
Mo, "2021-01-20 18:24:53", 130, 99
Curly, "2021-01-19 18:24:53", 121, 98
I tried this,
SELECT
c.userName,
MAX(c.DateTime),
c.bloodPressure,
c.temperature
FROM CheckupData c
GROUP BY
c.userName
but got an error that GROUP BY should include the other outputs of SELECT. Since bloodPressure and temperature are always changing, it seems like I shouldn't use them in GROUP BY, because I only want the lastest values. What am I doing wrong?
(If it matters, my real database is larger: 50,000 userNames measured at 50 or so DateTime values each, and 30 or so columns).
KAE
(143 rep)
Apr 8, 2021, 03:08 PM
• Last activity: Apr 9, 2021, 05:22 PM
3
votes
1
answers
2723
views
Getting Nested query result from Cosmos DB with property named "Value"
I have Cosmos Db collection, I am trying to query the "Key" "Value" pair from. Working query: SELECT ed From c JOIN ed IN c.ExtendedData Where ed.Key = "MembershipTypeId" And got result: [ { "ed": { "Key": "MembershipTypeId", "Value": "224" } } ] I need to filter by "Key" and "Value", but "Value" ap...
I have Cosmos Db collection, I am trying to query the "Key" "Value" pair from.
Working query:
SELECT ed
From c
JOIN ed IN c.ExtendedData
Where ed.Key = "MembershipTypeId"
And got result:
[
{
"ed": {
"Key": "MembershipTypeId",
"Value": "224"
}
}
]
I need to filter by "Key" and "Value", but "Value" appears to be a reserved keyword
Not working query:
SELECT ed
From c
JOIN ed IN c.ExtendedData
Where ed.Value = "224"
Got HTTP 400 with message: Syntax error, incorrect syntax near 'Value'
Kseniya Goloshchapova
(61 rep)
Oct 3, 2019, 10:58 AM
• Last activity: Jun 3, 2020, 08:03 PM
4
votes
0
answers
901
views
What is the right way to increment like/comment counters on CosmosDB?
I'm currently working on a new app which will use a NOSQL DB (particularly Cosmos DB) for data scalability. I need to implement a way to update counters every time a post receives a like or comment. As of now, I could think of two approaches to this problem: - Write a JS stored procedure to check do...
I'm currently working on a new app which will use a NOSQL DB (particularly Cosmos DB) for data scalability. I need to implement a way to update counters every time a post receives a like or comment. As of now, I could think of two approaches to this problem:
- Write a JS stored procedure to check document
ETAG
and increment counters only if the ETAG
is current. If not, run a loop and fetch the most recent version of the document before updating it. The document holding counters as well as like and comment records would belong to the same partition key.
- The counters should be eventually consistent. A WebJob
or Function
is triggered to update them after the like or comment record is written to the database.
In each scenario, Cosmos DB would run on Session
consistency level.
Assume that a document is so popular that it begins to receive thousands of likes immediately. How would the first approach scale in this case? Since it's running in a CAS loop , it is likely to time out given time constraints. Second, is ETAG
comparison the right approach here? Does Cosmos DB write to only one replica so that we could rely on ETAG
changes to make sure we're updating the most recent version of a document?
As for the second approach, perhaps eventual consistency is fine, but I hate seeing mismatched information on websites such as (3 comments on a post, but you can read 5 total comments). If the lag was tiny, this could be tolerated, and I'm not sure how much of a lag a typical, say, social media app should expect.
user246392
(141 rep)
Nov 6, 2018, 02:00 PM
1
votes
0
answers
173
views
Designing a globally distributed database
We are trying to design a **globally distributed** Azure Cosmos DB and we’re not sure how to best do that: There is data that should be replicated in every region to improve read performance. This data is updated/added to infrequently, and will be placed in a global database. However there is data t...
We are trying to design a **globally distributed** Azure Cosmos DB and we’re not sure how to best do that:
There is data that should be replicated in every region to improve read performance. This data is updated/added to infrequently, and will be placed in a global database.
However there is data that is added frequently that we don’t want to replicate in all regions, because writing it everywhere would be expensive, and ~90% of the time it will be accessed from the same region. This data will be placed in local databases.
Our main questions relate to this separation of global and local data.
**1. How (best) do we reference local data from global data and route requests?**
For example, we have Users, and their user data will be stored in the global database. However, Users have Devices (immobile IoT devices, residing in a User’s home) that produce frequent Measurements (sensor readings), and these Devices and their Measurements would be stored in a local database in the region where the devices are located.
If a User requests information for one of their Devices, **how do we route the request?** How do we know “Oh, their Device is in Canada, go to that database and retrieve the info”? Is an appropriate solution to give the id of a Device a prefix denoting the region it is in? Then we could look in the User document (global) and see the list of Device Ids that belong to the User and know which database the devices are in? This doesn’t seem like an ideal solution.
**2. On a related note, if a country requires that each citizen’s data must be stored in a local database within that country, a) how do we route all requests to the local database? b) how does a customer from that country access their data if they are traveling outside that country?**
For example, it is possible that all our Germany customer data will have to be stored within the country. That means that German User documents will be stored in Germany, and not globally. If a German comes to Canada and therefore connects to our Canadian app service, how can we 'make it aware' that it should query the German database for the User document?
If you need any additional info or clarifications, please leave a comment!
Thanks in advance!
Tur1ng
(119 rep)
Aug 14, 2018, 07:58 PM
1
votes
1
answers
4009
views
Parameter value in Data Factory Triggered pipeline
I have a pipeline configured in azure data factory which basically does create a backup file (JSON) from a cosmosDB dataset and it's saved in a blob storage, my problem comes when I want to schedule the copy task in a trigger, I see that I have to specify the value for windowStart (parameter already...
I have a pipeline configured in azure data factory which basically does create a backup file (JSON) from a cosmosDB dataset and it's saved in a blob storage, my problem comes when I want to schedule the copy task in a trigger, I see that I have to specify the value for windowStart (parameter already defined to name the JSON file with the date of the execution. ) as shown below:
Destination path:
And as you can see when I want to trigger it does ask me to specify the value for windowStart.
I tried below functions to specify that with no success so far:
"@{pipeline().parameters.windowStart}"
"@{formatDateTime(pipeline().startTime,'o')"
"@{formatDateTime(utcnow(),'yyyy-MM-dd')}"
"@{formatDateTime(trigger().startTime,'yyyy-MM-dd')}"
And the error I always get is as shown below:
Have anyone any idea or way to fix that, or have someone faced the same issue before ?



user141153
Jul 12, 2018, 12:03 AM
• Last activity: Jul 12, 2018, 08:58 PM
0
votes
1
answers
803
views
Local Cosmos DB service unavailable
I've installed the azure cosmos db in my local machine but is not loading the explorer pages as shown below: [![enter image description here][1]][1] **Explicit error**: Failed to get authorization headers for offers: {"readyState":4,"responseText":"{\"code\":\"ServiceUnavailable\",\"message\":\"Serv...
I've installed the azure cosmos db in my local machine but is not loading the explorer pages as shown below:
**Explicit error**:
Failed to get authorization headers for offers: {"readyState":4,"responseText":"{\"code\":\"ServiceUnavailable\",\"message\":\"Service is currently unavailable.\\r\\nActivityId: d728958b-c0b4-4b32-a131-bba1dc7a8ba0, \\r\\nResponseTime: 2018-06-05T17:22:33.8740685Z, StoreReadResult:

user141153
Jun 5, 2018, 05:33 PM
• Last activity: Jun 12, 2018, 02:58 AM
1
votes
1
answers
5685
views
Structuring change history data in MongoDB / NoSQL
I have a Chart on which I need to track changes; the Chart has several properties. A Chart also has a long list of items. So in C#, my Chart object looks like this: public class Chart { public int Id { get; set; } public string Title { get; set; } ... public List Items { get; set; } } Or in JSON, my...
I have a Chart on which I need to track changes; the Chart has several properties. A Chart also has a long list of items.
So in C#, my Chart object looks like this:
public class Chart {
public int Id { get; set; }
public string Title { get; set; }
...
public List Items { get; set; }
}
Or in JSON, my Chart looks like this (I wrote this JSON by hand, so it could be wrong):
{
"Id": 1,
"Title": "Test Chart",
"Items": [
{
"Id": 222,
"Title": "Test Item",
"Price": 112.34
}
]
}
So my question is as follows:
Should I log an entire chart with all the items as a single document, every time there is a change? (Although I read there may be a limit on document size, so this could be a problem.)
OR
Should I log Chart changes as one type of document and Chart Item changes as a separate type of document? This would allow me to store changes to chart items as a single "line item" document without having to save the entire Chart every time there is a change to a single chart item.
OR
Is there a better way?
Targaryen
(113 rep)
Mar 13, 2018, 06:29 PM
• Last activity: Apr 8, 2018, 11:22 AM
Showing page 1 of 15 total questions