Sample Header Ad - 728x90

Database Administrators

Q&A for database professionals who wish to improve their database skills

Latest Questions

0 votes
1 answers
18 views
How to count the number of campaigns per day based on the start and end dates of the campaigns
I need to count the number of campaigns per day based on the start and end dates of the campaigns Columns: Campaign Name, Start Date, End Date How do I need to write the SQL command in databricks?
I need to count the number of campaigns per day based on the start and end dates of the campaigns Columns: Campaign Name, Start Date, End Date How do I need to write the SQL command in databricks?
Level11Data (11 rep)
Jan 7, 2025, 07:20 PM • Last activity: Jan 7, 2025, 07:21 PM
0 votes
1 answers
69 views
Databricks SQL warehouse is failing to launch saying it "cannot fetch secrets", what is going on?
I have a Databricks SQL warehouse. When I try to start it, I get the following error: > Clusters are failing to launch. Cluster launch will be retried. > > Details for the latest failure: Error: Cannot fetch secrets referred > in the Spark configuration. Please check that the secrets exists and > th...
I have a Databricks SQL warehouse. When I try to start it, I get the following error: > Clusters are failing to launch. Cluster launch will be retried. > > Details for the latest failure: Error: Cannot fetch secrets referred > in the Spark configuration. Please check that the secrets exists and > the cluster's owner has read permissions. Type: CLIENT_ERROR Code: > INVALID_ARGUMENT I am not sure what's wrong, can somebody explain?
Kyle Hale (216 rep)
Jan 7, 2025, 06:42 PM
0 votes
0 answers
34 views
Connect to Create a New Unity Catalog using a onprem postgres database connect
1. Have datbricks on azure platform with admin acces. 2. I have serverless sql warehouse where i have imported some csv data into a catalog. 3. Now i need to access postgres data on onprem linux box. 4. Need to connect this db from datbticks add connection to create a new catalog. 5. I would like to...
1. Have datbricks on azure platform with admin acces. 2. I have serverless sql warehouse where i have imported some csv data into a catalog. 3. Now i need to access postgres data on onprem linux box. 4. Need to connect this db from datbticks add connection to create a new catalog. 5. I would like to use databeicks genei to acces the tables added from posgres db into the catlog. How do i procced now.
malcolm richard (1 rep)
Dec 19, 2024, 11:15 AM • Last activity: Dec 19, 2024, 06:57 PM
1 votes
0 answers
210 views
Writing large dataset from spark dataframe
We have a azure databricks job that retrieves some large dataset with pyspark. The dataframe has about 11 billion rows. We are currently writing this out to a postgresql DB (also in azure). Currently we are using the jdbc connector to write row out in batch to the existing table (batch size 10,000,0...
We have a azure databricks job that retrieves some large dataset with pyspark. The dataframe has about 11 billion rows. We are currently writing this out to a postgresql DB (also in azure). Currently we are using the jdbc connector to write row out in batch to the existing table (batch size 10,000,000). This table does have a handful of indexes on it, so inserts take awhile. It is dozens of hours to complete this operation (assuming if finishes successfully at all). I feel like it would make more sense to use COPY to load the data into the database, but I don't see any well establish patterns for doing that in databricks. I don't have a ton of spark or databricks experience, so any tips are appreciated.
Kyle Chamberlin (13 rep)
Feb 16, 2024, 12:57 AM
1 votes
1 answers
433 views
DESCRIBE TABLE in databricks piped into dataframe
Does anyone know of a method to pipe the "DESCRIBE TABLE" output in databricks into dataframe? (or other usable format which could be used for further analysis/computation)?
Does anyone know of a method to pipe the "DESCRIBE TABLE" output in databricks into dataframe? (or other usable format which could be used for further analysis/computation)?
Doc (121 rep)
Dec 7, 2021, 02:04 PM • Last activity: Feb 15, 2024, 03:05 AM
0 votes
1 answers
21 views
Next Business Date Column
I have a dataset that looks like this. [![dataset sample][1]][1] Where `business_day` indicates whether the `transaction_created_date` is a business day or not. I'm trying to sum the `line_amount` so that values that occurred over the holiday or weekend gets added to the next business day to look so...
I have a dataset that looks like this. dataset sample Where business_day indicates whether the transaction_created_date is a business day or not. I'm trying to sum the line_amount so that values that occurred over the holiday or weekend gets added to the next business day to look something like this: output Essentially, if I can capture the next business day where business_day = 0 then I can just do a sum over partition.
Lena Zheng (3 rep)
Jan 10, 2024, 12:28 AM • Last activity: Jan 10, 2024, 08:40 AM
-1 votes
1 answers
50 views
How to create "On this day in history" query
I'm using Databricks and I have a table with a list of event from various years. I want to return the event most recent to today's date from each year. For example, Today's date is 6th May and my table is thus: |Year (int)|Date (date)|Event (str)| |----------|-----------|-----------| |2021 |2021-08-...
I'm using Databricks and I have a table with a list of event from various years. I want to return the event most recent to today's date from each year. For example, Today's date is 6th May and my table is thus: |Year (int)|Date (date)|Event (str)| |----------|-----------|-----------| |2021 |2021-08-04|Ate apple| |2021 |2021-04-16|Flew plane| |2020 |2020-10-11|Swam 100 miles| |2020 |2020-03-07|Did backflip| |2020 |2020-01-01|Tidied room| |2019 |2019-09-30|Found 10 pence| |2018 |2018-02-22|Lost 10 pence| So I would want to return: **On this day in history your most recent achievements were:** |Year|Date|Event| |----|----|-----| |2021|2021-04-16|Flew plane| |2020|2020-03-07|Did backflip| |2018|2018-02-22|Lost 10 pence| Is there a neat way of doing this?...and by neat I mean, without creating extra columns or tables i.e. by comparing CURRENT_DATE to my Date field.
ben_al (1 rep)
May 6, 2022, 12:01 PM • Last activity: May 6, 2022, 03:00 PM
1 votes
1 answers
2569 views
How to call python file in repo in databricks from data factory outside DBFS?
In Azure Databricks I have I have a repo cloned which contains python files, not notebooks. In Azure Data Factory I want to configure a step to run a Databricks Python file. However when I enter the /Repos/..../myfile.py (which works for Databricks Notebooks) it gives me the error " DBFS URI must st...
In Azure Databricks I have I have a repo cloned which contains python files, not notebooks. In Azure Data Factory I want to configure a step to run a Databricks Python file. However when I enter the /Repos/..../myfile.py (which works for Databricks Notebooks) it gives me the error " DBFS URI must starts with 'dbfs:'" How can I reference a python file from a report which is not in dbfs? enter image description here NOTE I see a duplicate question here but the answer was just to wrap it in a Databricks Notebook - OK workaround but when I do it I get "No module named 'my_python_file'" https://stackoverflow.com/questions/70096408/how-to-create-a-databricks-job-using-a-python-file-outside-of-dbfs
Brendan Hill (301 rep)
Dec 1, 2021, 08:14 AM • Last activity: Jan 7, 2022, 07:51 AM
2 votes
0 answers
696 views
Troubleshoting slow running queries/jobs in Azure Databricks
I have Azure Databricks workspace with cluster configured to run Standard 6.4 runtime (Apache Spark 2.4.5, Scala 2.11). Cluster uses shared metastore (Azure MySQL). I'm trying to figure out possible way to troubleshoot sporadically slow execution of jobs/queires - I have a test SELECT query which no...
I have Azure Databricks workspace with cluster configured to run Standard 6.4 runtime (Apache Spark 2.4.5, Scala 2.11). Cluster uses shared metastore (Azure MySQL). I'm trying to figure out possible way to troubleshoot sporadically slow execution of jobs/queires - I have a test SELECT query which normally runs within 2-3 minutes but couple of times a day it takes 15 minutes. What would be the best way to troubleshoot this?
Mike (747 rep)
Aug 20, 2021, 01:54 PM • Last activity: Aug 23, 2021, 02:28 PM
Showing page 1 of 9 total questions