Database Administrators

Q&A for database professionals who wish to improve their database skills

Latest Questions

1 votes

1 answers

148 views

What does column family-oriented datastore mean in practice

I was reading [this article][1] about HBase. Within it, the storage format of HBase is described: *HBase is referred to as a column family-oriented data store. It’s also row-oriented: each row is indexed by a key that you can use for lookup* So what does column family oriented mean? Is it the same a...

                                  I was reading this article  about HBase. Within it, the storage format of HBase is described:

*HBase is referred to as a column family-oriented data store. It’s also row-oriented: each row is indexed by a key that you can use for lookup*

So what does column family oriented mean? Is it the same as column oriented? In that case, how can it be both column family oriented and row oriented?

So row oriented storage is like this:

    001:10,Smith,Joe,40000;
    002:12,Jones,Mary,50000;
    003:11,Johnson,Cathy,44000;
    004:22,Jones,Bob,55000;

Column oriented storage is like this:

    10:001,12:002,11:003,22:004;
    Smith:001,Jones:002,Johnson:003,Jones:004;
    Joe:001,Mary:002,Cathy:003,Bob:004;
    40000:001,50000:002,44000:003,55000:004;

So how is data actually stored in HBase? Do you have example?

bsky (111 rep)

Dec 6, 2017, 11:29 PM • Last activity: Jul 18, 2025, 01:03 PM

5 votes

3 answers

187 views

Different Disk Drive Same Backend Storage

sql-server storage wait-types vmware

I have a database running on a VM which is getting hammered during a large load, in particular I can see waits of WRITELOG happening. My initial thought is to split the files out on to their own drives but the backend storage is the same as where the other DB files are sitting. Basically it is SAN p...

                                  I have a database running on a VM which is getting hammered during a large load, in particular I can see waits of WRITELOG happening.  My initial thought is to split the files out on to their own drives but the backend storage is the same as where the other DB files are sitting.

Basically it is SAN presented as a Cluster Shared Volume to a whole host of virtual machines.

Would there be a performance advantage in doing this?  Some memory in the depths of my brain is telling me something about the number of IO streams would be better potentially?

*****************************************************************************
To update this I have now separated out the files and correctly sized the transaction log.  I have been collecting information from sys.dm_io_virtual_file_stats and can see that I now have extremely high readIOStalls but with a low latency of 13ms.  I also collected some memory information and PLE was in the thousands on average with this being a 32GB system I would expect that apart from in one 30min period where it drops right down to 30 before climbing sharply again, at this time lazy write/sec increases to 50 also before reducing to 0.   Could this period be the cause of the large number of read stalls I am seeing?  I would have expected to see with such high Read Stalls also high latency?

Tom (1569 rep)

Aug 25, 2015, 08:46 AM • Last activity: Jun 29, 2025, 12:02 AM

0 votes

1 answers

191 views

Is it possible to allocate data free for certain table?

mysql table storage cloud

After deleted a large dataset (~50GB) and optimized tables, some tables have data free (~50GB) and parts of tables have only data free (~KB). As my understanding, the data free are reused and not keeping the ibdata growing. The situation is: My commonly used tables are growing really fast, but unfor...

                                  After deleted a large dataset (~50GB) and optimized tables, some tables have data free (~50GB) and parts of tables have only data free (~KB). As my understanding, the data free are reused and not keeping the ibdata growing.

The situation is: My commonly used tables are growing really fast, but unfortunately the data free are allocated to not commonly used tables. The disk space of ibdata is keep growing but the data free is not reused.

The question is: Can the data free allocated to commonly used tables, so it can reuse the (~50GB) space, since the storage is costly on cloud?

Ben (143 rep)

Mar 15, 2018, 06:33 PM • Last activity: Jun 19, 2025, 01:04 PM

1 votes

1 answers

213 views

Shared logarch across multipe db2 HADR instances

db2 storage db2-luw hadr

**Background** We are looking into building a multiple standby hadr cluster. The Auxiliary standby is in a geographically separate location. from [Developer works](https://www.ibm.com/developerworks/community/blogs/DB2LUWAvailability/entry/log_archiving_in_an_hadr_environment?lang=en) >In a multiple...

                                  **Background**

We are looking into building a multiple standby hadr cluster. The Auxiliary standby is in a geographically separate location. 

from [Developer works](https://www.ibm.com/developerworks/community/blogs/DB2LUWAvailability/entry/log_archiving_in_an_hadr_environment?lang=en)  

>In a multiple standby system, the archived log files can be scattered among
>all databases' (primary and standbys) archive devices. A shared archive 
>is preferred because all files are stored in a single location.

and [Developer works](https://www.ibm.com/developerworks/community/wikis/home/wiki/DB2HADR/page/HADR%20config?lang=en§ion=LOGARCHMETH1_and_LOGARCHMETH2___Log_Archive_Device)  

 >Share archive for databases at the same site is recommended. For remote 
 >sites, you will need to make a decision based on the network speed, 
 >reliability, and ease of management

**My Question**

What method/solution can be used to create a shared archive.

We are running on X86 SLES 11.4 with HDS san. 
Primary and Standby is in the same datacentre but Aux-standby is 30km away.
Would a simple NFS share be used and replication to the Aux.

Thank you 
                                

DB2_Philip (11 rep)

Feb 22, 2017, 12:16 PM • Last activity: Jun 18, 2025, 09:00 PM

2 votes

1 answers

449 views

SQLite Internals - Records

sqlite storage database-internals record

Hey I'm trying to wrap my head around SQLite data storage, specifically how it is storing Records. I've found a book [The Definitive guide to SQLlite](https://link.springer.com/book/10.1007/978-1-4302-3226-1), where the author explains internal record format (Figure 9-5, page 351): Given table: ```s...

Hey I'm trying to wrap my head around SQLite data storage, specifically how it is storing Records. I've found a book [The Definitive guide to SQLlite](https://link.springer.com/book/10.1007/978-1-4302-3226-1) , where the author explains internal record format (Figure 9-5, page 351): Given table:

sqlite> SELECT * FROM episodes ORDER BY id LIMIT 1;
id   season  name
---  ------  --------------------
0    1       Good News Bad News

Its internal record format is:

| 04 | 01 | 01 | 49 |   | 00 | 01 | Good News Bad News |

> "The header is 4 bytes long. The header size reflects this and itself is encoded as a single byte. The first type, corresponding to the id field, is a 1-byte signed integer. The second type, corresponding to the season field, is as well. The name type entry is an odd number, meaning it is a text value. Its size is therefore given by (49-13)/2=18 bytes." Specifically I'm curious about TEXT attribute, in the example above we have a string of length 18 characters. And the rule for TEXT in SQLite is as follows:

Type Value     Meaning   Length of Data
----------     -------   --------------------
N>13 and odd   TEXT      (N-13)/2

What hapenns though when the string is longer ? It'll get out of range of that one byte.

Matis (121 rep)

Mar 5, 2020, 04:13 PM • Last activity: Jun 6, 2025, 05:08 PM

0 votes

2 answers

297 views

How to install MongoDB database as a service in ESxi?

mongodb storage cloud

I want to know is it possible to install mongodb directly to the ESXi without guest OS? Please tell me how to setup Data storage as a service environment inside ESXi.

                                  I want to know is it possible to install mongodb directly to the ESXi without guest OS?
Please tell me how to setup Data storage as a service environment inside ESXi.
                                

Sayed Uz Zaman (101 rep)

Apr 26, 2017, 06:54 PM • Last activity: May 13, 2025, 01:09 AM

1 votes

2 answers

3050 views

Error 823 and no connection

sql-server sql-server-2008-r2 errors storage

One of our servers tells us it has Error 823 issues: > TITLE: Connect to Server > > Cannot connect to ServerName. > ADDITIONAL INFORMATION: Warning: Fatal error 823 occurred at Mar 1 2016 > > 10:45AM. Note the error and time, > and contact your system administrator. (Microsoft SQL Server, Error: > 2...

                                  One of our servers tells us it has Error 823 issues:

> TITLE: Connect to Server
> 
> Cannot connect to ServerName.
> ADDITIONAL INFORMATION: Warning: Fatal error 823 occurred at Mar  1 2016 >
> 10:45AM. Note the error and time,
> and contact your system administrator. (Microsoft SQL Server, Error:
> 21)

The windows log is floated with the same message. 

No connection to the sql server is possible. So DBCC CHECKDB cannot be used / executed. 

I ran SQLIOSIM and it raised couple of errors that requests are outstanding for more than n sec on a specific drive. 

A Similar issue happened last night on another system - and it looks like it was workarounded by a failover.

I don't know where to look at next since I cant even connect to the Sql Server.


                                

Magier (4827 rep)

Mar 1, 2016, 09:55 AM • Last activity: Mar 26, 2025, 08:57 PM

1 votes

1 answers

604 views

Which Oracle datafiles grow in size?

oracle oracle-12c storage

I have downloaded and started up Oracle's pre-built OTN Developer Day VM. The VM is running on an SSD. I want to move datafiles that can grow to a separate mount point (setup as a vmdk on a HDD. [more details here][1] ). I am looking at moving the following to the HDD mount point: 1. USERS tablespac...

                                  I have downloaded and started up Oracle's pre-built OTN Developer Day VM. The VM is running on an SSD. I want to move datafiles that can grow to a separate mount point (setup as a vmdk on a HDD. more details here  ).

I am looking at moving the following to the HDD mount point:

1. USERS tablespace
2. Redo log files

Should datafiles like temp or undo also be moved if I want to minimize the growth of the VM on the SSD ?

unubar (131 rep)

May 1, 2019, 04:05 AM • Last activity: Feb 26, 2025, 02:01 PM

0 votes

2 answers

1053 views

How do you change the storage parameters of a already made table (sql developer)?

oracle alter-table storage oracle-sql-developer

**Summarize the problem** I have been given 6 tables that have all been created already and I was told to change the storage parameters of those tables. the issue I run into is that I can only change pctfree, pctused, initrans and max trans and not the others (initial, next, pctincrease and maxexten...

                                  **Summarize the problem**
I have been given 6 tables that have all been created already and I was told to change the storage parameters of those tables. the issue I run into is that I can only change pctfree, pctused, initrans and max trans and not the others (initial, next, pctincrease and maxextents)

**Provide details and any research**

I have done a lot of research, yet some of them do not work with sql developer at all. Whilst the others do work but as stated only for the 4 storage parameters. 

**When appropriate, describe what you’ve tried**

    alter table CUSTOMERS
    pctused 20 pctfree 80;

This works perfectly for those two, but I am unable to add the others. From what I found online, this was in fact the only thing that worked for me. 

I appreciate all input!

XileVoid (107 rep)

Sep 2, 2020, 06:53 AM • Last activity: Feb 26, 2025, 06:23 AM

4 votes

1 answers

488 views

SQL Server reduce unused space on a mostly heap table database

sql-server storage shrink datafile

I’m working with a 2.3TB primary data file and currently have about 1TB of unused space. Recently, I performed row-level compression on the largest table in the database, which reduced the table’s size from 0.9TB to 0.4TB. However, after this compression, the size of the file grew, and while the spa...

                                  I’m working with a 2.3TB primary data file and currently have about 1TB of unused space. Recently, I performed row-level compression on the largest table in the database, which reduced the table’s size from 0.9TB to 0.4TB. However, after this compression, the size of the file grew, and while the space used by the table decreased, the overall unused space in the file did not shrink accordingly.

My issue is how to reclaim this unused space, considering this is a reporting-based database with no modelling, meaning there are no primary keys or clustered indexes.

My initial test was to run dbcc shrinkfile(, TRUNCATEONLY), but no unallocated space was found at the end of the page. The next thing I was going to try is to go back to the massive table and create a cluster index with the hope of moving the unallocated space from the compression operation to the end of the file, but there is no unallocated space on that table. Also, there is only about 7GB unallocated space in the tables.

This is what I found after checking the space usage of the file: File Size (GB): 2,287 Space Used (GB): 1,311 Unallocated Space (GB): 976. Here are the results from running exec sp_spaceused: Database Size: 2,372,346.56 MB Unallocated Space: 999,150.95 MB Reserved: 1,375,788,088 KB Data: 1,162,793,432 KB Index Size: 204,355,584 KB Unused: 8,639,072 KB.

I am not a database administrator, so I got no idea on what I could do. Does anyone have any suggestion?

c_tames1998 (43 rep)

Jan 27, 2025, 10:11 AM • Last activity: Feb 1, 2025, 02:05 AM

1 votes

1 answers

2875 views

Azure Virtual Machine - local temp storage (D: drive) - how much IOPS it can handle?

sql-server performance storage azure-vm

Virtual machine size `DS3` (under older generation sizes) Azure Portal (when deploying VM) shows that `DS3` supports up to 16 data disks / up to `12800 IOPS` - thats fine, But what I am interested in is it's local/temp storage 28 GB D: drive Documentation shows that this local/temp drive is SSD, but...

                                  Virtual machine size DS3 (under older generation sizes)  
Azure Portal (when deploying VM) shows that DS3 supports up to 16 data disks / up to 12800 IOPS - thats fine, 

But what I am interested in is it's local/temp storage 28 GB D: drive  
Documentation shows that this local/temp drive is SSD, but what I can't seem to find is information on how many IOPS this drive can handle ?

More specifically, if my TempDB has requirement of IOPS up to 1900, can D: drive on DS3 VM handle that requirement ?  
From this source https://learn.microsoft.com/en-us/azure/virtual-machines/sizes-previous-gen  I can guess that D: is 3200 IOPS but not sure I understand it correctly...

Aleksey Vitsko (6195 rep)

Nov 13, 2020, 01:36 PM • Last activity: Jan 21, 2025, 09:00 AM

0 votes

1 answers

194 views

Is there any practice to align file system block size with database block size?

sql-server postgresql clustering storage file-system

This topic has already been discussed here: https://dba.stackexchange.com/questions/15510/understanding-block-sizes But I have few more things to add for my use case. Generally, most `database systems` use a default `block size` of `8 KB`, though some allow it to be modified. On the other hand, mode...

                                  This topic has already been discussed here: https://dba.stackexchange.com/questions/15510/understanding-block-sizes 
But I have few more things to add for my use case. 

Generally, most database systems use a default block size of 8 KB, though some allow it to be modified. On the other hand, modern operating systems often use a 4 KB block size for file systems. This discrepancy can result in multiple physical I/O requests to fill a single database page.

A smaller file system block size benefits random reads, such as index lookups, while larger block sizes are advantageous for sequential scans and heap fetches. Considering these points, I have a few questions:

 1. Is there a common practice to align the database block size with the
    file system block size for OLTP?
 2. In a clustered system (e.g., SQL Server Availability Groups or
    PostgreSQL streaming replication) with a primary and one or more
    secondaries, is it acceptable to have different file system block
    sizes, or is this something that should always be avoided?
 3. For analytical databases or columnar tables, is it beneficial to use
    a larger block size?
                                

goodfella (595 rep)

Dec 11, 2024, 06:30 AM • Last activity: Dec 11, 2024, 06:10 PM

7 votes

2 answers

35003 views

Choosing the right storage block size for sql server

sql-server storage size

There are many articles on what storage blocks size should be used for sql server e.g. [Disk Partition Alignment Best Practices for SQL Server][1]. The right blocks size should improve the performance of a sql server database. I’m looking for a recommendations and methods to identify which storage b...

                                  There are many articles on what storage blocks size should be used for sql server e.g. Disk Partition Alignment Best Practices for SQL Server . The right blocks size should improve the performance of a sql server database. I’m looking for a recommendations and methods to identify which storage blocks size is appropriated for a database. Is there a guide on how to identify an appropriated block size?
                                

r0tt (1078 rep)

Jun 28, 2016, 12:41 PM • Last activity: Oct 8, 2024, 08:19 PM

1 votes

1 answers

102 views

How to create smallest compressed backup of SQL Server databases intended for deletion?

sql-server backup storage

I need to minimize SQL Server storage in preparation to migrate to cloud. There are about 10 databases that are probably 99% junk/ needed for deletion. Maybe 100%. Reporting databases. What's the best way to do some kind of compressed "backup" just for 6 months? ... It's about 500 GBs total ... woul...

                                  I need to minimize SQL Server storage in preparation to migrate to cloud.

There are about 10 databases that are probably 99% junk/ needed for deletion. Maybe 100%. Reporting databases.

What's the best way to do some kind of compressed "backup" just for 6 months? ... It's about 500 GBs total ... would the LDF and MDF files be enough? (wouldn't save on storage but can put somewhere cheap).

It's something where it's mostly useless but occasionally some fringe case might be needed to bring it online. Not likely but possible.

I'm thinking there's probably a sufficient backup option. I don't need point in time or anything fancy. Just "here's the 10 dbs at this date (say today) in case of emergency". These are largely self contained reporting databases - not complicated applications. I understand corruption might be possible somehow, but willing to risk it.

This is an embarrassingly old version of SQL Server as well, 2014 if that is relevant.

user45867 (1739 rep)

Oct 5, 2024, 09:55 PM • Last activity: Oct 7, 2024, 10:38 AM

3 votes

1 answers

227 views

Should tiny dimension tables be considered for row or page compression on servers with ample CPU room?

sql-server index sql-server-2019 storage compression

An [old Microsoft paper](https://learn.microsoft.com/en-us/previous-versions/sql/sql-server-2008/dd894051(v=sql.100)?redirectedfrom=MSDN) says to consider using `ROW` compression by default if you have lots of CPU room (emphasis mine). > If row compression results in space savings and the system can...

                                  An [old Microsoft paper](https://learn.microsoft.com/en-us/previous-versions/sql/sql-server-2008/dd894051(v=sql.100)?redirectedfrom=MSDN)  says to consider using ROW compression by default if you have lots of CPU room (emphasis mine).

> If row compression results in space savings and the system can accommodate a 10 percent increase in CPU usage, **all data should be row-compressed**

Today, we have even more CPU space than we did back when the paper was written. [Wise men](https://youtu.be/tHfeCstrDAw?t=514)  have said that we should consider using PAGE compression everywhere unless we have a compelling reason not to.

This is all good advice and I often see sp_estimate_data_compression_savings agree. However, what should be done with tiny tables that are frequently accessed? For example, I have some extremely small dimension tables. 100 rows at most and very few columns. Because they are so small, the space-saving benefit from any compression is minimal. What is considered the best practice for applying ROW or PAGE compression to such tiny tables on boxes that have a huge amount of free CPU room?

For the purposes of this question, ignore columnstore. We are only talking about old-school rowstore indexes on disk-based tables.

J. Mini (1237 rep)

Sep 13, 2024, 04:23 PM • Last activity: Sep 15, 2024, 10:07 PM

1 votes

3 answers

1306 views

SQL Server Disk throughput slower than diskspd

sql-server sql-server-2014 performance storage query-performance

This is a bit of a broad question but I am trying to understand a storage performance behavior with two of our servers. I was following https://www.brentozar.com/archive/2015/09/getting-started-with-diskspd/ On one server I ran dskspd with the below parameters on the same disk as the DB. diskspd.exe...

                                  This is a bit of a broad question but I am trying to understand a storage performance behavior with two of our servers.

I was following https://www.brentozar.com/archive/2015/09/getting-started-with-diskspd/ 

On one server I ran dskspd with the below parameters on the same disk as the DB.

    diskspd.exe -b2M -d60 -o32 -h -L -t56 -W -w0 -c10G G:\MP13\Data\test.dat

and got around 1400MB/s

I was also able to get comparable throughput using a T-SQL query as below and calculating the throughput from the number of reads and time.  I got this from Glenn Berry SQL Course on PluralSight "Improving Storage Subsystem Performance
".

    set statistics io on
    set statistics time on
    checkpoint
    dbcc dropcleanbuffers
    select count(*) from table1 with(index(1))

On the other server though, I can get the high throughput numbers from the diskio tool but using SQL server method I am not able to get the throughput numbers.  the SQL numbers are close to what I get if I run diskspd in single thread, even though the plan is running in parallel.

So I was wondering what are the things I can check to see why SQL Server's IO is slow or why SQL Server is not able to push more IO's through.

DMDM (1750 rep)

Oct 26, 2017, 05:39 AM • Last activity: Jul 10, 2024, 01:22 PM

8 votes

3 answers

21761 views

PostgreSQL data files have size more than data itself

postgresql storage

We have a system that does some data archiving to a PostgreSQL DB. We found out that the PC storage was full due to the DB archiving. The problem is that I checked the data files residing in `/var/lib/pgsql/data/base/` and they were about 70 GB in total, while when I dumped all the databases using `...

                                  We have a system that does some data archiving to a PostgreSQL DB. We found out that the PC storage was full due to the DB archiving. The problem is that I checked the data files residing in /var/lib/pgsql/data/base/ and they were about 70 GB in total, while when I dumped all the databases using pg_dump the output files did not exceed 24 GB. Am I missing something here or misunderstanding something? Where is this large difference in size going?

Edit: I did pg_dump to contain schema and data with the option -c to allow drop and create.

Edit 2: I investigated the DB schema file and I found out that table that contained almost 23.9 GB of the 24 GB (About 332.4 Milions rows) of data has an index on it. There is another index on another table but the table is empty.

Edit 3: The program stores values of about 1500 variables periodically, I mean all variables are recorded from 0.1 second to 1 minute or a bit more, so I think there is a huge DB access here.

Edit 4: I executed the second query here  to find the size of each relation in the schema and I found out the following:

 - 28 GB for the main data table.
 - about 42 GB for 3 indexes only! 24, 9, 9.

My purpose is I want to do a backup and restore frequently (Every few months). Should I care about these DB indexes when doing backup and restore or just focus on my data tables?

3bdalla (229 rep)

Dec 14, 2015, 01:37 PM • Last activity: Jun 12, 2024, 11:04 AM

-1 votes

1 answers

852 views

How to reduce big size database files (Data File ) SQL Server without SHRINK?

sql-server disk-space storage shrink data

I found a couple of 'question' regarding shrinking big datafiles but I think that is not very accurate to my situation. So here is my question I have a big database with almost 8TB of Data File Size and almost 160 GB of log file The database is in offline status Shrink process due to the big size of...

                                  I found a couple of 'question' regarding shrinking big datafiles but I think that is not very accurate to my situation. So here is my question

I have a big database with almost 8TB of Data File Size and almost 160 GB of log file
The database is in offline status

Shrink process due to the big size of the data file will take a long time so I'm not sure how to proceed in order to avoid any problem in the instance.

One option that I have in mind is to create a new data file and then, copy the tables to the new data files and finally shrink the old data file

Best regards

SakZepelin (21 rep)

Sep 21, 2021, 02:28 PM • Last activity: May 16, 2024, 10:03 PM

0 votes

1 answers

74 views

Why does PostgreSQL not add padding for column alignment at the end of tuple?

postgresql percona storage database-internals

In the example here: [https://www.percona.com/blog/postgresql-column-alignment-and-padding-how-to-improve-performance-with-smarter-table-design/][1] [![enter image description here][2]][2] Why doesn't PostgreSQL add 2 bytes padding after the `int2` in `t2` to align the word correctly? [1]: https://w...

                                  In the example here: https://www.percona.com/blog/postgresql-column-alignment-and-padding-how-to-improve-performance-with-smarter-table-design/ 

Why doesn't PostgreSQL add 2 bytes padding after the int2 in t2 to align the word correctly?

Amr Elmohamady (103 rep)

May 12, 2024, 10:19 PM • Last activity: May 13, 2024, 06:20 AM

6 votes

2 answers

3030 views

Disable TOAST compression for all columns

postgresql datatypes storage compression

I am running PostgreSQL on [compressed ZFS file system](https://bun.uptrace.dev/postgres/tuning-zfs-aws-ebs.html#disabling-toast-compression). One tip mentioned is to disable PostgreSQL's inline TOAST compression because ZFS can compress data better. This can be done by setting column storage to `EX...

I am running PostgreSQL on [compressed ZFS file system](https://bun.uptrace.dev/postgres/tuning-zfs-aws-ebs.html#disabling-toast-compression) . One tip mentioned is to disable PostgreSQL's inline TOAST compression because ZFS can compress data better. This can be done by setting column storage to EXTERNAL. I can do this column by column with:

ALTER TABLE my_table ALTER COLUMN my_column SET STORAGE EXTERNAL;

However, this might be a bit cumbersome, as every schema needs to have migrated to this by hand. Are there easy ways to - Set default STORAGE to EXTERNAL instead of MAIN for all columns - Disable TOAST compression other way I found [default_toast_compression option](https://www.postgresql.org/docs/current/runtime-config-client.html#GUC-DEFAULT-TOAST-COMPRESSION) but the documentation is unclear if I can disable it.

Mikko Ohtamaa (331 rep)

Jul 31, 2022, 06:19 AM • Last activity: May 10, 2024, 05:07 AM

Showing page 1 of 20 total questions