Sample Header Ad - 728x90

Database Administrators

Q&A for database professionals who wish to improve their database skills

Latest Questions

0 votes
1 answers
148 views
Db2 LUW tens of millions of rows
We have a table with tens of millions of time series rows. We index by three foreign keys to relations. Let's hypothetically say medical observation row to doctor, acl, and person. Main thing being a 3 part key where acl is based on the user and the other two based on entity relationship. What's a g...
We have a table with tens of millions of time series rows. We index by three foreign keys to relations. Let's hypothetically say medical observation row to doctor, acl, and person. Main thing being a 3 part key where acl is based on the user and the other two based on entity relationship. What's a good physical architecture in db2 to make this run fast? (sub second) It currently takes 10+ seconds to query. * RAM usage is around 25-30% * Read hits to buffer pool are around -1300% * Log hits to buffer pool are 100% (are we doing too many transactions when we don't need to - i.e. change to read uncommitted) I wonder about having a separate table space for this with a separate buffer pool for example. Also would you consider partitioning?
JDPeckham (101 rep)
Mar 20, 2020, 01:18 PM • Last activity: Jul 16, 2025, 08:08 AM
-1 votes
1 answers
167 views
What data "MUST" be stored inside a "relational database"?
I am working on a mobile/web app like Instagram. Considering that, I want to know for what part of the app it's better to use `relational` and for what other parts `non-relational` databases. I have chosen `MySQL` and `Cassandra` databases and this is my research result so far: `-Relational Database...
I am working on a mobile/web app like Instagram. Considering that, I want to know for what part of the app it's better to use relational and for what other parts non-relational databases. I have chosen MySQL and Cassandra databases and this is my research result so far: -Relational Databases: - For services we need as much as possible consistency like payment service or ordering and authentication services. - Non-relational Databases: - For services we need to summarize a lot of different data in a small table instead of a lot of columns with null values, like user services that we have a lot of personal settings. - For the time we need horizontal scalability and want more distributed system over different datacenters/clouds. - For faster read/write heavy systems. But I am still wondering to know: 1- What database is more appropriate for upload files(images, videos. documents) on it? 2- What database is more sufficient for posts/comments/likes, list of friends, and other user related things that they are also related to other users. (***I mean a post or a comment is related to one user but all the other users can also see it, and may affect them***).
best_of_man (117 rep)
Dec 25, 2022, 06:00 PM • Last activity: Jul 12, 2025, 11:05 AM
2 votes
1 answers
172 views
MSSQL database physical design: RAID or spreading filegroups across different physical drives?
I'm going to build a Microsoft SQL server database which could end up being more than 40 TB in size. There are some quite large tables in the DB with more about 10 10 .. 11 records. Every table has typical primary key index. Also most of the tables have composite non-clustered indexes too (B-tree)....
I'm going to build a Microsoft SQL server database which could end up being more than 40 TB in size. There are some quite large tables in the DB with more about 1010 .. 11 records. Every table has typical primary key index. Also most of the tables have composite non-clustered indexes too (B-tree). In a few cases we have defined columnstore indexes. Which of the below physical designs of the server will result in better performance1 and db management2 overtime? **Scenario 1.** create one (or more) filegroup per physical drive (we only have SSD NVMe drives), and spread the heavily used tables and indexes across different drives. **Scenario 2.** Bundle all drives using RAID 10 and create a single large volume. Data will be striped on all disks by the OS. In this scenario, data file (.mdf), log file (.ldf) and tempdb are all stored on a large RAID 10 volume. **Scenario 3.** Partition tables/indexes and store each partition on different physical drive. I should mention that it wold be close to impossible to distribute data uniformly across all partitions because of the type of data we are dealing with. If you think none of the above are ideal solutions; I'd love to know what would you recommend. Many Thanks. by performance I mean lower retrieval time and higher IOPs, for a given indexing architecture by db management I mean add/remove/update data, keeping the performance, redundancy, availability ..
Demo80 (21 rep)
Jan 19, 2019, 04:55 AM • Last activity: Jul 7, 2025, 02:03 AM
0 votes
1 answers
46 views
What happens if I insert a row in the middle of an sqlite table `WITHOUT ROWID`?
`WITHOUT ROWID` is an sqlite optimization. Doc states, it is making the primary key of the table to a clustered index. As far I know, clustered index means that the physical ordering of the rows will follow the index, making linear scans effective on it. However, what will happen if I insert a row i...
WITHOUT ROWID is an sqlite optimization. Doc states, it is making the primary key of the table to a clustered index. As far I know, clustered index means that the physical ordering of the rows will follow the index, making linear scans effective on it. However, what will happen if I insert a row into the middle of such a table? Will it be really needed to shift the whole table file after that?
peterh (2137 rep)
May 4, 2025, 08:14 AM • Last activity: May 4, 2025, 02:15 PM
-1 votes
2 answers
1381 views
Logical model in workbench
I have a physical data model of db in MySQL Workbench, how i can create an a logical model of this db(like in Allfusion data modeler(ERwin))? May be Sorry for my english!
I have a physical data model of db in MySQL Workbench, how i can create an a logical model of this db(like in Allfusion data modeler(ERwin))? May be Sorry for my english!
Kirill (1 rep)
Dec 10, 2016, 08:09 PM • Last activity: May 9, 2024, 01:28 PM
1 votes
0 answers
38 views
Data Modeling Problem - Schema to store approval process
I need to design a schema to keep track of data changes (I do not need Change Tracking or Change Data Capture or Temporal Tables) to columns in a few tables. When the data is created, modified, or deleted, and when they meet a certain condition, I will have to initiate a workflow to require approval...
I need to design a schema to keep track of data changes (I do not need Change Tracking or Change Data Capture or Temporal Tables) to columns in a few tables. When the data is created, modified, or deleted, and when they meet a certain condition, I will have to initiate a workflow to require approval of those changes from groups of users. Are there any known schema design patterns for this purpose?
DBK (378 rep)
Oct 30, 2023, 07:33 PM
0 votes
1 answers
217 views
Design options: multi clusters, multi instances or multi availability groups
We have a Microsoft platform, where within it, we have two sets of databases. One set of databases for the platform services etc. and one set of databases of more custom nature with some data that is loosely connected to other databases. We would like to keep these database sets separate, as we woul...
We have a Microsoft platform, where within it, we have two sets of databases. One set of databases for the platform services etc. and one set of databases of more custom nature with some data that is loosely connected to other databases. We would like to keep these database sets separate, as we would prefer that the platform set of databases has more performance than the other custom databases, but it is not the most important factor – high availability, stability, best practices and little complexity/ease of use is more important. When we brainstorm this, we come up with three alternatives: options for platform - Option A: 2 clusters, 2 availability groups This is our current setup, obviously the best (?) but also most expensive and most time consuming to maintain - Option B: 1 big cluster, 1 default instance, 2 availability groups Should be the easiest to maintain, but what options do we have to ensure our “preferred” AG (with platform databases) has highest availability? - Option C: 1 big cluster, 2 instances, 2 availability groups With this option we can customize all the hardware resources per instance we want, but multi-instance, multi-AG also seems a bit complex to maintain We would like to hear from others if they have opinions on the options above? What would you choose if money is no option? Any other options available?
regularroutine (1 rep)
Sep 23, 2022, 07:12 AM • Last activity: Sep 23, 2022, 07:55 PM
0 votes
1 answers
118 views
Primary File Organization in DBMS - Files of ordered records (sorted files)
"Fundamental of Database Systems", 3rd ed. by Elmasri and Navathe, page 136 says: "We can physically order the records of a file on disk based on the values of one of their fields [...]" How could this "physical ordering" possibly enforced, especially with regard to fragmentation? Most databases use...
"Fundamental of Database Systems", 3rd ed. by Elmasri and Navathe, page 136 says: "We can physically order the records of a file on disk based on the values of one of their fields [...]" How could this "physical ordering" possibly enforced, especially with regard to fragmentation? Most databases use system calls to store their data in the file system of the OS, only very few are capable of doing raw i/o.
keezar (33 rep)
Mar 3, 2022, 09:31 PM • Last activity: Mar 3, 2022, 10:28 PM
1 votes
3 answers
470 views
Clustered Primary Key that is never used vs. Non-Clustered Primary Key on Multiple Columns
I am working on a table design for Customer Totals and trying to make a decision about the primary key. I was going to go with a surrogate identity column with a clustered index, but this column would NEVER be used. The candidate primary key columns are CustomerNumber + AccountNumber, because these...
I am working on a table design for Customer Totals and trying to make a decision about the primary key. I was going to go with a surrogate identity column with a clustered index, but this column would NEVER be used. The candidate primary key columns are CustomerNumber + AccountNumber, because these are the unique identifiers for each row, but these will NOT be sequentially inserted. Basically, on a daily basis a report will be run which will update each CustomerNumber + AccountNumber record with the most recent purchase total and total date. Does it make sense to remove CustomerTotalID completely and have CustomerNumber + AccountNumber be a PK with a NON-clustered index? CREATE TABLE CustomerTotals ( CustomerTotalID INT IDENTITY(1,1), CustomerNumber INT, AccountNumber INT, PurchaseTotal DECIMAL(10,2) TotalDate DATE, CONSTRAINT [PK_CustomerTotals] PRIMARY CLUSTERED ( CustomerTotalID ASC ) )
user2966445 (141 rep)
Jul 26, 2018, 04:04 PM • Last activity: Feb 7, 2022, 10:15 AM
-3 votes
1 answers
88 views
Striping disks with MAXDOP 1
We're working with a vendor who is insisting we use MAXDOP = 1 (I know...). They are also emphasizing the importance of striping our data files across different disks. As I understand it, an I/O operation is always paired with a CPU cycle. So if we have tables that were spread over different disks,...
We're working with a vendor who is insisting we use MAXDOP = 1 (I know...). They are also emphasizing the importance of striping our data files across different disks. As I understand it, an I/O operation is always paired with a CPU cycle. So if we have tables that were spread over different disks, wouldn't MAXDOP = 1 make reads/writes occur sequentially rather than in parallel?
cgoll (3 rep)
Aug 13, 2021, 03:25 PM • Last activity: Aug 14, 2021, 12:14 PM
6 votes
3 answers
1218 views
Enforce uniqueness across a one-to-many-to-many bridged relationship when constraining to one-to-many
We have defined a series of _configurations_, where, driven by a RESTful API, end-users can build up new _revisions_. Some of the components of the configuration can have more than one value; a revision involves multiple tables with one-to-many relationships. Because the configuration is shipped off...
We have defined a series of _configurations_, where, driven by a RESTful API, end-users can build up new _revisions_. Some of the components of the configuration can have more than one value; a revision involves multiple tables with one-to-many relationships. Because the configuration is shipped off elsewhere, revisions are marked as _deployed_, and become immutable. Users have to create a new revision (which can be cloned from an existing one) if they want to make changes to a configuration. _One_ *revision* per configuration can be marked as 'current'; this allows the users to switch between past revisions at will, or disable the configuration entirely by not picking any revision. The current revision is deployed, when marking a different revision as 'current' you replace the deployed config. We already have everything in place to enforce immutability of deployed revisions; the deployed column is automatically transitioned to TRUE when you first use a revision as the current revision, and all further INSERT, UPDATE and DELETE operations concerning rows that match a deployed revision id in revision-related tables is blocked. However, any value used for the name column in the *public name* table, **must** be unique across all the 'current' revisions across all current configurations. I'm trying to figure out the best strategy to enforce this. If this was a plain one-to-many relationship from config to public names, this would be solved by using a unique constraint on the name column. This is, instead, a one-to-many-to-many pattern with revision acting as the bridge table, and the current_revision_id "collapses" the one-to-many-to-many to a virtual one-to-many relationship from config to public names. Here is a simplified set of tables that illustrate our situation: > -- Configurations > CREATE TABLE config ( > id INT PRIMARY KEY, > name VARCHAR(100), > current_revision_id INT > ); > > -- Have multiple revisions > CREATE TABLE revision ( > id INT PRIMARY KEY, > config_id INT NOT NULL REFERENCES config(id), > created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP, > description VARCHAR, > foo INT NOT NULL, > bar BOOLEAN NOT NULL, > deployed BOOLEAN NOT NULL DEFAULT FALSE > ); > > -- A configuration has one _current_ revision > ALTER TABLE config > ADD CONSTRAINT current_revision_id_fk > FOREIGN KEY (current_revision_id) > REFERENCES revision(id); > > -- Revisions are automatically numbered in a view > CREATE VIEW numbered_revision AS ( > SELECT *, > row_number() OVER ( > PARTITION BY config_id > ORDER BY created_at, id > ) AS number > FROM revision > ); > > -- Configurations have multiple 'public names' > CREATE TABLE public_name ( > id INT PRIMARY KEY, > revision_id INT NOT NULL REFERENCES revision(id), > name VARCHAR(100), > > UNIQUE (revision_id, name) > ); The view only serves to provide revisions with gapless numbers per config (revisions are never deleted). As an ERD diagram: diagram of tables Some sample data to illustrate the setup: > INSERT INTO config (id, name) VALUES > (17, 'config_foo'), > (42, 'config_bar'); > > INSERT INTO revision (id, config_id, created_at, description, foo, bar) VALUES > (11, 17, '2021-05-29 09:07:18', 'Foo configuration, first draft', 81, TRUE), > (19, 17, '2021-05-29 10:42:17', 'Foo configuration, second draft', 73, TRUE), > (23, 42, '2021-05-29 09:36:52', 'Bar configuration, first draft', 118, FALSE); > > INSERT INTO public_name (id, revision_id, name) VALUES > -- public names for foo configuration, first draft > (83, 11, 'some.name'), > (84, 11, 'other.name'), > -- public names for foo configuration, second draft > (85, 19, 'revised.name'), > (86, 19, 'other.name'), > (87, 19, 'third.name'), > -- public names for bar configuration, first draft; > -- some of the names here are the same used by foo configurations > (88, 23, 'some.name'), > (89, 23, 'unique.name'), > (90, 23, 'other.name'); > > -- Foo configuration has a current, published revision: > UPDATE config SET current_revision_id = 19 WHERE id = 17; > UPDATE revision SET deployed = TRUE WHERE id in (11, 19); Here is a query showing the sample dataset: > SELECT > c.name AS config, > rev.number AS revision, > rev.deployed, > CASE WHEN c.current_revision_id = rev.id > THEN 'ACTIVE' > ELSE '' > END AS status, > string_agg(p.name, ', ' ORDER BY p.name) AS names > FROM config c > JOIN numbered_revision AS rev ON c.id = rev.config_id > JOIN public_name p ON p.revision_id = rev.id > GROUP BY c.id, rev.id, rev.number, rev.deployed > ORDER BY c.id, rev.number; > > config | revision | deployed | status | names > :--------- | -------: | :------- | :----- | :----------------------------------- > config_foo | 1 | t | | other.name, some.name > config_foo | 2 | t | ACTIVE | **other.name**, revised.name, third.name > config_bar | 1 | f | | **other.name**, some.name, unique.name *dbfiddle [here](https://dbfiddle.uk/?rdbms=postgres_13&fiddle=a4602704df384c1f1d5f6a170d3bd230)* In the above output table, the second row represents a "current" revision, made public *deployed*), and that row has been given exclusive access to the public names in the names column. The third row represents a configuration with a draft revision. Any attempts to set it as _current_ for config_bar should fail because the name other.name is already in use for config_foo, revision 2. If, in the future, config_foo were to create a new revision that doesn't include other.name, only then could config_bar revision 1 be made current. We do pre-validate this constraint; the API runs some checks and blocks marking a configuration as current when pre-conditions are not met. Names in the public_name table are also constrained to be unique per revision (UNIQUE (revision_id, name)). Neither of these prevents a race condition, they just reduce the rate at which race conditions happen. I was hoping a CONSTRAINT TRIGGER on config, firing on UPDATEs of the current_revision_id column, would be sufficient to enforce this constraint:
CREATE OR REPLACE FUNCTION unique_current_names() RETURNS trigger
   LANGUAGE plpgsql AS
$$BEGIN
   IF EXISTS (
     SELECT 1
     FROM public_name p
     WHERE
       p.revision_id = NEW.current_revision_id
     AND p.name IN (
       SELECT pp.name 
       FROM config AS pc
       JOIN public_name pp ON pp.revision_id = pc.current_revision_id
        AND pc.id != OLD.id
     )
   ) THEN
      RAISE EXCEPTION 'Public name is already published';
   END IF;
 
   RETURN NEW;
END;$$;
 
DROP TRIGGER IF EXISTS unique_current_names_trig ON config;
CREATE CONSTRAINT TRIGGER unique_current_names_trig
   AFTER UPDATE OF current_revision_id ON config
   DEFERRABLE INITIALLY DEFERRED
   FOR EACH ROW EXECUTE PROCEDURE unique_current_names();
(Note that the relationship between config and public_name is, in the general case, a many-to-many connection, but for the more specific current_revision_id case, it is a one-to-many connection, and you can use config.current_version_id = public_name.version_id to list the names directly.) My concern is that, even though this trigger fires at the very end of a transaction, there is still the possibility of a race condition, wherein another connection also tries to make a revision current with conflicting public names. OTOH, because all updates and inserts are the results of RESTFul API operations, there will never be a transaction that includes multiple operations (updates of public_name, *and* setting current_revision_id). Is that enough to prevent race conditions here, or are there corner cases I missed? Another option might be to copy the public names of the current revision into a separate “published names” table (with a trigger; delete all old names, insert all new names), with a UNIQUE constraint on the name column there. Would that work better than the constraint trigger? Note that we can’t use namespaces or other additions to the names (which are hostnames, on the public internet) to make them unique. The names must be unique entirely on their own, once deployed. We are aware the design allows a configuration to reference a current revision_id that belongs to a different configuration. That’s a possibility we explicitly guard against at the application level, but a trigger could also handle that.
Martijn Pieters (175 rep)
May 31, 2021, 09:39 PM • Last activity: Jun 3, 2021, 05:50 AM
0 votes
1 answers
237 views
B2B application: users has its own users
B2B Scenario I am trying is that: The following Roles exists in the system: 1. Super Admin 2. Super Admin Staff 3. Owner (client) 4. Admin (owner i.e. client and creates admin to define sales counters) 5. Sales Counters That means there are users who'll create their own sub users e.g. **[FLAG-A]** R...
B2B Scenario I am trying is that:
The following Roles exists in the system: 1. Super Admin 2. Super Admin Staff 3. Owner (client) 4. Admin (owner i.e. client and creates admin to define sales counters) 5. Sales Counters That means there are users who'll create their own sub users e.g.
**[FLAG-A]** ROLE "SUPER ADMIN" creates a ROLE "Owner" and "OWNER" creates its own sub users ROLES i.e. "ADMN" & "SALES COUNTER". Each role obviously has its info to login to the system I tried to design database as following:
enter image description here In this diagram I am assuming that main user creates sub user that in my view many to many relationship so I have to add table between them as "USER_PERSON". Problem is to set "WHO CREATED USER" in the system. I mean which user has created sub user as described in **[FLAG-A]**
Please check this diagram enter image description here
Murteza (5 rep)
Feb 23, 2021, 09:22 PM • Last activity: Feb 24, 2021, 09:23 AM
0 votes
3 answers
1218 views
char or smallint for a column with numeric codes?
I have columns in my application that have values ​​that always have 4 digits, e.g. '0100' or '1230'. One of examples could be [Merchant Category Code][1]. For the time being, I have set the type of this column to `char (4)`. I have many columns of this type. One of these types of columns is also in...
I have columns in my application that have values ​​that always have 4 digits, e.g. '0100' or '1230'. One of examples could be Merchant Category Code . For the time being, I have set the type of this column to char (4). I have many columns of this type. One of these types of columns is also included in nonclustered UNIQUE index. I would like to establish one approach to this type of column. Is it better to use char (4) or maybe smallint, padding in the application by adding ' ' or 0 to the beginning? The table in which I have these columns is partitioned and will contain about 300 million records. UPDATE will probably not be executed on columns with this type. They will also not appear in WHERE (maybe sometimes in *ad hoc* reports). Which data type should I choose? > It's like a zip code, it can be numeric but you must store leading zeros Exactly. I am thinking what is better approach in terms of performance. Also have to take into consideration that if I store it as smallint, the application code has to parse it to string and add leading zeros.
axdna (119 rep)
Oct 25, 2020, 04:53 PM • Last activity: Oct 31, 2020, 03:05 PM
5 votes
2 answers
18804 views
Separate stored procedures for inserts and updates?
I have a table in Microsoft SQL Server. Sometimes I need to update, and sometimes I need to insert. I could write 2 stored procedures: InsertNewPerson UpdatePertsonById But I was thinking to write 1 stored procedure instead (`SetPerson`) which would do both (if there is an `ID`, it is an `update` op...
I have a table in Microsoft SQL Server. Sometimes I need to update, and sometimes I need to insert. I could write 2 stored procedures: InsertNewPerson UpdatePertsonById But I was thinking to write 1 stored procedure instead (SetPerson) which would do both (if there is an ID, it is an update operation , else insert). Should I create one stored procedure (only one to maintain) or should I create two different stored procedures?
Royi Namir (1233 rep)
Dec 4, 2012, 08:03 AM • Last activity: May 15, 2020, 02:01 AM
14 votes
3 answers
46516 views
What is the fastest way to fetch the last row from a table?
I have a PostgreSQL table, `Prices`, with the columns: - `price` (Decimal) - `product_id` (Int) There are also `created_at` and `updated_at` columns. Prices get updated regularly and I keep old prices in the table. For a given product, the last price in the table is the current price. What is the mo...
I have a PostgreSQL table, Prices, with the columns: - price (Decimal) - product_id (Int) There are also created_at and updated_at columns. Prices get updated regularly and I keep old prices in the table. For a given product, the last price in the table is the current price. What is the most efficient way to get the last price for a specific product: - Index product_id and query for the last record - Add a third column active (Boolean) to mark the latest price and create a composite index (product_id and active) - Or something else?
Mike81 (315 rep)
Feb 16, 2014, 07:49 AM • Last activity: May 8, 2020, 02:58 AM
5 votes
1 answers
1181 views
PostgreSQL Index Only Scan can NOT return ctid
I have this table in PostgreSQL 11: CREATE TABLE A (id bigint PRIMARY KEY, text text) Now I want to fetch the `ctid` of rows meeting some condition like `id = 123`. However, even using `pg_hint_plan` to hint PostgreSQL to use *Index Only Scan*, it will always give me *Index Scan*: ``` /*+ IndexOnlyS...
I have this table in PostgreSQL 11: CREATE TABLE A (id bigint PRIMARY KEY, text text) Now I want to fetch the ctid of rows meeting some condition like id = 123. However, even using pg_hint_plan to hint PostgreSQL to use *Index Only Scan*, it will always give me *Index Scan*:
/*+ IndexOnlyScan(a) */ explain analyze select ctid from a where id = 823977776533426178;
                                                QUERY PLAN                                                
----------------------------------------------------------------------------------------------------------
 Index Scan using a_pkey on a  (cost=0.14..8.16 rows=1 width=6) (actual time=0.038..0.039 rows=1 loops=1)
   Index Cond: (id = '823977776533426178'::bigint)
 Planning Time: 0.122 ms
 Execution Time: 0.095 ms
(4 rows)
My understanding is that ctid is the physical row id (block_no: record_offset) of each row and it must be included in any btree index, since it needs this information to fetch the row from heap file. Then why can't an index-only scan return the ctid directly? Is it just that PostgreSQL does not implement it that way?
Qiushi Bai (65 rep)
Apr 14, 2019, 05:32 AM • Last activity: May 2, 2020, 11:47 PM
7 votes
1 answers
7224 views
One big table vs few smaller
**Please take note that the example below is just an example, my scenario is way more complex and the way i'm trying to model it really makes sense** Let's say i'm creating a table for audit events in one of my apps - so all of the "event_created", "user_created" etc. kind of thing. The table contai...
**Please take note that the example below is just an example, my scenario is way more complex and the way i'm trying to model it really makes sense** Let's say i'm creating a table for audit events in one of my apps - so all of the "event_created", "user_created" etc. kind of thing. The table contains several columns, some of them are a foreign keys to other tables. Over time, this single table can grow to several milion of records. From performance perspective, is it faster and more performant to use a single table for all of them or to use separate table for each kind of event and operate on separate tables? Or it doesn't make much differrence? It might sound silly to create a separate table for each kind of event but you need to trust me that in my real world scenario, it really makes sense.
mbajur (193 rep)
Oct 27, 2018, 08:13 AM • Last activity: May 2, 2020, 09:38 PM
27 votes
1 answers
7846 views
Composite Primary Key in multi-tenant SQL Server database
I'm building a multi-tenant app (single database, single schema) using ASP Web API, Entity Framework, and SQL Server/Azure database. This app will be used by 1000-5000 customers. All the tables will have `TenantId` (Guid / `UNIQUEIDENTIFIER`) field. Right now, I use single field Primary Key which is...
I'm building a multi-tenant app (single database, single schema) using ASP Web API, Entity Framework, and SQL Server/Azure database. This app will be used by 1000-5000 customers. All the tables will have TenantId (Guid / UNIQUEIDENTIFIER) field. Right now, I use single field Primary Key which is Id (Guid). But by using just the Id field, I have to check if the data supplied by the user is from / for the right tenant. For example, I have a SalesOrder table which has a CustomerId field. Every time users post/update a sales order, I have to check if the CustomerId is from the same tenant. It gets worse because each tenant might have several outlets. Then I have to check TenantId and OutletId. It's really a maintenance nightmare and bad for performance. I'm thinking to add TenantId to the Primary Key along with Id. And possibly add OutletId, too. So the Primary Key in the SalesOrder table will be: Id, TenantId, and OutletId. What is the downside of this approach? Would the performance hurt badly using a composite key? Does the composite key order matter? Are there better solutions for my problem?
Reynaldi (371 rep)
Apr 17, 2015, 08:12 AM • Last activity: Dec 12, 2019, 07:14 PM
8 votes
2 answers
1705 views
Fastest way to split/store a long string for charindex function
I have a 1 TB string of digits. Given a 12-character sequence of digits I want to get the start-position of this sequence in the original string (`charindex` function). I have tested this with a 1GB string and a 9-digit-substring using SQL Server, storing the string as a `varchar(max)`. `Charindex`...
I have a 1 TB string of digits. Given a 12-character sequence of digits I want to get the start-position of this sequence in the original string (charindex function). I have tested this with a 1GB string and a 9-digit-substring using SQL Server, storing the string as a varchar(max). Charindex takes 10 secs. Breaking up the 1GB string in 900 byte overlapping chunks and creating a table (StartPositionOfChunk, Chunkofstring) with chunkofstring in binary collation, indexed takes under 1 sec. Latter method for 10GB,10 digit-substring rises charindex to 1,5 min. I would like to find a faster storage method. ### Example string of digits: 0123456789 - substring to search 345 charindex('345','0123456789') gives 4 *Method 1*: I can now store this in a SQL Server table strtable consisting of one column colstr and perform:
select charindex('345',colstr) from strtable
*Method 2*: or I can make up a table **strtable2 (pos,colstr1)** by splitting up the original string: **1;012 | 2;123 | 3;234 a.s.o** and then we can have the query
select pos from strtable2 where colstr1='345'
*Method 3*: I can make up a table **strtable2 (pos2,colstr2)** by splitting up the original string into **larger chunks 1;01234 | 4;34567 | 7;6789** and then
select pos2+charindex('345',colstr2) from strtable2 where colstr2 like '%345%'
First method is the slowest. Second method blows up the database storage size! *Method 3*: Setting colstr2 length to 900 bytes in binary collation, creating an index on this column takes 1 sec for 1GB string and 9 digit substring search. For 10GB string and 10 digit substring ist takes 90 secs. **Any other idea how to make this faster (maybe by utilizing the string consists of Digits, with Long integers,....)?** Search is always for a 12 digit substring in a 1TB string of digits SQL Server 2017 Developer Edition, 16 cores, 16GB RAM. Primary goal is search speed! 10 digits in a 10GB string (for performance testing).
Werner Aumayr (181 rep)
Feb 15, 2019, 07:34 PM • Last activity: Feb 19, 2019, 01:35 AM
2 votes
1 answers
759 views
How to Store Skip Lists on Disk
I am new to thinking in terms of databases. I have read about how B-trees are stored on disk but I haven't seen an actual implementation or read about how it works for B-trees. Wondering how one might store Skip Lists on disk. There are many descriptions for storing it with pointers in memory, but I...
I am new to thinking in terms of databases. I have read about how B-trees are stored on disk but I haven't seen an actual implementation or read about how it works for B-trees. Wondering how one might store Skip Lists on disk. There are many descriptions for storing it with pointers in memory, but I haven't seen anything on disk storage. Wondering what it might look like at a high level (if it's just one big file or many small files, if there are different types of files, etc.).
Lance Pollard (221 rep)
Jul 1, 2018, 04:59 PM • Last activity: Dec 10, 2018, 08:51 AM
Showing page 1 of 20 total questions