T-SQL: Iteration vs. Set-Based Operations When Independence of Ancillary Data is Needed

4 votes

1 answer

222 views

sql-server t-sql join ssms database-theory

BACKGROUND ---------- I am using Microsoft SQL Server Management Studio v18.9.1 on Windows 10. I've created a database to store test results of our company's widget (not yet being produced). I am working on a stored procedure to populate that database with random test result records. The purpose of this is to have a data set to use for query development while waiting for real widgets to start being produced and tested. In what follows, our test suite is composed of multiple subtest types. For this post, I've created only one subtest type. The example below runs the test suite one time (with a **[test_id]** of 0). Within a test, a given subtest may be run multiple times if the tester deems it necessary for any reason. In the example below, the one subtest I show is run five times for illustrative purposes. The subtest I show has two parameters, the first of which can take on two discrete values, and the second of which can take on three discrete values. All subtests must be run over all parameter combinations. So, a subtest result consists of 2 * 3 = 6 partial results (one for each parameter combination). When a parameter combination is tested, either anomalies are not observed (pass) or one or more anomalies are observed (fail). If any of the six parameter combinations produces one or more anomalies, the subtest as a whole fails. A subtest result record contains the following columns: - **[test_id]** - **[subtest_run_number]** - **[parameter_1]** - **[parameter_2]** - **[anomaly_id]** - **[anomaly_magnitude]** - **[subtest_type]** If testing with a given parameter combination produces no anomalies, this is noted by creating a single subtest record containing the two parameter values, an **[anomaly_id]** of 0, and an **[anomaly_magnitude]** of NULL. If testing with a given parameter combination produces N anomalies, this is noted by creating N subtest records. Each subtest record contains the parameter values that produced the anomaly. Each subtest record also contains a 1-based anomaly ID for one of the anomalies seen as well as the "magnitude" of that anomaly. In producing this data set of random test results, I am trying to use set-based operations rather than iteration. As can be seen below, I did have to use iteration when 1) looping over the number of times the subtest would be run and when 2) looping over the (randomly-chosen) number of anomalies that were to be created for a given parameter combination. Where I was able to use set-based operations was when creating test result data for every parameter combination. This was a natural place to use a cross join since each parameter has a table defining its valid values. However, there's a problem... THE PROBLEM ----------- Within a subtest run, every parameter combination has the same number of anomalies associated with it. What I want is for the anomaly count associated with each parameter combination to be able to vary independently. AN UNDESIRABLE SOLUTION ----------------------- The only way I can think of accomplishing this is to forego set-based operations and iterate as follows: DECLARE @parameter_1 as INT = 1; DECLARE @parameter_2 as INT = 1; DECLARE @anomaly_count AS INT; EXEC [dbo].[rand_int] @a = 0, @b = 2, @random_int = @anomaly_count OUTPUT; WHILE @parameter_1 0 AND [anomaly_magnitude] IS NOT NULL) OR ([anomaly_id] = 0 AND [anomaly_magnitude] IS NULL) ) ); GO -- Stored Procedure CREATE PROC [dbo].[rand_int] @a AS INT, @b AS INT, @random_int AS INT OUTPUT AS SET NOCOUNT ON; SET @random_int = FLOOR(RAND() * (@b - @a + 1)) + @a; GO -- Stored Procedure CREATE PROC [dbo].[generate_random_subtest_type_1_data] @test_id AS VARCHAR(20) AS SET NOCOUNT ON; DECLARE @subtest_type_1_count AS INT = 5; DECLARE @subtest_type_1_loop AS INT = 0; WHILE @subtest_type_1_loop 0 SET @subtest_result = 'fail'; ELSE SET @subtest_result = 'pass'; INSERT INTO [dbo].[test_runs] ([test_id], [subtest_run_number], [subtest_type], [subtest_result]) VALUES (@test_id, @subtest_type_1_loop, 'subtest_type_1', @subtest_result); IF @anomaly_count = 0 BEGIN INSERT INTO [dbo].[subtest_type_1_data] ([test_id], [subtest_run_number], [parameter_1], [parameter_2], [anomaly_id], [anomaly_magnitude]) SELECT @test_id AS [test_id], @subtest_type_1_loop AS [subtest_run_number], [parameter_1_value] AS [parameter_1_value], [parameter_2_value] AS [parameter_2_value], 0 AS [anomaly_id], NULL AS [anomaly_magnitude] FROM [dbo].[parameter_1_values] CROSS JOIN [dbo].[parameter_2_values]; END; ELSE BEGIN DECLARE @anomaly_loop AS INT; SET @anomaly_loop = 1; WHILE @anomaly_loop <= @anomaly_count BEGIN -- Instead of RAND(), use ABS(CHECKSUM(NewId()) / 2147483647.0). -- This is because RAND() gets invoked only once in the INSERT statement below. -- -- By using an expression based on NewID(), every row will be assigned its -- own random number. INSERT INTO [dbo].[subtest_type_1_data] ([test_id], [subtest_run_number], [parameter_1], [parameter_2], [anomaly_id], [anomaly_magnitude]) SELECT @test_id AS [test_id], @subtest_type_1_loop AS [subtest_run_number], [parameter_1_value] AS [parameter_1_value], [parameter_2_value] AS [parameter_2_value], @anomaly_loop AS [anomaly_id], ABS(CHECKSUM(NewId()) / 2147483647.0) AS [anomaly_magnitude] FROM [dbo].[parameter_1_values] CROSS JOIN [dbo].[parameter_2_values]; SET @anomaly_loop = @anomaly_loop + 1; END; END; SET @subtest_type_1_loop = @subtest_type_1_loop + 1; END; GO **SCRIPT 2** -- Script 2 -- Delete old data, generate a new data set of random test results, and display it DELETE FROM [dbo].[subtest_type_1_data]; DELETE FROM [dbo].[test_runs]; GO EXEC [dbo].[generate_random_subtest_type_1_data] @test_id = '0'; GO SELECT * FROM [dbo].[test_runs]; SELECT * FROM [dbo].[subtest_type_1_data]; GO EXAMPLE OUTPUT (current, incorrect) ----------------------------------- In the example output below: - All parameter combinations within subtest run 0 have the **same number** (0) of anomalies - All parameter combinations within subtest run 1 have the **same number** (0) of anomalies - All parameter combinations within subtest run 2 have the **same number** (1) of anomaly - All parameter combinations within subtest run 3 have the **same number** (0) of anomalies - All parameter combinations within subtest run 4 have the **same number** (2) of anomalies I would like each parameter combination's anomaly count to be able to vary independently of any other parameter combination's anomaly count.

test_id subtest_run_number  parameter_1 parameter_2 anomaly_id  anomaly_magnitude   subtest_type
    ------- ------------------  ----------- ----------- ----------  -----------------   ------------
    0       0                   1           1           0           NULL                subtest_type_1
    0       0                   1           2           0           NULL                subtest_type_1
    0       0                   1           3           0           NULL                subtest_type_1
    0       0                   2           1           0           NULL                subtest_type_1
    0       0                   2           2           0           NULL                subtest_type_1
    0       0                   2           3           0           NULL                subtest_type_1
    0       1                   1           1           0           NULL                subtest_type_1
    0       1                   1           2           0           NULL                subtest_type_1
    0       1                   1           3           0           NULL                subtest_type_1
    0       1                   2           1           0           NULL                subtest_type_1
    0       1                   2           2           0           NULL                subtest_type_1
    0       1                   2           3           0           NULL                subtest_type_1
    0       2                   1           1           1           0.8730268           subtest_type_1
    0       2                   1           2           1           0.5566615           subtest_type_1
    0       2                   1           3           1           0.4599889           subtest_type_1
    0       2                   2           1           1           0.9322677           subtest_type_1
    0       2                   2           2           1           0.3515796           subtest_type_1
    0       2                   2           3           1           0.872755            subtest_type_1
    0       3                   1           1           0           NULL                subtest_type_1
    0       3                   1           2           0           NULL                subtest_type_1
    0       3                   1           3           0           NULL                subtest_type_1
    0       3                   2           1           0           NULL                subtest_type_1
    0       3                   2           2           0           NULL                subtest_type_1
    0       3                   2           3           0           NULL                subtest_type_1
    0       4                   1           1           1           0.6965834           subtest_type_1
    0       4                   1           1           2           0.4588626           subtest_type_1
    0       4                   1           2           1           0.1284888           subtest_type_1
    0       4                   1           2           2           0.4331938           subtest_type_1
    0       4                   1           3           1           0.3083588           subtest_type_1
    0       4                   1           3           2           0.8907238           subtest_type_1
    0       4                   2           1           1           0.4016767           subtest_type_1
    0       4                   2           1           2           0.8041269           subtest_type_1
    0       4                   2           2           1           0.5932015           subtest_type_1
    0       4                   2           2           2           0.5389434           subtest_type_1
    0       4                   2           3           1           0.7058043           subtest_type_1
    0       4                   2           3           2           0.749176            subtest_type_1

EXAMPLE OUTPUT (desired) ------------------------ Below, I show an example of what desired output might look like. (Here, I show only rows associated with **[subtest_run_number]** = 0. Of course, what's being demonstrated here holds for all **[subtest_run_number]** values.) In the example output below: - The parameter combination (1, 1) has one anomaly (**[anomaly_id]** value is only 1) - The parameter combination (1, 2) has zero anomalies (**[anomaly_id]** value is only 0) - The parameter combination (1, 3) has two anomalies (**[anomaly_id]** values are 1, 2) - The parameter combination (2, 1) has one anomaly (**[anomaly_id]** value is only 1) - The parameter combination (2, 2) has zero anomalies (**[anomaly_id]** value is only 0) - The parameter combination (2, 3) has three anomalies (**[anomaly_id]** values are 1, 2, 3) The salient point is that the number of anomalies per parameter combination can vary within a given value of **[subtest_run_number]**. In the incorrect example output shown earlier, the number of anomalies per parameter combination could not vary within a given value of **[subtest_run_number]**.

test_id subtest_run_number  parameter_1 parameter_2 anomaly_id  anomaly_magnitude   subtest_type
    ------- ------------------  ----------- ----------- ----------  -----------------   ------------
    0       0                   1           1           1           0.5095024           subtest_type_1
    0       0                   1           2           0           NULL                subtest_type_1
    0       0                   1           3           1           0.5062660           subtest_type_1
    0       0                   1           3           2           0.3940517           subtest_type_1
    0       0                   2           1           1           0.6216237           subtest_type_1
    0       0                   2           2           0           NULL                subtest_type_1
    0       0                   2           3           1           0.5802680           subtest_type_1
    0       0                   2           3           2           0.5673455           subtest_type_1
    0       0                   2           3           3           0.5517588           subtest_type_1

MY QUESTION ----------- Is there a way I can continue to use set-based operations but have each parameter combination have its own, independent anomaly count? Or, must I resort to iteration?

Asked by Dave (197 rep)

Nov 2, 2022, 01:58 PM
Last activity: Jun 25, 2025, 03:06 PM

T-SQL: Iteration vs. Set-Based Operations When Independence of Ancillary Data is Needed

Related Questions