Why MS SQL bigint type is implicitly mapped to float64 python type, and what is the best way to handle it?

6 votes

1 answer

844 views

                          Python integer type has unlimited precision  so it is more than capable to hold a bigint value of MS SQL (64 bit). Still it is implicitly mapped  to float64 python type, when passed to an external script.

This can cause serious calculation errors for large integers.

So why is it mapped to float64?

My guess is:

R was added before Python via the [Extensibility architecture](https://learn.microsoft.com/en-us/sql/machine-learning/concepts/extensibility-framework?view=sql-server-ver15)  and it has fixed precision integers (32 bit). So it can't hold bigints. So perhaps this is a compatibility issue.

What is the best practice to ensure precise calculations?

Simple but working idea: pass bigints as string then parse them as int.

I know it has a slim chance to cause problem in practice, but still good to know.

# How can it be a problem:

I wrote a simple example to demonstrate how can it be a problem:

    CREATE TABLE #test (
        big_integer BIGINT
    );
    
    INSERT INTO #test 
        (big_integer)
    VALUES
        (36028797018963968),
        (36028797018963968 + 1);
    
    EXECUTE sp_execute_external_script 
        @language = N'Python',
        @input_data_1 = N'SELECT big_integer FROM #test',
        @script = N'
    print(InputDataSet.dtypes)
    OutputDataSet = InputDataSet
    '

Executing this code on SQL Server 2019 will give you the result of:

    | | (No column name)  |
    |---------------------|
    |1| 36028797018963970 |
    |2| 36028797018963970 |

and because of the print(InputDataSet.dtypes) statement we can see the following message:

    ...
    STDOUT message(s) from external script: 
    big_integer    float64
    dtype: object
    ...

So we got a floating point rounding error. The value of this error for big enough integers is greater than 1, which is the root of this problem.

It is out of the scope of this question to teach floating point arithmetics, but I link some good materials if you don't understand what did happen:

[Simple example - Stack Overflow](https://stackoverflow.com/questions/249467/what-is-a-simple-example-of-floating-point-rounding-error) .

Floating Point Numbers - Computerphile 

The IEEE 754 Format - Oxford 

I also share a small ipython sample if you want to experiment with this (which isn't a substitute of learning the theory behind this):

    In : import numpy as np
    
    In : a = 2**55
    
    In : a
    Out: 36028797018963968
    
    In : float(a) == float(a + 1)
    Out: True
    
    In : float(a)
    Out: 3.602879701896397e+16
    
    In : float(a + 1)
    Out: 3.602879701896397e+16
    
    In : np.nextafter(float(a), np.inf)
    Out: 3.6028797018963976e+16

# Note

To run my example T-SQL some conditions must be met:

 - Machine Learning Services must be installed 
 - external scripts must be enabled 
 - Privilege to execute external scripts must be granted 
 - You must have SQL Server 2017 CTP 2.0 or later

Asked by atevm (337 rep)

Feb 16, 2021, 02:19 PM
Last activity: Feb 17, 2021, 12:55 PM

Why MS SQL bigint type is implicitly mapped to float64 python type, and what is the best way to handle it?

Related Questions