Why MS SQL bigint type is implicitly mapped to float64 python type, and what is the best way to handle it?
6
votes
1
answer
844
views
Python integer type has unlimited precision so it is more than capable to hold a bigint value of MS SQL (64 bit). Still it is implicitly mapped to float64 python type, when passed to an external script.
This can cause serious calculation errors for large integers.
So why is it mapped to float64?
My guess is:
R was added before Python via the [Extensibility architecture](https://learn.microsoft.com/en-us/sql/machine-learning/concepts/extensibility-framework?view=sql-server-ver15) and it has fixed precision integers (32 bit). So it can't hold bigints. So perhaps this is a compatibility issue.
What is the best practice to ensure precise calculations?
Simple but working idea: pass bigints as string then parse them as int.
I know it has a slim chance to cause problem in practice, but still good to know.
# How can it be a problem:
I wrote a simple example to demonstrate how can it be a problem:
CREATE TABLE #test (
big_integer BIGINT
);
INSERT INTO #test
(big_integer)
VALUES
(36028797018963968),
(36028797018963968 + 1);
EXECUTE sp_execute_external_script
@language = N'Python',
@input_data_1 = N'SELECT big_integer FROM #test',
@script = N'
print(InputDataSet.dtypes)
OutputDataSet = InputDataSet
'
Executing this code on SQL Server 2019 will give you the result of:
| | (No column name) |
|---------------------|
|1| 36028797018963970 |
|2| 36028797018963970 |
and because of the
print(InputDataSet.dtypes)
statement we can see the following message:
...
STDOUT message(s) from external script:
big_integer float64
dtype: object
...
So we got a floating point rounding error. The value of this error for big enough integers is greater than 1, which is the root of this problem.
It is out of the scope of this question to teach floating point arithmetics, but I link some good materials if you don't understand what did happen:
[Simple example - Stack Overflow](https://stackoverflow.com/questions/249467/what-is-a-simple-example-of-floating-point-rounding-error) .
Floating Point Numbers - Computerphile
The IEEE 754 Format - Oxford
I also share a small ipython sample if you want to experiment with this (which isn't a substitute of learning the theory behind this):
In : import numpy as np
In : a = 2**55
In : a
Out: 36028797018963968
In : float(a) == float(a + 1)
Out: True
In : float(a)
Out: 3.602879701896397e+16
In : float(a + 1)
Out: 3.602879701896397e+16
In : np.nextafter(float(a), np.inf)
Out: 3.6028797018963976e+16
# Note
To run my example T-SQL some conditions must be met:
- Machine Learning Services must be installed
- external scripts must be enabled
- Privilege to execute external scripts must be granted
- You must have SQL Server 2017 CTP 2.0 or later
Asked by atevm
(337 rep)
Feb 16, 2021, 02:19 PM
Last activity: Feb 17, 2021, 12:55 PM
Last activity: Feb 17, 2021, 12:55 PM