Database Administrators
Q&A for database professionals who wish to improve their database skills
Latest Questions
0
votes
1
answers
149
views
Producing JSON in Oracle 12c1
I need to produce a JSON document with several properties from a list of Oracle 12c1 production databases, and I was given read-only access in order to generate this data. The problem is these databases were created over time and may store VARCHAR2 values in different character sets. My naive attemp...
I need to produce a JSON document with several properties from a list of Oracle 12c1 production databases, and I was given read-only access in order to generate this data.
The problem is these databases were created over time and may store VARCHAR2 values in different character sets. My naive attempt at producing JSON is:
SELECT '{' ||
'"content":"' || content || '", "' ||
'"name":"' || name || '"}'
FROM my_table
WHERE ...
However, since the columns
content
and name
may have all kind of special characters and use non-trivial character sets, the query above may produce invalid JSON documents.
Is there any function that can convert a VARCHAR2 into Unicode from any character set? I think if the string is in Unicode it could be easy to manage. Unfortunately, I don't really know each column's character set in advance.
Joe DiNottra
(101 rep)
Oct 26, 2021, 10:42 PM
• Last activity: Jul 24, 2025, 07:02 AM
1
votes
1
answers
286
views
Can't set up Unicode support
I am creating a toy website locally with XAMPP and I want it to support Unicode. index.php contains a simple form that I can use to add a new entry (a UTF-8 string) to a table "myTable" in the DB. In addition to that, I am also echoing with PHP every entry from "myTable" on the body of index.php, so...
I am creating a toy website locally with XAMPP and I want it to support Unicode.
index.php contains a simple form that I can use to add a new entry (a UTF-8 string) to a table "myTable" in the DB. In addition to that, I am also echoing with PHP every entry from "myTable" on the body of index.php, so that I can see if the UTF-8 strings that I am entering through the form make all the way into the database and back without any issues.
Here are the measures that I took toward Unicode support:
1. Every file on my website (ex.: index.php) is encoded in UTF-8 without BOM;
2.
` everywhere applicable too;
3. `mysqli_set_charset($myConnection, "utf-8");
` in my "db.inc.php" file;
4. At the phpMyAdmin "control panel" (?), under "Database Server" I see "Server charset: UTF-8 Unicode (utf8)";
5. I created a database and a table for this specific purpose. Here is the result of `SHOW CREATE TABLE
myTable
`:
`CREATE TABLE
myTable` (
row1
varchar(16) COLLATE utf8mb4_bin NOT NULL,
row2
varchar(32) COLLATE utf8mb4_bin NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin``
The outcome was as follows: when I print the entries of "myTable", everything does get displayed correctly on the browser indeed, however only owing to a stroke of luck, because apparently my strings are being internally represented badly. When I browse my table with phpMyAdmin I see mojibake, and I looked over at
myTable.ibd``` and I see the same mojibake.
The string `joão e maría
for example, is displayed as
joão e marÃa
; for the first
ã -> ã
, it could be UTF-8 bytes being interpreted as latin-1, though I see
ã
` even when I tell npp to interpret the file as UTF-8.
Running the query `SELECT
row1, HEX(
row1) FROM
myTable WHERE 1
I see
6A6FC383C2A36F2065206D6172C383C2AD61
as the bytes of
joão e marÃa
, so in fact the mojibake is "correctly" encoded as UTF-8 (
C3 83 C2 A3
is UTF-8 for
ã
`).
Ideas?
UPDATE
After more careful inspection I verified thatmysqli_set_charset($myConnection, "utf-8");
was returning false
. I replaced it with mysqli_set_charset($myConnection, "utf8");
and voilà, strings with fancy characters that I enter through the form now make it all the way into the database unscathed. However, these new strings aren't correctly displayed by the browser.
For example: the string bênção
is inserted into the DB through the form. I query the table with `SELECT
row, HEX(
row1) (...)
and find out that it is stored as
62C3AA6EC3A7C3A36F inside the table, which is the correct UTF-8 representation of the string in question. I access index.php and I see
b�n��o being displayed. I download the HTML of the webpage and inspect it, and I find that the offending bytes are respectively
EA,
E7,
E3, which, surely enough, correspond to
ê,
ç and
ã` in latin-1.
what a disgrace
(111 rep)
May 22, 2018, 07:16 AM
• Last activity: May 18, 2025, 08:11 PM
1
votes
1
answers
1078
views
Character Strings related : string_units db cfg changed from SYSTEM to CODEUNITS32
To handle few special characters, we changed `string_units` database configuration parameter from default `SYSTEM` to `CODEUNITS32`. Based on IBM [DB2 Knowledge Center][1], there are three methods to achieve this. 1.Set the `NLS_STRING_UNITS` global variable 2.Update the `string_units` database conf...
To handle few special characters, we changed
string_units
database configuration parameter from default SYSTEM
to CODEUNITS32
.
Based on IBM DB2 Knowledge Center , there are three methods to achieve this.
1.Set the NLS_STRING_UNITS
global variable
2.Update the string_units
database configuration parameter
3. Alter the column by defining new character set ( CODEUNITS16
or CODEUNITS32
)
We have used the second option and restarted the instance. Still it's not working as expected. Have we missed any steps?
Is the reorg required on the table? What happens to the already existing data ?
db2level - 10.5 fp5
OS : Windows
shivrajk
(73 rep)
Jan 5, 2017, 06:09 AM
• Last activity: Apr 14, 2025, 06:04 AM
1
votes
1
answers
1501
views
How to display non-English character correctly in db2cmd?
On Db2 v11.5.7 on Linux/x86_64 I have a UTF-8 database. Executing `db2 get db cfg for test1` returns: Database code page = 1208 Database code set = UTF-8 On my Windows 10 computer in Putty I have set: *Window | Translation | Remote character set* to UTF-8. I can properly see non-English characters i...
On Db2 v11.5.7 on Linux/x86_64 I have a UTF-8 database.
Executing
If I do the same in DBeaver:
insert into admin.test8 (a) values ('č');
select id, a, hex(a) as hex_a from admin.test8 order by id;
I get:
Both Putty and DBeaver correctly inserts character in database.
Now saving the commands into UTF-8 encoding:
From db2cmd I check for the code page:
chcp
and output is:
Active code page: 1252
Execute commands:
db2 -tf a.sql
Output in column A is completely wrong. In ID=3 also HEX_A is completely wrong.
I tried several things like:
1. Changing:
Now to configure db2cmd to properly handle the UTF-8 code-page?
db2 get db cfg for test1
returns:
Database code page = 1208
Database code set = UTF-8
On my Windows 10 computer in Putty I have set: *Window | Translation | Remote character set* to UTF-8. I can properly see non-English characters in Putty.
I have also installed DBeaver tool and I can also properly see non-English characters.
The problem I have is "db2cmd" tool.
In Putty remote connection I created tables and inserted one character:
db2 "connect to test1 user db2inst1 using db2inst1"
db2 "create table admin.test8 (id int not null generated always as identity, a varchar(20))"
db2 "insert into admin.test8 (a) values ('č')"
db2 "select id, a, hex(a) as hex_a from admin.test8 order by id"
Output of select statement is (which is correct):




chcp 65001
2. db2set DB2CODEPAGE=1208
But in both cases I see the same result as last command.
If I try to connect with "-a" switch I get Db2 server code page is 1208 (utf-8) and Db2 client code page is 1252.

folow
(523 rep)
Oct 18, 2022, 01:08 PM
• Last activity: Mar 20, 2025, 10:03 PM
1
votes
1
answers
2041
views
How to fix unicode REPLACEMENT CHARACTER � in postgres database
Is there a way fix the � problem in postgres database: [![enter image description here][1]][1] [1]: https://i.sstatic.net/r9WuT.png As you can see there are names with accents and are not displayed corretly in the database. I would like to know if there is a way to fix this without dropping the data...
Is there a way fix the � problem in postgres database:
As you can see there are names with accents and are not displayed corretly in the database.
I would like to know if there is a way to fix this without dropping the database and restoring it... ?
When I execute
SELECT CAST (userfirstname AS bytea) FROM tab WHERE id = 42;`
this is the result i get:

bytea
--------------------------------------
\x5374efbfbd7068616e696520284d6d6529
(1 ligne)
`
rainman
(205 rep)
May 12, 2022, 03:10 AM
• Last activity: Jul 27, 2024, 02:55 AM
1
votes
0
answers
229
views
MySQL 8.0 fails to store 5.7-allowed symbols (Incorrect string value, Error 1366)
I'm upgrading MySQL 5.7 server to 8.0 with a bunch of user-input data that sometimes contain "malformed" unicode characters like `x'eda0bdedb98f'`. The latter seems to be 2 UTF-16 characters `0xD83D 0xDE4F` (4 bytes) of ['Folded Hands'-emoji](https://www.fileformat.info/info/unicode/char/1f64f/index...
I'm upgrading MySQL 5.7 server to 8.0 with a bunch of user-input data that sometimes contain "malformed" unicode characters like
x'eda0bdedb98f'
. The latter seems to be 2 UTF-16 characters 0xD83D 0xDE4F
(4 bytes) of ['Folded Hands'-emoji](https://www.fileformat.info/info/unicode/char/1f64f/index.htm) encoded into 2 utf8mb3 characters (6 bytes).
[This](https://cryptii.com/pipes/hex-decoder) service correctly decodes the emoji, while the others consider it to be a wrong unicode character.
### Problem ###
This symbol can be inserted into MySQL 5.7 table, but fails to insert into MySQL 8.0 table. Depending on sql_mode
insert either fails or proceeds with a truncated string. Thus mysqldump + restore produces inconsistent data. The inplace upgrade works fine but I still cannot manipulate (insert/update) these characters. Changing the charset/collation doesn't help.
### Question ###
Is that possible to force MySQL 8.0 to store "malformed" unicode characters like MySQL 5.7 does?
### Steps to reproduce ###
Run latest MySQL 5.7 in Docker:
-bash
docker run --rm --name percona57 -e MYSQL_ALLOW_EMPTY_PASSWORD=1 percona:5.7 --character_set_server=utf8mb4 --collation-server=utf8mb4_general_ci &
docker exec -ti --user root percona57 mysql --default-character-set=utf8mb4
Run latest MySQL 8.0 in Docker:
-bash
docker run --rm --name percona80 -e MYSQL_ALLOW_EMPTY_PASSWORD=1 percona:8.0 --character_set_server=utf8mb4 --collation-server=utf8mb4_general_ci &
docker exec -ti --user root percona80 mysql --default-character-set=utf8mb4
Execute SQL commands (on both 5.7 and 8.0):
-sql
SET names utf8mb4 COLLATE utf8mb4_general_ci;
SHOW VARIABLES LIKE '%char%';
SHOW VARIABLES LIKE '%collat%';
SHOW VARIABLES LIKE 'sql_mode';
CREATE DATABASE IF NOT EXISTS test DEFAULT CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci;
SHOW CREATE DATABASE test \G
CREATE TABLE IF NOT EXISTS test.t1 (id int unsigned auto_increment primary key, text varchar(255)) CHARSET utf8mb4 COLLATE utf8mb4_general_ci;
SHOW CREATE TABLE test.t1 \G
INSERT INTO test.t1 (text) VALUES (concat('aaa', x'eda0bdedb98f', 'bbb'));
SHOW WARNINGS;
SELECT *, hex(text) FROM test.t1;
Results from MySQL 5.7:
mysql> INSERT INTO test.t1 (text) VALUES (concat('aaa', x'eda0bdedb98f', 'bbb'));
Query OK, 1 row affected (0.01 sec)
mysql> SHOW WARNINGS;
Empty set (0.00 sec)
mysql> SELECT *, hex(text) FROM test.t1;
+----+--------------+--------------------------+
| id | text | hex(text) |
+----+--------------+--------------------------+
| 1 | aaa������bbb | 616161EDA0BDEDB98F626262 |
+----+--------------+--------------------------+
1 row in set (0.00 sec)
Results from MySQL 8.0:
mysql> INSERT INTO test.t1 (text) VALUES (concat('aaa', x'eda0bdedb98f', 'bbb'));
ERROR 1366 (HY000): Incorrect string value: '\xED\xA0\xBD\xED\xB9\x8F...' for column 'text' at row 1
mysql> SHOW WARNINGS;
+-------+------+----------------------------------------------------------------------------------+
| Level | Code | Message |
+-------+------+----------------------------------------------------------------------------------+
| Error | 1366 | Incorrect string value: '\xED\xA0\xBD\xED\xB9\x8F...' for column 'text' at row 1 |
+-------+------+----------------------------------------------------------------------------------+
1 row in set (0.00 sec)
mysql> SELECT *, hex(text) FROM test.t1;
Empty set (0.00 sec)
Execution log for both MySQL 5.7 and 8.0 + SHOW VARIABLES
statement: https://pastebin.com/MK0KUaV1
Alessandro Cattaneo
(19 rep)
May 20, 2024, 10:56 PM
• Last activity: May 29, 2024, 01:20 PM
1
votes
1
answers
905
views
sp_send_dbmail is populating 00 bits between all characters in a result set
I'm using sp_send_dbmail in a SQL agent job triggered nightly. The job queries our database and checks for product price updates, if there are some, it will send them in an email as an attachment to our e-commerce vendor. They have some automatic processes they will update our e-commerce platform, h...
I'm using sp_send_dbmail in a SQL agent job triggered nightly. The job queries our database and checks for product price updates, if there are some, it will send them in an email as an attachment to our e-commerce vendor. They have some automatic processes they will update our e-commerce platform, however their automated process cannot deal with the file provided by sp_send_dbmail.
sp_send_dbmail seems to be putting null characters between all characters in the result set.
See result of opening the csv in a hex editor:
Viewed through a text editor I see as expected:
sp_send_dbmail query here:
SET @FileName = 'Update_' + CONVERT(VARCHAR(12),GETDATE),105) '.csv'
SET @Query = 'Some Query'
EXEC msdb.dbo.sp_send_dbmail
@recipients = 'me@domain.com'
@query = @Query
@attach_query_result_as_file = 1
@query_attachment_filename = @FileName
@query_result_separator = ','
@query_result_no_padding = 1
END
What's going on here?
****EDIT FOR ADDITIONAL CLARIFICATION****
Using Microsoft SQL Server 2008 R2
Collation is Latin1_General_CI_AS
Leaving no_padding out leaves several trailing spaces in each returned field. each space (20) separated by a null (00)



Andrew
(13 rep)
Jan 10, 2017, 04:01 AM
• Last activity: May 27, 2024, 06:47 PM
4
votes
2
answers
11931
views
Nvarchar to varchar to nvarchar characters conversion
I had a table filled with some greek text rows as `nvarchar(2000)`. Lately, I altered the type of the column to `varchar(4000)` but I realized that some greek characters were shown as question marks. So, I tried to change back to `nvarchar(4000)` in order to fix it as I suppose that the characters'...
I had a table filled with some greek text rows as
nvarchar(2000)
.
Lately, I altered the type of the column to varchar(4000)
but I realized that some greek characters were shown as question marks.
So, I tried to change back to nvarchar(4000)
in order to fix it as I suppose that the characters' unicode still remains the same.
I am just wondering, is there any way to fix this instead of restoring the backup I created before altering the table?
George Chondrompilas
(143 rep)
Sep 9, 2016, 06:32 PM
• Last activity: May 27, 2024, 06:47 PM
3
votes
1
answers
4565
views
How to insert a Unicode character verbose into a varchar DB?
I need to insert this character '●' into a `VARCHAR` column of a MSSQL database with collation set as `SQL_Latin1_General_CP1_CI_AS` (or at least mock what my Python + Windows MSSQL Driver might have done). I tried to insert this from my client, but it gets inserted as `?`. The hex value as checked...
I need to insert this character '●' into a
VARCHAR
column of a MSSQL database with collation set as SQL_Latin1_General_CP1_CI_AS
(or at least mock what my Python + Windows MSSQL Driver might have done). I tried to insert this from my client, but it gets inserted as ?
.
The hex value as checked in Python is \xe2\x97\x8f
which is binary 226, 151, 143.
The collation spec says that 226 is defined but 143 and 151 is undefined. So my best would be to insert 226?
The reason I am doing this exercise is that our app was inserting this character into DB earlier. At this point, I don't know how it went in because but it went through a Python App with Windows MSSQL Driver and our DB Team wrote a script to correct it and apparently it got replaced with "?".
Now I am just trying to reproduce this scenario to ask them why it is getting replaced with "?" because we are replacing it as empty in the App. Since my App already has a fix for this issue (we remove anything greater than 127), I am trying to mock this directly from back-end to prove that it is getting replaced with '?' which is actually my claim and I think it is done because it says "COLLATE" in the update query.
UPDATE pr_response
SET nur_respon =
REPLACE (nur_respon,
SUBSTRING(nur_respon, PATINDEX('%[^ !-~]%' COLLATE Latin1_General_BIN, nur_respon), 1),
'')
WHERE PATINDEX('%[^ !-~]%' COLLATE Latin1_General_BIN, nur_respon) > 0
Is this a fine script to remove Non-ASCII-Data > 127? Can someone explain this query in plain English?
Nishant
(899 rep)
Nov 20, 2018, 12:42 PM
• Last activity: May 27, 2024, 06:47 PM
0
votes
1
answers
792
views
Generating MD5 Hash for NVARCHAR and VARCHAR columns across different databases
I am building a utility that generates a single MD5 hash across all CONCATENATED column values for any table row. To make this work, I have to eliminate NULL values using COALESCE and must CAST NUMBER and DATE types to VARCHAR using consistent patterns. The purpose of this utility is to compare all...
I am building a utility that generates a single MD5 hash across all CONCATENATED column values for any table row. To make this work, I have to eliminate NULL values using COALESCE and must CAST NUMBER and DATE types to VARCHAR using consistent patterns.
The purpose of this utility is to compare all the data across two different databases as part of a data migration. We wanted to make sure all of our converted ETL code (stored procedures) produced the same results with 100% accuracy after porting it from Netezza to Azure Synapse Analytics.
For the most part, this works extremely well and we were able to identify numerous bugs using this approach but still, this process isn't perfect ... We cannot use this with BLOB columns and we sometimes get slight differences with FLOAT types. But the real headache is how NVARCHAR is hashed differently between Netezza and Azure Synapse Analytics.
Here is an example... First, I will demonstrate what Netezza gives me.
show nz_encoding
-- NOTICE: UTF8
show server_encoding
-- Current server encoding is LATIN9
CREATE TABLE ENCODING_TEST (
VALUE1 CHARACTER VARYING(255),
VALUE2 NATIONAL CHARACTER VARYING(255)
);
-- The command completed successfully
INSERT INTO ENCODING_TEST (VALUE1, VALUE2) VALUES('très bien', 'très bien');
-- 1 rows affected
SELECT RAWTOHEX(HASH('très bien', 0)) FROM ENCODING_TEST
-- E5D128AFD34139A261C507DA18B3C558
SELECT RAWTOHEX(HASH(VALUE1, 0)) FROM ENCODING_TEST
-- E5D128AFD34139A261C507DA18B3C558
SELECT RAWTOHEX(HASH(VALUE2, 0)) FROM ENCODING_TEST
-- A54489E883CE7705CDBE1FDAA3AA8DF4
SELECT RAWTOHEX(HASH(CAST(VALUE2 AS VARCHAR(255)), 0)) FROM ENCODING_TEST
-- E5D128AFD34139A261C507DA18B3C558
And here is what Azure Synapse Analytics gives me...
CREATE TABLE WZ_BI.ENCODING_TEST (
VALUE1 CHARACTER VARYING(255),
VALUE2 NATIONAL CHARACTER VARYING(255)
);
-- Commands completed successfully.
INSERT INTO WZ_BI.ENCODING_TEST (VALUE1, VALUE2) VALUES('très bien', 'très bien');
-- (1 row affected)
SELECT HASHBYTES ('MD5', 'très bien') FROM WZ_BI.ENCODING_TEST
-- 0xE5D128AFD34139A261C507DA18B3C558
SELECT HASHBYTES ('MD5', VALUE1) FROM WZ_BI.ENCODING_TEST
-- 0xE5D128AFD34139A261C507DA18B3C558
SELECT HASHBYTES ('MD5', VALUE2) FROM WZ_BI.ENCODING_TEST
-- 0xC43534A6812499038457EDF545834866
SELECT HASHBYTES ('MD5', CAST(VALUE2 AS VARCHAR(255))) FROM WZ_BI.ENCODING_TEST
-- 0xE5D128AFD34139A261C507DA18B3C558
The question I have is how come the MD5 hash for the NVARCHAR column in Netezza is different than the MD5 hash of the same type and value in Azure Synapse Analytics? I mean it treats the VARCHAR types the same? I really do not want to have to explicitly CAST all NVARCHAR types to VARCHAR to get these to work, but I have not found any other way to make them equivalent.
What am I missing here?
Lauren_G
(69 rep)
Jun 29, 2023, 02:38 AM
• Last activity: Jan 12, 2024, 02:05 AM
0
votes
1
answers
612
views
sp_executesql changes non standard characters to?
Am trying to use `sp_executesql` to run commands but I've come across an issue where in non-english characters (aka Chinese & Japanese) are changed to ??? when ran. sql server 2008R2 & 2012 **Example Code** DECLARE @text nvarchar(30) = N'select ''J こんにちは C 你好''' SELECT @text SELECT N'J こんにちは C 你好' e...
Am trying to use
sp_executesql
to run commands but I've come across an issue where in non-english characters (aka Chinese & Japanese) are changed to ??? when ran.
sql server 2008R2 & 2012
**Example Code**
DECLARE @text nvarchar(30) = N'select ''J こんにちは C 你好'''
SELECT @text
SELECT N'J こんにちは C 你好'
exec sp_executesql @text
exec sp_executesql N'select ''J こんにちは C 你好'''
*J= Japanese 'Hi' C = Chinese 'Hi' as processed by google translate for testing purposes*
**Output**
------------------------------
select 'J こんにちは C 你好'
------------
J こんにちは C 你好
------------
J ????? C ??
------------
J ????? C ??
As you can see when ran naturally or the code is selected the output is returned as expected, however once it is passed through sp_executesql it returns as ? characters. Its worth noting if you attempt to send a varchar instead of an nvarchar it errors out so it defiantly uses unicode.
Have tried a few different collations for the input but that doesn't seem to change anything
Ste Bov
(2101 rep)
Apr 11, 2018, 09:11 AM
• Last activity: Nov 3, 2023, 09:37 AM
9
votes
0
answers
758
views
Inconsistent equality checking of Unicode text in Postgres 9.5?
Using PostgreSQL 9.5. It seems like Postgres is being inconsistent with how it compares strings similar to `لى`. A unique constraint is considering some strings equal that `GROUP BY` is considering different. I have a select query that is using `GROUP BY` on a `TEXT` column and an aggregate function...
Using PostgreSQL 9.5. It seems like Postgres is being inconsistent with how it compares strings similar to
لى
. A unique constraint is considering some strings equal that GROUP BY
is considering different.
I have a select query that is using GROUP BY
on a TEXT
column and an aggregate function on some other column to ensure uniqueness of the TEXT
column in the output, and I'm inserting the results into a table with a primary key (and thus unique) constraint on the TEXT
column.
Looks similar to this; I've just changed the table names for simplicity:
INSERT INTO mytable ( % mytable has string TEXT PRIMARY KEY, score INT
SELECT
sq1.string string, sq2.score / sq1.score
FROM
(
SELECT n.string string, SUM(n.score) score
FROM
othertable1 n % has string TEXT (non-unique) and score INT
GROUP BY string
) sq1,
(
SELECT n.string string, SUM(n.score) score
FROM
othertable2 n % has string TEXT (non-unique) and score INT
GROUP BY string
) sq2
WHERE sq1.string = sq2.string
ORDER BY score DESC LIMIT 100000
);
This should never fail, right? It fails: violation of unique constraint due on key لى
.
And it has worked many times before with other data sets containing millions of rows in PG 9.3; I don't know whether لى
was in the data back then. I know Arabic has decorations you can put on the letters, so I wonder if that's tripping it up.
Does anyone have an alternative explanation, or should I report this as a bug once I can reproduce it more easily?
UPDATE: Confirmed that the query runs successfully on a PostgreSQL 9.3 server with the same data. There are some moving parts here, so I'm trying to find exactly what the problematic strings are so I can make a simple list of queries anyone can run to expose a bug.
UPDATE 2: Argh, I can't get my database to give me a set of strings I can copy into a table and expose a bug. I've been trying to do it with COPY TO
. Something along the way keeps stripping the Arabic text of the differences that are making it fail, I think. But I tried a simpler query, and it's also failing. It's more obvious that this should work:
INSERT INTO mytable ( % mytable has string TEXT PRIMARY KEY, score INT
SELECT n.string string, SUM(n.score) score
FROM
othertable2 n % has string TEXT (non-unique) and score INT
GROUP BY string
);
I'm still working towards getting something others can try because, of course, a bug report is useless if I say that it only works on my data.
UPDATE 3: I ran it again with different data and ran into the same problem with Cyrillic characters У
and В
. Making a table containing them didn't reveal anything. Same problem as with the Arabic text, I think. Something is getting stripped along the way.
UPDATE 4: This is definitely a bug. I'm still trying to figure out how to report this. I found a workaround:
WITH glitch(string, score) AS (
SELECT n.string string, SUM(n.score) score
FROM
othertable2 n % has string TEXT (non-unique) and score INT
GROUP BY string
)
INSERT INTO mytable ( % mytable has string TEXT PRIMARY KEY, score INT
SELECT DISTINCT ON(string) * FROM glitch
);
sudo
(1061 rep)
Feb 21, 2016, 06:49 AM
• Last activity: Aug 31, 2023, 06:57 AM
5
votes
2
answers
26234
views
Why do I get incorrect characters when decoding a Base64 string to NVARCHAR in SQL Server?
I've been looking at how to decode Base64 with SQL Server and after searching around many solutions online (some from here) seem to be based upon this sort of method. SELECT CAST(CAST('Base64StringHere' as XML ).value('.','varbinary(max)') AS VARCHAR(250)) When I have ASCII text this works perfectly...
I've been looking at how to decode Base64 with SQL Server and after searching around many solutions online (some from here) seem to be based upon this sort of method.
SELECT CAST(CAST('Base64StringHere' as XML ).value('.','varbinary(max)') AS VARCHAR(250))
When I have ASCII text this works perfectly. However when I have the following French text it gets corrupted (presumably due to the restrictions of VARCHAR).
Où est le café le plus proche?
T8O5IGVzdCBsZSBjYWbDqSBsZSBwbHVzIHByb2NoZT8=
And provides the following output.
Où est le café le plus proche?
I thought the relatively simple fix would be to change the
CAST
to NVARCHAR
but this results in corruption again.
SELECT CAST(CAST('T8O5IGVzdCBsZSBjYWbDqSBsZSBwbHVzIHByb2NoZT8=' as XML ).value('.','varbinary(max)') AS NVARCHAR(250) )
썏₹獥⁴敬挠晡꧃氠汰獵瀠潲档㽥
My search engine skills may be failing me, but I cannot seem to find someone else who has my problem.
Any thoughts?
Geesh_SO
(153 rep)
Jul 13, 2018, 02:45 PM
• Last activity: Aug 13, 2023, 08:41 AM
6
votes
2
answers
2548
views
Byte ordering for multibyte characters in SQL Server versus Oracle
I am currently in the process of migrating data from Oracle to SQL Server and I'm encountering an issue trying to validate the data post-migration. Environment Details: * Oracle 12 - AL32UTF8 character set * Client - NLS_LANG - WE8MSWIN1252 * VARCHAR2 field SQL Server 2016 * Latin1_General_CI_AS col...
I am currently in the process of migrating data from Oracle to SQL Server and I'm encountering an issue trying to validate the data post-migration.
Environment Details:
* Oracle 12 - AL32UTF8 character set
* Client - NLS_LANG - WE8MSWIN1252
* VARCHAR2 field
SQL Server 2016
* Latin1_General_CI_AS collation
* NVARCHAR field
I'm using DBMS_CRYPTO.HASH on Oracle to generate a checksum of the whole row, then copying to SQL and using HASHBYTES to generate a checksum of the whole row, which I'm then comparing to validate the data matches.
The checksums match for all rows, except those with multibyte characters.
For example, rows with this character: ◦ do not match in the checksums, even though the data is transferred correctly. When I use DUMP in Oracle or convert to VARBINARY in SQL Server the data matches exactly except for the bytes for this character.
In SQL Server, the bytes are 0xE625 and in Oracle they are 0x25E6.
Why are they ordered differently, and is there a reliable way to convert one to the other to ensure the checksum at the other end matches for strings with multi-byte characters?
HandyD
(10432 rep)
Feb 7, 2019, 03:12 AM
• Last activity: Jun 27, 2023, 05:38 PM
3
votes
2
answers
17998
views
SQL Server error JSON text is not properly formatted. Unexpected character '?' is found at position 21
I am getting below error while parsing any JSON string with special character or emoji. How to parse this? >JSON text is not properly formatted. Unexpected character '?' is found at position 21. Sample code: DECLARE @OriginalJSON NVARCHAR(4000), @newvalue varchar(30),@path varchar(20) Set @OriginalJ...
I am getting below error while parsing any JSON string with special character or emoji. How to parse this?
>JSON text is not properly formatted. Unexpected character '?' is found at position 21.
Sample code:
DECLARE @OriginalJSON NVARCHAR(4000), @newvalue varchar(30),@path varchar(20)
Set @OriginalJSON='{
"test2": "ROYD\ロイド",
"test1": 0
}'
Select
@OriginalJSON,
JSON_VALUE(CONVERT(VARCHAR(MAX),@OriginalJSON), '$.test1') AS 'Updated JSON';
Is there any way to parse this data without modifying it? This is just sample code. I need to query from a table and cannot modify the data.
Mish
(33 rep)
Feb 26, 2023, 04:50 PM
• Last activity: Feb 27, 2023, 03:29 PM
4
votes
1
answers
3175
views
Importing Chinese characters via SSIS
I'm trying to import file that contains Chinese characters into a table. File encoding is Big5 (Traditional). Here is a sample file like one that I need to import: https://www.pastiebin.com/5d7782d9b63fa Table where file has to be imported into has such a structure: create table dbo.Test ( AccountId...
I'm trying to import file that contains Chinese characters into a table. File encoding is Big5 (Traditional).
Here is a sample file like one that I need to import: https://www.pastiebin.com/5d7782d9b63fa
Table where file has to be imported into has such a structure:
create table dbo.Test (
AccountId numeric(18, 0) not null
, Province nvarchar(50) null
, City nvarchar(50) collate Chinese_Hong_Kong_Stroke_90_CI_AI
, Country nvarchar(50) null
, Gender nvarchar(50) null
)
----------
If I import using
----------
But if I try to import file using SSIS, then Chinese characters are not parsed correctly.
In Flat File Source I use DataType string [DT_STR] and CodePage 950. Then convert it to Unicode string [DT_WSTR].
Here is how text is imported into table:
As we can see some characters are parsed correctly and some are not.
What am I missing?
OPENROWSET/BULK
then all the data is transferred correctly:
select AccountId, Province, City, Country, Gender
from openrowset (
bulk 'C:\chinese_sample.dat'
, firstrow = 1
, formatfile = 'C:\chinese_sample.xml'
) t
Here is a format file that I use: https://www.pastiebin.com/5d7783396f9e4



GriGrim
(322 rep)
Sep 10, 2019, 11:50 AM
• Last activity: Jan 23, 2023, 03:11 AM
0
votes
1
answers
544
views
Unable to translate Unicode character \\uDCC9 at index 5 to specified code page
I'm using MariaDB (and Adminer) > MySQL version: 5.5.5-10.10.2-MariaDB-1:10.10.2+maria~ubu2204 through PHP extension MySQLi And this is the `show create table Pages`: ``` CREATE TABLE `Pages` ( `Id` bigint(20) NOT NULL AUTO_INCREMENT, `Guid` uuid NOT NULL DEFAULT uuid(), `Name` varchar(400) CHARACTE...
I'm using MariaDB (and Adminer)
> MySQL version: 5.5.5-10.10.2-MariaDB-1:10.10.2+maria~ubu2204 through PHP extension MySQLi
And this is the
show create table Pages
:
CREATE TABLE Pages
(
Id
bigint(20) NOT NULL AUTO_INCREMENT,
Guid
uuid NOT NULL DEFAULT uuid(),
Name
varchar(400) CHARACTER SET utf8mb3 COLLATE utf8mb3_general_ci NOT NULL,
AcceptsComment
bit(1) NOT NULL DEFAULT b'1',
Key
varchar(400) CHARACTER SET utf8mb3 COLLATE utf8mb3_general_ci NOT NULL DEFAULT uuid(),
IsVital
bit(1) DEFAULT NULL,
Slug
varchar(400) CHARACTER SET utf8mb3 COLLATE utf8mb3_general_ci NOT NULL DEFAULT uuid(),
PRIMARY KEY (Id
),
UNIQUE KEY IX_Pages_Unique_Guid
(Guid
),
UNIQUE KEY IX_Pages_Unique_Key
(Key
),
UNIQUE KEY IX_Pages_Unique_Slug
(Slug
),
CONSTRAINT Ck_Pages_NonEmptyGuid
CHECK (Guid
'00000000-0000-0000-0000-000000000000')
) ENGINE=InnoDB AUTO_INCREMENT=4 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_general_ci
I'm using [Bogus](https://github.com/bchavez/Bogus) to insert some fake data using MySqlBulkCopy
. But when I call bulkCopy.WriteToServer(table)
, I get this error:
> Unable to translate Unicode character \\uDCC9 at index 5 to specified code page.
I realized that [\uDCC9](https://charbase.com/dcc9-unicode-invalid-character) is a � character.
However, I use UTF for all of my columns.
I can't understand why this happens. What have I done wrong?
Saeed Neamati
(1515 rep)
Dec 15, 2022, 12:25 PM
• Last activity: Dec 15, 2022, 06:19 PM
4
votes
1
answers
652
views
Fix corrupted utf8 nvarchar value that was inserted without N prefix
Is it possible to restore the UTF8 data from a corrupted nvarchar, that was used without the `N` prefix? For example, in the following snippet, I'd like `@n1_fixed` to have the correct value based on `@n1`: declare @n1 nvarchar(10) = 'Ḿấxiḿứś', @n2 nvarchar(10) = N'Ḿấxiḿứś'; declare @n1_fixed nvarch...
Is it possible to restore the UTF8 data from a corrupted nvarchar, that was used without the
N
prefix?
For example, in the following snippet, I'd like @n1_fixed
to have the correct value based on @n1
:
declare
@n1 nvarchar(10) = 'Ḿấxiḿứś',
@n2 nvarchar(10) = N'Ḿấxiḿứś';
declare
@n1_fixed nvarchar(10); -- somehow make it have the correct value, based on @n1
select iif(@n1_fixed = @n2, 1, 0)
HeyJude
(467 rep)
Oct 20, 2022, 09:16 AM
• Last activity: Oct 20, 2022, 11:28 AM
1
votes
1
answers
15786
views
How to replace multiple unicode characters
I have identified a couple of unicode characters in my database by means of the following query: SELECT name FROM table WHERE name NOT LIKE '%'+CAST(name AS VARCHAR(1000))+'%' ...with which I get stuff like this: MORRISON CPA LIMITED E¦¬Ç¦®EŒÆŒƑȨˆÅ¸...
I have identified a couple of unicode characters in my database by means of the following query:
SELECT name FROM table WHERE name NOT LIKE '%'+CAST(name AS VARCHAR(1000))+'%'
...with which I get stuff like this:
MORRISON CPA LIMITED E¦¬Ç¦®EŒÆŒƑȨˆÅ¸«Äº‹Å‹™Æ‰€ÆŒ‰E™Å…¬Å¸
Is it possible to get the collation when using a REPLACE function? Something like:
UPDATE table SET name = REPLACE(name,NOT LIKE '%'+CAST(name AS VARCHAR(1000))+'%','')
I know I can individually replace them
REPLACE(name,'Æ','')
but is there a way to replace everything that is unicode?
Edit: I don't know why but I get less rows if I use
SELECT name FROM table WHERE name != CAST(name AS VARCHAR(1000))
user151179
Jul 3, 2019, 07:03 PM
• Last activity: Aug 24, 2022, 12:08 PM
35
votes
1
answers
49989
views
How do I set a SQL Server Unicode / NVARCHAR string to an emoji or Supplementary Character?
I want to set a Unicode string variable to particular character based on its Unicode code point. I want to use a code point beyond 65535, but the SQL Server 2008 R2 database has a collation of `SQL_Latin1_General_CP1_CI_AS`. According to [Microsoft's NCHAR documentation](https://msdn.microsoft.com/e...
I want to set a Unicode string variable to particular character based on its Unicode code point.
I want to use a code point beyond 65535, but the SQL Server 2008 R2 database has a collation of
SQL_Latin1_General_CP1_CI_AS
.
According to [Microsoft's NCHAR documentation](https://msdn.microsoft.com/en-us/library/ms182673.aspx) , the NCHAR
function takes an integer as follows:
> *integer_expression*
>
> When the collation of the database does not contain the supplementary
> character (SC) flag, this is a positive whole number from 0 through
> 65535 (0 through 0xFFFF). If a value outside this range is specified,
> NULL is returned. For more information about supplementary characters,
> see Collation and Unicode Support.
>
> When the collation of the database supports the supplementary
> character (SC) flag, this is a positive whole number from 0 through
> 1114111 (0 through 0x10FFFF). If a value outside this range is
> specified, NULL is returned.
So this code:
SELECT NCHAR(128512);
Returns NULL
in this database.
I would like it to return the same as this:
SELECT N'😀';
How can I set a Unicode string variable (e.g. nvarchar) to an emoji using code (without using the actual emoji character) in a database where the collation "does not contain the supplementary character (SC) flag"?
[Full list of emoji Unicode code points](http://www.unicode.org/emoji/charts/full-emoji-list.html)
(Ultimately I want any character to work. I just chose emoji for ease of reference.)
(Although the server is SQL Server 2008 R2, I'm also curious about any solutions for later versions.)
Assuming that there is no way, could I reference an inline user-defined function in another database which had an appropriate collation?
How do I find a collation which has the "supplementary character" flag?
This returns no records on our server:
SELECT * FROM sys.fn_helpcollations()
WHERE name LIKE 'SQL%[_]SC';
It seems like SQL Server 2012 introduced Latin1_General_100_CI_AS_SC
which would work. Can you install collations on older instances?
Collation References:
- [Answer to What is the difference between char, nchar, varchar, and nvarchar in SQL Server?](https://stackoverflow.com/a/17335317/2266979)
- [Microsoft's Supplementary Characters Collation Information](https://msdn.microsoft.com/en-us/library/ms143726.aspx#Anchor_2)
- [Microsoft's SQL Server 2008 R2 Collation List](https://msdn.microsoft.com/en-us/library/ms143508(v=sql.105).aspx)
Is there an explanation for why, regardless of collation, SQL Server can understand and deal with the extended characters except from the perspective of NCHAR
?
Riley Major
(1965 rep)
May 25, 2016, 08:02 PM
• Last activity: Jun 24, 2022, 03:35 PM
Showing page 1 of 20 total questions