Sample Header Ad - 728x90

Is it possible to use OPENROWSET to import fixed width UTF8 encoded files?

9 votes
3 answers
3773 views
I have an example data file with following contents and saved with UTF8 encoding. oab~opqr öab~öpqr öab~öpqr The format of this file is fixed width with columns 1 to 3 each being allocated 1 character and column 4 reserved 5 characters. I have created an XML format file as below Disappointingly running the following SQL... SELECT * FROM OPENROWSET ( BULK 'mydata.txt', FORMATFILE = 'myformat_file.xml', CODEPAGE = '65001' ) AS X Produces the following results Col1 Col2 Col3 Col4 ---- ---- ---- ----- o a b ~opqr � � a b~öp � � a b~öp from which I conclude the LENGTH is counting bytes rather than characters. Is there any way I can get this working correctly for fixed *character* widths with UTF8 encoding? (Target environment is Azure SQL Database reading from Blob storage) NB: It was suggested in the comments that adding COLLATION="LATIN1_GENERAL_100_CI_AS_SC_UTF8" to the FIELD elements might help but the results remain unchanged with this.
Asked by Martin Smith (87941 rep)
Dec 1, 2021, 01:56 PM
Last activity: Dec 10, 2021, 08:34 PM