Sample Header Ad - 728x90

load text file into db using temp variable as place holder for complex column value

0 votes
1 answer
246 views
Assume I have a text file with tab terminated fields as below. Where value in row m, column n as vmn. Vmn is a string which could wrapped in double quota. Generally this string variable could have any readable char such as '\' or '"', such as
"Bob the "king""TAB"c:\path\to\file"TAB
Here TAB is tab char '\t' They are not as
"Bob the \"king\""TAB"c:\\path\\to\\file"TAB
header1 header2 header3 ,,headerk,,,headern
v11     v12     v13     ,,v1k    ,,,v1n


...
vm1     vm2     vm3     ,,vmk    ,,,vmn
...
Now I also have a table as create as
|column1  |column2  |,,|columnv  |,,,|columnw  |
------------------------------------------------
t1        t2         ,,tv         ,,,tw
w is not equal to n, w could larger than or smaller than n. tk = function(vi1, vi2,...) Let us look at a real sample of text file.
"Bob the "King""      01/01/1010     "$100"     "1000"      "12/12/1090"
"Jenny the Queen"     11/11/1030     "$10.2"    "100"       "11/02/1070"
I use a slow way to load the data into table as
CREATE TABLE IF NOT EXISTS STATE
(
     KeyID INT UNSIGNED auto_increment primary key,
     Names               VARCHAR(32),
     DOBs                VARCHAR(16),
     Incomes             VARCHAR(16),
     Propertys           VARCHAR(16),       
     Deaths              VARCHAR(16),
     Name                VARCHAR(16),
     Title               VARCHAR(16),
     Ages                INT,
     Income              Double,
     Property            Double,
     DOB                 Date,
     Death               Date,
     Age                 INT,
     Value               Double 
);

LOAD DATA LOCAL INFILE 'state.tsv' IGNORE INTO TABLE STATE
COLUMNS TERMINATED BY '\t'  OPTIONALLY ENCLOSED BY '"'
LINES TERMINATED BY '\n'
(
 Names,
 DOBs,
 Incomes,
 Propertys,       
 Deaths,
) 
SET
   Name   = SUBSTRING(Names, 0, POSITION("the" IN Names)),INSTR (ori_str, sub_str)
   Title  = CASE 
               WHEN INSTR (Names, "King") THEN "King" 
               WHEN INSTR (Names, "Queen") THEN "Queen" 
               ELSE "No Title" 
            END,
   DOB    = STR_TO_DATE(case when DOBs='' then null else DOBs end, '%m/%d/%Y' ),
   Death  = STR_TO_DATE(case when Deaths='' then null else Deaths end, '%m/%d/%Y' ),
   Income = NULLIF(Incomes,''),
   Property = NULLIF(Propertys,''),   
   Age = Death - DOB,
   Value = MyFunction(Income, Property);
Then I run an alter sql to drop columns like Names, Titles,DOBs, Deaths, Incomes, Propertys. Here MyFunction is a user function to calculate the value. This method is not smart and slow. I would like to have temp variables to hold the strings, then convert them in to table columns. No extra columns to drop afterwards. Any suggestion?
Asked by chans (1 rep)
Aug 11, 2021, 12:36 AM
Last activity: May 23, 2025, 06:05 PM