Sample Header Ad - 728x90

How to store a bunch of columns and where you don't know how many there will be, the columns change, and new columns are introduced

0 votes
1 answer
163 views
So the so source of the data is the internet, and random bots that submit values to my apache server. Periodically new columns appear, but they are not always present. The other problem is then you have a bunch of NULLs which you don't want to store at all if possible. Later, I would statistically analyze the data, but we are just trying to store it here. Currently there are 99 columns, with a minimum of 27 and a max of 59, but the max will probably go up slowly. I am currently using a single master table, but the downside is there are tons of NULLs stored for each record. Lots of wasted space. Presently the most common value is 64 NULLs per row. So I had an idea, and was wondering is anyone can improve on it. You need a master table, and every submitted column has a lookup table with an index and a value. master index num_columns table_index 1 27 1 There would be table for simplicity named tables27 to table100 table27 would be index , columns_names_csv, col1_index,col2_index,...col27_index table28 would be much the same except index, columns_names_csv, col1_index,col2_index,...col28_index All the indexes would be integers used to point to the correct values in the corresponding lookup table. Since they would all be integers there would be no need to worry about data types. So in addition to the master table and 99 (and growing) lookup tables you would have tables27 through, currently, 59 but that number could eventually increase. Is there a better way to do this? For fun here's the table +-----------------------------------+---------------------+------+-----+---------+----------------+ | Field | Type | Null | Key | Default | Extra | +-----------------------------------+---------------------+------+-----+---------+----------------+ | id | int(11) | NO | PRI | NULL | auto_increment | | lstREMOTE_ADDR | int(11) | NO | | NULL | | | lstCONTENT_LENGTH | int(11) | YES | | NULL | | | lstREQUEST_TIME_FLOAT | datetime(4) | NO | | NULL | | | lstUNIQUE_ID | varchar(128) | NO | | NULL | | | lstSERVER_PORT | int(5) | NO | | NULL | | | lstREMOTE_PORT | int(5) | NO | | NULL | | | lstREDIRECT_UNIQUE_ID | int(10) unsigned | YES | | NULL | | | lstREDIRECT_REQUEST_METHOD | int(10) unsigned | YES | | NULL | | | lstREDIRECT_STATUS | int(10) unsigned | YES | | NULL | | | lstHTTP_HOST | int(10) unsigned | YES | | NULL | | | lstHTTP_USER_AGENT | int(10) unsigned | YES | | NULL | | | lstHTTP_ACCEPT | int(10) unsigned | YES | | NULL | | | lstHTTP_ACCEPT_LANGUAGE | int(10) unsigned | YES | | NULL | | | lstHTTP_ACCEPT_ENCODING | int(10) unsigned | YES | | NULL | | | lstHTTP_CONNECTION | int(10) unsigned | YES | | NULL | | | lstHTTP_COOKIE | int(10) unsigned | YES | | NULL | | | lstHTTP_UPGRADE_INSECURE_REQUESTS | int(10) unsigned | YES | | NULL | | | lstHTTP_CACHE_CONTROL | int(10) unsigned | YES | | NULL | | | lstPATH | tinyint(3) unsigned | YES | | NULL | | | lstSERVER_SIGNATURE | tinyint(3) unsigned | YES | | NULL | | | lstSERVER_SOFTWARE | int(10) unsigned | YES | | NULL | | | lstSERVER_NAME | int(10) unsigned | YES | | NULL | | | lstSERVER_ADDR | int(10) unsigned | YES | | NULL | | | lstDOCUMENT_ROOT | tinyint(3) unsigned | YES | | NULL | | | lstREQUEST_SCHEME | int(10) unsigned | YES | | NULL | | | lstCONTEXT_PREFIX | int(10) unsigned | YES | | NULL | | | lstCONTEXT_DOCUMENT_ROOT | int(10) unsigned | YES | | NULL | | | lstSERVER_ADMIN | int(10) unsigned | YES | | NULL | | | lstSCRIPT_FILENAME | tinyint(3) unsigned | YES | | NULL | | | lstREDIRECT_URL | int(10) unsigned | YES | | NULL | | | lstGATEWAY_INTERFACE | int(10) unsigned | YES | | NULL | | | lstSERVER_PROTOCOL | int(10) unsigned | YES | | NULL | | | lstREQUEST_METHOD | int(10) unsigned | YES | | NULL | | | lstQUERY_STRING | int(10) unsigned | YES | | NULL | | | lstREQUEST_URI | int(10) unsigned | YES | | NULL | | | lstSCRIPT_NAME | tinyint(3) unsigned | YES | | NULL | | | lstPHP_SELF | int(10) unsigned | YES | | NULL | | | lstREDIRECT_spider | int(10) unsigned | YES | | NULL | | | lstspider | int(10) unsigned | YES | | NULL | | | lstHTTP_PRAGMA | int(10) unsigned | YES | | NULL | | | lstHTTP_FROM | int(10) unsigned | YES | | NULL | | | lstREDIRECT_QUERY_STRING | int(10) unsigned | YES | | NULL | | | lstHTTP_REFERER | int(10) unsigned | YES | | NULL | | | lstHTTP_DNT | int(10) unsigned | YES | | NULL | | | lstHTTP_X_REQUESTED_WITH | int(10) unsigned | YES | | NULL | | | lstCONTENT_TYPE | int(10) unsigned | YES | | NULL | | | lstREDIRECT_ERROR_NOTES | int(10) unsigned | YES | | NULL | | | lstHTTP_X_FORWARDED_FOR | int(10) unsigned | YES | | NULL | | | lstHTTP_IF | int(10) unsigned | YES | | NULL | | | lstREDIRECT_HTTPS | int(10) unsigned | YES | | NULL | | | lstREDIRECT_SSL_TLS_SNI | int(10) unsigned | YES | | NULL | | | lstHTTPS | int(10) unsigned | YES | | NULL | | | lstSSL_TLS_SNI | int(10) unsigned | YES | | NULL | | | lstPHP_AUTH_USER | int(10) unsigned | YES | | NULL | | | lstPHP_AUTH_PW | int(10) unsigned | YES | | NULL | | | lstHTTP_AMP_CACHE_TRANSFORM | int(10) unsigned | YES | | NULL | | | lstHTTP_ACCEPT_CHARSET | int(10) unsigned | YES | | NULL | | | lstHTTP_CLIENT_IP | int(10) unsigned | YES | | NULL | | | lstHTTP_X_REMOTE_IP | int(10) unsigned | YES | | NULL | | | lstHTTP_X_ORIGINATING_IP | int(10) unsigned | YES | | NULL | | | lstHTTP_X_REMOTE_ADDR | int(10) unsigned | YES | | NULL | | | lstHTTP_X_REAL_IP | int(10) unsigned | YES | | NULL | | | lstHTTP_REVERSE_VIA | int(10) unsigned | YES | | NULL | | | lstHTTP_X_VARNISH | int(10) unsigned | YES | | NULL | | | lstHTTP_X_UA_COMPATIBLE | int(10) unsigned | YES | | NULL | | | lstHTTP_X_POWERED_BY | int(10) unsigned | YES | | NULL | | | lstHTTP_TE | int(10) unsigned | YES | | NULL | | | lstHTTP_X_PIPER_ID | int(10) unsigned | YES | | NULL | | | lstHTTP_X_UCBROWSER_UA | int(10) unsigned | YES | | NULL | | | lstHTTP_X_WAP_PROFILE | int(10) unsigned | YES | | NULL | | | lstHTTP_X_EBO_UA | int(10) unsigned | YES | | NULL | | | lstHTTP_X_OPERAMINI_FEATURES | int(10) unsigned | YES | | NULL | | | lstHTTP_X_OPERAMINI_PHONE | int(10) unsigned | YES | | NULL | | | lstHTTP_X_OPERAMINI_ROUTE | int(10) unsigned | YES | | NULL | | | lstHTTP_DEVICE_STOCK_UA | int(10) unsigned | YES | | NULL | | | lstHTTP_X_OPERAMINI_PHONE_UA | int(10) unsigned | YES | | NULL | | | lstHTTP_FORWARDED | int(10) unsigned | YES | | NULL | | | lstHTTP_X_ACCEL_INTERNAL | int(10) unsigned | YES | | NULL | | | lstHTTP_WAP_CONNECTION | int(10) unsigned | YES | | NULL | | | lstHTTP_X_CONTENT_OPT | int(10) unsigned | YES | | NULL | | | lstHTTP_KEEP_ALIVE | int(10) unsigned | YES | | NULL | | | lstHTTP_APPID | int(10) unsigned | YES | | NULL | | | lstHTTP_CLIENTID | int(10) unsigned | YES | | NULL | | | lstHTTP_DID | int(10) unsigned | YES | | NULL | | | lstHTTP_PLATFORM | int(10) unsigned | YES | | NULL | | | lstHTTP_RISKUDID | int(10) unsigned | YES | | NULL | | | lstHTTP_SIGNTYPE | int(10) unsigned | YES | | NULL | | | lstHTTP_TRACKERID | int(10) unsigned | YES | | NULL | | | lstHTTP_WORKSPACEID | int(10) unsigned | YES | | NULL | | | lstHTTP_VIA | int(10) unsigned | YES | | NULL | | | lstHTTP_PROXY_CONNECTION | int(10) unsigned | YES | | NULL | | | lstHTTP_DEPTH | int(10) unsigned | YES | | NULL | | | lstHTTP_TRANSLATE | int(10) unsigned | YES | | NULL | | | lstHTTP_MIME_VERSION | int(10) unsigned | YES | | NULL | | | lstHTTP_X_VERMEER_CONTENT_TYPE | int(10) unsigned | YES | | NULL | | | lstHTTP_X_CNECTION | int(10) unsigned | YES | | NULL | | | lstHTTP_EXPECT | int(10) unsigned | YES | | NULL | | | lstHTTP_IF_MODIFIED_SINCE | int(10) unsigned | YES | | NULL | | +-----------------------------------+---------------------+------+-----+---------+----------------+
Asked by cybernard (151 rep)
Aug 24, 2019, 03:29 PM
Last activity: Jul 12, 2025, 10:01 AM