Sample Header Ad - 728x90

Database Administrators

Q&A for database professionals who wish to improve their database skills

Latest Questions

0 votes
1 answers
963 views
How to check user settings in ClickHouse
I have a user created by SQL command. I know there is the `getSetting()` function to check user settings. e.g. : ```sql SELECT getSetting('async_insert'); ``` But how to check other users' settings if you are a DBA? Is there any view/function/request for this purpose?
I have a user created by SQL command. I know there is the getSetting() function to check user settings. e.g. :
SELECT getSetting('async_insert');
But how to check other users' settings if you are a DBA? Is there any view/function/request for this purpose?
Mikhail Aksenov (430 rep)
Dec 12, 2023, 02:20 PM • Last activity: Aug 6, 2025, 09:06 AM
0 votes
1 answers
35 views
Clickhouse - Oracle ODBC connection error
I am trying to create connection between my oracle and clickhouse databases, so I could query oracle through ch like this: ```SELECT * FROM odbc('DSN=OracleODBC-21', 'sys', 'test')```. I have successfully installed unixODBC, Oracle Instant Client, Oracle ODBC for client. Also, I configured my ```.od...
I am trying to create connection between my oracle and clickhouse databases, so I could query oracle through ch like this:
* FROM odbc('DSN=OracleODBC-21', 'sys', 'test')
. I have successfully installed unixODBC, Oracle Instant Client, Oracle ODBC for client. Also, I configured my
.odbc.ini
and
.ini
, so I can access oracle:
[oracle@host ~]$ isql -v OracleODBC-21
+---------------------------------------+
| Connected!                            |
...
SQL> select * from sys.test;
+-----------------------------------------+-----------------------------------------------------------------------------------------------------+
| ID                                      | DATA                                                                                                |
+-----------------------------------------+-----------------------------------------------------------------------------------------------------+
| 0                                       | 123                                                                                                 |
+-----------------------------------------+-----------------------------------------------------------------------------------------------------+
User
also can do this, but with some envs:
[oracle@host ~]$ sudo -u clickhouse bash -c "export LD_LIBRARY_PATH=/opt/oracle/instantclient_21_19; isql -v OracleODBC-21"
+---------------------------------------+
| Connected!                            |
...
But when I am trying to query oracle in ch:
host :) select * from odbc('DSN=OracleODBC-21','sys','test');

SELECT *
FROM odbc('DSN=OracleODBC-21', 'sys', 'test')

Query id: d263cc54-bd51-4a97-94c0-085177149947


Elapsed: 9.529 sec.

Received exception from server (version 25.6.2):
Code: 86. DB::Exception: Received from localhost:9000. DB::HTTPException. DB::HTTPException: Received error from remote server http://127.0.0.1:9018/columns_info?use_connection_pooling=1&version=1&connection_string=DSN%3DOracleODBC-21&schema=sys&table=test&external_table_functions_use_nulls=1 . HTTP status code: 500 'Internal Server Error', body length: 267 bytes, body: 'Error getting columns from ODBC 'std::exception. Code: 1001, type: nanodbc::database_error, e.what() = contrib/nanodbc/nanodbc/nanodbc.cpp:1275: IM004: [unixODBC][Driver Manager]Driver's SQLAllocHandle on SQL_HANDLE_HENV failed  (version 25.1.5.31 (official build))'
'. (RECEIVED_ERROR_FROM_REMOTE_IO_SERVER)
Will be grateful for any advice.
pashkin5000 (101 rep)
Jul 22, 2025, 05:58 AM • Last activity: Jul 22, 2025, 06:14 PM
2 votes
1 answers
763 views
Correlated subqueries. Count Visits after Last Purchase Date
I'm pretty new to SQL and have been trying to solve this task for a while...still no luck. I would appreciate if someone here could help me out. I have a database with columns: - ClientID - VisitID - Date - PurchaseID (array) - etc. What I'm trying to achieve is to retrieve a list containing the fol...
I'm pretty new to SQL and have been trying to solve this task for a while...still no luck. I would appreciate if someone here could help me out. I have a database with columns: - ClientID - VisitID - Date - PurchaseID (array) - etc. What I'm trying to achieve is to retrieve a list containing the following data: - ClientID - Last Visit Date - First Visit Date - Last Purchase Date - Visits Count - Purchases Count - Visits After Last Purchase Count When trying to retrieve a value for Visits After Last Purchase Count this is where I am stuck. SELECT ClientID, FirstVisit, LastVisit, LastPurchaseDate, Visits, Purchases, VisitsAfterPurchase FROM ( SELECT h.ClientID, max(h.Date) AS LastVisit, min(h.Date) AS FirstVisit, count(VisitID) AS Visits FROM s7_visits AS h WHERE Date > '2017-12-01' GROUP BY h.ClientID LIMIT 100 ) ANY LEFT JOIN ( SELECT d.ClientID, max(d.Date) AS LastPurchaseDate, sum(length(d.PurchaseID)) AS Purchases, sum( ( SELECT count(x.VisitID) FROM s7_visits AS x WHERE x.ClientID = d.ClientID HAVING x.Date >= max(d.Date) )) AS VisitsAfterPurchase FROM s7_visits AS d WHERE (length(PurchaseID) > 0) AND (Date > '2017-12-01') GROUP BY d.ClientID ) USING (ClientID) The database system I'm using is Yandex ClickHouse . The USING syntax is absolutely normal for ClickHouse. It is used instead of ON clause in other RDBMSs. This query is giving me the following error: > DB::Exception: Column Date is not under aggregate function and not in GROUP BY.. Sample Data: +----------+---------+------------+------------+ | CliendID | VisitID | Date | PurchaseID | +----------+---------+------------+------------+ | 123 | 136 | 01.12.2017 | | | 123 | 522 | 05.12.2017 | | | 123 | 883 | 08.12.2017 | | | 123 | 293 | 09.12.2017 | ['345'] | | 123 | 278 | 12.12.2017 | | | 123 | 508 | 12.12.2017 | | | 123 | 562 | 15.12.2017 | | | 123 | 523 | 21.12.2017 | | | 456 | 736 | 29.11.2017 | | | 456 | 417 | 03.12.2017 | | | 456 | 950 | 04.12.2017 | | | 456 | 532 | 05.12.2017 | ['346'] | | 456 | 880 | 09.12.2017 | | | 456 | 296 | 12.12.2017 | | | 456 | 614 | 15.12.2017 | | +----------+---------+------------+------------+ And the result should be: +----------+-----------------+------------------+--------------------+--------------+-----------------+----------------------------------+ | ClientID | Last Visit Date | First Visit Date | Last Purchase Date | Visits Count | Purchases Count | Visits After Last Purchase Count | +----------+-----------------+------------------+--------------------+--------------+-----------------+----------------------------------+ | 123 | 21.12.2017 | 01.12.2017 | 09.12.2017 | 8 | 1 | 4 | | 456 | 15.12.2017 | 29.11.2017 | 05.12.2017 | 7 | 1 | 3 | +----------+-----------------+------------------+--------------------+--------------+-----------------+----------------------------------+
Edgard Gomez Sennovskaya (21 rep)
Dec 25, 2017, 07:40 PM • Last activity: Apr 24, 2025, 06:02 AM
0 votes
0 answers
23 views
PeerDB Initial Snapshot Performance Impact on Standby PostgreSQL
I have set up a Change Data Capture (CDC) pipeline using PeerDB to mirror tables from a PostgreSQL standby read replica to ClickHouse. • The PostgreSQL database contains terabytes of data. • The initial snapshot of the existing data needs to be loaded into ClickHouse. • PeerDB is configured to pull...
I have set up a Change Data Capture (CDC) pipeline using PeerDB to mirror tables from a PostgreSQL standby read replica to ClickHouse. • The PostgreSQL database contains terabytes of data. • The initial snapshot of the existing data needs to be loaded into ClickHouse. • PeerDB is configured to pull from the standby read replica. Questions: 1. How long will the initial snapshot take? Are there any benchmarks or estimations based on database size? 2. Will the initial snapshot affect the standby PostgreSQL server’s performance? • Since it is a read replica, will PeerDB’s snapshot queries (e.g., COPY, SELECT * FROM) put significant load on it? • Would it impact replication lag from the primary database? 3. Are there any best practices to optimize the initial snapshot process to minimize impact on the standby server?
Tselmen Tugsbayar (1 rep)
Mar 17, 2025, 01:43 AM • Last activity: Mar 17, 2025, 06:12 AM
3 votes
2 answers
334 views
More efficient accumulator in SQL?
I'm writing a ledger system where every transaction can have multiple classifications. For example, if someone purchases a widget for $50, I can categorize that transaction as having an account of "Revenue" and an SKU as "SKU1". Users can then select the dimensions they wish to report on, and I can...
I'm writing a ledger system where every transaction can have multiple classifications. For example, if someone purchases a widget for $50, I can categorize that transaction as having an account of "Revenue" and an SKU as "SKU1". Users can then select the dimensions they wish to report on, and I can generate aggregates. When my database has 10M+ transactions, the following query is prohibitively slow. After about 10s I receive a Memory limit exceeded error on my 8GB laptop. Thus the question: I don't actually care about the individual rows, I only care about the accumulation of these values. In my test, I only expect about 10 rows returned after aggregation. Here is a fiddle: http://sqlfiddle.com/#!17/4a7d8/10/0
select
   year,
   sum(amount),
   t1.value as account,
   t2.value as sku
from 
    transactions 
left join
    tags t1 on transactions.id = t1.transaction_id and t1.name ='account'
left join
    tags t2 on transactions.id = t2.transaction_id and t2.name = 'sku'
group by
    year,
	t1.value,
    t2.value;
Here is the query plan:
Expression ((Projection + Before ORDER BY))
  Aggregating
    Expression (Before GROUP BY)
      Join (JOIN)
        Expression ((Before JOIN + (Projection + Before ORDER BY)))
          Join (JOIN)
            Expression (Before JOIN)
              ReadFromMergeTree (default.transactions)
            Expression ((Joined actions + (Rename joined columns + (Projection + Before ORDER BY))))
              ReadFromMergeTree (default.tags)
        Expression ((Joined actions + (Rename joined columns + (Projection + Before ORDER BY))))
          ReadFromMergeTree (default.tags)
And, finally, here is the schema:
CREATE TABLE default.transactions
(
    id Int32,
    date Date,
    amount Float32
)
ENGINE = MergeTree
PRIMARY KEY id
ORDER BY id
SETTINGS index_granularity = 8192

CREATE TABLE default.tags
(
    transaction_id Int32,
    name String,
    value String,
    INDEX idx_tag_value value TYPE set(0) GRANULARITY 4,
    INDEX idx_tag_name name TYPE set(0) GRANULARITY 4
)
ENGINE = MergeTree
PRIMARY KEY (transaction_id, name)
ORDER BY (transaction_id, name)
SETTINGS index_granularity = 8192
My questions are: - Is there a different schema, or different set of Clickhouse features I might use? - Should I instead pre-compute aggregates? - Is there a different DB which can perform this kind of calculation more efficiently?
poundifdef (141 rep)
Jun 10, 2022, 01:15 PM • Last activity: Mar 11, 2025, 05:02 AM
0 votes
0 answers
242 views
Calculate the sum of minutes between statuses Clickhouse
There is a table in ClickHouse that is constantly updated, format: ``` date_time | shop_id | item_id | status | balance --------------------------------------------------------------- 2022-09-09 13:00:01 | abc | 1234 | 0 | 0 2022-09-09 13:00:00 | abc | 1234 | 1 | 3 2022-09-09 12:50:00 | abc | 1234 |...
There is a table in ClickHouse that is constantly updated, format:
date_time           | shop_id | item_id | status | balance
---------------------------------------------------------------
2022-09-09 13:00:01 | abc     | 1234    | 0      | 0
2022-09-09 13:00:00 | abc     | 1234    | 1      | 3
2022-09-09 12:50:00 | abc     | 1234    | 1      | 10
The table stores statuses and balances for each item_id, when the balance is changed, a new record with status, time and balance is added. If the balance = 0, the status changes to 0. Need to calculate how much time (how many minutes) every item_id in the shop was available for the day. The status may change several times a day. Please help me calculate this.
Kirill_K (1 rep)
Sep 12, 2022, 09:59 AM • Last activity: Sep 3, 2024, 11:17 AM
-1 votes
1 answers
606 views
How to create pre-computed tables in order to speed up the query speed
One of the issues that I am encountering presently is that we have certain very large tables (>10 Million rows).When we reference these large tables or create joins, the speed of query is extremely slow. One of the hypothesis for solving the issue is to create pre-computed tables, where the computat...
One of the issues that I am encountering presently is that we have certain very large tables (>10 Million rows).When we reference these large tables or create joins, the speed of query is extremely slow. One of the hypothesis for solving the issue is to create pre-computed tables, where the computation for the use cases will be done already and instead of referencing the raw data, we will query the pre-computed table instead Are there any resources in order to implement this ? Do we only use mySQL or can we also use Pandas or other such modules in order to accomplish the same Which is the optimal way?
databasequestion (1 rep)
Sep 7, 2022, 01:49 PM • Last activity: Sep 7, 2022, 05:30 PM
-1 votes
1 answers
228 views
ClickHouse MV is not working perfectly as i need
I’m new to ClickHouse and having an issue with MV. I have a record table which is the data source. I’m inserting all the data here. Then created another table called adv_general_report using mv_adv_general_report materialized view. This is my [schema][1]. [Records table data][2]. The odd part is aft...
I’m new to ClickHouse and having an issue with MV. I have a record table which is the data source. I’m inserting all the data here. Then created another table called adv_general_report using mv_adv_general_report materialized view. This is my schema . Records table data . The odd part is after inserting the data to record table, the sum of impression is perfectly adding to both adv_general_report and mv_adv_general_report materialized view but views and clicks are always showing zero. You can see running this query. That showing amount of view SELECT sum(views) as views from records; enter image description here But if you run this select sum(views) as views from adv_general_report; enter image description here It’s 0 . also the select query used for a materialized view is showing the sum of view perfectly. Any idea why?
Aniruddha Chakraborty (101 rep)
Aug 30, 2021, 02:39 PM • Last activity: Aug 31, 2021, 08:44 PM
1 votes
1 answers
649 views
How to backup clickhouse over SSH?
In postgreSQL, I usually run this command to backup and compress (since my country have really low bandwidth) from server to local: ``` mkdir -p tmp/backup ssh sshuser@dbserver -p 22 "cd /tmp; pg_dump -U dbuser -Fc -C dbname | xz - -c" \ | pv -r -b > tmp/backup/db_backup_`date +%Y-%m-%d_%H%M%S`.sql....
In postgreSQL, I usually run this command to backup and compress (since my country have really low bandwidth) from server to local:
mkdir -p tmp/backup
ssh sshuser@dbserver -p 22 "cd /tmp; pg_dump -U dbuser -Fc -C dbname | xz - -c" \
 | pv -r -b > tmp/backup/db_backup_date +%Y-%m-%d_%H%M%S.sql.xz
and to restore:
fname=ls -w 1 tmp/backup/*sql.xz | tail -n 1
echo $fname

echo "select 'drop table \"' || tablename || '\" cascade;' from pg_tables WHERE schemaname = 'public';" |
psql -U dbuser |
 tail -n +3 |
 head -n 2 |
 psql -U dbuser

# sudo -u postgres dropdb dbname
# sudo -u postgres createdb --owner dbuser dbname
xzcat $fname | pg_restore  --clean --if-exists --no-acl --no-owner -U dbuser -d dbname
How to do similar thing in Clickhouse (backup, compress on the fly, compress to a file)?
Kokizzu (1403 rep)
Jul 29, 2021, 11:48 AM • Last activity: Aug 30, 2021, 01:58 PM
0 votes
0 answers
53 views
How do i design a schema with proper DB engine to accumulate data depending on this need on clickhouse or in any other database?
We're a new Adtech company and I was planning to design a database where I'll pull all the data to a single table and then make new tables with a materialized views for others to generate multiple reports. Say we have Inventory, impression, views for multiple reasons. [![enter image description here...
We're a new Adtech company and I was planning to design a database where I'll pull all the data to a single table and then make new tables with a materialized views for others to generate multiple reports. Say we have Inventory, impression, views for multiple reasons. enter image description here Our main table looks like this, to recreate this CREATE TABLE report.empty_summing (times DateTime64,inventory_id String,city Nullable(String), country Nullable(String),inventory Int32 default 0, impression Int32 default 0, views Int32 default 0) ENGINE=SummingMergeTree() primary key inventory_id; When a request comes from google ADX to our Adengine , it has a unique id which is "inventory_id" and other parameters like country, city..... other string type parameters are inserted. When 3 types of data are inserted it looks like this. enter image description here You can see that Every row have their values inserted but I want to Our inventory request insert looks like this. INSERT INTO report.empty_summing (times,inventory_id,country,city,inventory,impression,views) VALUES (now(),'7120426e6abd0b04ec8c777460a78bdf4b9de0','Bangladesh','Dhaka',1,0,0); Our impression insert looks like this. INSERT INTO report.empty_summing (times,inventory_id,impression) VALUES (now(),'7120426e6abd0b04ec8c777460a78bdf4b9de0',1); Our view insert looks like this. INSERT INTO report.empty_summing (times,inventory_id,views) VALUES (now(),'7120426e6abd0b04ec8c777460a78bdf4b9de0',1); You can see that "inventory_id" is the same for all these rows. is there any DB engine or any technique I can use where data will be merged and look like this? enter image description here Help is much appreciated. thanks in advance!
Aniruddha Chakraborty (101 rep)
Aug 24, 2021, 10:20 AM • Last activity: Aug 24, 2021, 10:48 AM
1 votes
1 answers
3083 views
Clickhouse OPTIMIZE performance for deduplication
I want to try and understand the performance of the `OPTIMIZE` query in [Clickhouse][1]. I am planning on using it to remove duplicates right after a bulk insert from a `MergeTree`, hence I have the options of: `OPTIMIZE TABLE db.table DEDUPLICATE` or `OPTIMIZE TABLE db.table FINAL DEDUPLICATE` I un...
I want to try and understand the performance of the OPTIMIZE query in Clickhouse . I am planning on using it to remove duplicates right after a bulk insert from a MergeTree, hence I have the options of: OPTIMIZE TABLE db.table DEDUPLICATE or OPTIMIZE TABLE db.table FINAL DEDUPLICATE I understand that the first state only deduplicates the insert if it hasn't already merged, whereas the second will do it to the whole table. However I am concerned about performance; from dirty analysis of OPTIMIZE TABLE db.table FINAL DEDUPLICATE on different size tables I can see it going to get exponentially worse as the table gets bigger (0.1s for 0.1M rows, 1s for 0.3M rows, 12s for 10M rows). I am assuming OPTIMIZE TABLE db.table DEDUPLICATE is based however on the insert size and table size, so should be more performative? Can anyone point to some literature on these performances? In addition, do these problems go away if I replace the table with a ReplacingMergeTree? I imagine the same process will happen under the hood, so doesn't matter either way.
AmyChodorowski (113 rep)
Aug 19, 2021, 11:47 AM • Last activity: Aug 23, 2021, 04:40 AM
1 votes
1 answers
1690 views
Mounting Clickhouse data directory to another partition: DB::Exception: Settings profile `default` not found
I'm trying to move clickhouse data directory to another partition `/dev/sdb1`. So here's what I've done: ``` sudo systemctl stop clickhouse-server mv /var/lib/clickhouse /var/lib/clickhouse-orig mkdir /var/lib/clickhouse chown clickhouse:clickhouse /var/lib/clickhouse mount -o user /dev/sdb1 /var/li...
I'm trying to move clickhouse data directory to another partition /dev/sdb1. So here's what I've done:
sudo systemctl stop clickhouse-server
mv /var/lib/clickhouse /var/lib/clickhouse-orig
mkdir /var/lib/clickhouse
chown clickhouse:clickhouse /var/lib/clickhouse
mount -o user /dev/sdb1 /var/lib/clickhouse 
cp -Rv /var/lib/clickhouse-orig/* /var/lib/clickhouse/
chown -Rv clickhouse:clickhouse /var/lib/clickhouse
sudo systemctl start clickhouse-server
but it shows an error when it started:
Processing configuration file '/etc/clickhouse-server/config.xml'.
Sending crash reports is disabled
Starting ClickHouse 21.6.4.26 with revision 54451, build id: 12B138DBA4B3F1480CE8AA18884EA895F9EAD439, PID 10431
starting up
OS Name = Linux, OS Version = 5.4.0-1044-gcp, OS Architecture = x86_64
Calculated checksum of the binary: 26864E69BE34BA2FCCE2BD900CF631D4, integrity check passed.
Setting max_server_memory_usage was set to 882.18 MiB (980.20 MiB available * 0.90 max_server_memory_usage_to_ram_ratio)
DB::Exception: Settings profile default not found
shutting down
Stop SignalListener thread
**EDIT** apparently even without new partition it doesn't start, so probably the config.xml or the macro.xml is the culprit
Kokizzu (1403 rep)
Jun 15, 2021, 07:50 AM • Last activity: Jun 15, 2021, 08:35 AM
1 votes
1 answers
1988 views
Clickhouse Replication without Sharding
How to make replication (1 master, 2 slave for example) in ClickHouse without sharding? All I can see from the examples are always have sharding: - [Altinity Presentation](https://www.slideshare.net/Altinity/introduction-to-the-mysteries-of-clickhouse-replication-by-robert-hodges-and-altinity-engine...
Kokizzu (1403 rep)
May 31, 2021, 08:52 AM • Last activity: May 31, 2021, 09:44 AM
0 votes
1 answers
117 views
Is Azure Managed Disks enough to ensure high-durability for a database?
I want to set up a database in a high durability set-up on Azure. I've previously relied on DB-as-a-service offerings, but can't do that in this case, so I'd like your feedback on the plan below. Is this enough to ensure reliable storage of data? 1) An Azure Web App takes in metric data from the web...
I want to set up a database in a high durability set-up on Azure. I've previously relied on DB-as-a-service offerings, but can't do that in this case, so I'd like your feedback on the plan below. Is this enough to ensure reliable storage of data? 1) An Azure Web App takes in metric data from the web, does some minor processing and sampling, and sends the data in batches to VM2. 2) VM2 runs the Clickhouse database, and stores data on an Azure Managed Disk 3) Some periodical job takes snapshots of the disk using Clickhouse built-in backup functionality and stores them to cold storage The periodical backup is meant to mitigate human error, i.e. accidentally running "DROP TABLE xx" on the wrong data. The big question is if managed disks are an acceptable substitute for database replication, to ensure data durability. Azure Managed Disks are advertised as being very durable forms of storage, with built in triple-redundant replication. They are advertised as good for database use. It seems that this should be enough to take away any concerns of data loss due to hardware failure. Is this correct? Do you see any potential problems with this? The recovery plan is that if VM2 fails, some monitoring process catches this and spins up a new VM2 instance attached to the same managed disk. The Web App similarly restarts if it fails. I understand that this setup isn't high-availability, if a VM fails there will be some window of time before it is able to store new data. This is acceptable to me. But I want to ensure that data that gets stored will not be lost, i.e. is durably stored with very high probability. Is this enough to ensure that? Do you see any problems?
ServableSoup (3 rep)
Apr 5, 2021, 11:50 AM • Last activity: Apr 5, 2021, 12:27 PM
2 votes
1 answers
151 views
In what cases is using ClickHouseDb and the like a neccessity?
An open source for website analytics - https://github.com/plausible/analytics They use Postgresql and ClickHouseDb. When it comes to web analytics, there're tons of events going that need to be tracked. From the point of view of database, why is using ClickHouseDb in this project neccessity? Why wou...
An open source for website analytics - https://github.com/plausible/analytics They use Postgresql and ClickHouseDb. When it comes to web analytics, there're tons of events going that need to be tracked. From the point of view of database, why is using ClickHouseDb in this project neccessity? Why wouldn't Postgresql, which is relational database, alone do? Yes, ClickHouseDb has been created specifically for analytical processing. But still, why wouldn't Postgresql **alone** do? Are Postgresql, MySql and the like uncapable of handling lots of inserts that occurr simateneously?
kosmosu05 (23 rep)
Aug 22, 2020, 05:02 AM • Last activity: Aug 22, 2020, 02:07 PM
0 votes
1 answers
2120 views
Clickhouse create database structure for json data
New to clickhouse and stuck on the database creation structure for importing json data which is nested Take for example the json data that looks like the following when there is data populated ``` "FirewallMatchesActions": [ "allow" ], "FirewallMatchesRuleIDs": [ "1234abc" ], "FirewallMatchesSources...
New to clickhouse and stuck on the database creation structure for importing json data which is nested Take for example the json data that looks like the following when there is data populated
"FirewallMatchesActions": [
    "allow"
  ],
  "FirewallMatchesRuleIDs": [
    "1234abc"
  ],
  "FirewallMatchesSources": [
    "firewallRules"
  ],
or
"FirewallMatchesActions": [
    "allow",
    "block"
  ],
  "FirewallMatchesRuleIDs": [
    "1234abc",
    "1235abb"
  ],
  "FirewallMatchesSources": [
    "firewallRules"
  ],
but there maybe json data which doesn't have them populated
"FirewallMatchesActions": [],
  "FirewallMatchesRuleIDs": [],
  "FirewallMatchesSources": [],
what would the clickhouse create database structure look like ?
p4guru (296 rep)
Jun 12, 2020, 01:52 AM • Last activity: Jun 12, 2020, 06:14 AM
3 votes
3 answers
283 views
Continuously move data from server A to Server B while deleting the data in server A?
I'm developing an ad server that is expected to handle ad impressions/billion clicks per day. The most difficult challenge I am facing is moving data from one server to another. Basically the flow is like this : 1. Multiple front facing load balancers distributes traffic (http load balancing) to sev...
I'm developing an ad server that is expected to handle ad impressions/billion clicks per day. The most difficult challenge I am facing is moving data from one server to another. Basically the flow is like this : 1. Multiple front facing load balancers distributes traffic (http load balancing) to several servers called the traffic handler node. 2. These traffic handler nodes job is to store the click logs in mysql table (data like geo , device, offer id, userid etc) and then redirect traffic to the offer landing page. 3. Every minute cron job runs in all traffic nodes that which transfers all the clicks logs to reporting server (server where all reports generation is done) in batches of 10000 rows per minute , and then delete the data after confirming that the data is successfully received by the reporting server. The reporting servers uses clickhouse database engine I need to replace the mysql database engine from the traffic nodes as I'm facing a lot of issues with MySQL. Between the heavy inserts and then the heavy deletes it's getting slow. Plus, the data is being transferred via cron job so there is 2 minutes average delay. I can't use clickhouse in these server as Yandex clickhouse do not support updates and deletes yet and the click logs are supposed to be updated many times (how many events happened on the visit) I'm looking at kakfa but again I'm not sure how to achieve one way data transfer and then deletion of data. Maybe my whole approach is wrong. I would be very grateful for any expert to guide in right direction.
Sourabh Swarnkar (33 rep)
Mar 19, 2018, 05:28 PM • Last activity: Mar 21, 2018, 07:10 PM
Showing page 1 of 17 total questions