optimizing MySQL for traffic analytics system
1
vote
1
answer
790
views
**background :**
I've developed a URL shortener system like Bitly with same features , so the system also tracks clickers info and represent as graphs to the person who has shorten the link as analytics data.
currently I'm using MySQL and have a table to store click info with this schema:
visit_id (int)
ip (int)
date (datetime)
country
browser
device
os
referrer (varchar)
url_id (int) //as foreign key to the shortened URL
and for now , just the
url_id
field has index
The system should represent click analytics in the time periods the user wants, for example past hour, past 24 hours , the past month , ...
for example to generate graphs for past month , I do following queries:
SELECT all DAY(date) AS period, COUNT( * )
FROM (
SELECT *
FROM visits
WHERE url_id = '$url_id'
) AS URL
WHERE DATE > DATE_SUB( CURRENT_TIMESTAMP( ) , INTERVAL 1 MONTH )
GROUP BY DAY( DATE )
//another query to display clicker browsers in this period
//another query to display clicker countries in this period
// ...
**issues:**
- for a shortened link with about 500,000 clicks , it takes about 3-4 seconds to calculate just the first query , so for total queries about 10-12 seconds which is terrible.
- lots of memory and CPU is needed to run such queries
**questions :**
1- how to improve and optimize the structure , so the analytics of high
traffic links will be shown in less than 1 second(like bitly and similar web apps) and with less usage
of CPU and RAM ? should I make an index on the fields date
,
country
, browser
, device
, os
, referrer
? if yes , how to do
that for the field date
because I should group clicks some times by
DAY(date)
, sometimes by HOUR(date)
, sometimes by MINUTE(date)
and ...
2- is MySQL suitable for this application? assume at maximum my application should handle 100 million links and 10 billion clicks on them totally. Should I consider switching to an NoSQL solution for example?
3- if MySQL is ok , is my database design and table structure proper and well designed for my application needs? or you have better recommendations and suggestions?
**UPDATE:** I made an index on column referrer
but it didn't help at all and also damaged the performance and I think that's because of the low cardinality of this column (also others) and the big resulting index size related to the RAM of my server.
I think making index on these columns would not help to solve my problem, my idea is about one of these:
1- if using MySQL, maybe generating statistics using background processing for high traffic links is better instead of calculating lively at the user request.
2- using some caching solution like memcached to help MySQL with high traffic links.
3- using a NoSQL such as MongoDB and solutions like Map-Reduce which I am poorly familiar with and haven't used ever.
what do you think?
Asked by Aliweb
(146 rep)
Jul 11, 2013, 07:18 PM
Last activity: Apr 27, 2015, 12:29 AM
Last activity: Apr 27, 2015, 12:29 AM