Sample Header Ad - 728x90

optimizing MySQL for traffic analytics system

1 vote
1 answer
790 views
**background :** I've developed a URL shortener system like Bitly with same features , so the system also tracks clickers info and represent as graphs to the person who has shorten the link as analytics data. currently I'm using MySQL and have a table to store click info with this schema: visit_id (int) ip (int) date (datetime) country browser device os referrer (varchar) url_id (int) //as foreign key to the shortened URL and for now , just the url_id field has index The system should represent click analytics in the time periods the user wants, for example past hour, past 24 hours , the past month , ... for example to generate graphs for past month , I do following queries: SELECT all DAY(date) AS period, COUNT( * ) FROM ( SELECT * FROM visits WHERE url_id = '$url_id' ) AS URL WHERE DATE > DATE_SUB( CURRENT_TIMESTAMP( ) , INTERVAL 1 MONTH ) GROUP BY DAY( DATE ) //another query to display clicker browsers in this period //another query to display clicker countries in this period // ... **issues:** - for a shortened link with about 500,000 clicks , it takes about 3-4 seconds to calculate just the first query , so for total queries about 10-12 seconds which is terrible. - lots of memory and CPU is needed to run such queries **questions :** 1- how to improve and optimize the structure , so the analytics of high traffic links will be shown in less than 1 second(like bitly and similar web apps) and with less usage of CPU and RAM ? should I make an index on the fields date, country, browser, device, os, referrer ? if yes , how to do that for the field date because I should group clicks some times by DAY(date), sometimes by HOUR(date), sometimes by MINUTE(date) and ... 2- is MySQL suitable for this application? assume at maximum my application should handle 100 million links and 10 billion clicks on them totally. Should I consider switching to an NoSQL solution for example? 3- if MySQL is ok , is my database design and table structure proper and well designed for my application needs? or you have better recommendations and suggestions? **UPDATE:** I made an index on column referrer but it didn't help at all and also damaged the performance and I think that's because of the low cardinality of this column (also others) and the big resulting index size related to the RAM of my server. I think making index on these columns would not help to solve my problem, my idea is about one of these: 1- if using MySQL, maybe generating statistics using background processing for high traffic links is better instead of calculating lively at the user request. 2- using some caching solution like memcached to help MySQL with high traffic links. 3- using a NoSQL such as MongoDB and solutions like Map-Reduce which I am poorly familiar with and haven't used ever. what do you think?
Asked by Aliweb (146 rep)
Jul 11, 2013, 07:18 PM
Last activity: Apr 27, 2015, 12:29 AM