MySQL database cluster vs. replication setup for large crawler indexes

0 votes

1 answer

230 views

                          What is the most efficient way to set up a MySQL cluster or replication in an application that has a crawler/indexing engine with thousands of read/writes and then a separate front-end for public access.

It is kind of like creating a mini Google search engine. The database that keeps the latest crawl data has to be separate in a way not to impact the performance of the publicly-accessed database. However, eventually the two databases need to be reconciled so that the publicly-accessed database has all the latest data.

So my idea would be the crawler DB would be the master, while the publicly accessed database would be the slave. And then edit/insert actions on the publicly assessed database would really need to be done on the master DB (same as crawler).

Is this the best way to do such a thing? Is there a better setup?

More information:

Currently there is a lot of reading/updating/inserting at a very fast pace while the crawler is running. 

For example, the crawler could send 10 businesses with location and items for each business per every 2-3 seconds, at which, the receiving server needs to do the following. Search DB if business already exists (out of few hundred thousand records), if not create it. Then search if that location (out of half a million records) for that business exists, if not, create it. Then loop through each item of the business and check if it exists (out of 40+ million records), update, if not, create it. Old ones are backed up and then deleted.

Asked by zen (121 rep)

Jan 26, 2020, 07:21 PM
Last activity: Jun 5, 2025, 09:07 PM

MySQL database cluster vs. replication setup for large crawler indexes

Related Questions