Sample Header Ad - 728x90

Proper way to store lots of links?

0 votes
2 answers
4627 views
I'm doing a small crawl over multiple sites and I have a lot of links which are represented by ID's (example.com/foo = 354). I'm currently storing the link -> link references and the link text. So in the following table page "2845" contains a link to 4479 with the text "About Us". Nothing big, just 3NF. +----------+----------+-----------------+ | url_1_id | url_2_id | text | +----------+----------+-----------------+ | 2845 | 4479 | About Us | | 2845 | 4480 | Who We Are | | 2845 | 4481 | What We Do | | 2845 | 4482 | Core Principles | | 2845 | 4483 | Research Staff | +----------+----------+-----------------+ However, by my calculations (most of the pages containing 500-1000 links each) I should have 6GB of link data by the time I've only parsed 200k pages. Is there a better way to store link data? Especially if there is a good way to normalize repeated navigation menus for all site pages.
Asked by Xeoncross (109 rep)
Jan 5, 2017, 08:47 PM
Last activity: Oct 18, 2021, 08:35 AM