Proper way to store lots of links?
0
votes
2
answers
4627
views
I'm doing a small crawl over multiple sites and I have a lot of links which are represented by ID's (
example.com/foo
= 354
).
I'm currently storing the link -> link
references and the link text. So in the following table page "2845" contains a link to 4479 with the text "About Us". Nothing big, just 3NF.
+----------+----------+-----------------+
| url_1_id | url_2_id | text |
+----------+----------+-----------------+
| 2845 | 4479 | About Us |
| 2845 | 4480 | Who We Are |
| 2845 | 4481 | What We Do |
| 2845 | 4482 | Core Principles |
| 2845 | 4483 | Research Staff |
+----------+----------+-----------------+
However, by my calculations (most of the pages containing 500-1000 links each) I should have 6GB of link data by the time I've only parsed 200k pages.
Is there a better way to store link data? Especially if there is a good way to normalize repeated navigation menus for all site pages.
Asked by Xeoncross
(109 rep)
Jan 5, 2017, 08:47 PM
Last activity: Oct 18, 2021, 08:35 AM
Last activity: Oct 18, 2021, 08:35 AM