wget can't seem to download renderable result especially embedded images
1
vote
1
answer
112
views
I want to spider **plants.usda.gov** before it disappears due to budget cuts.
However every wget combo I try results in an ultimately blank result.
I also checked on archive.org and there too, the entries for **plants.usda.gov** are all blank.
One random example of many:
- https://web.archive.org/web/20250203022537/https://plants.usda.gov/
I watched the network tab in Chrome to look for all possible servers that might be involved in fulfilling requests and added them to the --domains argument
Here is my example
But it is missing.
wget
command:
wget --adjust-extension --continue --convert-links \
--header 'User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:71.0) Gecko/20100101 Firefox/71.0' \
--mirror --no-clobber --page-requisites --random-wait --recursive --rejected-log=./rejected.txt \
--span-hosts --timestamping --verbose --wait 1 \
-D=dap.digitalgov.gov,gis.sc.egov.usda.gov,js.arcgis.com,nrcsgeoservices.sc.egov.usda.gov,plants.sc.egov.usda.gov,plants.usda.gov,plantsservices.sc.egov.usda.gov,server.arcgisonline.com \
-e robots=off \
https://plants.usda.gov/plant-profile/ARHI3
I can see it downloading fonts and javascript and such so it can render the page, but the resulting local copy renders as blank.
Also, the actual image of the plant does not appear anywhwere in the downloaded files.
For the example, this image should be present in the result:
- 

Asked by slashdottir
(169 rep)
Apr 28, 2025, 05:51 PM
Last activity: May 8, 2025, 09:02 AM
Last activity: May 8, 2025, 09:02 AM