wget can't seem to download renderable result especially embedded images

1 vote

1 answer

112 views

wget

I want to spider **plants.usda.gov** before it disappears due to budget cuts. However every wget combo I try results in an ultimately blank result. I also checked on archive.org and there too, the entries for **plants.usda.gov** are all blank. One random example of many: - https://web.archive.org/web/20250203022537/https://plants.usda.gov/ I watched the network tab in Chrome to look for all possible servers that might be involved in fulfilling requests and added them to the --domains argument Here is my example wget command:

wget --adjust-extension --continue --convert-links \
  --header 'User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:71.0) Gecko/20100101 Firefox/71.0' \
  --mirror --no-clobber --page-requisites --random-wait --recursive --rejected-log=./rejected.txt \
  --span-hosts --timestamping --verbose --wait 1  \
  -D=dap.digitalgov.gov,gis.sc.egov.usda.gov,js.arcgis.com,nrcsgeoservices.sc.egov.usda.gov,plants.sc.egov.usda.gov,plants.usda.gov,plantsservices.sc.egov.usda.gov,server.arcgisonline.com \
  -e robots=off \
  https://plants.usda.gov/plant-profile/ARHI3

I can see it downloading fonts and javascript and such so it can render the page, but the resulting local copy renders as blank. Also, the actual image of the plant does not appear anywhwere in the downloaded files. For the example, this image should be present in the result: -

But it is missing. image of some kind of wheat

Asked by slashdottir (169 rep)

Apr 28, 2025, 05:51 PM
Last activity: May 8, 2025, 09:02 AM

wget can't seem to download renderable result especially embedded images

Related Questions