Sample Header Ad - 728x90

wget -i notifying when each file finishes (for processing purposes)

1 vote
1 answer
511 views
I'd like to be able to process *multiple* files downloaded by wget -i immediately after they are downloaded (instead of waiting for all files in the list to finish--for the entire wget process to exit). The trouble is: because wget downloads the file in place, I cannot be sure when a file is safe to process (fully downloaded). Ideally, the principled approach is (I believe) to have wget initially download files into a temporary directory and then mv them into the actual destination directory when complete. Because the mv is atomic*, I can guarantee that any file present in the destination directory is completely downloaded and ready for processing. I've been through the manpage, but can't seem to find anything to this end. My current hacky approach is to use fuser to see if wget no longer has the file open. But, this is very fragile (what if wget opens a file multiple times?) and I'd like to avoid it. If there isn't a way to achieve this exactly, is there a workaround that can achieve the same effect? The files are HTML pages if that's at all relevant. *Addendum: Apparently [mv may not be atomic](https://unix.stackexchange.com/a/322074/8506) (although for my env it is), although I don't think strict atomicity is needed. The only requirement is that once a file is renamed into the destination directory it is completely downloaded (and the complete contents are immediately available at the new path). edit: Splitting the process up into multiple wget commands is also not ideal because it precludes using some core features of wget (rate limiting, HTTP keepalive, DNS caching, etc.).
Asked by Bailey Parker (330 rep)
May 15, 2019, 02:13 PM
Last activity: May 15, 2019, 03:01 PM