using wget to download all audio files (over 100,000 pages on wikia)

2 votes

1 answer

3158 views

wget

                          I am trying to download all audio files in Wookiepedia, the Star Wars wiki.

My first thought is something like this

    wget -r -A  -nd .mp3 .ogg http://starwars.wikia.com/wiki/ 

This should download all .mp3 and .ogg from the wiki while preventing creation of a directory. However, when I run this in terminal I get:

>bash: http://starwars.wikia.com/wiki/ : No such file or directory

The problem is that I can't use for loops since the URLs are unique to each wiki page. For example:

    http://starwars.wikia.com/wiki/Retcon 

    http://starwars.wikia.com/wiki/C-3PX 

    http://starwars.wikia.com/wiki/Star_Wars_Legends 

Is it possible to download URLs in this structure?

EDIT: This is the message I get back using the answer.

>--2016-02-10 16:21:26--  http://starwars.wikia.com/wiki/ 
Resolving starwars.wikia.com (starwars.wikia.com)... 23.235.33.194, 23.235.37.194, 104.156.81.194, ...
Connecting to starwars.wikia.com (starwars.wikia.com)|23.235.33.194|:80... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: http://starwars.wikia.com/wiki/Main_Page  [following]
--2016-02-10 16:21:26--  http://starwars.wikia.com/wiki/Main_Page 
Reusing existing connection to starwars.wikia.com:80.
HTTP request sent, awaiting response... 200 OK
Length: 569628 (556K) [text/html]
Saving to: ‘index.html’

>100%[========================>] 569,628      217KB/s   in 2.6s   

>2016-02-10 16:21:29 (217 KB/s) - ‘index.html’ saved [569628/569628]

>Removing index.html since it should be rejected.

>FINISHED --2016-02-10 16:21:29--
Total wall clock time: 2.7s
Downloaded: 1 files, 556K in 2.6s (217 KB/s)

    sl
gives me nothing, there are no files in the working directory.
                        

Asked by user147855

Feb 6, 2016, 05:40 PM
Last activity: Jul 3, 2025, 01:03 PM

using wget to download all audio files (over 100,000 pages on wikia)

Related Questions