Download Images from list of urls

  • Create a folder in your machine.

  • Place your text file of images URL in the folder.

  • cd to that folder.
  • Use wget -i images.txt

  • You will find all your downloaded files in the folder.


On Windows 10/11 this is fairly trivial using

for /F "eol=;" %f in (filelist.txt) do curl -O %f

Note the inclusion of eol=; allows us to mask individual exclusions by adding ; at the start of those lines in filelist.txt that we do not want this time. If using above in a batch file GetFileList.cmd then double those %%'s

Windows 7 has a FTP command, but that can often throw up a firewall dialog requiring a User Authorization response.

Currently running Windows 7 and wanting to download a list of URLs without downloading any wget.exe or other dependency like curl.exe (which would be simplest as the first command) the shortest compatible way is a power-shell command (not my favorite for speed, but if needs must.)

The file with URLs is filelist.txt and IWR is the PS near equivalent of wget.

The Security Protocol first command ensures we are using modern TLS1.2 protocol

-OutF ... split-path ... means the filenames will be the same as remote filenames but in CWD (current working directory), for scripting you can cd /d folder if necessary.

PS> [Net.ServicePointManager]::SecurityProtocol = "Tls12" ; GC filelist.txt | % {IWR $_ -OutF $(Split-Path $_ -Leaf)}

To run as a CMD use a slightly different set of quotes around 'Tls12'

PowerShell -C "& {[Net.ServicePointManager]::SecurityProtocol = 'Tls12' ; GC filelist.txt | % {IWR $_ -OutF $(Split-Path $_ -Leaf)}}"

This needs to be made into a function with error handling but it repeatedly downloads images for image classification projects

    import requests

    urls = pd.read_csv('cat_urls.csv') #save the url list as a dataframe

    rows = []

    for index, i in urls.iterrows():
        rows.append(i[-1])

    counter = 0

    for i in rows:
    

    file_name = 'cat' + str(counter) + '.jpg'
    
        print(file_name)
        response = requests.get(i)
        file = open(file_name, "wb")
        file.write(response.content)
        file.close()
        counter += 1