CentOS wget

WGET is an incredibly useful command which allows one to transfer files from a remote host. In its unmodified form, wget will only return files with public permissions. Any files that would require authentication for access will be ignored.

[user@localhost ~]$ wget http://host.net/Turnip.jpg
--2021-01-21 20:05:35--  http://host.net/Turnip.jpg
Resolving host.net (host.net)... FE80::abcd:abcd:abcd:abcd,
Connecting to host.net (host.net)|FE80::abcd:abcd:abcd:abcd... connected.
HTTP request sent, awaiting response... 200 OK
Length: 116772 (114K) [image/jpeg]
Saving to: ‘Turnip.jpg’

100%[======================================>] 116,772     --.-K/s   in 0.06s

2021-01-21 20:05:35 (1.77 MB/s) - ‘Turnip.jpg’ saved [116772/116772]

[user@localhost ~]$

In this example wget is retrieving the publicly visible file turnip.jpg from host.net, saving it to whatever directory this sessions happens to be working in. In order to have access to all files we will need to incorporte some sort of authentication method. The following example uses wget in conjuction with ftp.

[user@localhost ~]$ wget ftp://user:password@host.net/nonPublicFile.txt

While this method does work, it is considered bad practice, as the password will be saved as clear text in your command-line history.

[user@localhost ~]$ wget --ask-password ftp://user@host.net/nonPublicFile.txt

This use of the command is much more secure, as it prompts the user to enter the password at the time of execution.

Moving beyond downloading single files where we explicitly know the full name, we can use wildcards such as the * (asterisk) to modify the path in much the same way that they can be applied when working with local files.

[user@localhost ~]$ wget host.net/directory/*

In this case, one will download all publicly visible files that reside in the given directory. wget also has a number of its own modifiers for these purposes.

[user@localhost ~]$ wget -r host.net/directory/

In this case the -r flag indicates that wget should retrieve all publicly visible files within the given directory recursively. While this functionality provides some improvement upon its preceding method, it is still limitted to 5 levels of recursion.

[user@localhost ~]$ wget -m host.net/directory/

In this case the -m flag indicates mirror, meaning that wget will get all publicly visible files in the process of mirroring the directory/file structure of the given path.

An interesting note on the behavior of the -m flag is that while it will not download any files from levels above the specified path, it will still replicate the full folder structure. Take the following example.

[user@localhost ~]$ wget -m host.net/htdocs/images

In this case you will still only get a mirror of files and directories inside the images directory, but the folder that shows up in your local directory will actually be named host.net in which an htdocs directory will live, which holds a mirror of the targeted directory. If there are other directories or files in the htdocs directory on the remote host, they will be ignored.

Posted in Learn, Linux, Test