Cloning an entire Website and Running in the Localhost
Sometimes, we need to copy a website for different types of testing purposes. In this tutorial, we will learn how to clone a website and then run that site on localhost.
Cloning a Website
Using Python
The pywebcopy
module is a popular one to download a website or a particular webpage. It crawls through all the pages and then save the webpages in a target directory.
from pywebcopy import save_website
import sys
kwargs = {'project_name': 'site-name'}
save_website(
url='http://test-site.com',
project_folder='test-site_folder_path',
**kwargs
)
If you want to have input from the terminal while running the code, just use sys
module to receive target arguments from the terminal.
from pywebcopy import save_website
import sys
kwargs = {'project_name': 'some-fancy-name'}
save_website(
url=str(sys.argv[1]),
project_folder=str(sys.argv[2]),
**kwargs
)
$ python clone.py "<target_website_name>" "<target_download_path>"
Using HTTrack
HTTrack is a commercial website cloning tool and widely used by people. It also has a free version. Here, I am using the free version.
First install HTTrack in your operating system. You can download the GUI version of the app. Or, you can use command-line for installing it in mac.
$ ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)" 2> /dev/null
$ brew install httrack
or in Ubuntu
$ sudo apt-get install httrack
Now, clone a website using the following command or following instructions in the GUI app.
$ httrack <target_website_name> -O <target_download_path>
Hosting the Website in Local Machine
Let’s create the Dockerfile
in the downloaded website folder and import a lightweight web server:
FROM nginx
COPY . /usr/share/nginx/html
Now, let’s build and run the docker container. Just replace the <target_name>
and <target_folder_path>
with the name you want to set on the container and the directory path you cloned the website.
$ docker build -t <target_name> <target_folder_path>
$ docker run -it -d -p 8080:80 <target_name>
Now, open the browser and enter http://localhost:8080/
in the url bar to view the downloaded site running in the localhost.
When done, just use docker stop
to stop the container. You can find the container ID using the command docker ps
.
$ docker ps -a -f status=running
$ docker stop -t 60 <CONTAINER_ID>
That’s all! Cheers!!!
Leave a comment