Write a Reverse Proxy Server in Python: Part 1 (Reverse Proxy Server)
In this tutorial we will learn, how to build a Reverse Proxy Server
in python. I will post a series of this tutorial where we will build a complete system that includes several servers and clients alongside the Reverse Proxy server.
After the basic script-based implementation, I will provide details on how to implement the system in Amazon EC2
instances.
The series will have the following contents:
- Part 01: The Reverse Proxy Server
- Part 02: A Server in a Server Pool
- Part 03: A Client
- Part 04: Simulation with Packets
- Part 05: Deploy in Amazon EC2 instances
Reverse Proxy
A reverse proxy server is connected to a pool of servers and forwards user requests any of the server from the pool. In a server-client architecture, a client typically used to directly connect to a server for some work to be done. But it raises some issues regarding server availability and scaling of the server-end. The red
connection in the figure depicts the direct communication between a client and a server.
To resolve the above-mentioned issues, a reverse proxy server is used to handle all the incoming connection from the users or clients and then forward the request to any of the servers. When the server completes processing of the data, it returns the data to the client via the reverse proxy again. In the figure below, the green two-way communication is established in a. reverse proxy server-based system.
Functionalities
In our Python code, we will apply the following functionalities to a Reverse Proxy Server:
- Keep a record of available servers
- Distinguish the servers based on policy
- Use
threading
to handle all processes seperately - Handles incoming connections from the clients
- Forward the user requests to the available servers in a round-robin fashion
- Receives processed data from the servers and returns back the data to the users
Communication Packets
We will be using communication packets in JSON
format. JSON is a python dictionary-alike representation of data and widely used in REST-based systems.
Client-side Packets
{
"type": 0, // 0 is a message from a client to a server
"srcid": 999, // source (client) id
"privPoliId": 999, // destination server’s privacy policy
"payloadsize": 999, // payload size
"payload": "xyz" // payload
}
Server (Server Pool)-side Connection Setup Packets
{
"type": 1 // 1 is a connection setup message from a server
"id": 999, // id of the server
"privPolyId": 999, // privacy policy of the server
"listenport": 999 // port on which the server is listening
}
Server (Server Pool)-side User-data Processed Packets
{
"type": 2, // 2 is an ACK from a server to a client
"srcid": 999, // source (server) id
"destid": 999, // destination (client) id
"payloadsize": 999, // payload size
"payload": "xyz" // payload
}
Code in Python
Used Modules
Here, the most important module we are going to use is socket
, which will help us to establish socket communication between nodes. Here, let’s check why we are using particular modules for what purpose
_thread
andthreading
- for implementing thread-based isolated processes for each communication.json
- handles packet data that is JSON-alike.sys
- handles options and arguments while running the scriptspandas
- keep a table-based record of available serversitertools
- to help the round-robin cycletime
- optional, you can keep track of the sessions (not included)
# Import required modules
import socket
import _thread
import threading
import json
import sys
import time
import pandas as pd
import itertools
Check Available Arguments
The following function checks for available options and receives each input arguments that will help us to run the servers. The only input we need is the port
address.
It will give user the usage message if the user does not run the script based on our need.
def option_check():
# all available argument options
avail_options = ["-port"]
# receive user given options
options = [opt for opt in sys.argv[1:] if opt.startswith("-")]
# receive user given arguments
args = [arg for arg in sys.argv[1:] if not arg.startswith("-")]
# raise error if user given option is wrong
for i in options:
if i not in avail_options:
raise SystemExit(f"Usage: {sys.argv[0]} -port <argument>...")
# raise error if not all options or arguments are available
if len(options) != 1 or len(args) != 1:
raise SystemExit(f"Usage: {sys.argv[0]} -port <argument>...")
return args
Round Robin
Our Round-Robin implementation is too simple. We take a Itertools.cycle()
object as an input and our function returns one output each time using the next
function. We will infact pass a list object of ip addresses
that have the same privacy policy.
def round_robin(iterable):
return next(iterable)
A dataframe to keep records
Here, we will use a dataframe table using pandas
to store the type
, id
, privacy policy
, listening port
, and the ip address
of each available server.
# define the available table
column_names = ["type", "id", "privPolyId", "listenport", "ip_addr"]
updated_available_server_table = pd.DataFrame(columns = column_names)
Here we add entry to the server record table once a message is received.
def available_server(msg):
global updated_available_server_table
global policy_table
updated_available_server_table = updated_available_server_table.append(msg, ignore_index = True)
policy_list = set(updated_available_server_table["privPolyId"].tolist())
# print(policy_list)
policy_table = {}
for policy in policy_list:
policy_table[policy] = itertools.cycle(set(updated_available_server_table\
[updated_available_server_table["privPolyId"]==policy]["id"].tolist()))
Receive New Connection
Now, the following function will start action when our proxy server receives a new connection request (whether a server from the server pool or a user client machine).
It first retrieves the incoming packet and check the packet type
. If it is 0
, the incoming connection identifies a client on the opposite side. If 1
, then the connection establishing node must be a server.
Now, if the opposite node is a server, we add an entry to our table using the previously defined available_server
function.
If the node is client, we first retrive the privacy policy of the packet, and then receive the next server from the same policy using our round_robin
function. Then our proxy server creates a new connection to the target server, forward the user packet, and receives the processed data followed by sending the packet back to the client. You can imagine the new socket connection as a nested socket communication.
# Establish connection with new client
def on_new_client(clientsocket,addr):
while True:
msg = clientsocket.recv(2048)
if not msg:
# lock released on exit
print_lock.release()
break
json_msg = json.loads(msg.decode())
if json_msg["type"] == "1":
ip, port = clientsocket.getpeername()
print ("Received Connection from IP:", ip, "Port:", port)
json_msg["ip_addr"] = ip
print ("Received setup message from server id", json_msg["id"], "privacy policy",\
json_msg["privPolyId"], "port", json_msg["listenport"])
available_server(json_msg)
elif json_msg["type"] == "0":
print ('Received a message from client', json_msg["srcid"], \
"payload", json_msg["payload"])
policy = json_msg["privPoliId"]
# print(policy)
target_host_id = round_robin(policy_table[policy])
# print(target_host_id)
server_name = updated_available_server_table.loc\
[updated_available_server_table["id"]==target_host_id, "ip_addr"].values[0]
server_port = int(updated_available_server_table.loc\
[updated_available_server_table["id"]==target_host_id, "listenport"].values[0])
server_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
server_socket.connect((server_name,server_port))
print("Forwarding a data message to server id", target_host_id, "server ip", server_name, \
"port", server_port, "payload", json_msg["payload"])
server_socket.send(json.dumps(json_msg).encode())
recv_msg = server_socket.recv(2048)
recv_json_msg = json.loads(recv_msg.decode())
print ("Received a data message from server id", recv_json_msg["srcid"],\
"payload", recv_json_msg["payload"])
print("")
server_socket.close()
clientsocket.send(json.dumps(recv_json_msg).encode())
else:
pass
clientsocket.close()
Main Program
Now, let’s create our main program where we follow the mandatory socket communication steps (binding host and port, listening for clients, etc.). Here, we start thread for each new socket communication.
if __name__ == "__main__":
args = option_check()
s = socket.socket() # Create a socket object
# host = socket.gethostname() # Get local machine name
host = 'localhost'
port = int(args[0]) # Reserve a port for your service.
print("Running the reverse proxy on port", port)
# Binds to the port
s.bind((host, port))
# Allow 10 clients to connect
s.listen(100)
while True:
c, addr = s.accept() # Establish connection with client.
# lock acquired by client
print_lock.acquire()
_thread.start_new_thread(on_new_client,(c,addr))
s.close()
The entire code is available in GitHub. It was a homework in my network course taken in the last semester; if you want to through details of the problem, just go through this document.
As this is a series tutorial, I will post the subsequent blog-posts soon. Stay tuned.
If you find this post helpful, and want to support this blog, you can or
The whole tutorial series is listed here:
- Write a Reverse Proxy Server in Python: Part 1 (Reverse Proxy Server)
- Write a Reverse Proxy Server in Python: Part 2 (Server Pool)
- Write a Reverse Proxy Server in Python: Part 3 (Client-side Script)
- Write a Reverse Proxy Server in Python: Part 4 (Shell Script for Automation)
Promotions and Referrals (US Residents Only)
- Chime: Open a Checking account at Chime using my referral link and get $100 after your employer deposit paycheck of minimum $200 within the first 45 days.
- Rakuten: Get $30 after you spend $30 at Rakuten select stores after you use my referral link to open an account.
-
Chase Freedom Credit Card: Earn $200 cash back with Chase Freedom Unlimited or Chase Freedom Flex credit card. I can be rewarded if you apply using my referral link and are approved for the card.
- Chase Checking Account: Get $200 when you open a checking account using my referral link after your first salary is deposited.
- Discover: Earn $50 cash back with Discover when you apply using my referral link and are approved for the card.
- Amex Blue Cash Preferred: Earn $250 as statement credit when you spend $3000 in first six months. Apply using my referral link.
Leave a comment