Learn How to Install Open WebUI on a Cloud Hosted GPU Server.

Introduction

Open WebUI is a powerful, open-source web interface you can host yourself, designed to streamline the deployment and use of large language models (LLMs). Seamlessly integrating with Ollama and APIs compatible with OpenAI, it supports the addition of custom AI models and plugins, offering flexibility for developers, researchers, and AI enthusiasts alike. This guide walks you through the process of setting up Open WebUI on a Cloud Hosted GPU instance, enabling you to run LLMs efficiently and with full control.

Requirements

The following is a list of items needed to complete the installation process successfully:
1) An Ubuntu 24.04 Cloud GPU Server with 8 GB of GPU memory.
2) CUDA Toolkit and cuDNN Installed.
3) Git installed: `sudo apt install git`
4) A root or sudo privileges.
5) An SSH client such as PuTTy or the macOS terminal app

Step 1: Download And Install Ollama

Ollama is a streamlined platform designed to run open-source large language models such as Llama, Code Llama, Mistral, and Gemma. With its robust set of APIs, it enables easy creation, management, and customization of models, serving as a key tool for powering Open WebUI. To get started, follow the instructions below to install Ollama and load a sample model:

Please run the following commands to download and run the ollama installation script:

wget https://ollama.ai/install.sh
chmod +x install.sh
./install.sh
systemctl enable ollama
systemctl status ollama

Once this is done, proceed to run the following command to download an AI model such as llama3:8b

ollama pull llama3:8b

Step 3: Download and install the Open Web UI

The next step is downloading and installing Open WebUI. Open WebUI can be installed using Python Pip or Docker. The Pip method requires Python 3.11 or greater version. In this guide we will use the pip method to complete the installation. Run the following command to install Python3 and the needed dependencies:

apt install python3-venv python3-pip

Create a Python virtual environment and activate it.

python3 -m venv venv
source venv/bin/activate

Install Open WebUI:

pip install open-webui

Upgrade the Pillow and pyopenssl modules:

pip install -U Pillow pyopenssl

Install additional libraries.

pip install ffmpeg uvicorn

Run Open WebUI and verify it starts without errors

open-webui serve &

Allow the Open WebUI port 8080 via UFW.

ufw allow 8080

Step 4: Access the Open UI web admin dashboard

Now that you have completed the Open Web UI installation, open a new browser window and navigate to http://127.0.0.1:8080. Replace 127.0.0.1 with the actual IP address of your server. On the Open WebUI welcome page, click “Get Started”. Enter a username, email address, and password then, click on Create Admin Account. Once the admin account is setup, you can login and test the system by running a prompt such as “What is billysoftacademy.org?” Then click on Send to generate a response using the model. You will see the answer on the prompt screen.

Conclusion And Next Steps

Setting up Open WebUI on a cloud-hosted GPU instance gives you full control over running large language models with a flexible, customizable, and self-hosted interface. By following this guide, you’ve successfully installed Ollama, downloaded a model like llama3:8b, configured your environment, and deployed Open WebUI for use through a web-based dashboard. This powerful setup is ideal for developers, researchers, and AI hobbyists seeking a secure, scalable, and open-source solution for interacting with LLMs.

Next Steps:

1) Explore Additional Models: Use ollama pull to download and test other open-source models like mistral, gemma, or codellama to suit your specific use cases.

2) Extend Open WebUI’s capabilities by installing plugins or integrating APIs, including those compatible with OpenAI.

3) Set up HTTPS, configure firewall rules beyond port 8080, and ensure strong authentication policies to protect your instance.

4) Monitor GPU usage and tweak model and server settings to optimize inference speed and resource consumption.

5) Consider containerizing the setup with Docker or orchestrating deployments using Kubernetes if you’re planning multi-user access or high-availability scenarios.

6) Regularly check for updates to Ollama and Open WebUI to benefit from new features, performance improvements, and security patches.

With your system now online, you’re well-positioned to experiment, build, and deploy AI-driven applications powered by the latest in open-source LLM technology.