How to Run Ollama on VPS India — Host AI Models Without GPU
Run Ollama on a best VPS hosting India plan from GigaNodes starting at ₹800/mo. Cloud XS (4GB RAM) handles Phi-3 Mini. Cloud S (8GB RAM, ₹1,440/mo) runs Llama 3 7B and Mistral 7B without a GPU. Install takes under 5 minutes with one command.
Ollama makes it possible to run large language models like Llama 3, Mistral, and Phi-3 entirely on your own server — no API fees, no rate limits, no data leaving your infrastructure. The catch is that most laptops don’t have the RAM or uptime to run a model 24/7 for an app or API. That’s where a VPS comes in.
This guide covers exactly which Ollama models fit which VPS plan, the install commands, and how to expose Ollama as an API for your own apps.
Why Run Ollama on a VPS Instead of Your Local Machine
Running Ollama locally works for testing, but it stops working the moment you need the model available 24/7, accessible from multiple devices, or callable from a production app. A laptop sleeps, reboots, and loses its IP address. A VPS in India stays online continuously, gets a static IP, and sits close to users in Delhi NCR and across India for low latency.
Running Ollama on a GigaNodes VPS India also means you control your data completely — prompts and outputs never leave your own server, which matters for businesses handling sensitive customer data under India’s data protection rules.
Ollama Model RAM Requirements vs GigaNodes Plans
| Model | Parameters | RAM Needed | GigaNodes Plan | Price |
|---|---|---|---|---|
| Phi-3 Mini | 3.8B | 3-4GB | Cloud XS | ₹800/mo |
| Llama 3 7B | 7B | 6-8GB | Cloud S | ₹1,440/mo |
| Mistral 7B | 7B | 6-8GB | Cloud S | ₹1,440/mo |
| Llama 3 13B | 13B | 14-16GB | Cloud M | ₹2,880/mo |
| Llama 3 70B | 70B | 48-64GB | Cloud XL | ₹11,520/mo |
Step by Step — Installing Ollama on Your VPS
These steps work on any Ubuntu 24.04 VPS, including every GigaNodes plan.
1. Connect to your VPS via SSH
2. Install Ollama
3. Start the Ollama service
systemctl enable ollama
4. Pull a model
5. Run the model
6. Optional — add a web UI with Docker
Open WebUI gives you a ChatGPT-style interface for your self-hosted models.
Using the Ollama API
Once running, Ollama exposes a local REST API on port 11434 that any app can call.
“model”: “llama3”,
“prompt”: “Explain VPS hosting in one sentence”
}’
By default Ollama listens on localhost only. To call the API from another machine, set OLLAMA_HOST=0.0.0.0 in the systemd environment file at /etc/systemd/system/ollama.service.d/override.conf, then restart the service. Put Nginx with SSL in front of it before exposing it publicly.
Performance — AMD EPYC CPU vs GPU
CPU inference on AMD EPYC 7C13 is slower than a dedicated GPU but is fully usable for personal assistants, internal tools, and low-traffic APIs. Expect roughly 5-12 tokens per second on a 7B model on Cloud S — fast enough for chat-style interaction, slower than instant for long generations. For production workloads serving many concurrent users, a GPU-backed setup is recommended; for personal use, side projects, and small team tools, CPU inference on a GigaNodes VPS India is more cost-effective than renting GPU instances.
AMD EPYC 7C13 · 141,108 IOPS (2.4x faster than DigitalOcean) · Cloudflare Magic Transit DDoS included free · UPI accepted · GST invoice · Yotta DC Noida · First Indian hosting company with Cloudflare Magic Transit · VPS from ₹400/mo →
Frequently Asked Questions
Related guides
Get the Best VPS Hosting in India
AMD EPYC 7C13 · Cloudflare Magic Transit · Yotta DC Noida · UPI accepted · GST invoice
⭐ 4.9/5 from 500+ reviews · No setup fee · Deploy in 60 seconds
