# Server Security (VPS / Self-Hosted) - Security Checklist

VPS and self-hosted server hardening: SSH, firewall, fail2ban, patching, and least-privilege OS users.

Part of the TIGZIG security checklist (112 items across 12 categories, distilled from hardening 20+ live microservices). Full checklist: https://www.tigzig.com/security

### 9.1. SSH Key-Only Access

**THE RISK:** When you spin up a new server, password login is often enabled by default. Bots start scanning for SSH access within hours. If password login is enabled, they will brute-force it - even strong passwords are vulnerable to sustained automated attacks with millions of combinations.

**THE SOLUTION:** Disable password login entirely and only allow SSH key authentication. SSH keys are effectively impossible to guess (they are thousands of characters long). Once configured, anyone trying to log in with a password is rejected instantly - they never get a chance to guess. This is a one-time setup during server provisioning.

**THE FIX:**
```
# /etc/ssh/sshd_config
PasswordAuthentication no
PubkeyAuthentication yes
PermitRootLogin prohibit-password

# Apply changes
sudo systemctl restart sshd
```

### 9.2. fail2ban

**THE RISK:** Even with SSH key-only authentication, bots will keep trying to connect. Each failed attempt consumes CPU for the handshake rejection. Thousands of simultaneous attempts can spike CPU to 100% and crash your server. This happened in production - server went down from the sheer volume of rejected connections.

**THE SOLUTION:** Install fail2ban, which monitors your SSH log for failed login attempts. After a set number of failures (say 5) from the same IP within an hour, it bans that IP at the firewall level for 24 hours. The banned IP's connections are dropped before they even reach SSH, so there is zero CPU cost. Real-world results: 150+ IPs banned at any time, 6,000+ blocked attempts per week.

**THE FIX:**
```
sudo apt install fail2ban

# /etc/fail2ban/jail.local
[sshd]
enabled = true
maxretry = 5
findtime = 3600     # 1-hour observation window
bantime = 86400     # 24-hour ban
```

### 9.3. Firewall - Close Unnecessary Ports

**THE RISK:** By default, many ports may be open on your server. Each open port is a potential entry point for attackers. Database ports (5432 for Postgres, 3306 for MySQL), admin panels, and debug servers should never be reachable from the internet.

**THE SOLUTION:** Only three ports need to be open: 22 (SSH for server management), 80 (HTTP), and 443 (HTTPS). Close everything else. Configure the firewall at the cloud provider level (Hetzner Firewall, OCI Security Lists) so traffic is dropped before it reaches your server - zero CPU cost. Services that need to talk to each other should use Docker internal networking, not exposed ports.

**THE FIX:**
```
# OS-level firewall (ufw)
sudo ufw allow 22/tcp
sudo ufw allow 80/tcp
sudo ufw allow 443/tcp
sudo ufw enable

# Also configure at cloud provider level
# (Hetzner Firewall / OCI Security Lists)
```

### 9.4. Non-Root User

**THE RISK:** When you first set up a server, you're logged in as root with full control over everything. If an application vulnerability is exploited while running as root, the attacker owns your entire system - every file, every service, every secret on the machine.

**THE SOLUTION:** Create a dedicated non-root user with sudo access and run all your applications under that user. This limits the damage if something goes wrong - an attacker can only access what that user has permission to, not the entire system. Docker, Coolify, and your applications should all run under this restricted user, not root.

**THE FIX:**
```
# Create a deploy user with sudo access
adduser deploy
usermod -aG sudo deploy

# Disable root password login
passwd -l root

# Run apps as this user, not root
```

### 9.5. Docker Image Cleanup on Deploy Servers

**THE RISK:** Every Nixpacks/Docker build creates a 1-1.5GB image, and old images are never cleaned up automatically. On a typical VPS with 75GB disk, you can fill the entire disk in a single batch deployment session (40+ apps). When the disk fills up, everything stops - containers can't write logs, databases can't write WAL files, and new deploys fail silently.

**THE SOLUTION:** Run docker image prune -f between deployments to remove unused images. For batch redeploy sessions (like updating all apps at once), prune after every 5-10 deploys. Set up a cron job or deploy hook to prune weekly. Monitor disk usage - if your server has less than 20% disk free after a deploy, prune immediately.

**THE FIX:**
```
# After each deploy or between batch deploys
docker image prune -f

# Check disk usage
df -h /

# Nuclear option: remove ALL unused images, volumes, networks
docker system prune -af --volumes

# Cron job: weekly cleanup (add to deploy user's crontab)
0 3 * * 0 docker image prune -f >> /var/log/docker-prune.log 2>&1
```

*This is especially critical on Coolify/Hetzner setups with limited disk. A 75GB disk filled to 100% from 43 images in one session - learned the hard way.*
