Differences
This shows you the differences between two versions of the page.
| Both sides previous revision Previous revision Next revision | Previous revision | ||
| doc:recipes:high-availability [2017/01/25 01:14] – [4. Configure conntrackd] aaronhauck | docs:guide-user:network:high-availability [2023/02/04 18:31] (current) – fixed missing 'globals' in alt config file lines nathhad | ||
|---|---|---|---|
| Line 1: | Line 1: | ||
| + | :!: most of this assumes you're familiar with openwrt, basic networking concepts and are able to tinker around the command-line :!: | ||
| + | ====== High availability ====== | ||
| + | |||
| + | //High availability// | ||
| + | |||
| + | This page describes a simple two router setup, in an active/ | ||
| + | The two devices will share a virtual ip address that hosts on the lan can use as a gateway to reach the internet. | ||
| + | In case the active router fails or is rebooted, a backup router will take over. | ||
| + | |||
| + | We're using keepalived to implement healthchecking and ip failover, and conntrack-tools to implement firewall/ | ||
| + | |||
| + | Most of openwrt configuration required (but not all) is doable from luci web ui as well. | ||
| + | ===== Preparation, | ||
| + | |||
| + | * You have 2 openwrt routers and a static WAN IP. (could also be a private IP+DMZ). | ||
| + | * If you're not doing NAT or connection tracking based firewalling, | ||
| + | * DHCP dynamic WAN IP is possible with keepalived, but requires extra scripting and is not going to be described here. | ||
| + | * VPNs and tunnel setups and failing those over is not covered. | ||
| + | * Failing over PPPoE WAN is not implemented here, best bet: let the modem do PPPoE and setup your virtual wan ip to DMZ. | ||
| + | |||
| + | |||
| + | ===== Individual Router Configuration ===== | ||
| + | |||
| + | ==== 1. Configure 1st openwrt router ==== | ||
| + | |||
| + | * Internal LAN ip: 192.168.1.2/ | ||
| + | * WAN IP, gateway: static 192.168.0.2/ | ||
| + | * DHCP on defaults is fine, we'll configure it later. | ||
| + | |||
| + | ==== 2. Configure 2nd openwrt router ==== | ||
| + | |||
| + | * Interface LAN ip: 192.168.1.3/ | ||
| + | * WAN IP, gateway: static 192.168.0.3/ | ||
| + | * DHCP on defaults is fine for now, if you have any static leases in dhcp, or fixed host entries, make sure they' | ||
| + | |||
| + | == verification and troubleshooting == | ||
| + | * change a client to use gw 192.168.1.3 and dns 192.168.1.3, | ||
| + | * hosts that have IPs issued with one dnsmasq might not be resolvable using the second dnsmasq, assigning static leases helps. | ||
| + | |||
| + | ===== Both router configuration ====== | ||
| + | |||
| + | ==== 3. Configure keepalived ==== | ||
| + | |||
| + | **keepalived** is a linux daemon that uses VRRP (Virtual Router Redundancy Protocol) to healthcheck and elect a router on the network that will serve a particular IP. We'll be using a small subset of its features in our use case. | ||
| + | |||
| + | '' | ||
| + | |||
| + | Much work has been done to set up keepalived to use a uci config file, however this config file format has not yet been documented. The following example uses a keepalived.conf configuration, | ||
| + | |||
| + | The following configuration in ''/ | ||
| + | You will need to adjust the interfaces to match your device. | ||
| + | |||
| + | < | ||
| + | ! Configuration File for keepalived | ||
| + | |||
| + | ! failover E1 and I1 at the same time | ||
| + | vrrp_sync_group G1 { | ||
| + | group { | ||
| + | E1 | ||
| + | I1 | ||
| + | } | ||
| + | } | ||
| + | |||
| + | ! internal | ||
| + | vrrp_instance I1 { | ||
| + | state backup | ||
| + | interface br-lan | ||
| + | virtual_router_id 51 | ||
| + | priority 101 | ||
| + | advert_int 1 | ||
| + | virtual_ipaddress { | ||
| + | 10.9.8.4/24 | ||
| + | } | ||
| + | authentication { | ||
| + | auth_type PASS | ||
| + | auth_pass s3cret | ||
| + | } | ||
| + | nopreempt | ||
| + | } | ||
| + | |||
| + | ! external | ||
| + | vrrp_instance E1 { | ||
| + | state backup | ||
| + | interface eth0.2 | ||
| + | virtual_router_id 51 | ||
| + | priority 101 | ||
| + | advert_int 1 | ||
| + | virtual_ipaddress { | ||
| + | 192.168.0.4/ | ||
| + | } | ||
| + | virtual_routes { | ||
| + | src 192.168.0.4 to 0.0.0.0/0 via 192.168.0.1 dev eth0.2 metric 5 | ||
| + | } | ||
| + | authentication { | ||
| + | auth_type PASS | ||
| + | auth_pass s3cret | ||
| + | } | ||
| + | nopreempt | ||
| + | } | ||
| + | </ | ||
| + | |||
| + | To ensure `/ | ||
| + | |||
| + | < | ||
| + | config global_defs ' | ||
| + | | ||
| + | </ | ||
| + | |||
| + | In 21.02 and later: | ||
| + | |||
| + | < | ||
| + | config globals ' | ||
| + | | ||
| + | </ | ||
| + | |||
| + | This will tell the keepalived service to use the configuration file you wrote at / | ||
| + | |||
| + | ==== 4. Configure conntrackd ==== | ||
| + | |||
| + | This step is optional, keepalived will be failing over (successing over?) the ip address with or without conntrackd, however, as NAT relies on tracking connection state in a (network address) table that links external ip:port with internal ip:port (per given protocol, tcp or udp), connections might be broken on failover to backup openwrt instance. New connections (such as application level reconnects) will work just fine. | ||
| + | This is because the backup instance will not know who to send outgoing packets to. | ||
| + | |||
| + | Below is a simple config file for conntrackd. It would be advisable to navigate to / | ||
| + | |||
| + | < | ||
| + | |||
| + | Sync { | ||
| + | Mode FTFW { | ||
| + | DisableExternalCache Off | ||
| + | CommitTimeout 1800 | ||
| + | PurgeTimeout 5 | ||
| + | } | ||
| + | |||
| + | UDP { | ||
| + | IPv4_address "ip addr of host router" | ||
| + | IPv4_Destination_Address "ip addr of partner router" | ||
| + | Port 3780 | ||
| + | Interface eth* | ||
| + | SndSocketBuffer 1249280 | ||
| + | RcvSocketBuffer 1249280 | ||
| + | Checksum on | ||
| + | } | ||
| + | } | ||
| + | |||
| + | General { | ||
| + | Nice -20 | ||
| + | HashSize 32768 | ||
| + | HashLimit 131072 | ||
| + | LogFile on | ||
| + | Syslog on | ||
| + | LockFile / | ||
| + | UNIX { | ||
| + | Path / | ||
| + | Backlog 20 | ||
| + | } | ||
| + | NetlinkBufferSize 2097152 | ||
| + | NetlinkBufferSizeMaxGrowth 8388608 | ||
| + | Filter From Userspace { | ||
| + | Protocol Accept { | ||
| + | TCP | ||
| + | UDP | ||
| + | ICMP # This requires a Linux kernel >= 2.6.31 | ||
| + | } | ||
| + | Address Ignore { | ||
| + | IPv4_address 127.0.0.1 # loopback | ||
| + | } | ||
| + | } | ||
| + | } | ||
| + | |||
| + | |||
| + | </ | ||
| + | |||
| + | Run simple commands to verify functionality | ||
| + | |||
| + | < | ||
| + | |||
| + | Summary of connected devices: | ||
| + | |||
| + | conntrackd -s | ||
| + | |||
| + | </ | ||
| + | |||
| + | < | ||
| + | |||
| + | Resync nodes: | ||
| + | |||
| + | conntrackd -n | ||
| + | |||
| + | </ | ||
| + | |||
| + | ==== 5. Configure dhcp ==== | ||
| + | |||
| + | You'll want DHCP (dnsmasq) to serve 192.168.0.4 (vip address) to hosts on the lan, both as their gateway and DNS. | ||
| + | Here's an excerpt from ''/ | ||
| + | < | ||
| + | ... | ||
| + | config dhcp ' | ||
| + | ... | ||
| + | option force ' | ||
| + | list dhcp_option ' | ||
| + | list dhcp_option ' | ||
| + | ... | ||
| + | </ | ||
| + | option force ' | ||
| + | dhcp_option 3 is gateway, dhcp_option 6 is DNS. | ||
| + | |||
| + | Now we need to configure synchronization of the dhcp leases. Both devices will have a dhcp server and both will assign dynamic IPs to clients. But each will only update its own dhcp lease list. | ||
| + | |||
| + | Dnsmasq stores current leases in a text file called **/ | ||
| + | |||
| + | This is what it looks like on my OpenWrt router VM | ||
| + | < | ||
| + | root@VM-router: | ||
| + | 1633703346 00: | ||
| + | 1633703352 c4: | ||
| + | 1633703161 c0: | ||
| + | 1633703141 e8: | ||
| + | </ | ||
| + | The first number is a timestamp (seconds since Unix " | ||
| + | |||
| + | So we add a simple and dumb script that just merges the files on both devices every X time, and it assumes that dnsmasq will automatically drop the entries when their lease is up. | ||
| + | |||
| + | We must do the following on both routers. | ||
| + | |||
| + | Import the public SSH key of the router 1 in router 2 (and the reverse) so they can scp to each other without writing the password | ||
| + | this to read the current public key [[docs: | ||
| + | and this to write the key [[docs: | ||
| + | |||
| + | Then copy the following script to **/ | ||
| + | |||
| + | < | ||
| + | #!/bin/sh | ||
| + | #syncs contents of dnsmasq dhcp leases | ||
| + | |||
| + | other_router=192.168.11.254 | ||
| + | |||
| + | scp root@$other_router:/ | ||
| + | |||
| + | cat / | ||
| + | |||
| + | mv / | ||
| + | </ | ||
| + | |||
| + | then make it executable | ||
| + | < | ||
| + | chmod u+x / | ||
| + | </ | ||
| + | Then add a scheduled task to execute this script every minute and enable cron (scheduled tasks) service. (can be done from luci as well [[docs: | ||
| + | |||
| + | < | ||
| + | echo '*/1 * * * * / | ||
| + | echo ' | ||
| + | service cron start | ||
| + | </ | ||
| + | |||
| + | ==== 6. Sysupgrade backup add dirs ==== | ||
| + | |||
| + | Add the following directories to ''/ | ||
| + | < | ||
| + | ... | ||
| + | / | ||
| + | / | ||
| + | / | ||
| + | </ | ||
| + | |||
| + | ===== Testing and verification ==== | ||
| + | |||
| + | TODO(risk): restarting keepalived with logread -f open, pulling cables with ssh / telnet / http sessions open, forcing dhcp renewal with tcpdump running, ensure | ||