Unlocking Seamless Connectivity with Tailscale

I may bit a bit late to the party, but I recently found out about Tailscale. I was looking for a remote access solution for my lab, which is behind my ISPs Carrier Grade NAT (CGNAT). This means I don’t have a publicly accessible IP address, so I really need a solution that can help overcome that configuration. I had heard of Tailscale from a colleague and figured I’d give it a spin. In a minimal amount of time I went from no remote access, to having full remote access to my entire network, and then created a site-to-site tunnel to another colleagues lab. This post will outline the various steps along the way.

Step 1: Remote Access

To get started, I wanted to install the Tailscale client on a system in my lab and on a laptop connected to a different network. In this basic configuration, I assumed I could treat the lab system as a jump box, connect to it with SSH or RDP and be able to reach other devices on my network from there. This was super easy… just install the OS specific application on each system and bam! This was the easy button for setting up remote access, it just worked.

Step 2: Subnet Router

While reading the documentation on Tailscale, I noticed they had a feature called a Subnet Router. This is a service that was designed for devices where the Tailscale client couldn’t be installed, like random network printers, and allowed those devices to be reached from devices on the Tailscale tailnet. I deployed an Ubuntu 20.04 VM to act as my subnet router. The install was straightforward, on the Linux VM I just needed to run a few commands from the console:

curl -fsSL https://tailscale.com/install.sh | sh

echo 'net.ipv4.ip_forward = 1' | sudo tee -a /etc/sysctl.d/99-tailscale.conf
echo 'net.ipv6.conf.all.forwarding = 1' | sudo tee -a /etc/sysctl.d/99-tailscale.conf
sudo sysctl -p /etc/sysctl.d/99-tailscale.conf

sudo tailscale up --advertise-routes=192.168.0.0/17 --accept-routes=true --snat-subnet-routes=false

The Tailscale documentation at https://tailscale.com/kb/1019/subnets was also very helpful in describing these commands/steps.

Step 3: Site-to-Site connectivity

While setting up the subnet router, I noticed the docs had some details on site-to-site networking (https://tailscale.com/kb/1214/site-to-site). This looked very interesting, as I had previously wanted to setup cross site networking to demo VMware Site Recovery Manager. The only caveat I saw in the documentation was:

This scenario will not work on subnets with overlapping CIDR ranges

I pinged a colleague of mine, to see if they would be interested in peering networks, and if so, what IP addresses they used in their lab. Turns out we had some minor overlapping segments, but luckily the segments on my side were internal only/non-routed networks (dedicated to storage & vmotion). I made a few changes to what subnets I advertised on my side, added a statement to adjust MTU, and added a couple route statements within the physical network as described in the Tailscale docs. The updated lines on my subnet router look like this:

iptables -t mangle -A FORWARD -i tailscale0 -o eth0 -p tcp -m tcp --tcp-flags SYN,RST SYN -j TCPMSS --clamp-mss-to-pmtu

sudo tailscale up --advertise-routes=192.168.10.0/24,192.168.32.0/20,192.168.127.0/24 --accept-routes=true --snat-subnet-routes=false

My colleague also deployed a subnet router with a very similar configuration and then also added some routes to his physical network.

curl -fsSL https://tailscale.com/install.sh | sh

echo 'net.ipv4.ip_forward = 1' | sudo tee -a /etc/sysctl.d/99-tailscale.conf
echo 'net.ipv6.conf.all.forwarding = 1' | sudo tee -a /etc/sysctl.d/99-tailscale.conf
sudo sysctl -p /etc/sysctl.d/99-tailscale.conf

iptables -t mangle -A FORWARD -i tailscale0 -o eth0 -p tcp -m tcp --tcp-flags SYN,RST SYN -j TCPMSS --clamp-mss-to-pmtu

tailscale up --advertise-routes=192.168.55.0/24,192.168.60.0/24 --accept-routes=true --snat-subnet-routes=false

With these configurations in place, we could each ping devices on each others network. I’ve created a diagram which includes the routes created in the physical network as a reference.

As a test, from a device on my lab, I did a trace route to a device on my collegues network.

> tracert 192.168.60.10

Tracing route to 192.168.60.10 over a maximum of 30 hops

  1     1 ms     1 ms     1 ms  192.168.127.252
  2    <1 ms    <1 ms    <1 ms  192.168.40.31
  3    30 ms    24 ms    26 ms  100.105.105.105
  4    25 ms    24 ms    27 ms  192.168.60.10

Trace complete.

As we can see, traffic goes to my labs layer 3 switch, is sent to the Tailscale Subnet Router, which sends it to a 100.105.x.x address (which is on the tailnet), and then reaches the IP on the remote site. With IP connectivity established, the next steps were to make name resolution work. Since we both have Pihole DNS servers on our networks, this was accomplished by adding conditional forwarding on each of our Pihole servers. With conditional forwarding in place, we are able to query our own DNS servers for the others lab domain names, which in turn will query the correct server. In the Tailscale admin console, DNS is configured for Split DNS with similar configuration, my lab requests come to my DNS servers, their domains going to their DNS servers.

What is great, is that with this third option configured, we not only have site-to-site connectivity, but can reach both networks even while remote, thanks to the Tailscale client installed on mobile device. For example, while connected to a mobile hotspot, not connected directly to either lab, I’m able to trace route to devices on each network.

> tracert 192.168.60.10
Tracing route to 192.168.60.10 over a maximum of 30 hops
  1   486 ms    73 ms    76 ms  agb-vpnrtr01.tail1234.ts.net. [100.105.105.105]
  2   239 ms    80 ms    84 ms  192.168.60.10
Trace complete.


> tracert 192.168.127.30
Tracing route to CORE-CONTROL-21 [192.168.127.30]
over a maximum of 30 hops:
  1    56 ms    11 ms     8 ms  net-vpnrtr-01.tail1234.ts.net. [100.123.123.123]
  2    24 ms    37 ms    55 ms  192.168.40.1
  3    61 ms     9 ms    13 ms  CORE-CONTROL-21 [192.168.127.30]
Trace complete.

Summary

I was surprised how easy it was to setup Tailscale, even in a fairly complex network with overlapping address space. The documentation was easy to follow, the setup was quick, and performance has been very good. I set out to solve one specific problem, and in short order solved that problem — and expanded the lab to an entirely different site along the way.

Posted in Lab Infrastructure, Virtualization | Leave a comment

Jenkins & Java Upgrade on Windows

One service that I use quite frequently in my lab is Jenkins. I have this running on a Windows VM and have a variety of tasks that run from there, some on a scheduled and others consumed by web hooks. For example, I have a job that Aria Automation calls to create records on my Windows DNS server for Linux/nested ESXi builds.

I recently was looking in the Manage Jenkins section and noticed two issues, one that Jenkins needed an upgrade and another that stated You are running Jenkins on Java 11, support for which will end on or after Sep 30, 2024.

Updating Jenkins was super easy. I created a snapshot of the VM, in case things went sideways, and then pushed the button to update Jenkins from within the web console. This took care of itself and when Jenkins restarted it was current. I let this sit a few days, run a variety of tests, and when I was happy that everything was stable I deleted the VM snapshot.

The second task was to update Java, and I decided to do this a few days after the above Jenkins update. That way if something went wrong it would be easier to know if it were a Jenkins or Java issue. I’m glad I did, as I ran into two issues when updating Java, described below.

To start the upgrade process, I downloaded the latest version of Java 17 JDK from https://adoptium.net/temurin/releases/?os=windows&arch=x64&version=17. I also backed up my D:\Program Files\Jenkins and C:\Users\svc-jenkins\AppData\Local\Jenkins folders. I had done this prior to updating Jenkins and decided it would be wise to do again for the Java update. I then took a snapshot of the virtual machine as one last restore point.

With backups in place, I stopped the Jenkins service (from services.msc on the Windows VM), and uninstalled Java JDK 11 from add/remove programs. This is the only application using Java, so I wasn’t worried about other application dependencies. I then installed JDK 17 into D:\Program Files\Eclipse Adoptium\jdk-17.0.9.9-hotspot, selecting that I wanted to add an entry to my PATH statement, associate JAR files, and set the JAVA_HOME variable.

After the installation completed, I attempted to start the Jenkins service, but it stopped immediately. I then decided to reboot, as I had changed system environment variables and wanted to make sure those were in effect, but the service did not start on boot. Since I knew that the path to java.exe had changed, I went looking for a Jenkins configuration that pointed at the old file system path — I found such an entry in D:\Program Files\Jenkins\jenkins.xml and updated the <executable> location. After doing so the service started successfully, however I was only able to access it locally from the server console and not on a remote machine.

I checked the Windows firewall and found an inbound rule for Jenkins that was restricted to only one program — the previous path to Jenkins. I updated the ‘this program’ value on the programs and services tab to D:\Program Files\Eclipse Adoptium\jdk-17.0.9.9-hotspot\bin\java.exe, which resolved the remote access issues.

Now my Jenkins & Java versions are up-to-date and everything is working as expected. Hopefully this article helps someone else who runs into issues with this upgrade.

Posted in Lab Infrastructure, Scripting | 1 Comment

Keeping pihole up to date with Aria Automation Config

I’ve recently begun keeping components of my lab up to date using Aria Automation Config. I’ve scheduled a daily job to inventory Linux packages that need updated and a weekly task to update Linux VMs and reboot if necessary. Both of these tasks leave a paper trail showing what updates were made, so I can refer back to them if needed.

I recently was checking the pihole admin interface and noticed some text at the bottom of the page that said ‘Update available!’ This is an easy process to complete, just SSH into the appliance and run pihole -up. However, since I’m keeping other systems up to date automatically, I wanted to add this service into the mix.

I debated on whether or not I should tack this process on to the end of the current OS update state file, or create a new state. I opted for option 2, but wrote the state in a way that it could run on any system and only run the commands if present. I created a new state file named /updates/pihole.sls with the following contents:

{%- if salt['file.file_exists']('/usr/local/bin/pihole') %}
Update-pihole:
  cmd.run:
    - name: /usr/local/bin/pihole updatePihole
{%- endif %}

This is a pretty basic state, it checks for the presence of the pihole script file, and if found, tries to run the updatePihole argument.

Before running the state on a test system, the footer looked like:

Pi-hole v5.17.2 FTL v5.23 Web Interface v5.20.2 · Update available!

The stdout of the minion return stated:
[i] Update local cache of available packages…\r\u001b[K [✓] Update local cache of available packages\n [i] Existing PHP installation detected : PHP version 7.3.31-1~deb10u5\n [i] Checking for git…\r\u001b[K [✓] Checking for git\n [i] Checking for iproute2…\r\u001b[K [✓] Checking for iproute2\n [i] Checking for dialog…\r\u001b[K [✓] Checking for dialog\n [i] Checking for ca-certificates…\r\u001b[K [✓] Checking for ca-certificates\n\n [i] Checking for updates…\n [i] Pi-hole Core:\tup to date\n [i] Web Interface:\tupdate available\n [i] FTL:\t\tup to date\n\n [i] Pi-hole Web Admin files out of date, updating local repo.\n [i] Check for existing repository in /var/www/html/admin…\r\u001b[K [✓] Check for existing repository in /var/www/html/admin\n [i] Update repo in /var/www/html/admin…HEAD is now at be05b0f v5.21 (#2860)\n\r\u001b[K [✓] Update repo in /var/www/html/admin\n\n [i] If you had made any changes in '/var/www/html/admin/', they have been stashed using 'git stash'\n [i] Local version file information updated.

After the state.apply operation completed, refreshing the web interface the footer changed to:

Pi-hole v5.17.2 FTL v5.23 Web Interface v5.21

We can see that the web interface was updated from v5.20.2 to v5.21.

I created a job to apply this state file, then created two schedules to stagger the patching to different minions on different days. This was a pretty quick solution to keeping the pihole software up to date on a schedule, using the centralized scheduling & reporting of Aria Automation Config.

Posted in Lab Infrastructure, Scripting, Virtualization | Leave a comment

Keeping Linux up to date with Aria Automation Config — part 2

In a recent post (available here), we created a simple Aria Automation Config (formerly SaltStack Config) state file which reported on and applied available Linux OS updates. In this post we’ll revisit a minor change to this state file.

After creating the previous state which applies available updates every Saturday morning, I noticed sometimes logging into Linux VMs would return a message of *** System restart required ***. I found that this text was coming from the file /var/run/reboot-required which was created when a package required a system restart.

I’ve modified the state file applied by my scheduled job to accommodate this reboot as shown below:

update_pkg:
{% if grains['os'] == 'VMware Photon OS' %}
  pkg.uptodate:
    - refresh: True
{% else %}
  pkg.uptodate:
    - refresh: True
    - dist_upgrade: True
{% endif %}

{# Check if the system requires a reboot, and if so schedule it to happen in the next 15 minutes, randomize to prevent boot storm #}
{%- if salt['file.file_exists']('/var/run/reboot-required') %}
Reboot-if-needed:
  module.run:
    - name: system.reboot
    - tgt: {{ grains.id }}
    - at_time: {{ range(1,15) | random }}
{%- endif %}

In this version, we continue to use pkg.uptodate to apply updates, but after doing so we check for the presence of /var/run/reboot-required. If found, we schedule a system reboot to happen at least one minute in the future (to give the salt-minion time to report back). In this case we are randomizing the time of these reboots to minimize a boot storm, with a maximum future time of 15 minutes.

Posted in Lab Infrastructure, Scripting, Virtualization | Leave a comment

vSphere Custom Images & How to Compare Image Profiles

Occasionally there is a need to create a custom ESXi image as either an installable ISO or a depot/zip bundle. For example, when setting up a new host, you may wish to automatically include specific drivers for a particular network card or storage adapter. There are a variety of ways to do this.

PowerCLI Image Builder Cmdlets

PowerCLI has been able to create custom images for many years. In this example, I plan to combine the ESXi 8.0 Update 2 image from VMware with the HPE Server addon (from https://www.hpe.com/us/en/servers/hpe-esxi.html). This specific image combination is already available directly from HPE, but the steps to manually combine the bundles should be the same if the combination is not available, for example if we wanted to include 8.0u2x (where x is a lettered patch release).

The first step is to get our two files, the stock VMware image (VMware-ESXi-8.0U2-22380479-depot.zip) and the HPE addon (HPE-802.0.0.11.5.0.6-Oct2023-Addon-depot.zip). We will add both of these depots to a PowerCLI session using the following:

Add-EsxSoftwareDepot -DepotUrl '.\VMware-ESXi-8.0U2-22380479-depot.zip','.\HPE-802.0.0.11.5.0.6-Oct2023-Addon-depot.zip'

When these depots are added, the Depot Url will appear onscreen, its in the format zip:<localpath>depot.zip?index.xml). We’ll want to note the path listed for the HPE addon as we will use that again shortly. With these depots added we can now query for image profiles. Only the ESXi image will have profiles, but there are likely multiple versions and we want to see what is available.

Get-EsxImageProfile

Name                           Vendor          Last Modified   Acceptance Level
----                           ------          -------------   ----------------
ESXi-8.0U2-22380479-no-tools   VMware, Inc.    9/4/2023 10:... PartnerSupported
ESXi-8.0U2-22380479-standard   VMware, Inc.    9/21/2023 12... PartnerSupported

As mentioned, multiple versions are available, one has VMware Tools (standard) and the other does not (no-tools). We will make a copy of the standard profile

$newProfile = New-EsxImageProfile -CloneProfile 'ESXi-8.0U2-22380479-standard' -Name 'ESXi-8.0U2-22380479_HPE-Oct2023' -Vendor 'HPE'

We will now add all of the HPE addons to the copy of our image profile. This is where we’ll need that local depot path mentioned above.

Add-EsxSoftwarePackage -ImageProfile $newProfile -SoftwarePackage (Get-EsxSoftwarePackage -SoftwareDepot zip:D:\tmp\custom-image\HPE-802.0.0.11.5.0.6-Oct2023-Addon-depot.zip?index.xml)

In this example we added all of the packages from the depot, but we could have included only a subset of specific VIBs by name if desired. We could have also included other VIBs from different depots (for example, from a compute vendor AND other VIBs from a storage vendor).

With our custom image created, combining the VMware and HPE bits, we can now export as ISO or Bundle (ZIP). In this example I’ll export both. The Bundle (ZIP) will be used for some comparisons later.

Export-EsxImageProfile -ImageProfile $newProfile -ExportToIso -FilePath 'PowerCLI_ESXi-8.0U2-22380479_HPE-Oct2023.iso'
Export-EsxImageProfile -ImageProfile $newProfile -ExportToBundle -FilePath 'PowerCLI_ESXi-8.0U2-22380479_HPE-Oct2023.zip'

vCenter Image Managed Clusters

Starting in vSphere 7, there was an ability to manage hosts with a single image that can create a custom image in the web interface. The screenshot below is from the workflow that comes up when creating a new cluster, we just need to pick the values from the provided drop down lists.

Similar to the above PowerCLI example, we are going to create an image that combines the ESXi 8.0 U2 build with a specific HPE Vendor Add-on (802.0.0.11.5.0-6). Once the cluster creation is complete, the image can be exported from the UI. Select the elipises > Export > and select JSON (for a file showing the selections made), ISO (for an image that can be used for installation), or ZIP (for updating an existing installation). I’m going to download a ZIP to be used in the next step. This results in a file named OFFLINE_BUNDLE_52d9502b-7076-7cb2-49b9-cbee13c57f0a.zip.

Comparing Images

The above two processes attempted to create similar images with identical components (the same ESXi image & HPE addon). We may have a need to compare images like these… either by comparing the depot files or the depot file to a running ESXi host. This section will focus on those comarisons.

Since we have two ZIP archive files, the first inclination might be to simply compare the file size or MD5 checksum. However, if we look at the file size (lenght property below), we’ll notice that the files differ slightly in size. This difference can be explained by a number of things, such as the different strings used for various names.

Get-ChildItem PowerCLI*.zip,offline*.zip | Select-Object Name, Length

Name                                                       Length
----                                                       ------
PowerCLI_ESXi-8.0U2-22380479_HPE-Oct2023.zip            686582727
OFFLINE_BUNDLE_52d9502b-7076-7cb2-49b9-cbee13c57f0a.zip 686552303

What we really need to do is compare the VIB contents of these bundles to see if any files are missing or versions inconsistent. This can be easily completed in PowerCLI. The first step is to import these depots into our session, we can do that as follows:

Add-EsxSoftwareDepot PowerCLI_ESXi-8.0U2-22380479_HPE-Oct2023.zip,OFFLINE_BUNDLE_52d9502b-7076-7cb2-49b9-cbee13c57f0a.zip

With both bundles imported, we can check and see what image profiles we have available. We should see two — one from Lifecycle Manager and the other using the name specified in our PowerCLI example. In this step we’ll create a variable for each profile to be used later

Get-EsxImageProfile

Name                           Vendor          Last Modified   Acceptance Level
----                           ------          -------------   ----------------
VMware Lifecycle Manager Ge... VMware, Inc.    11/20/2023 4... PartnerSupported
ESXi-8.0U2-22380479_HPE-Oct... HPE             11/20/2023 5... PartnerSupported


$ipLCM = Get-EsxImageProfile -Name 'VMware Lifecycle Manager*'
$ipPCLI = Get-EsxImageProfile -Name 'ESXi-8.0U2-2*'

If we dig into the image profiles, we’ll find that each as a VibList property that contains the included VIBs. Digging deeper, we’ll see that each VIB has a Guid that combines the VIB name and version (ex: $ipLCM.VibList.Guid will return the list for one profile; a sample row would look like VMware_bootbank_esx-base_8.0.2-0.0.22380479). Now that we have a field that has details on the various VIBs, we can have PowerShell compare them. The first command below will likely return nothing, the second should return all VIBs from our bundle:

Compare-Object $ipLCM.VibList.Guid $ipPCLI.VibList.Guid

Compare-Object $ipLCM.VibList.Guid $ipPCLI.VibList.Guid -IncludeEqual

With the above, we can confirm that our two bundles (ZIP files) have the same contents.

Another question that I’ve heard is can we confirm that a running ESXi host matches this bundle or if any changes are required? One option to do this is with esxcli software profile update --dry-run (documented here: https://docs.vmware.com/en/VMware-vSphere/8.0/vsphere-esxi-upgrade/GUID-8F2DE2DB-5C14-4DCE-A1EB-1B08ACBC0781.html). However, that typically requires the new bundle to be copied to the host. Since we already have this bundle locally, and imported into a PowerCLI session, we can ask the ESXi host for a list of VIBs and do a comparison locally.

$esxcliVibs = (Get-EsxCli -VMHost 'test-vesx-71' -V2).software.vib.list.invoke()
Compare-Object $ipLCM.VibList.Guid $esxcliVibs.ID

The above example returns a list of VIBs from an ESXi host, then compares the ID value to the Guid from the imported image. If any discrepancies are identified, they’ll be listed. As with the above comparison of the two image files, we can add an -IncludeEqual switch to ensure that the command is actually returning (as it will return all of the VIBs instead of nothing).

Posted in Scripting, Virtualization | Leave a comment