When wondering how to pass the time I decided to optimise my website. I took initial steps such as minimising CSS and JS however incase you haven't seen, my base site actually is actually very small (and in reality serves no purpose).
At the time my website was powered by 2 loadbalancers and 3 web servers located around the world. If you was curious, the setup was HAProxy following this tutorial. Also, at this time, I used OpenBSD and had no idea how to use PF so my HTTP connections were left exposed at the origin...
This worked however, it wasn't as satisfying as saying "I deliver content from a local edge node". Not only that, but with big companies like Cloudflare powering my websites since 2016-2019 the idea of creating my own "CDN" really appealed to me from a privacy standpoint but also a learning standpoint.
As it stood my web payload was already tiny and in practicallity the only issue for loading times was DNS requests and web server location. Infact, Google PageSpeed insights suggested it was practically perfect (100%). But I knew it wasn't optimal for people viewing from Asia or America.
So the next step logically to me was to get some new PoP's. I am very thankful that LET is well known for there Black Friday deals! The aftermath of that I bought 4 VPS servers, 3 in America and 1 in Singapore. Shoutout to Racknerd and Cam for some awesome deals. At the time of writing this 2 of the America servers are not used (Infact 1 has not been delivered, thanks Virmach).
What servers do I currently use?
- London, United Kingdom
- Gravelines, France
- Beauharnois, Canada
- Frankfurt, Germany
- Singapore, Singapore
- Falkenstein, Germany
- Helsinki, Finland
- Ashburn, Virginia, United States
- Droten, The Netherlands (coming soon)
The next step was to figure out how I could utilise these. I actually considered GeoDNS from the very start however I needed to further my research before deciding it would be the route I took.
My criteria for a self-hosted CDN was the following:
- Cannot rely on external providers such as AWS (No thanks Route53)
- Had to be configured by myself from origin server to end server
- Had to be scalable. More on this later
- Some sort of learning involved
Posts such as this one from NGINX really helped to inspire me, I highly suggest you watch the presentation. Of course they are at a much larger scale, but the principle was the same; serve content faster. Although I quickly found that Google was not that useful as it proposed alot of advertisements and/or solutions outside of my reach.
The big one that, to my understanding, is used in Industry is BGP anycast. Of course, this is completely out of reach and not possible for a University Student who doesn't have thousands of pounds available. Also, none of my providers allow BGP announcements either.
After going through multiple pages of search queries I eventually settled on the GeoDNS solution. I came accross a gold mine of a page, GeoIP.site. Who would have thought someone wanted to achieve the same goal?
How does it work?
The author of this page explains that they wanted a in-house soltion to move away from UltraDNS and that this was the solution they utilised. While I encourage you to read the original post, to abbreviate it, the scripts provided will pull from Maxmind and other GeoLocation databases and format them to be ACL compatible with BIND.
With these IP ranges matched to countries, it allows for custom DNS responses based on the origin IP. Honestly, such a simple solution amazed me. Of course this isn't a foolproof method and there are some flaws in it but, this will be talked about later.
How do my edge servers provide content?
NGINX. Reverse proxying is a beautiful technology which allows me to cache my website locally on each edge node once they have fetched before. I did look into varnish however after researching the differences between this approach I felt like NGINX was the most appropriate.
I unfortauntely didn't note down any previous measurements for speed. However, I can provide some figures of how this preforms as of now.
Pingdom - London
Pingdom - Japan
Pingdom - United States (Washington)
WebPageTest - United States (Salt Lake City)
Uptrends - Netherlands
GTmetrix - Canada
Site 24x7 - Hong Kong
Downsides to this approach
DNS isn't a perfect technology and I have noticed that some lookups do have exceedingly high times (6+ seconds). While this is not true accross the board (as seen in above screneshots) it has happened at least a couple times. I believe this is down to my configuration of named and it's ACL pool.
Of course depending on the DNS resolver that the client is using it can seriously affect the results of this. For example, if someone uses Google or Cloudflare it might has potential to return results based on these resolvers last saved query. However, since the majority of people will utilise ISP nameservers this should not be considered too much.
Another critique of this system is that it does not take advantage of city based responses. For example, a US-EAST client connecting to my US-WEST server would have significantly higher loading times compared to if I had a server in US-EAST
Finally this approach is not great resolution when downtime occurs. I could create a custom script which updates my nameservers to push clients to another sevrer, however, this does not solve the issue of DNS caching. This is where a solution such as a floating IP address could come into use but I do not have any services capable of this.
As a personal issue with this approach, I do not have much content to actually put this system through work. This blog may act as one but it is hardly something to break a sweat over. This is where I expect load balancers in combination with this to be required in ensuring uptime.
In short a Geo aware DNS resolver can absolutely help to resolve page speed times for content, especially if you have lots of PoP's. However, it does also have flaws. I would recommend it in combination with other technologies such as Load Balancing and Anycast.
As a fun project though? Learning basic technologies? Sure. It kept me entertained for a couple of days. However, with technologies like Kubernetes around I think this could have been optimised better on my edge nodes.