Building an ESXi White Box

This is the story of my migration from expensive to run server rack equipment, to a cheap both in build and to run, VMware ESXi white box.

Let’s start with a quick run-down of my equipment to date. First off, this was all free. So as you can imagine, the specs aren’t great but that didn’t matter because they were good enough to learn on. I decided to make this move for a few reasons, first and foremost being that I wanted a host I could leave running 24/7 that wasn’t too loud and didn’t jump my bill up $100 when doing so. I also wanted an excuse to build a new computer to play with. I set a strict budget of $500, that I did wind up going $15 over but I’ll discuss that later.

Design Considerations

Since I was building this with only $500, I knew I wasn’t going to get everything I wanted or all top-of-the-line equipment. I had to decide what was important to the functionality of this device, which broke down for me to the following:

I wanted full dual parity of all data on this host and, if possible, use it as a back up site for the rest of my network, which is around 3 TB. I want the system to be running 24/7 without breaking the bank. I defined “breaking the bank” as being $25 a month or less in electricity costs. (I did not factor in heating and cooling.) The new system needs to be able to handle my current infrastructure in addition to reasonable future improvements. I also want to start running an OS-based firewall system. I researched the topic and specs needed for the various choices and finally decided on pfSense with Snort. This will allow me to start playing with a NIPS. I planned to have the domain controller (DC) and WSUS running 24/7 so I can integrate my desktop PC and all other network devices into my local domain. This will help further increase the security posture of my network. While I was at it, I started creating VLANs to separate out traffic and isolate my publicly facing systems.

Since this build is partly to make myself more familiar with ESXi, I want something that VMware has approved as compatible. This increases reliability and more closely represents an enterprise environment, making the skills and knowledge I pick up more applicable to my career.

With all that in mind, lets over my existing infrastructure so we have a baseline for the equipment needs. My pre-migration setup has the following systems:

Okama Game Server (OGS)

Dell 2650
Specs:
2x Intel Xeon x86 Dual core at 2.6 Ghz
7 GB of ECC DDR – 400
3x 10k 34GB SCSI HDD

OS: CentOS 6.7
Software:

  • Apache
  • MySQL
  • Teamspeak3
  • XRDP
  • Massive Network Game Object Server
  • Battle.net v1 Emulator

None of these components really take a lot of computational power, but the Network Game Object Server (NGOS) does like to chew up whatever RAM you allow it to use. I had to limit its use to 4 GB and MySQL to 2 GB. The only time I really use a lot of CPU is when I recompile the game server with updates.

R9

Dell 6950
Specs:
4x AMD Opteron 8214
16 GB of ECC DDR2 – 800
3x 146GB 15K SAS HDD
2x 73GB 15K SAS HDD

OS: VMware ESXi 5.5
VMs:

  • Win2k8 R2 – Domain Controller
  • Win2k8 R2 – WSUS
  • Win2k8 R2 – Game Host – TF2 Server, Halo CE Server, Minecraft Servers (Lots of CPU and RAM)
  • Win2k8 R2 – Code Repo/Tool server
  • Win7 -Test Client 1
  • Win7 – Test Client 2
  • Linux – Test Client 3
  • Linux – Test Client

Since R9 pulls more power idle than OGS under load, I usually didn’t have R9 booted up. The game host VM was usually used at LAN parties that I run for my friends and I, and has a server of whatever games we wanted to play and when that VM was running, all the test clients had to be shutdown to free up RAM.

Analysis

In conclusion, I usually need more RAM than anything else. I need some computational power but it is usually in spikes and not all at once. After spending about a week looking at what I needed to replace and considering projects that I wanted to do in the future, I calculated I could make do with a minimum of 24 GBs including ESXi overhead. I also decided that I should run either RAID 6 or RAID 10 for the sake of parity. I need at least 3 Gigabit Ethernet ports: 1 for WAN, 1 for LAN and 1 for Management. As for processing power, many of the applications I run are single threaded and would not benefit from multiple cores but would benefit from higher clock speeds. With this information and analysis, I wanted a minimum processor spec of quad-core hyper threaded at 3.2 Ghz.

Finding the Parts

As I stated in the design phase, all the parts need to be in VMware’s approved list but for consumer motherboards, that is just not going to happen. VMware has no reason or motivation to verify that a Gigabyte, ASU, or ePox motherboard is compatible so you can either buy something on the list at a commercial server cost, or you have to get creative.

The first thing I did was pulled up NewEgg.com. I have been using them since 2005 because they have the best search systems for computer parts in the market. Unfortunately, their prices the last few years haven’t been as competitive as other sites. But the search system is still so good, it’s still where I start when I do any computer build.

I quickly found that there was no way I could build a whitebox this way within budget. Enterprise-grade motherboards just aren’t within a $500 budget. I started looking for an alternative to my motherboard issue. I knew it would be the most difficult part, which is why I started with it. I started searching through sites like eBay and local craigslist postings. I was just not finding anything I liked or fit my budget.

I decided to hit the internet, looking for others who had put together budget whitebox solutions. I found another tinkerer who worked on a project similar to mine and figured out how to match the consume hardware chipsets to their respective enterprise-level hardware chipsets.

With a greater understand of the motherboard issues, I headed back to NewEgg.com There was a lot of research on a lot of motherboards to identify which one’s had the correct chipset, turns out, most mid-grade motherboards qualify for the white list. The more you know the more you realize how similar these devices are.

Once I felt there was a good market of motherboards with compatible chipsets, I moved to processors. Building computers for as long as I have, I know that if you are looking at price-per-performance ratios, AMD hasn’t really been beaten since the early 2000’s. I also know that if you want the best performance or latest features and support, you go with Intel. Looking at Intel, I quickly found what I expected; Intel processors that met my criteria, would also blow my budget. Sandy Bridge processors had the power I need but their high-end TDP would not fit into my low running cost criteria and were too expensive to purchase new in the first place without buying used, which was not something I wanted to do.

In the end, I went with AMD because of the lower TCO and higher reliability inherent in new processors vs refurbished processors or used processors. The only part I bought used was the RAID Controller.

Final list of components:

  • AMD FX 8350 with 8 half cores (AMD’s version of hyper threading) @ 4 GhzAsrock 970 Extreme3
  • 32GB of DDR3 1866 running at 1600
  • Dell PERC H700 RAID Controller
  • Intel 82571EB Quad-port Ethernet Controller
  • 4x 3TB SATAv3 HDD
  • 8 GB USB storage for ESXi OS
  • 650 Watt Platinum Certified PSU (High Efficiancy)
  • An Old Case
  • An Old Video Card
  • Unused computer fans

In the end, spent about a week researching each component, trolling through sites like eBay (mostly eBay) and computer surplus warehouses looking for something that would meet my criteria. One thing that I did mess up, and the reason I went over budget, I bought the wrong version of the RAID Controller. I got a Dell PERC H600 that only supports a max hard drive size of 2 TB so I had to fix the mistake and purchase the H700 instead.

Final thoughts

I spent 8 weeks researching, building, waiting, installing and moving my entire infrastructure on to this one box. It has been running 24/7 since then for about 10 months with no issues other than user error. I have my Firewall and NIPS in place powered by pfSense with a hostname Shields. I have my VLANs that are managed almost entirely inside the box, inter-VLAN routing is all handled by Shields and the Shields VM resides on my new ESXi 6.0 host, which I lovingly refer to as Holodeck1.

I learned a lot along the way about RAID Controllers and the importance of documentation and design. Because of this project, I have created a lot of documentation with details about my network, it’s current design and layout, how it will expand in the future and I have referred to my documentation in the process of adding new servers.

I ended up with a RAID 5 configuration, giving me about 9 TB of data storage. It has worked well but with further reading on this topic, I am looking to setup RAID-Z 2 and using a 10 GbE with iSCSI to provide the datastores. This will make things easier when I move to a cluster setup. I also am looking into getting an SSD for Holodeck1 for caching purposes. I haven’t noticed any real issues except when booting 4 or more VM’s at once.

There has been one down side that came about as a failure on my part. I am still having issue with not enough RAM. I find myself keeping test clients and other systems I don’t use frequently shutdown to free up RAM. This is something I would have liked to have addressed when I built this system. I was somewhat constrained by budget and if I had double the RAM, I would probably be using it. I have looked into this and unfortunately, I would have to buy 16 GB sticks that my motherboard supports but would cost around $450 for 4 sticks of 16GB @ 1600Mhz. I can almost build another Host for that cost. So I haven’t made the jump and am considering the benefits of the addition RAM vs the benefits of another Host. More hardware is never a bad thing!