My server has been in use for a few months, and has been performing admirably, running the usual self-hosted suspects; NextCloud, PleX, Unifi etc, as well as facilitating my thesis for my masters in educational sociology.
This is going great — but there was one nagging annoyance: the fan curve had a bit of hysteresis in it that ramped the fans up; then down; then up;then down. My server closet is pretty much in the middle of the house as well, so noise becomes annoying fast. And, it is a literal closet; so the thermal solution isn’t great. Oh, to have a basement or outhouse…
There is a fix, though: Changing the hysteresis profile associated with thermal sensor 1 in the DL380e.
this is achieved by running
fan t 0 hyst 2
from the iLO prompt.
Hysteresis profile 2 was selected on the basis of being the profile of all other sensors, except the PSU sensors. The front ambient sensors ran hysteresis profile 3 previously.
Fan utilization is now at a consistent 9.4-20-13-16-20-16 percent, which is decently quiet. The tradeoff, however, is higher component and ambient temperatures (all values in Celsius, at mostly idle):
Sensor Now Before ========================== Ambient 28-33 26-32 Exhaust 47-50 44-46 P420 70-75 64-66 CPU 50-55 40-45 Disks 44-52 42-47
This, I’m willing to accept. YMMV. It is, however, imperative to actively cool the P420 - that’s a HOT chip.
Now, I want to investigate a R210ii fan swap. The higher ambient temps brought on by summer and, y’know, using this old thing, has set those fans (slightly) on fire. Those 40mm fans make their presence known!
You can change the threshold values with:
fan t <n> caut <offset from original value> fan t <n> crit <offset from original caut>
But I haven’t had a chance or a need to mess around. There is also a lot of tunables per sensor regarding setpoints, gain values and somesuch. I get rather lost, rather quickly.
Last week, I had got my hands on a set of three Intel S3700 datacenter SSDs, and today was going to be the day I installed them.
The disk installation was straightforward enough. Legos and duct tape were involved. More on that some other time, perhaps.
So, I get disks in, I close up, and plug everything back in. And somehow I manage to kill the iLO management port along the way.
Well, the server still works. But now the iLO can’t be accessed over the network, which means my fan control scripts won’t work.
This greatly reduces the Everyone-In-The-House Acceptance Factor (EITHAF?), which is a problem, since I’ve actually, finally, started using the server for thesis word counting.
HPE has an utility for their servers called
hponcfg, which allows you to set iLO parameters directly.
I goofed around with rpms and missing drivers in unRAID trying to get a sign of life from the iLO, but to no avail. It was dead.
Well, the network port was. No link light from either switch or iLO port, across multiple cables. A monitor allowed me to see what was going on; which indicated the iLO hardware otherwise was fine.
So: I needed to set the iLO to run on the onboard NIC of my 380e.
grml is the grumpy sound sysadmins make when they can’t automate every task in front of them. It is also a very capable live CD, based on Debian.
Download grml, and get it on a bootable USB somehow. I used Etcher.
Boot off the grml USB.
I added the boot parameters
grml ssh=secret at the boot prompt to be able to ssh in.
hponcfg is available from HPE as part of the Management Component Pack.
To install it, first add the repository:
cat "deb http://downloads.linux.hpe.com/SDR/repo/mcp buster/current non-free" >> /etc/apt/sources.list.d/hp-mpc.list
and the GPG key:
curl http://downloads.linux.hpe.com/SDR/hpePublicKey2048_key1.pub | apt-key add -
pull a fresh list of packages, and install
apt-get update && apt-get install hponcfg
I really don’t like XML as a representation of data.
hponcfg consumes XML.
The following snippet tells the iLO to use the onboard NIC instead of the management port:
<!-- HPONCFG VERSION = "5.5.0" --> <!-- Generated 5/12/2020 23:15:55 --> <RIBCL VERSION="2.1"> <LOGIN USER_LOGIN="Administrator" PASSWORD="password"> <RIB_INFO MODE="write"> <MOD_NETWORK_SETTINGS> <SHARED_NETWORK_PORT VALUE="Y"/> </MOD_NETWORK_SETTINGS> </RIB_INFO> </LOGIN> </RIBCL>
Save it on your grml live environment as, say, lom.xml, and apply the configuration with `hponcfg -f lom.xml. After a bit of time, the iLO will reset, and iLO is running off the onboard NIC.
I really want these tools in base unRAID, or at least the drivers for them.
After a iLO reset, it had forgotten to run off the shared networking interface.
Running the iLO off the LOM means that the server itself cannot ssh in to the iLO.
In the unRAID subreddit I came across someone needing to set up a ssh key for accessing files remotely.
There is, however, a step zero to perform before this can work.
When unRAID boots, most of the filesystem is copied from the boot USB into RAM.
This means that any changes will be lost upon a reboot.
Thus, simply doing
mkdir /home/norseghost for user settings won’t work.
Persistent home is still possible, though.
Create a home directory somewhere on the array. My location is
/mnt/user/system/home. I’ve set the share system to be cache-prefer.
Install the User Scripts plugin. Create a new script - let’s call it
#!/bin/bash mount -o bind /mnt/user/system/home /home
This bind-mounts the newly created home dir on to system
Set it to run on array start. And run it now, while you’re at it.
Create another one -
#!/bin/bash umount /home
Set that one to run on array stop. This is to enable array shutdown without “device busy” errors.
Now unRAID has a persistent home. Create a home directory for any user that would log in remotely:
root@unraid# mkdir /home/norseghost chown norseghost:norseghost /home/norseghost.
And now you can add passwordless ssh login, shell profile customizations, or whatever.
After observing PLeX streams churning up some CPU time, I decided to get a GPU to offload transcoding tasks. I went with a GTX 1050 Ti 4GB. This is more or less the same chip as the Quadro P2000, which is the current ~budget~, uh, low-powered darling. With a sneaky workaround, the artificial two-transcode limit is easily circumvented. For 5 times less than a P2000, I’ll take that deal!
To pass your GPU through to your PLeX docker, there are preparatory steps needed.
From Community Applications install unRAID-nVidia. Go to Settings → unRAID-nVidia, Select the nVidia build for your version of unRAID, and install.
su cd /boot wget https://raw.githubusercontent.com/keylase/nvidia-patch/master/patch.sh chmod +x patch.sh mv patch.sh nvidia-patch.sh cat /boot/nvidia-patch.sh >> config.go
Installing the card is straightforward. Remove the PCIe riser, insert the card in the 16x wide slot, reinsert, reboot.
Do note that the 16x slot is 8x electrical; but this does not matter for our purposes. My particular card did not need an extra GPU power cable. If yours does, you need the 10 pin to GPU power adapter from HP, or get this one from moddiy.com
…but first, go to Settings → unRAID-nVidia; and copy your GPU GUID somewhere convenient.
In the unRAID web UI navigate to Docker, and reconfigure the PLeX docker.
Switch to advanced view, and under “Extra Parameters” add
Under NVIDIA_VISIBLE_DEVICES add that GUID.
Save, restarting the PLeX docker.
In the PLeX webui, go to settings for your server. Under “Transcoding”, select “Use hardware transcoding when available”
And there you go!
Remember the fan control rain dance from the last entry in this series? And HPE’s agressive stance towards fan control? Well, HPE’s not gonna let you forget. After installing the GPU, my fans were running at an… excessive > 60 %. And my previous fan hack - just setting every fan baseline to 1 - didn’t work anymore. This reddit post could point me in the right direction, though.
This process is also a little involved, so buckle up.
There’s a bug in the fan-control hacked firmware that makes it not display command outputs in SSH sessions beyond the first after a reset. And this output is important for the next step.
iLO is, as my son so eloquently put it, crying “stranger danger” on account of not recognizing the GPU.
This can be illustrated by SSHing into iLO, and running the command
fan info g.
A nice table like the following should be presented:
GROUPINGS 0: FASTEST Output: 63 [02*07 ... 1: FASTEST Output: 63 [02*07 ... 2: FASTEST Output: 35 [01 02*... 3: FASTEST Output: 36 [01 02 ... 4: FASTEST Output: 60 [01 03 ... 5: FASTEST Output: 60 [01 05 ...
(Example borrowed from the linked Reddit post, since I forgot to save my actual output)
Note that some numbers are marked with an
This indicates that that is the sensor iLO is reading as the hottest - in my case, sensor 52.
To quiet down just that sensor, run
fan pid 52 hi 300 or some other low number.
And enjoy immediate relief, as your fans settle down somewhere around 10-15 %.
Quick testing yielded 2 4K → 1080p transcodes, at ~1500 megabytes of GPU ram each; alongside one 1080p → 720p transcode. Realistically, I wont have much more than one 4K transcode at any given moment, if at all. Very nearly 0 CPU usage though, which was nice.
Feels good when a plan comes together.
Pursuant to my master’s in educational sociology I’ve been coding a fair bit of R recently. I’ve quickly run into resource bottlenecks though — the intersection of a fairly large dataset and a mere X1C6 turns out to… not be great. So… what better excuse to buy retired enterprise hardware? It’s for my degree!
| Base | HP DL380e gen 8 | | Chassis | 12 LFF bays, 1x750W PSU | | CPU | Dual Xeon E5-2450L | | RAM | 96 GB (4x8 + 4x16) | | HBA(ish) | P420, B120i | | Data | 5x 6TB HGST NL-SAS | | Cache | 500GB WD Red NAS SSD |
Went whole hog on the RAM, as that’s where my programming efforts are stymied. I’ve got room to grow - HPE reports I can increase up to 196 GB; while unRAID reports 384 GB max capacity.
The disks were used, and a steal at ~15 USD per gigabyte. All report A-OK.
To use the B120i for cache SSDs, I needed to find extra power somewhere. My 380e came without the rear drive cage option; but did come with the rear drive cage cable. Measurements yielded this pinout:
|-| +---------+ 1: 8v ground 4: 12/8v (yellow) | 1 2 3 | 2: empty 5: 12v/5v ground | 4 5 6 | 3: 1v ground 6: 5v/1v (red) +---------+
Also, the cable fits a female 6-pin PCI Express connector perfectly. So, a massacre of a 6-to-8 pin adapter as well as a molex extender later, we have power!
The yellow and red leads on the molex adapter go to pins 4 and 6, and the two black leads meet and go to pin 5.
HP servers are notorious for having an… aggressive approach to fan profiles. This means they can be hard to share a small home with. But never fear! Nerds to the rescue — turns out, there’s a hack for that.
Do note: THIS IS A HACK. IT MAY NOT WORK. IT MAY BRICK YOUR SERVER.
I hate noise more than I have sense, and I was fine in the end. YMMV.
The P420 is a hot chip, and largely responsible for the baseline fan levels. iLO does all it can to keep it at or below 85 degrees C; which means running the fans hard. I ziptied a Noctua 40mm fan (the A4-N20FLX) to the heatsink, with great results. Powered from the rear drive bay power cable (another adapter in the chain), this keeps the RAID card at a comfortable 65-67 degrees C, and iLO can stop worrying.
Except that, in their infinite wisdom, any PCI card detected means fans 3, 4, and 5 will run at a minimum of 35-40 %. More steps must be taken!
wget https://downloads.hpe.com/pub/softlib2/software1/sc-linux-fw-ilo/p192122427/v112485/CP027911.scexe CP027911.scexe --unpack=ilo
Install the 2.50 firmware however you like. I used the web interface.
git clone firstname.lastname@example.org:airbus-seclab/ilo4_toolbox.git yay -S keystone hexdump cd ilo4_toolbox/scripts/iLO4/eploits wget https://uc2e993615a24a6915b40d722b8c.dl.dropboxusercontent.com/cd/0/get/A1CIhVjQEhr9ukukz8Qw_dHKizKB0RGgnFjfrp6z1rUtvBFclCvn4t6LErPcGVl0At3NQKzgezKAb8eV9-W5eg1P_0lRnZ47R-d5u0r4VvTpbmRBuItsv5RL2b2aKbyY7_M/file?_download_id=16760008867236312560412850928972566356913390752513665509633372074&_notify_domain=www.dropbox.com&dl=1 python2 exploit_write_flash.py 250 ilo4_healthcommands.bin
I had the exploit stall the first time I ran it. Tried again, the planets were aligned.
This will reduce the base speed of the fans to a more bearable level cross the board; while allowing the firmware to respond as designed to high temperatures.
It really quiets down around sensor 32; which, incidentally, is the P420.
Adding disks can cause the fans to spin up; which requires a re-run of the command.
Change user and iLO hostname to suit your environment.
If private keys are not set up, add
sshpass password after
for I in `seq 1 65`; do ssh -o KexAlgorithms=+diffie-hellman-group1-sha1 martin@nas-ilo "fan pid $I lo 125"; done
I decided to run this command periodically, in case the box gets confused and ramps up the fan profiles again.