34

I've spent the past day working on my newest Poweredge R620 acquisition, and trying to nail down what things I can do without checking. Google has shown me that everyone seems to be having similar issues regardless of brand or model. Gone are the days when a rack server could be fully booted in 90 seconds. A big part of my frustration has been when the USB memory sticks are inserted to get firmware updated before I put this machine in production, easily driving times up to 15-20 minutes just to get to the point where I find out if I have the right combination of BIOS/EUFI boot parameters for each individual drive image.

I currently have this machine down to 6:15 before it starts booting the OS, and a good deal of that time is spent sitting here watching it at the beginning, where it says it's testing memory but in fact hasn't actually started that process yet. It's a mystery what exactly it's even doing.

At this point I've turned off the lifecycle controller scanning for new hardware, no boot processes on the internal SATA or PCI ports, or from the NICs, memory testing disabled... and I've run out of leads. I don't really see anything else available to turn off sensors and such. I mean it's going to be a fixed server running a bunch of VMs so there's no need for additional cards although some day I may increase the RAM, so I don't really need it to scan for future changes at every boot.

Anyway, this all got me thinking... it might be fun to compare notes and see what others have done to improve their boot times, especially if you're also balancing your power usage (since I've read that allowing full CPU power during POST can have a small effect on the time). I'm sure different brands will have different specific techniques, but maybe there's some common areas we can all take advantage of? And sure, ideally our machines would never need to reboot, but many people run machines at home only while being used and deal with this issue daily, or want to get back online as quickly as possible after a power outage, so anything helps...

top 35 comments
sorted by: hot top controversial new old
[-] SheeEttin@lemmy.world 24 points 1 year ago

I don't. Poweredges are slow to boot, not much you can do about that. They're designed to be very compatible, unlike the desktops. Any time I need to reboot a physical server, I go do something else for a while and come back.

If you want to avoid outages, consider a UPS or a second server for HA.

[-] stealthnerd@lemmy.world 12 points 1 year ago

I concur and it just gets worse the more hardware you have in them. 256G of memory and 24 disks? Might as well go have lunch while it boots.

[-] kalleboo@lemmy.world 3 points 1 year ago* (last edited 1 year ago)

And beyond the UEFI/boot stuff, it takes 10 minutes just for my ZFS pool to mount

[-] Shdwdrgn@mander.xyz 3 points 1 year ago

Damn are all 24 disks internal? That's some rig! I have the hardware on my latest NAS to connect up to 56 drives in hot-swap bays, and at one point while migrating data to the new drives I had 27 active units. Now that I've cleaned it up I'm only running 17 drives but it still seems like quite a stack.

[-] stealthnerd@lemmy.world 2 points 1 year ago

Yea they're internal. That's normal for a fully loaded 2u storage server. Some even have 2-4 extra disk slots in the rear to cram in a few more.

[-] Shdwdrgn@mander.xyz 1 points 1 year ago

Wow that's packing a lot in 2u. I've only ever had 1u servers so eight 2.5" slots is a lot for these.

[-] princessnorah@lemmy.blahaj.zone 2 points 1 year ago* (last edited 1 year ago)

To be fair, that’s for something like the R720xd, which drops the disk drive and tape drive slots to fit an extra 8 disks in the front. I have a regular ole R720 and it only has 16 bays. I didn’t need that many bays, and wanted better thermals for the GPU in it.

Edit: and I went with the 2U because it’s so much quieter.

[-] Shdwdrgn@mander.xyz 1 points 1 year ago

The 2u (R720) is quieter than the 1u (R620)? Or quieter than the R720xd?

Unfortunately the 720 wouldn't have worked for me as the majority of my drives are 3.5" (8x18TB + 5x6TB). What I ended up doing is designing a 3D-printed 16-drive rack using some cheap SATA2 backplanes. Speed tests showed the HDDs were slower than SATA2 anyway, so despite the apparent hardware limitation I actually still clock around 460MB/s transfer rates from those arrays. Then I use the internal 2.5" slots for SSDs. Seems to be working a hell of a lot better than my previous server (a PE 1950 which only had a PCIx 4x slot and topped out at about 75MB/s).

[-] 970372@sh.itjust.works 1 points 1 year ago

The new ones are actually reaaaaly fast with booting

[-] Awwab@kbin.social 16 points 1 year ago

6 min seems about right for an enterprise server, the more you have like a raid card initialization the longer it will be. Since there devices are designed to be run for months or years without rebooting it really doesn't matter that the reboot takes as long as it does.

[-] Shdwdrgn@mander.xyz 7 points 1 year ago

It's a bit of a shock to me. These are being used to replace some Poweredge 860's where POST time was pretty identical to that of a desktop, even though they too had PERC raid controllers in them. And sure, the NAS has the PERC plus a pair of 16-port LSI cards to initialize, but that doesn't seem to make a difference on the boot time between the other machines with only the onboard PERC.

[-] Appoxo@lemmy.dbzer0.com 1 points 1 year ago

Sorry but not really.
My workplace is an HPE shop and our DL3XX Gen8 and above can boot in about <4-3min to the OS part.

[-] macgyver@federation.red 10 points 1 year ago

Your only reason answer is don’t buy a “server” motherboard. They inherently perform more tasks during post to ensure stability. If you want fast post times get a desktop and a pikvm

[-] nicman24@kbin.social 10 points 1 year ago

On Linux kexec is reboot without rebooting

[-] Shdwdrgn@mander.xyz 3 points 1 year ago

Ohhhh... I remember reading years ago about a tool in development to perform in-place kernel reloads, but I never heard of it being completed. Thanks for the info, I'll be digging into this!

[-] nicman24@kbin.social 2 points 1 year ago

that most likely is kpatch. kexec is a bit different but not really

[-] Shdwdrgn@mander.xyz 2 points 1 year ago

Thanks for that, because I just realized kexec doesn't work with systemd. More rabbit-holes to go down!

[-] nicman24@kbin.social 2 points 1 year ago

i think it does with the latest version

[-] saiarcot895@programming.dev 2 points 1 year ago

Second this. If you don't need to go into the UEFI or do a full hardware reboot, and you're running Linux, kexec will be much better for you.

[-] TechAdmin@lemmy.world 6 points 1 year ago

Unfortunately I can't help with boot speed. Cold boot on enterprise servers tends to be on the slower side even for latest servers at my work across all major vendors. For rebooting the newer ones are faster but the older ones (around same age as R620) are slow to boot no matter what.

For the firmware that system is end of support life so once they are caught up to latest you are done, just an FYI. Do you have a single or multiple Dell servers?

I don't have much experience with single server environments so I'd recommend research & verify everything before attempting to install any firmware. Dell OpenManage Server Administrator looks like it could be helpful. Failing that you can use the iDRAC web interface for some of the firmware installs. You'll need to research to learn which ones can be installed there & the proper order to do them. If your iDRAC has the fancy remote console & media features available you could use those features to handle the rest of the firmware updates as well as install any OS you want on it. If it doesn't and have some budget available then I'd say look on eBay (or equivalent) for iDRAC Enterprise card and license if needed.

If you have multiple Dell servers I would recommend using the OpenManage Enterprise virtual appliance they make. It's free and makes firmware updates on Dell servers quick and easy. It can also handle installing firmware in the correct order when necessary. It will need access to the iDRAC interface.

[-] nowwhatnapster@lemmy.world 3 points 1 year ago

Agree with this post here. Adding to this thought:

Outside of initial provisioning/firmware updates. This server should only need to reboot once a month for OS/firmware security updates and maintenance. Maybe less/more depending on your organizations security posture. OP said they're running VM's so I don't understand the concern with the boot time. Once you provision the host you don't really tinker with any setting unless your adding hardware to updating firmware/os.

If the boot time is really that big a deal, get a second host and setup replication/vmotion with your VM's to eliminate the host boot time from affecting your uptime entirely.

[-] Shdwdrgn@mander.xyz 1 points 1 year ago

I am currently running four servers, and I think the R620 is pretty much beyond any future updates. My process has been to bump them to the last BIOS rev, then flash the PERC controller to IT mode, then I can start loading the operating system. So that's at least four different flash keys to go through (and stupid me, I didn't even write down the specific until setting up this last new machine).

I don't expect to ever have to touch any of these items again in their usable lifetime, but I do want to keep some basic hardware checks in place. So like if there's anything I could shut down that is actively looking for updates, I could nix that. Checking PCI cards for bootable devices when the system has 8 internal bays I'm making use of? Yeah that's a waste of time. The problem is we have limited control over what checks can be disabled for our specific needs, and it's not always easy to decipher the mean of those options. I never would have thought it would be a good idea to disable my SATA controller, until I noticed a line at boot saying no AHCI devices found.

Speaking of iDRAC... I've never used this feature before, but my understanding is that using this is the same as doing it directly on the console -- if you make any changes then the server is forced to reboot. Is that correct? Or is there a way to save the new settings but hold off on applying them until the next intentional reboot? Like I would love to get the settings on all of my machines identical now that I've somewhat figured out what I need to do, but I don't want to reboot them until the next good window (and then I'd rather reboot ALL of them together instead of messing with each one at a time). I'm just curious if any of this is possible or if there's really no advantage except remote management?

[-] Awwab@kbin.social 2 points 1 year ago

iDRAC just lets you remote access the device and tweak bios settings or whatever remotely rather than having to use a physical kvm. I know dell and hp have utilities to let you modify bios settings from windows but I'm not sure if that extends to their server platforms as well.

[-] Shdwdrgn@mander.xyz 1 points 1 year ago
[-] TechAdmin@lemmy.world 2 points 1 year ago

IMO, management interfaces like iDRAC are very nice extra to have when using enterprise servers for homelab.

The base iDRAC allows you to control power state, monitor & configure hardware, and view hardware system event log. The remote console and media features cost extra as part of the Enterprise iDRAC. Remote console lets you access server just like if you were physically in front of it. Remote media lets you mount images over the network to the server and boot from bootable ones too.

It has in band and out of band connectivity methods but I only have experience with out of band.

[-] Shdwdrgn@mander.xyz 2 points 1 year ago

Cool, thanks! Sounds like something I should at least hook up and check out.

[-] TechAdmin@lemmy.world 1 points 1 year ago

I hope you like it :)

[-] Decronym@lemmy.decronym.xyz 2 points 1 year ago* (last edited 1 year ago)

Acronyms, initialisms, abbreviations, contractions, and other phrases which expand to something larger, that I've seen in this thread:

Fewer Letters More Letters
HA Home Assistant automation software
~ High Availability
NAS Network-Attached Storage
RAID Redundant Array of Independent Disks for mass storage
SATA Serial AT Attachment interface for mass storage
SSD Solid State Drive mass storage

5 acronyms in this thread; the most compressed thread commented on today has 6 acronyms.

[Thread #124 for this sub, first seen 10th Sep 2023, 04:35] [FAQ] [Full list] [Contact] [Source code]

[-] const_void@lemmy.ml 0 points 1 year ago* (last edited 1 year ago)

Reflash to Coreboot. Failing that, disable SecureBoot, disable splash screen, disable PXE boot, disable all other boot order options that might try and fail before hitting the OS drive, remove any RAID cards or network cards you're not using. Remove any drives you're not using.

[-] Shdwdrgn@mander.xyz 1 points 1 year ago

What brand of server do you have? I don't remember seeing anything about SecureBoot options, which may be why I'm not familiar with CoreBoot. I don't use UEFI on my drives but maybe that's related?

[-] bdesk@kbin.social -2 points 1 year ago* (last edited 1 year ago)

I make all my requests early in the morning: 0800 &lt; 1400 - instant time reduction there!

[-] scott@lem.free.as 0 points 1 year ago
[-] bdesk@kbin.social -4 points 1 year ago

Oh am sorry I don't build fences

[-] PuppyOSAndCoffee@lemmy.ml -3 points 1 year ago

Microsoft says gtfo with your bios settings, they know what’s best, and that means all the checks you say you don’t want. I am guessing that’s your OS vendor…

[-] Shdwdrgn@mander.xyz 7 points 1 year ago

Oh god no, I haven't had a Windows machine since 2006. Everything in this house, even my wife's laptop, runs linux.

this post was submitted on 10 Sep 2023
34 points (94.7% liked)

Selfhosted

39700 readers
230 users here now

A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.

Rules:

  1. Be civil: we're here to support and learn from one another. Insults won't be tolerated. Flame wars are frowned upon.

  2. No spam posting.

  3. Posts have to be centered around self-hosting. There are other communities for discussing hardware or home computing. If it's not obvious why your post topic revolves around selfhosting, please include details to make it clear.

  4. Don't duplicate the full text of your blog or github here. Just post the link for folks to click.

  5. Submission headline should match the article title (don’t cherry-pick information from the title to fit your agenda).

  6. No trolling.

Resources:

Any issues on the community? Report it using the report flag.

Questions? DM the mods!

founded 1 year ago
MODERATORS