koldfront

New nvme ssd in the home server #hardware

🕠︎ - 2024-07-15 - 🟊 1

It's been more than 5 years since I last upgraded my home server.

Today the kernel barfed out a lot of loglines which seemed to pertain to the five years and 11 weeks old nvme ssd it is running on:

pcieport 0000:00:01.3: AER: Corrected error message received from 0000:06:00.0
nvme 0000:06:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
nvme 0000:06:00.0:   device [8086:f1a8] error status/mask=00000001/0000e000
nvme 0000:06:00.0:    [ 0] RxErr                  (First)

Although these were all "Corrected", and although smart-log reads out:

available_spare				: 100%
available_spare_threshold		: 10%
percentage_used				: 2%

it does also say:

power_on_hours				: 45,474

and as the disk usage current is at:

Filesystem      Size  Used Avail Use% Mounted on
/dev/nvme0n1p1  1.9T  1.3T  608G  68% /

I am taking the opportunity to upgrade the old 2TB Intel 660p NVMe SSD to a new 4TB Samsung 990 Pro NVMe SSD. This should also give a decent boost in performance, from 1800 Mbps read/write and 220K IOPS to 7450/6900 Mbps read/write and 1400/1550K IOPS.

Currently I have the new NVMe SSD connected via a Dezen USB-thing, and I am running rsync -varSHx --progress --stats --exclude /mnt / /mnt/ to transfer everything. When it has run through, I will boot the machine in single-user mode, run the rsync again, and then switch over after adjusting the fstab and running grub-install.

Ok, that went a little less smoothly than expected.

The new SSD is 4TB, so a DOS partition table was out. I created a GPT table, not realizing that then grub needs a BIOS boot partition.

So the server was down for like 20 minutes, then up again while I managed to create the needed partition, and then down again for 40 minutes while I made the actual switch.

The commands I used were:

rsync -varSHx --progress --stats --exclude /mnt / /mnt/

to transfer all the files - once while running, then once while booted in single user mode.

Then I did:

mount -t proc proc /mnt/proc
mount -t sysfs /sys /mnt/sys
mount --bind /dev /mnt/dev
mount --bind /dev/pts /mnt/dev/pts
chroot /mnt
blkid
jove /etc/fstab # to update the UUID in /etc/fstab
grub-install

And then I shut down and swapped in the new SSD in the server.

This sort of worked, except grub was still trying the old UUID, so I booted with root=/dev/nvme0n1p1 single and ran update-grub.

Then I reassembled the machine and put it back on the hat rack, where it belongs.

Only thing I had to fix so far after that was the /usr/bin/ping binary:

setcap cap_net_raw+ep /usr/bin/ping

as I run it as a non-privileged user in various simple monitoring checks.

- Adam Sjøgren 🕘︎ - 2024-07-15

+=

Add comment

How to comment, in excruciating detail…

To avoid spam many websites make you fill out a CAPTCHA, or log in via an account at a corporation such as Facebook, Google or even Microsoft GitHub.

I have chosen to use a more old school method of spam prevention.

To post a comment here, you need to:

  • Configure a newsreader¹ to connect to the server koldfront.dk on port 1119 using nntps (nntp over TLS).
  • Open the newsgroup called lantern.koldfront and post a follow up to the article.
¹ Such as Thunderbird, Pan, slrn, tin or Gnus (part of Emacs).

Or, you can fill in this form:

+=