this post was submitted on 09 Nov 2023
2 points (100.0% liked)

Data Hoarder

11 readers
1 users here now

We are digital librarians. Among us are represented the various reasons to keep data -- legal requirements, competitive requirements, uncertainty of permanence of cloud services, distaste for transmitting your data externally (e.g. government or corporate espionage), cultural and familial archivists, internet collapse preppers, and people who do it themselves so they're sure it's done right. Everyone has their reasons for curating the data they have decided to keep (either forever or For A Damn Long Time (tm) ). Along the way we have sought out like-minded individuals to exchange strategies, war stories, and cautionary tales of failures.

founded 1 year ago
MODERATORS
 

I am archiving a vast amount of media files that are rarely accessed. I'm writing large sequential files, at peaks of about 100MB/s.

I want to maximise storage space primarily; I have 20x 18TB HDDs.

I've been told that large (e.g. 20 disk) vdevs are bad because resilvers will take a very long time, which creates higher risk of pool failure. How bad of an idea is this?

top 7 comments
sorted by: hot top controversial new old
[–] Big_Expression7231@alien.top 1 points 1 year ago

I always understood as a balancing act in your vdev sizing. Too big = long rebuild times with so many disks spinning up every time. too small = wasted $ with TB loss to redundancy (raidz2 with 3 20tb disks only utilizes 33% of the TB purchased).

I've always felt you should calculate how many disks do you need to saturate your connection and go from there. 10gig trunk on your network then you'll want 1 vdev to be able to saturate a 10gig line. any larger than that you don't get any benefit from.

[–] zrgardne@alien.top 1 points 1 year ago (1 children)
[–] Sertisy@alien.top 1 points 1 year ago

Draid was released just a month after I built my last raid set with vdevs. Really hoping there's an in-place migration path someday, assuming nobody finds any bugs in the next couple years.

[–] EchoGecko795@alien.top 1 points 1 year ago

I have a few 24x drive RAIDz3 pools, and as long as you can live with the longer scrub and resilver time they make a good archive or backup pool, but I would not really want it as an always on active pool. If you want to know what the estimated failure rate here is a calculator.

Not sure if its broken or if my mobile firefox browser just doesn't like it, but I seem to be getting an error of 0% failure rates, there are other calculators if you google them though.

https://www.servethehome.com/raid-calculator/raid-reliability-calculator-simple-mttdl-model/

I've been told that large (e.g. 20 disk) vdevs are bad because resilvers will take a very long time, which creates higher risk of pool failure. How bad of an idea is this?

I normally only have to replace 1 drive at a time, but with RAIDz3 you have to lose 4 drives at the same time for data loss to happen. If you are using a mixed batches of drive (not all from the same run) this happening is very low, and usually happening due to some other event (overheating, fire, cow attacking the disk shelf) In the 5 years I have had these pools, the worst was losing 1 drive, and having errors pop up on another drive, which were still corrected because RAIDz3 has 3 drives of protection.

[–] Dagger0@alien.top 1 points 1 year ago

I've been told that large (e.g. 20 disk) vdevs are bad because resilvers will take a very long time

They won't really take any longer than on narrower vdevs unless you're hitting CPU or controller throughput limits.

[–] dr100@alien.top 1 points 1 year ago

That's the ideal use case for snapraid.

[–] kwarner04@alien.top 1 points 1 year ago

Mergerfs + snapraid

If a drive fails, you only need the parity disk to restore, not the whole array. Also, if for some reason you can’t restore, you only lose data on the failed drive.

ZFS is great and for real NAS data, I’m a fan. But for large media files and and such that you are write once, read many, it’s a much better option I think.

Mergerfs is just to present all 20 drives as a single mount point so you aren’t searching thru 20 drives when you want to view.