this post was submitted on 04 Dec 2023
3 points (100.0% liked)

Data Hoarder

11 readers
1 users here now

We are digital librarians. Among us are represented the various reasons to keep data -- legal requirements, competitive requirements, uncertainty of permanence of cloud services, distaste for transmitting your data externally (e.g. government or corporate espionage), cultural and familial archivists, internet collapse preppers, and people who do it themselves so they're sure it's done right. Everyone has their reasons for curating the data they have decided to keep (either forever or For A Damn Long Time (tm) ). Along the way we have sought out like-minded individuals to exchange strategies, war stories, and cautionary tales of failures.

founded 1 year ago
MODERATORS
 

I read many posts talking about importance of having multiple copies. but the problem is, even if you have multiple copies, how do you make sure that EVERY FILE in each copy is good. For instance, imagine you want to view a photo taken a few years ago, when you checkout copy 1 of your backup, you find it already corrupted. Then you turn to copy 2/3, find this photo is good. OK you happily discard copy 1 of backup and keep 2/3. Next day you want to view another photo 2, and find that photo 2 in backup copy 2 is dead but good in copy 3, so you keep copy 3, discard copy 3. Now some day you find something is wrong in copy 3, and you no longer have any copies with everything intact.

Someone may say, when we find that some files for copy 1 are dead, we make a new copy 4 from copy 2 (or 3), but problem is, there are already dead files in this copy 2, so this new copy would not solve the issue above.

Just wonder how do you guys deal with this issue? Any idea would be appreciated.

top 11 comments
sorted by: hot top controversial new old
[–] Individual_Brick5537@alien.top 1 points 11 months ago

Many storage systems have features called "data scrubbing" https://en.wikipedia.org/wiki/Data_scrubbing , which Synology discusses here: https://blog.synology.com/how-data-scrubbing-protects-against-data-corruption

This will correct errors with drives, and potentially give some early warning that a drive may fail. You will also want to run SMART tests on your drives. Quick tests often (I do daily), extended tests occasionally (I do monthly).

The backup software should also have a way to verify the accuracy of the data, and check that the data can be restored. On Synology, HyperBackup has backup integrity check https://kb.synology.com/en-us/DSM/tutorial/What_is_backup_integrity_check_for_Hyper_Backup_tasks .

[–] hobbyhacker@alien.top 1 points 11 months ago (1 children)

if you use real backups, and not just simple copies, then your backup software has verify function. For simple copies you should use hash files or something that can build a hash database and verify it. Btw. you should already use hash checking for live data anyway. For archiving you can create winrar archives with 10% recovery record, so it can self-verify and self-repair easily.

[–] Silencer306@alien.top 1 points 11 months ago (1 children)

What do you mean by “real backups”?

[–] hobbyhacker@alien.top 1 points 11 months ago

dedicated software that can create verifiable historical backup files. Like Veeam or Macrium, or the new generation like Duplicacy, Arq, Borg, etc. All of them have integrity verification integrated.

[–] Far_Marsupial6303@alien.top 1 points 11 months ago

Ideally you would have generated and saved a HASH before you copied your files as a control. Otherwise, it's just a probability game. If the HASH on copy 1&2 match, but doesn't match 3, then the probability is 1&2 are correct. If all three don't match, you toss a coin.

If you're on Windows, I recommend using Teracopy for all your file copying (always copy, never move!) and set verify on, which will perform a CRC and generate a HASH which you can then save. You can also use it to Test your files after the fact and generate a HASH.

[–] WikiBox@alien.top 1 points 11 months ago

I use snapraid as one of my backup methods. Mainly for long term mostly static archive backups. Things that no longer change, but is added to, and I still want to have accessible read-only. Not for daily backups or for frequently changing files or folders, nor for "permanent" off-line cold storage.

https://www.snapraid.it/

I use 8 storage drives and two snapraid parity drives.

Using snapraid I can then easily verify that all backed up files are 100% OK, exactly as they were when I had just backed them up.

Snapraid can detect and fix bitrot (has never happened so far), undelete accidentally deleted files or folders and even recreate up to two failed drives.

When I backup/archive files, I simply copy them to one of the storage drives and then ask snapraid to update the parity.

Done!

[–] iMainQuake@alien.top 1 points 11 months ago

how do you make sure that EVERY FILE in each copy is good.

Checksums.

Personally, I use TeraCopy to safely copy a folder/file from my main drive to my backups (there’s even an option on there that will save a checksum of said folder/file on the backup so that I can later run that checksum and see if anything has corrupted).

What do I do if there’s corruption? Simply delete the corrupted files and replace them with good copies from other backups.

[–] dr100@alien.top 1 points 11 months ago

This why you check your backups periodically and replace the bad ones with good copies. If you're asking how you know what's good and bad - traditionally and fundamentally, even if many people here dismiss it, the storage already has checksums, that sneaky bitrot when the storage will give you slightly altered data (instead of saying "Error") are so small that most people would never encounter this. Now of course serious data hoarders would use checksumming file systems, will do extra checksums for any archived data, also all archiving formats or backup formats have their own checksums too, if one would use that instead of dropping the files in the regular file system.

[–] DTLow@alien.top 1 points 11 months ago (1 children)

Versioning
My backups are incremental (Mac TimeMachine and Arq)
If I find a file is corrupted, I can restore an earlier version

[–] Silencer306@alien.top 1 points 11 months ago

How do you do on win’s or linux?

[–] Melodic-Look-9428@alien.top 1 points 11 months ago

It's something I need to take a look into more if I'm honest, I checked all my media in VLC looking for duration and replaced any with no duration that wouldn't play.

I've got an old backup I can refer to and when I sync to my Synology the deleted items don't get removed so if something gets removed by mistake I have a couple of places to refer back to.

There's still the risk of ongoing corruption/bit rot so I installed Checkrr last weekend to try to flag problematic files.

Take a look: Checkrr