Organising Terabytes Fast

Organising Terabytes Fast

Imagine for a second that you have six hard drives filled with Data. Some are four terabytes. Others are one terabyte each, and you’ve already sorted personal videos and photos from other media files. Each drive is moved to its own folder

Two Terabyte Seagate has gone from being a drive to a folder. Bob 2017 has also become a folder, rather than a drive. Samantha 2018 is also a volume, rather than a drive. There is a reason for congregating the media from smaller files to a large volume and that is speed.

If you want to organise photos into a “photos” folder, videos into a “videos” folder and then organise those photos by year, initially, then moving files and folders from location A on a drive to Location B on a drive takes seconds.

To be more specific, you have volume A, B and C with files in them, and there is a lot of overlap of files. You could compare the files in three locations to make sure that there are no duplicates across a spaguetti junction of physical drives but by centralising everything you have a single volume, whether it’s a raid or a hard drive.

When everything is on a single volume organising files becomes as easy as creating folders, and moving files to the right folder. Video files from 2024 all go into videos/2024, videos from 2016 go to videos 2016. You may notice that I’m not going through the year-month-date yet. That’s because I find that itteration is faster. The aim is to identify the duplicates fast, and delete them from the aggregation/consolidation drive. Having three or four copies across three or four drives makes sense, for data recovery. On a single drive it’s a waste of space.

Once duplicates have been identified and got rid of, then time can be spend in uniformising date format and file names. Remember, eventually you can move the well organised files back to a smaller drive as a low cost backup.

10 Terabytes Become Four

This figure is an imaginery one. The point is that if you have data across 6-8 drives, and their storage amounts to 10-12 terabytes but a lot of that data is duplicates then the real space needed and used is lower. You should have two local backups and one off-site backup.

By copying data from multiple drives to a single drive it becomes easy to get everything organised, and one it is organised you can dump that data, in an organised manner, back to external drives.

For example you could have a drive for photos from 2010-2024, and another for videos from 2020-2024, and so on. Usually I don’t print labels for drives so it can get confusing. That’s where I like to use post-it notes. They’re cheap, and versatile. They need to last only until you finish organising your files. In theory you could print proper labels for drives but post-its are quick, cheap, and easy to use.

If I had a label maker I might print labels. I would consider printed labels once things are finalised, rather than when they’re in flux. Post-its are good for constant change.

Knowing What You Have and Where

If you backup from your laptop to drive A when you run out of space, and then drive B when you run out of space you have duplicates, triplicates, maybe even five or six copies of the same file. The problem is that because it’s decentralised it’s easier to back everything up and be safe, than assume you have a file or folder backed up when it isn’t.

By aggregating smaller drives to a big drive you gain control of your former chaos. You go from thinking you need ten terabytes to realising you need four to six terabytes instead.

Two Motivations

The first motivation for finally doing this properly is that I noticed a few years ago that I had lost track of hundreds of files from the uni years and I wanted to recover them. Between Picasa, iPhoto, Aperture, Google Photos and other solutions I lost track of these files. Now that I have regained track of them I can take advantage.

My second motivation is that PhotoPrism and Immich look like interesting solutions. In the good old days I had so few photos that they fit on my laptop’s drive with ease, but in the age of having a camera with us everywhere we go we end up with thousands of photos. These take up space, and by having a self-hosted solution like PhotoPrism and Immich we can keep track of these images with ease.

Estimating Cost

For the sake of argument let’s say that you have twenty drives. and they vary from 750 gigabytes to 8 terabytes in size. In theory you would assume that you need a 30+ terabyte raid to backup all that data. The issue is that this 30 terabyte raid costs hundreds, if not thousands of francs. If you regain control of how much space you need, on smaller volume drives then you get a better idea of how much storage you need.

An empty four bay synology device costs over four hundred francs, and that’s before you get disk drives. With the disk drives you get to 1500 CHF. I am not against getting a Synology or other device. I’m encouraging people to get into good habits, to ensure that there are two local copies, and a third off-site backup, rather than 15 drives that all have similar files.

 And Finally

I chose to write this blog post today because yesterday I suddenly felt overwhelmed by the amount of data I felt I still had to consolidate and the little amount of space I had, relative to the requirement. Initially my idea was to dump all the data from the smaller drives to the central server but I don’t have that much storage.

By organising files by photos, documents, and videos, as well as then going down to organising them by year I can quickly detect and delete duplicates. This helps me streamline how much storage I need, and I can then backup that data by photos on one drive, video on another, and documents on a third, for example. I could also organise them chronologically.

By organising the video and photo files on a single volume though I prepare it to be indexed and catalogued by either PhotoPrism, or Immich, or both, to see which one I prefer over time.

|

Keeping Twitter Private

Twitter has three options. You can tweet to the world without barriers and anyone can read and respond. This is great when you want to grow your network and have conversations. The second option is to send DMs to specific individuals or groups (if I remember correctly). The third option is to make your account private. The only people can read your tweets are the people who were following you when you made the account private.


The weakness of a private account is that twitter is a social medium and as such any time we @ or retweet someone they will be unable to see our answers. Any answer we write to those people will be unseen and so we will be tweeting in the wind.


My two reasons for keeping twitter private are:


A) More freedom. If we approve the people who can read what we write we can first warn them that we may be cheeky. We may something that we only think for long enough to write a tweet, and by the time it’s published we have already changed our mind.


B) The people we’re tweeting with are also private. If we answer a private tweet publicly then people may intuit what the conversation is about. We could use another IM platform but WhatsApp is part of Facebook and other IM networks have their own problems. People tend to be spread across platforms.


Twitter, Facebook and LinkedIn are three different types of social networks. LinkedIn is serious. I keep my profile up to date but not much more. Facebook is the network of former university friends. Due to this, I need to trust those I add. When twitter was a network of friends waiting to meet at tweetups everyone was accountable to the community so everyone had reason to behave a specific way. Now that trolls, hashtags and other issues are present keeping an account private keeps them away.


140+ characters is excellent to tell people how we feel but terrible for context. Blogging, forums and other long-form discussion websites are better suited to being public because you spend half an hour to an hour developing your idea, modifying it, and then sharing. That is long enough for an irrational tweet to become a rational post.


I’d rather have one to three blog posts by the end of a given day, than twenty-five to two hundred tweets. ;-). I haven’t tweeted like that in years for a reason.