Is De-Cluttering Your Electronic Storage Worth It?

Updated on September 14, 2018
Simon Kravis profile image

Simon has been involved in software development since the days of paper tape.

More storage or clean up?

In 1958, Professor C. Northcote Parkinson observed that ‘Work expands to fill the time available for its completion’. A digital derivative of this adage is that ‘Data expands to fill the storage space available’. As the marginal cost of storage moves towards zero, so does the incentive to tidy up storage. It’s far more appealing to add a few more terabytes of storage or buy a new, more capacious piece of hardware than to address the task of de-cluttering by deleting or archiving unwanted files.

So why should you bother tidying up your storage? It will

  • Save you the trouble and expense of a hardware upgrade.
  • Improve the performance of your hardware if its storage is almost full
  • Help you locate data – it’s easier to find one photo out of 100 rather than one out of 10,000.

With search now ubiquitous, if you want it, you can find it with a few clicks - or so vendors would like you to believe. Search works excellently on text but has a long way to go on photos and videos, which constitute the bulk of domestic storage use nowadays. Time ordering, geo-location and automatic image analysis such as that provided by Google Photos, certainly help in this task and will improve over time, but their limitations become apparent when you start on an actual task. Reducing the number of the files you have to look through will always help.

Mobile Phones

Once you’ve decided to clean up, how do you do it? With their focus on apps, mobile phones give the storage use of files associated with an app and the app itself. This makes it obvious what is using most of the space and the answer is almost always photos and videos. Going through these and deleting those you don’t want to keep gives a quick and easy increase in space. If you want to archive them, first connect the phone to a computer. Photos and videos can usually be seen directly and copied to archive media before deleting them. For iPhones connecting PCs, you will need to have iTunes installed and running.

Tablets

Tablets are half-way between mobile phones and PCs. They provide a view of the internal folder structure and an analysis of storage use from a built-in app. Budget tablets often have little storage space (as little as 8 Gbytes) and although they provide extra storage potential via an SD card socket, this storage is not usable for all purposes in the way that the built-in storage is. The amount of space used by apps is often significant and the easiest savings may be from removing unwanted apps. Use of cloud storage can relieve storage pressure on your tablet or phone, but you will be using your mobile data allowance every time you retrieve a file and this can end up being quite costly.

Domestic PCs

Modern PCs often come with a terabyte or more of rotating disk storage, but solid-state storage is becoming more widespread, especially for laptops where its use can result in a slimmer device. Solid-state storage provision is generally less than for rotating disks, but half a terabyte is now not uncommon, although older devices had much less. PC operating systems have grown over the years, with Windows 10 64 bit now requiring 20 GBytes, but storage provision has increased much more, so if you’re short of space on a modern PC, the major culprits will be applications and media files. The Windows Disk Cleanup utility does a good job. Windows upgrades often leave large files behind, which are identified by Disk Cleanup as possible candidates for removal.

Disk Cleanup Results
Disk Cleanup Results | Source

Right-clicking on the OS (C:) icon in This PC shows the amount of space free on the C: drive:

Disk Properties
Disk Properties | Source

If Disk Cleanup doesn’t offer much in the way of savings, then Windows Search can locate large files using the Size: gigantic option as shown below.

Search Results for Gigantic Files
Search Results for Gigantic Files | Source

The results of a search with a size filter are not ordered. if you want to see files ordered by descending size, the entire search will be run again, which may take some time. Large files tend to take up most of the space in electronic storage, so identifying these can be very helpful. The largest files are frequently C:\pagefile.sys and C:\ hiberfil.sys, which are Windows system files which grow with each operation you perform. They are rebuilt each time the system restarts, so simply restarting your computer may free up many gigabytes of space.

If you see a large file that seems to be a candidate for deletion, type its name into Google before you remove it to see what its purpose is. Removing it may have unintended consequences.

Windows offers the option to compress large files to save storage space – folders with their contents compressed are shown in blue. Compression may result in large space savings for log files and files with a high proportion of repeated content, but savings will be much smaller for media files, which are often in compressed format anyway.

If you have folders containing large numbers of non-gigantic files which consume significant space, these folders can be difficult to identify with native Windows functionality. However, there are a number of applications which can help you find these. TreeSize Free gives a very rapid overview of the amount of space used by folders:

Treesize Output
Treesize Output | Source

Clicking on any of the folders shows the space usage of subfolders. This application can rapidly home in just what is consuming your storage space. The Professional version (costing 46.95 Euros) includes actioning, age profiles and many other features.

WinDirStat is a free open-source application giving similar data, but with the addition of visualization of file and folder size:


WindDirStat Results
WindDirStat Results | Source

Networked PCs and Shared Storage

Most workplaces now have networked PCs, usually with some storage shared between all users, often as a group drive, or drives, which are accessible to all users or a group of users, and Home drives, accessible only to individual users. Storage quotas may be applied to restrict the amount of space available for group and home drives. A common scenario is that only shared storage is backed up. Storage on local machines may or may not be accessible to individual users. Making the only accessible storage on a shared drive can ensure that all documents created and stored are backed up. The role of individual PCs then becomes similar to that of the ‘dumb terminals’ which were widely used before the advent of the PC and which had no local storage.

File ownership may be problematic on shared storage. The tie between file ownership and Windows accounts on file servers means that when accounts are removed as staff leave, ownership of large numbers of files may become indeterminate.

Cloud-based systems have the advantage that capacity can be easily increased, and content is available via the Web. Both of these features are attractive to organizations but come at a significant cost, both for software licensing and data movement.

Wherever shared storage exists, its management is the responsibility of IT staff rather than individual users. Management is often by exhortation, often on the lines of “The G: drive is 98% full. Can users please remove any unneeded files”. Such exhortations may result in massive amounts of time being wasted by users as they examine small files whose removal will only minimally reduce storage demand. Many users have no idea of file size, further complicating management. Storage quotas may curb the profligate use of shared storage by the handful of users with large holdings, but most users have very small shared storage usage, so a quota policy makes poor use of available capacity and may result in users storing important documents being stored outside the backup umbrella on local drives or removable storage.

Another approach to shared storage management is to remove all files, or files which are commonly large (such as Microsoft Access databases, or media files) at a specified date and restoring only those that users request to be restored. This process is certainly effective but may cause considerable disruption and reveal problems with the backup/restore process.

Given the difficulties of managing shared storage, the path of expanding capacity is usually taken. The only pressure to clean up usually comes from legal departments of organizations who are concerned about liability – if an organization becomes involved in legal action they may be required to disclose all the relevant documents in their possession, whether or not they were required to retain them. The default policy of ‘keep everything forever’ can lead to increased legal exposure and is one of the drivers for the introduction of document management systems, where disposal of documents which no longer need to be retained is straightforward.

Document Management Systems

Document management systems may also be used to store files and may run into storage limitations. These systems are often cloud-based, sometimes completely replacing the user desktop so that all documents and communications are stored automatically. Document ownership and permissions can be managed more effectively in most document management systems, but performance may be poor for large media files, which are increasingly common. Migration from a shared drive to a document management system may be problematic due to difficulty in mapping permissions. File and folder naming rules also may be much more restricted in a document management system. Poor performance may result in an increase in ‘off-system’ processing which may negate the advantages of the document management systems.

However, one virtue of document management systems is that they record the check-out dates of files but users, making it possible to record file usage. The first check-in date can be used to set the start of a retention period (or sentence) for the files, making it possible to implement a policy to remove documents in a particular category after a set period of time. This automatic removal process means that storage problems are less likely to arise, as files whose retention period has expired can be detected easily and removed. However, storage for document management systems is commonly in a database, which requires much higher performance than a shared disk drive and capacity addition is likely to be much more expensive.

Duplication and De-Duplication

A common concern of many computer users is file duplication or the retention of multiple copies of the same electronic document. The most exact definition of duplication is that duplicate files contain the same pattern of bits. Duplication can be simply established by calculating the checksum of the file binary content and comparing the checksums of two file. Duplicated files will have the same checksum.

Given that most files stored are small, high levels of exact duplication seldom affect storage use. The author’s experience in profiling shared storage in commercial, government and not-for-profit organizations would rarely see storage reductions of more than 15% from the removal of all duplicate files. As the numbers of duplicate clusters of files are extremely large, deciding which files in a cluster to keep and which to delete is a highly laborious task yielding only a small increase in available storage.

A complication of de-duplication is that humans may perceive as duplicates electronic documents which do not have the same bit pattern. Two photos taken from a hand-held camera of the same scene will have different pixels due to the camera position being slightly different for each, and thus will not have the same bit pattern. The same Word document saved by two different people will have different bit patterns, as Word stores metadata about the user and time of saving inside the file. A PDF document containing electronic text will not have the same bit pattern as a scan of the same document, and two different scans of the same document will not have the same bit pattern due to different placement of the original on the scanner.

DupeGuru is a sophisticated free program for identifying duplicate files, and folder. It uses file names and checksums for comparisons. It does not flag as duplicates files with the same text content but different bit patterns. DupeGuru can detect folders with duplicated content, which can easily be created and are the “low-hanging fruit” of de-duplication. Picture mode addresses the problem of detecting visually identical photos with different bit patterns by creating a very low-resolution version of the photo and comparing pixels to give a percentage match.

File Age

Files which are no longer being accessed are a much greater problem than duplication. From the author’s experience over many types of shared storage and domestic computers, 50% of files are likely to have a Modified date more than 3 years before the current date. In one instance, 50% of files had a Modified date of more than 8 years before the date of scanning. An example of file date and count profiles is shown below:

File Count and Size Profiles by Modified Date
File Count and Size Profiles by Modified Date | Source

Unfortunately, the Modified date of a file does not indicate the date at which it was last accessed by a user, only the date at which it was last changed. Examples of files which are in frequent use but not changed include office floor plans, document templates, and logos. These may have old Modified dates but are frequently used as read-only files. Any action to increase storage space by removing files with old Modified dates runs the risk of removing such files unless its application is restricted to folders where these files are unlikely to be found. However, the volume savings are substantial: removing files with a Modified date before May 2014 in the example shown above would save 50% of storage volume.

If local email archive file storage is used, the opposite problem may occur with email archive files, that can be very large and whose Modified date is updated to the current date each time the Mail application checks for new mail.

Files do store a Last Accessed date, but this access date may be set from a backup program, virus scanner or even the operating system as well as the parent application. This makes it of little use for storage management.

If you want to do more than guess an age threshold for an archive-by-modified date policy, the FolderSizes program provides comprehensive filesystem analysis, including a file age histogram as shown below:

Source

FolderSizes also provides a visualization of folder size similar to WinDirStat and many other useful displays for storage analysis and management. FolderSizes costs US$60 for a single user license, with discounts for multiple users. It has a 15-day free evaluation period.

To De-clutter or not to De-clutter?

If you are a domestic PC user seeing the message “There is not enough disk space to complete this operation” then de-cluttering is probably the best way to go. On Windows PCs, Disk Cleanup should be your first step, followed by some the actions described in this article if required. With terabyte removable disk drives readily available at very modest cost, deleting some files or moving them onto removable storage is the way to go if you’re unable to create enough space.

If you manage a collection of networked PCs in an organization with shared storage running out, any de-cluttering action needs to be carefully thought out to minimize disruption to users and avoid discouraging them from using backed-up storage. If a disruption occurs, the cost of this can exceed the cost of expanding storage capacity. If a security breach occurs from increased use of off-system storage or processing the consequences may be serious.

Have you had computer storage capacity problems?

How did you deal with them?

See results

Comments

    0 of 8192 characters used
    Post Comment

    working

    This website uses cookies

    As a user in the EEA, your approval is needed on a few things. To provide a better website experience, turbofuture.com uses cookies (and other similar technologies) and may collect, process, and share personal data. Please choose which areas of our service you consent to our doing so.

    For more information on managing or withdrawing consents and how we handle data, visit our Privacy Policy at: https://turbofuture.com/privacy-policy#gdpr

    Show Details
    Necessary
    HubPages Device IDThis is used to identify particular browsers or devices when the access the service, and is used for security reasons.
    LoginThis is necessary to sign in to the HubPages Service.
    Google RecaptchaThis is used to prevent bots and spam. (Privacy Policy)
    AkismetThis is used to detect comment spam. (Privacy Policy)
    HubPages Google AnalyticsThis is used to provide data on traffic to our website, all personally identifyable data is anonymized. (Privacy Policy)
    HubPages Traffic PixelThis is used to collect data on traffic to articles and other pages on our site. Unless you are signed in to a HubPages account, all personally identifiable information is anonymized.
    Amazon Web ServicesThis is a cloud services platform that we used to host our service. (Privacy Policy)
    CloudflareThis is a cloud CDN service that we use to efficiently deliver files required for our service to operate such as javascript, cascading style sheets, images, and videos. (Privacy Policy)
    Google Hosted LibrariesJavascript software libraries such as jQuery are loaded at endpoints on the googleapis.com or gstatic.com domains, for performance and efficiency reasons. (Privacy Policy)
    Features
    Google Custom SearchThis is feature allows you to search the site. (Privacy Policy)
    Google MapsSome articles have Google Maps embedded in them. (Privacy Policy)
    Google ChartsThis is used to display charts and graphs on articles and the author center. (Privacy Policy)
    Google AdSense Host APIThis service allows you to sign up for or associate a Google AdSense account with HubPages, so that you can earn money from ads on your articles. No data is shared unless you engage with this feature. (Privacy Policy)
    Google YouTubeSome articles have YouTube videos embedded in them. (Privacy Policy)
    VimeoSome articles have Vimeo videos embedded in them. (Privacy Policy)
    PaypalThis is used for a registered author who enrolls in the HubPages Earnings program and requests to be paid via PayPal. No data is shared with Paypal unless you engage with this feature. (Privacy Policy)
    Facebook LoginYou can use this to streamline signing up for, or signing in to your Hubpages account. No data is shared with Facebook unless you engage with this feature. (Privacy Policy)
    MavenThis supports the Maven widget and search functionality. (Privacy Policy)
    Marketing
    Google AdSenseThis is an ad network. (Privacy Policy)
    Google DoubleClickGoogle provides ad serving technology and runs an ad network. (Privacy Policy)
    Index ExchangeThis is an ad network. (Privacy Policy)
    SovrnThis is an ad network. (Privacy Policy)
    Facebook AdsThis is an ad network. (Privacy Policy)
    Amazon Unified Ad MarketplaceThis is an ad network. (Privacy Policy)
    AppNexusThis is an ad network. (Privacy Policy)
    OpenxThis is an ad network. (Privacy Policy)
    Rubicon ProjectThis is an ad network. (Privacy Policy)
    TripleLiftThis is an ad network. (Privacy Policy)
    Say MediaWe partner with Say Media to deliver ad campaigns on our sites. (Privacy Policy)
    Remarketing PixelsWe may use remarketing pixels from advertising networks such as Google AdWords, Bing Ads, and Facebook in order to advertise the HubPages Service to people that have visited our sites.
    Conversion Tracking PixelsWe may use conversion tracking pixels from advertising networks such as Google AdWords, Bing Ads, and Facebook in order to identify when an advertisement has successfully resulted in the desired action, such as signing up for the HubPages Service or publishing an article on the HubPages Service.
    Statistics
    Author Google AnalyticsThis is used to provide traffic data and reports to the authors of articles on the HubPages Service. (Privacy Policy)
    ComscoreComScore is a media measurement and analytics company providing marketing data and analytics to enterprises, media and advertising agencies, and publishers. Non-consent will result in ComScore only processing obfuscated personal data. (Privacy Policy)
    Amazon Tracking PixelSome articles display amazon products as part of the Amazon Affiliate program, this pixel provides traffic statistics for those products (Privacy Policy)