In this current world of business, organizations are amassing and storing more data than ever before. We truly are in the age of Big Data, which often presents as many challenges for growing companies as it does benefits.
Most of the data being gathered by your organization is going to be used to improve something about the way you do business. Whether it’s information about how your users are utilizing your product, results gathered from your marketing efforts, or internal statistics about your development processes, your company’s constantly growing data is a major asset that, with the correct analysis, can increase your bottom line.
But along with that valuable data, your company is almost certainly also storing an increasing amount of data that has no real tactical value at all. Gartner has deemed this unmanaged information as “Dark Data.” Sure, it sounds a bit dramatic, but realistically, the increasing amount of this unstructured information being stored by organizations is a costly and potentially risky endeavor that some believe could become a major speed bump along the Big Data highway.
Let’s take a look at what dark data actually is, how it could impact your organization, and what steps you can take to manage it at your own organization.
What is Dark Data?
As with many buzz terms that float around the Web, the exact definition of “Dark Data” can be hard to nail down. According to Gartner, which originally coined the term, dark data is defined as, “the information assets organizations collect, process and store during regular business activities, but generally fail to use for other purposes.”
By this definition, much – if not most – of the information your organization stores could be referred to as dark data. This is because, as useful as data can be, the majority of the information we tend to hold on to is simply collateral, in that we feel the need to hold onto it in case you need to prove that something occurred in the past, but is almost entirely obsolete for any other use.
Specific examples of what could make up all of your dark data will be wide-ranging on a company-to-company basis, but any of the following could absolutely fall under this fairly broad term if they are outdated or unstructured:
- Customer Information
- Log Files
- Account Information
- Previous Employee Data
- Financial Statements
- Raw Survey Data
- Email Correspondences
- Notes or Presentations
- Old Versions of Relevant Documents
What’s the Problem with Dark Data?
There are many issues associated with dark data that can become more prevalent as time goes by. If you think of dark data as the clutter that is amassed inside the house of a hoarder, the first problem becomes obvious: Space. As that unorganized data continues to grow, it takes up storage that could otherwise be used for your valuable assets. More storage means more overhead costs, which – particularly in the era of Big Data – is already a significant concern in most organizations.
Aside from increased storage costs, having large amounts of unstructured or unorganized data can potentially lead to serious security risks. Along with outdated and seemingly useless documents, dark data will likely also contain sensitive, proprietary information. If you haven’t seen the news, data breaches – like the one that just rocked Sony Pictures – are becoming more and more prevalent each week. Just because employees at your organization don’t want to take their time to go through piles of old information doesn’t mean that hackers aren’t willing to mine that data for years-old embarrassments that your company had hiding in the basement.
On the other end of the spectrum, your organization may also be missing out on some great opportunities by allowing dark data to steadily build up in your database. Along with extremely sensitive information that could be potentially harmful in the case of a breach, there’s likely going to be a lot of untapped potential inside that mass of information. As with the hoarder and their overabundance of useless stuff, it’s difficult for your company to find the information that could be truly valuable amid a giant mass of unstructured legacy data.
How Can Dark Data be Managed?
While you’ll likely never be able completely rid yourself of legacy data, that’s not necessarily a bad thing. Your goal shouldn’t be to toss out any information you’re not currently using. Rather, it should be to have a process in place that allows you to manage and organize your legacy data in order to keep the risks and costs associated with dark data at reasonable limit.
Audit and Prune your Content Store
Do regular audits of your entire content store and make sure you have a process for getting rid of the old, unneeded data. Nail that down as early as possible, and stick to it moving forward. This won’t necessarily make up for the lack of organization of your previous information, but it will surely slow the build-up of new dark data, which will be helpful in the future.
Part of that process should include the pruning of old data. This isn’t necessarily data dumping, but it’s a bit more than mining for hidden gems. The goal here is simply to provide more structure to your legacy data overtime so you can easily decipher what is necessary to hang on to and for how long.
Anytime you can find a new use for old data is a big win – it’s like finding €5 in that pair of pants you hadn’t worn in a month. That’s why, rather than dumping old data, I’d recommend to simply find a manageable format for it. That way, when (or if) you ever actually need that information, you’ll have exactly what you need at arms length.