Dark Data & The Data Disconnect

Dark Data & The Data Disconnect

Dark DataI ran across a blog the other data that used the term “dark data” when describing an organization’s data that is collected during standard business activities that never really gets used for any purpose other than its original purpose.  Dark Data is that data the gets created, collected, stored but rarely revisited for any purpose after its initial use.

Before we get into the world of Dark Data, let me take a quick detour to set the stage for this particular topic.  I’ve written about Shadow IT many times in the past and specifically touched on the topic of ‘dark data’ in a post titled Data Disconnect. In that post, I wrote the following about the “data disconnect” problem:

You can do your job. And…you can do your job well.  But…your work is being done in a vacuum.

Your data might remain in that vacuum. But worse…whatever knowledge that data might create (or has created) will remain in that vacuum as well.

The data disconnect problem is real and is one that hasn’t been addressed much in recent years. The same can be said of Shadow IT, although many organizations I speak with tell me they have been able to get in front of Shadow IT recently.

So – back to Dark Data.  Do you think the data disconnect creates dark data?

While it may not be the only reason dark data exists, it is a main driver of the phenomenon in my experience.

Let’s take a look at a scenario taken from a real-world example.

The Scenario

You are the director of marketing for a mid-sized business.

One of your goals for this year is to revamp your web analytics engine to better understand your site visitors and help inform decisions on how well your on-site marketing is working and what types of content work best for your clients.

You reach out to your IT group to ask for help and receive very little in the way of support. The first response you get is a standard “it isn’t in the budget for the year” or “we’ll need to run that through our portfolio management system”.   You talk to the CIO who says budget and process are the drivers for this type of thing and there’s not much he can do.

What do you do?   You do the same thing every marketing person has done for years…you go out and start talking to web analytics software and consulting companies to see how they can help.

After a few months and a selection process, you decide to go with company X for their analytics platform. This platform requires very little in the way of integration with your website so it takes very little to get the few lines of code integrated into the site.  You do need to get the IT group to do add these lines of code (which takes much longer than you thought it should).

Your new analytics engine is in place and you are now able to see start asking and answering better questions about how visitors are consuming your website.

Your goal of implementing a new system has been met and your Marketing VP is ecstatic with your performance and gives you a nice bonus and some well deserved kudos.

Within a few months, other folks within the organization start you questions about the website. You are able to quickly and easily tell them how many visitors you’ve had, where they come from, what they do while there and various other types of questions. You are using your analytics data for its intended purpose.

Then one day you hear of a new project. This project is looking at all areas of customer ‘touch’ and analyzing how well those touch points are working and whether these touch points a can be improved upon.  The CIO asks you how to interface with your analytics data to begin to use it alongside other analytical data to begin to study the organization’s touch points with clients.

You ask your analytics provider how you can start to integrate their data with your local data. The answer “we don’t provide that capability”.

You’ve just realized you are trapped in the data disconnect. You have data that you can’t use anywhere else in the organization. You have data that is useless outside of your web analytics engine.   You have dark data.

Moral of the Story

This is a real story from a real organization.

The outcome of this particular scenario is the marketing department had to backtrack and start using another analytics engine that integrated with the other systems used throughout the organization. This backtrack left about 6 months of web analytics sitting out in the cloud in a provider’s system without any way of analyzing that data outside their tool.

This sort of thing happens every day in business. People go out and purchase their own applications and systems without IT involvement because the cloud makes it easy to do so. This is Shadow IT.

There are approaches to alleviate Shadow IT, but you can never fully stop it today. It is much too easy to plunk down a credit card number and order services via the cloud today.

The IT group can help to alleviate Shadow IT by being a bit more proactive as well as a bit more interactive when dealing with the organization. It’s not longer good enough to say “no” or “the process is…” or “our budget doesn’t allow…”.  The organization is looking for solutions and the IT group needs to start being the group that says ‘yes’ – or at least ‘let’s figure out how to make this work’.

Shadow IT can lead to a data disconnect which can lead to one form of dark data where data lives in your applications but cannot (or at least is not) used for any other analytical purpose.

How is your organizations fighting the data disconnect and dark data?

Image Credit: Data Packets on Flickr