Mister James: Unravelling the Data

Friday, October 8, 2010

Unravelling the Data – Ill-Formatted Data

When Bad Data Is Required

Fixing the data in the legacy system was not something that could be done in place. What I would refer to as ‘bad’ data was in some cases the glue that held reports together and made things like billing work.

This was one of the first things I had to resolve. My original approach was that I was going to try to “self-heal” the data through a combination of regular expressions, string replacements and templated hints and helpers. With the sheer number of discrepancies, this approach was DOA, and manual intervention was required.

A Side Order of Data Fixin’

I took a snapshot of the database and added additional columns to the tables where combined data was present. To understand ‘combined data’ a little background will help.

At various points in the application lifecycle the management had decided that they weren’t going to use the fields for their original purpose and started using them for a new one. In other scenarios, they decided to use the fields in one context for some customers and in a different context for other customers.

Depending on the customer and how long it took employees to shake old habits, these fields were used in differing ways over extended periods of time. Furthermore, even if there was a clear drawing point, none of the records in the database have a last modified date or any kind of audit log that reveals when a customer record is modified (in a meaningful way).

Thus, my side order approach faced another problem: there was no clear cut of the data and the existing applications needed to keep running. A snapshot of data today wouldn’t help in the transition 6 months down the road.

The Birth of the Transition Platform

The solution was to create an ASP.NET MVC application, hosted only on the intranet, that used much of my original approaches to identifying bad data, but left the “healing” to an end user.

Where possible, I used jQuery to look up context-based fixes through controller actions and mashed-up some save functionality by POSTing to the legacy ASP pages of the original application. Where it wasn’t possible (where functionality would be affected by changes to data) I created proxy tables to house the ‘corrected’ version of the data and wrote some monitors to periodically check to make sure that data was up-to-date.

I grouped functionality of the fixes into distinct controllers. For instance, anything related to a billing address was in the BillingAddressController with actions to support the corrections required for errors related to that piece. The models focused on model-view versions of the “bad data” and I used repositories to not only connect to the legacy system, but also to maintain a worklog of outstanding and completed tasks.

This worked great, as I was also able to say, at any given point, where we were at percentage-wise for correcting any set of data.

This process continues on today, and time is devoted to cleaning data each week. All three of the legacy systems continue to get (mis)used, though accuracy has been greatly improved. As users became aware of expected formats they also became more conscience of how they were entering the data into the older software.

This first win made the next steps more plausible.

Next up: Data that Could be Auto-Corrected

2 comments:

AnonymousOctober 9, 2010 at 9:37 AM
What happened to #2 of this series?
ReplyDelete
Replies
ChipOneOctober 9, 2010 at 11:35 PM
This is number 2!

Is the title confusing? That's ill as in 'bad' not III as in 'the third'. ;o)

-ChipOne
ReplyDelete
Replies

Add comment

A Bit About Me

From a technical perspective, I am a .Net developer, devoted to keeping up with new areas of learning. I try to grow in this regard through continued education, access to literature and time devoted to mentoring others -- a great way to learn. Most recently I have worked in the areas of development around ASP.NET MVC, Windows Phone 7, Windows Azure and the likes. I am a little too crazy about Linq (check out my Linq statement to convert IP addresses to decimals for proof). I lead our local .Net User Group and enjoy speaking publicly and the opportunity to teach to a large group of people. Personally, I am someone who believes in giving back to the community and my family is blessed to be able to give many hours to local and national charities, community organizations and to our church.

Mister James

Pages