Great Link: Testing in Production

The other day, I ran into this interesting article:

OpenSource.com: Testing in production: Yes, you can (and should)

This article was written about generic software development. I thought it would be a good exercise to apply it to PeopleSoft development in specific…

Point 1 and 2: You already do it

The PeopleSoft world varies from the author’s world. Everyone in PeopleSoft has multiple clones of their production system. Aside from maybe hardware differences, the systems are usually pretty identical to production. If it works in Development, there’s a very good chance it is going to work in production.

Still, to the author’s point, no system can be exactly the same. We would do well to go through the exercise of thinking through what is different between Production and our test systems. What are we actually testing in production?

Third-party integrations are a great place to start. Do your benefits providers give test systems to test integrations for each push to production? What about bank interfaces? I’ve worked with enough outside interfaces to know that there are some people that don’t offer a test integration path. Many times they have a system to test with during development and once you are live, there’s no going back to testing.

Emails are another thing that can only happen in production. Most people blank out email addresses in all non-production systems. Sure you can test one or two transactions by setting your own email address on the a test case. Usually that’s good enough.

Data is a big thing to consider. How often can you refresh your test instances? Data can make a big difference in testing. If there’s not a data scenario in the test environment, you can’t test that condition. As your data ages, the chance that you will run into those scenarios increases and the “exact” clone of production that we take for granted in the PeopleSoft world isn’t what we think. You should probably consider building a culture of constant refreshes with your development team.

Point 3: It’s probably fine

The author deals strictly with percentages. Sure it’s true: catching 80% of the bugs is easier than the last 10 or 20%. I think we need to consider risk in that calculation.

For example, a new hire that gets loaded into the wrong department might not be that big of a deal. A supervisor catches it later on in the onboarding process, HR corrects the problem, and the development team is notified and quickly corrects that new hire process. That’s no big deal.

What about payroll? If a check goes out for the wrong amount or taxes aren’t collected because of a wrong setting, that’s a bigger deal. I once had a client fail to test an interface that autoloaded the FICA Status. They had months of incorrect taxes to reconcile! Those kinds of bugs are a big deal.

All that to say, not all bugs are equal. A simple percentage isn’t a good measure. There’s two elements to consider. First, how quickly can the issue be caught? Maybe we can increase audits and checks and balances to catch issues quicker and reduce risk. Second, how big is the impact? Maybe we should increase testing practices for those critical issues.

Point 4: Bigger Prob

My biggest complaint in the PeopleSoft world is that we aren’t “shipping code everyday” for the fear of “causing self-inflicting damage”. All too often, I’ve worked with users who have a problem that is either already fixed or easy to fix but we can’t get it to production where the user can take advantage of it.

I had to look up what “canarying” means. It is basically rolling out new code and changes to small groups first before rolling it out to everyone. It’s not really a concept we can feasibly do in the PeopleSoft world. Everyone runs the same codeline. But, if we think about it, we can at times break our changes into smaller phases that affect only targeted groups of people at a time. Or, for new components, we can setup security to targeted users before opening it up to other people. Maybe we can steal the concept of canarying for our world.

Standardizing and Automating deployment is something we can always work on. The cleaner we can get things into production, the more often and confidently we can do it. When a critical patch or last minute tax update comes up, you’ll be glad for the efficiency.

The author talks about “exploring production”. In the PeopleSoft world, maybe we can work to create checks and balances that will detect problems sooner. Like I said earlier, one of the worst things that can happen is that a person gets overpaid each paycheck for six months.

Resources

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.