Considerations for planning an Azure Data Factory enterprise deployment: what to release, how to deploy.
Looking at my blog today, you would think no progress has been made on the Azure Stream Analytics ALM series I'm publishing there. In terms of writing, that is correct. But to be fair to myself, there is a lot I need to prepare before I can write the next post. And it is taking more time than I expected (to nobody’s surprise). Below I share what needs to be done before I can finish writing the ALM series.
The next article in the series will cover unit-testing, both local and running in the build phase of a CI pipeline. The main difficulty on this topic is that at the moment, unit testing is not supported natively for ASA in either VSCode or Visual Studio. The good news is that the Visual Studio ASA extension provides building blocks that can be wired together to offer the basics of it.
Since I was having a ton of fun with PowerShell recently, I decided to do just that, and hacked a solution that I published on GitHub. It’s basically a PowerShell script that leverages a fixture to run tests programmatically. It’s a bit hacky, but it does its thing. As I was polishing it and adding a couple of features, I realized that if it's a very good prototype, it's not production ready at all. Ironically it's missing unit-testing and a proper release pipeline. I couldn’t let that fly.
To unit-test a PowerShell script, it needs to be designed and built the right way: the way the PowerShell authors planned it originally. So I took the time to sit down and learn PowerShell for real. It helped me rewrite the script in a much cleaner way. Then I tried to unit-test it with Pester, but if the internals were clean, the modularity of it (interface and flow of operations) was still wrong. It was awkward to test anything, and I had to mock way too many things to get any results.
My script is just that, a script, that did everything in one single file (class?). So I went back in research mode and this time read the PowerShell Toolmaking book, and it helped a lot. What also helped was that I was going through A Philosophy of Software Design at the same time, which was reiterating the same ideas from a more agnostic point of view.
That's where I am today.
Now I needed to re-design my solution with the following principles in mind:
- One controller script calling multiple tool scripts
- The controller script is the user interface
- It needs to be narrow and deep
- It’s about context, it knows about ASA and what we’re trying to accomplish
- The tool scripts do only one thing each
- No context, no hard-coded values, as generic and reusable as possible
- Use verb-noun naming convention and only get data via parameter binding
- Are testable because self-contained
- The controller script is the user interface
The way to design either forms of script is to:
- Map business requirements via example calls with parameters (UX)
- Derive its unique signature from those calls + the constraints above
- Build+test it at the same time
I tried that for a single tool script I had already build (
New-AutAsaprojXML) and it was weird but good. That tool converts a JSON autoproj (config file generated by the ASA extension for VSCode) into an equivalent XML one (generated by Visual Studio, expected for a scripted run). It’s awkward because it still carries context – I don’t want to build a generic JSON to XML converter, so it knows about the asaproj details – but I remodeled it enough to be testable (object as input parameter, string as output parameter, no reading/writing files). And it’s now unit-tested, and it’s awesome. I already caught a couple of bugs thanks to it.
Now I must refactor the existing tool scripts, and extract the other tools from the controller script so I can test most of it independently. It sounds obvious but it isn't. Defining the boundaries between controller and tools is hard, at least to me. I'm not trained in that specific space, so I need a bit of practice to make more informed decisions. Going back to the tool above, should I strive to make it completely agnostic, and pass the whole JSON template as an input parameter? Both books tell me a soft yes, but it's an investment (of my time and brainpower) that I may never get the return on.
The whole conversation also opens another box: how am I supposed to structure my solution (as in files and folders). It needs to be easy to develop, test and also integrate well in the release pipeline. PowerShell offers a lot of options on how to package scripts (functions, modules…), choosing one is not straightforward for a beginner.
Finally, if I apply the design process to the controller script, thinking about how I want my end users to experience the solution, then comes another source of requirements for the refactoring. Because I don’t want them to have to run multiple scripts, in specific sub-folders, scripts they got from a unknown GitHub repo… as is the case now. The best UX I can imagine would look like:
Import-Module Asa.Unittest New-AutFixture -Path “C:\...” | Start-AutRun -AsaPath “C:\...”
That would be awesome! But that means publishing the tool in the PowerShell gallery, which comes with its own set of guidelines and best practices. Which also means a strong release pipeline, a project in itself.
All that for no visible feature shipped. Isn't that the whole story of software planning and delivery unfolding at the scale of my little product? I love it!