Passing Parameters To U-SQL Scripts From Azure Data Factory V2
Sometimes you write Part 2 of your documentation before you write Part 1. This is one of those times.
We’ll post Part 1 – which is ACTUALLY CONFIGURING Data Factory to have the authority to execute these scripts once we have finished with our 16 pages of documentation and steps that it takes. In the meantime, this post is meant to document the steps required to pass a global pipeline parameter to the downstream U-SQL script and to configure the number of AUs needed to execute the script.
Spoiler: It took a while to figure it out.
SECTION 1 – PASSING A GLOBAL PARAMETER
First and foremost – when you have selected the pipeline itself, you have three nav items to choose from: General, Parameters and Output. Under the Parameters tab, you have the ability to add a new parameter by clicking “+ New” and filling out a name, type and default value. Remember this name – case sensitive.
Once you have done this, select the USQL Task by clicking on it (or dragging on over) in the pipeline.
Which will provide you with more options at the bottom of the screen. Lets assume you have already configured your ADLA Account and linked to your script that will be executed (again, this is coming in the sequel to this post which will be titled “Part 1”). Select the “Advanced” tab and at this point you are going to want to insert the JSON formatted parameter that gets merged into the code behind the executable pipeline.
As you can see from above, we will want this to become a part of the “typeProperties” array in an explicitly defined parameter called “state”. A few items to note:
The value I am assigning this to is called “state” and this will get added to the top of the U-SQL script I would have configured for this task. You read that right – the ADF task will ADD A LINE to the top of your U-SQL Script containing a DECLARATION to the new variable. If you want to avoid conflicts, your ADLA script will need to include DECLARE EXTERNAL [varname] so that value can be overwritten by what ADF creates. This was immensely confusing to me at first… there is now better documentation for the concept of EXTERNAL U-SQL variables here: https://msdn.microsoft.com/en-us/azure/data-lake-analytics/u-sql/declare-variables-u-sql
The reference to a global, pipeline parameter will always be in the “@pipeline().parameters.[$VariableName]” format. That is how you will reference a pipeline variable in downstream application.
By nesting the parameters within the “typeProperties” array – the code behind will nicely place the parameters portion where it belongs.
See – nicely added to the code behind:
SECTION 2 – ADJUSTING AUS FOR CONSUMPTION
This part was a little less intuitive because there is actually a section for configuring “Degrees of Parallelism” in the “Parameters” tab of the actual ADLA Task. But this doesn’t work. Really.
For whatever reason, this creates a mess of JSON and arrays that are read as strings and can’t be executed from the ADLA process. But! Since the Advanced tab allows you to overwrite fairly easily the code-behind JSON, you can navigate back to the Advanced tab and toggle the degreeOfParallelism in the correct format from here.
This will add the degreeOfParallelism into the code behind in the correct format nested within the typeProperties and will execute correctly in ADLA.
That’s it! As promised we will follow up later with instructions on configuring ADF to execute the scripts!