Creating run dependencies

You can control the order of node execution, outside of the basic left to right structure of your data flow, by adding wait or re-run dependencies (clocks and events) to your nodes. See:

Clocks

You can use clocks to enforce sequential dependencies between nodes that are not explicitly sequential as a result of the structure of your data flow.

For example, to delay the execution of the Order Info and Product Info nodes until the Transactions node has finished running:

Run dependencies

  1. Select the Transactions, Order Info and Product Info nodes.
  2. Right-click, and select Run Dependencies > Wait for Completed Run by.
  3. Select the node that you want to base the dependency on, in this case, the Transactions node. This means that the Transactions node will complete its run first, before the Order Info and Product Info nodes begin running.

Input clocks and output clocks indicating the wait dependency are shown on the canvas:

Wait dependency

Output clocks are displayed on the right of the node that is set to run first, and indicate that the execution of all connected nodes will be delayed until this node has finished running.

Input clocks are displayed on the left of all dependent nodes, and indicate that the execution of the node is dependent on another node having completed its execution first.

Tip: By combining the use of clocks with the Meta Check node, you can set up conditional paths within your data flow so that run time decisions about the data determine whether or not certain nodes are run. For example, you may only want to run a set of nodes if the input data contains a specific field, or if the input data contains a minimum number of records. See Conditional execution.

Events

You can use events to trigger the re-run of a node (and its dependencies) at a given point within the data flow. The rescheduled status indicates that a node will be re-run.

Caution: You can only base a re-run dependency on a node that has a property that allows you to enter script.

An "Example Re-Run Dependency" data flow is provided with your installation and shows how you can use re-run dependencies to add a loop to your data flow.

Import the "Example Re-Run Dependency" data flow into your installation of Data360 Analyze by using the Import Data Flows/Library Nodes command in the Directory (see the topic Sharing documents for more details).

When you open the data flow, you will see that the nodes have already been connected and configured, but at this point, the data flow does not contain any re-run dependencies:

To trigger a re-run of the Switch node after the Logic node has finished running, edit the example data flow as follows:

  1. Select the Switch and Logic nodes.
  2. Right-click, and select Run Dependencies > Re-Run on Event from.
  3. Select the node that you want to base the dependency on, in this case, the Logic node. This means that the Switch node and its dependencies will re-run when the Logic node has completed its execution.

The Create Data node contains the following data:

Text:string,Iteration:int
Test,0

The Increment loop node contains the following Script to cause a counter to increment:

emit Text,Iteration+1 as Iteration

The Logic node contains the following Script to determine whether or not to trigger a re-run of the Switch node:

if Iteration == 10
then setSuccessReturnCode(0)
else setSuccessReturnCode(101)
emit *

An input and output event indicating the wait dependency are shown on the canvas:

Re-run dependency

When the data flow runs, the Switch node outputs the data from its first input to its output. At this point, the Logic node has not yet run, so there is no data on the optional input pin of the Switch node, and therefore it will only pass through the data from its mandatory input, in this case the Create Data node.

When the Logic node has executed and the value of iteration is not 10, return code 101 (see Success return codes) will tell the Switch node to reschedule and preserve the data from output 1 on the Logic node, which the Switch node can then take as its input. The Switch node, and any nodes that are downstream of the Switch node, are rescheduled and re-run using the data on the Switch node's optional input.

Each time they run, the Iteration Count property on the individual nodes will also increment by 1. The rescheduled nodes will continue to execute until the value of Iteration is equal to 10, at which point the Logic node will return a 0 code, which indicates successful execution, and the nodes downstream of the loop will execute.

Tip: To execute a single iteration of your loop, select and run only the node with the output event, in this example, the Logic node. The execution will stop when it reaches the Logic node and all nodes that need to be re-run will be put into the rescheduled state.

Rescheduled state

Success return codes

The 100 return code signals to all connected nodes to change to the rescheduled state. Return codes 101 to 116 perform the same function with the additional behavior of preserving an output from 1 to 16, based on the corresponding code, for the current node. For example, return code 101 will preserve the data on output 1 of the current node and signal the node attached to its output event pin to change to the rescheduled state. Return code 102 will preserve the data on output 2 of the current node while signaling the node attached to its output event pin to reschedule.

These return codes can be set using the Script setSuccessReturnCode operator in the Transform (Superseded) node.

Creating run dependencies inside composites

You may want to configure run dependencies between nodes at the top level of your data flow and nodes that are contained within a composite.

To do this, you will first need to create a dependency between the top level node and the composite. Then, you can select the relevant contained nodes within the composite that you want to link the top level node to.

Clocks

Inside a composite, the Run Dependencies menu contains two additional options:

  • Composite > Wait for Upstream - connects the selected contained nodes to a composite-level input clock, which is connected via a clock to one or more nodes at the top level of your data flow, meaning that the execution of the selected contained nodes is dependent on another node (outside of the composite) having completed its execution first.
  • Composite > Downstream Waits - connects the selected contained nodes to a composite-level output clock, which is connected via a clock to one or more nodes at the top level of your data flow, meaning that the selected contained nodes will run before all connected nodes.

Events

Inside a composite, the Run Dependencies menu contains two additional options:

  • Composite > Re-Run on Upstream - connects the selected contained node to a composite-level input event, meaning that the re-run of this node will be triggered following the completed execution of another node (outside of the composite).
  • Composite > Downstream Re-Runs - connects the selected contained node to a composite-level output event, which is connected via an event to one or more nodes at the top level of your data flow, meaning that when the selected contained node runs, it will trigger a re-run of all connected nodes.

Tip: If you create a run dependency between two contained nodes (i.e. both nodes are within the composite), the input and output clocks are connected as usual, with no connections to composite-level input or output clocks.

Note: You may notice a performance degradation if you create a run dependency between two composites that each have a large number of contained nodes. If you run into this situation, see Composite performance for a workaround.

Removing run dependencies

To remove a run dependency, right-click the blue dependency connection line, and select Delete Connection.