DeepSQL

Analyzes the supplied SQL and produces a set of nodes and edges to enable data lineage.

 

To run the DeepSQL node, configure the Body property by typing a SQL query, or select the (from Field) variant of the property to reference an input field containing SQL code. You can use this node to analyze SQL code from the Database Metadata node.

 

The DeepSQL node analyzes the supplied SQL and then provides the following outputs:

  • Nodes - Provides a list of node names and node types. These are the objects in the database.
  • Edges - Provides a list of edges. These are the relationships between objects in the database.
  • Node Types - Provides a list of node types, for example, tables and columns.
  • Edge Types - Provides a list of edge types.
  • Errors - If you see records on the Errors output pin, check that the SQL code that you have entered in the Body property is correct.

Example

You have the following SQL code:

select emp.first, emp.last, emp.id, sum(vac.hours)
from employee emp, vacation vac
where emp.id = vac.id

 

In this example, there are two tables (employee and vacation). The employee table has columns for first, last and id. The vacation table has columns for id and hours. There is also a 'Result Set' (RS-1) which consists of four columns (first, last, id, and sum(vac.hours).

You can use the DeepSQL node to analyze the SQL, and a set of nodes and edges will be constructed to generate data lineage.

The Nodes output pin contains all of the nodes reported as a result of the analysis, with fields for the following:

  • The name and type of the parent of the node (where applicable).
  • The name and type of the node.
  • The source code corresponding to the node.
  • A unique ID of the node.

In this example, the first column on the employee table would have a parent name of 'employee', a parent type of 'table', a node name of 'first', and a node type of 'column'.

 

The Edges output pin contains all of the edges reported by the analysis, with a row for every edge in the analysis.

In this example, the first column on the Result Set is a direct relationship from the first column on the employee table. This will be captured as an edge of type 'fdd' which indicates that this is a direct relationship from the source to the target.

For more details on the nature of the supported relationships and their meaning, see https://github.com/sqlparser/sqlflow_public/blob/master/dbobjects_relationship.md

Properties

DbType

Select a database type to connect to.

Choose the (from Field) variant of this property to specify the name of an input field containing the database type.

A value is required for this property.

Body

Supply the SQL source code used as input to generate lineage information.

Choose the (from Field) variant of this property to specify the name of an input field containing SQL source code.

A value is required for this property.

PassThroughFields

Optionally specify which input fields will "pass through" the node unchanged from the input to the output, assuming that the input exists. The input fields specified will appear on those output records which were produced as a result of the input fields. Choose from:

  • All - Passes through all the input data fields to the output.
  • Used - Passes through all the fields that the node used to create the output.
  • Unused - Passes through all the fields that the node did not use to create the output.
  • None - Passes none of the input data fields to the output; as such, only the fields created by the node appear on the output.

If a naming conflict exists between a pass-through field and an explicitly named output field, an error will occur.

The default value is None.

FailureBehavior

Optionally specify what to do when the SQL is invalid. Choose from:

  • Error - Report error and stop further processing.
  • Log - Log a warning message and skip the input.
  • Ignore - Skip the input.

Inputs and outputs

Inputs: 1 optional.

Outputs: Nodes, Edges, Node Types, Edge Types, Errors.