Database Metadata

Uses third-party JDBC drivers to connect to and query the metadata of the database.

 

You can use this node to extract SQL code from a database which you can then feed into the DeepSQL node for lineage analysis. To configure the Database Metadata node:

  1. Configure the required properties to connect to your database. If you are connecting to a database that uses a driver which is pre-installed with Data360 Analyze, follow the Simple database connection steps. Or, if you are connecting to a database that uses a driver which is not shipped with Data360 Analyze, see Advanced database connection.

    Tip: For a list of the drivers which are included with your installation, see Acquiring data from a database.

  2. If you are working with a large database, it is recommended that you filter your queries, see Filtering your queries.

 

The Database Metadata node has the following outputs:

  • Metadata - Provides metadata about the data source that you are connecting to.
  • Catalogs - Provides a list of catalogs in the database.
  • Schemas - Provides a list of schemas in the database.
  • Tables - Provides a list of tables and views in the database.
  • Table Columns - Provides a list of tables, views and columns in the database.
  • Procedures - Provides a list of procedures in the database.
  • Procedure Parameters - Provides a list of procedure parameters in the database.
  • Functions - Provides a list of functions in the database.
  • Function Parameters - Provides a list of function parameters in the database.
  • Triggers - Provides a list of triggers in the database.
  • Object Definitions - Provides a list of SQL statements which you can use as input to the DeepSQL node. These SQL statements define the objects of the database that you are connected to. The Notes field on this output pin provides the following information:
    • 0 - Body is valid
    • 1 - Body is invalid (more than 4000 bytes)
    • 2 - Body is unavailable (for example, it is not supported)
    • 3 - Body is encrypted

In this topic:

Example

You are using the Database Metadata node to connect to and query a SQL Server database.

Click the Object Definitions output pin to expand the data viewer:

 

Each record shows a database object and the corresponding SQL statements that are needed to create the object. In the following example, the ObjectCode field contains the required SQL statements to create a view, a procedure, a trigger and a table from the SQL Server database:

Object Definitions output pin

 

The Object Definitions output pin is connected to a DeepSQL node, which analyzes the SQL statements to produce a set of nodes and edges to enable data lineage.

Database Metadata node

See DeepSQL for an example of working with the DeepSQL node.

Filtering your queries

  1. When working with a large database, initially, it is recommended that you set the ShowDBInfoOnly property to True. This will allow you to obtain database metadata and schema information.
  2. Click the Schemas output pin to view the schema information in the data viewer. Identify the schema that you want to analyze.
  3. Specify the schema in the SchemaFilter property to filter the information that is retrieved when you next run the node.

Note: The Database Metadata node contains a number of properties that you can use to filter your queries. In most cases, you can either type a name, and an exact match will be performed, or you can type a string followed by a % character, and in this case a stem match will be performed. For the CatalogFilter property, the only option is to type the name of a catalog and an exact match is performed.

To filter by more than one item, specify the filters in a comma separated list. For example, to retrieve only the 'INVOICES' and 'CUSTOMERS' tables, set the TableViewFilter property to INVOICES,CUSTOMERS.

Note: System catalogs and system schemas are excluded by default.

Database-specific information

The information that is output by the Database Metadata node varies depending on the database that you are connecting to. For example, Oracle does not support catalogs, so the Tables output pin will display NULL for each record in the Catalog column.

The following table outlines which database components are supported by some of the most common database connections:

  Database-supported components
Database Schemas

Catalogs

Tables Views Functions Procedures

Triggers

DB2 P O P P P P P
Hana P O P P P P P
Hive P O P P O O O
MS Access O O P O P P O
MySQL / MariaDB O P P P P P P
Oracle P O P P P P P
PostgreSQL P

P

P P P P P
RedShift P

P

P P P P O
Snowflake P

P

P P P P O
SQL Server P

P

P P P P P
Sybase P

P

P P P P P
Teradata P O P P P P P

DB2

Notes:

  • Supports z/OS. If you are using z/OS, you will require an additional license file to connect which is typically named: db2jcc_license_cisuz.jar

    Place the license file in the following directory: <site directory>/lib/java/db/db2

    For example: C:/users/<username>/AppData/Local/Data360Analyze/site-7731/lib/java/db/db2

    Then, configure the DbDriverClasspath property to reference this, as follows:

    %ls.brain.javaLibDir%/db/db2

    %ls.appDataDir%/lib/java/db/db2/db2jcc_license_cisuz.jar

  • Modules are not supported in Data360 Analyze or Data360 Govern. This means that, for example, if there are two functions with the same name in two different modules it would not be possible to differentiate the two unless the module was part of the path.

Permissions:

  • No additional permissions are required.

System schemas for non-z/OS systems:

  • NULLID, SQLJ, SYSCAT, SYSFUN, SYSIBM, SYSIBMADM, SYSIBMINTERNAL, SYSIBMTS, SYSPROC, SYSPUBLIC, SYSSTAT, SYSTOOLS

System schemas for z/OS systems:

  • ADB, CLARKG, DSN8110, DSN81110, DSN811SA, DSN8BQRY, DSNRGCOL, IBMUSER, Q, RUNSTATS, SYSIBM, SYSIBMTS, SYSTOOL

Hana

Notes:

  • Does not support macros or parameterized views.

Permissions, method one:

  1. Create a role by running the following command, where <rolename> is replaced with the name of the role: create role <rolename>
  2. Grant the role to the required user by running the following command, where <rolename> is replaced with the name of the role and <user> is replaced with the name of the user: grant <rolename> to <user>
  3. Grant permission to the role by running the following command, where <rolename> is replaced with the name of the role: grant catalog read to <rolename>

Permissions, method two:

  1. Grant a special internal role to the user by running the following command where <user> is replaced with the name of the user: grant SAP_INTERNAL_HANA_SUPPORT to <user>

System schemas:

  • HANA_XS_BASE, SAPDBCTRL, SAP_REST_API, SAP_XS_LM, SAP_XS_LM_PE, SAP_XS_LM_PE_TMP, SAP_XS_USAGE, SYS, _SYS_AFL, _SYS_AUDIT, _SYS_BI, _SYS_BIC, _SYS_DATA_ANONYMIZATION, _SYS_EPM, _SYS_EPM_DATA, _SYS_LDB, _SYS_PLAN_STABILITY, _SYS_REPO, _SYS_RT, _SYS_SECURITY, _SYS_SQL_ANALYZER, _SYS_STATISTICS, _SYS_TASK, _SYS_TELEMETRY, _SYS_WORKLOAD_REPLAY, _SYS_XS

Hive

Notes:

  • Catologs display " " instead of NULL.

Permissions:

  • No additional permissions are required.

System schemas:

  • information_schema, sys

MS Access

Notes:

  • The Object Definitions pin only includes tables.

MySQL / MariaDB

Notes:

  • MariaDB is an open source version of MySQL.

Permissions:

  • Run the following two commands for each database where metadata is needed, replacing <database> and <user> with the name of the database and the name of the user:
    • grant select, execute, trigger, show view on <database>.* to '<user>'@'%'
    • grant select on mysql.proc to '<user>'@'%'

MariaDB system catalogs:

  • information_schema, mysql, performance_schema

MySQL system catalogs:

  • information_schema, mysql, performance_schema, sys

Oracle

Notes:

  • Packages are not supported, that is, the node will not output any object code defined in a package. Materialized / parameterized views are not supported.
  • The Procedures output pin shows both functions and procedures. The Functions output pin also shows functions.
  • The "select any dictionary" system privilege may be required when connecting to Oracle.
  • Oracle has the ability to define objects that are invalid. These objects are not included in the Object Definitions pin.
  • Oracle has the ability to specify whether triggers are enabled or disabled. Disabled triggers are not included on the Object Definitions pin.
  • Triggers that are greater than 4000 characters will have a null value for the ObjectCode field and a value of 1 in the Notes field.

Permissions:

  • Use the following commands where metadata is needed, replacing <user> with the name of the user:
    • grant select any table to <user>
    • grant execute any procedure to <user>
    • grant select_catalog_role to <user>
    • grant create any trigger to <user>

System schemas:

  • SYSTEM, XDB, CTXSYS, ANONYMOUS, DIP, SYS, DBSNMP, RDSADMIN, OUTLN

PostgreSQL

Notes:

  • Catalogs associated with schemas are commonly NULL, implying the default catalog.

Permissions:

  • Use the following command where metadata is needed, replacing <user> with the name of the user:

    grant trigger on all tables in schema public to <user>

System schemas:

  • information_schema, pg_catalog

RedShift

Notes:

  • Due to a known issue with the Redshift driver, there is no support for procedure parameters or function parameters.

Permissions:

Run the following commands, where <user> is replaced with the name of the user:

  • grant select on all tables in schema public to <user>
  • grant execute on all procedures in schema public to <user>
  • grant execute on all functions in schema public to <user>

System schemas:

  • information_schema, pg_catalog

Snowflake

Permissions:

  1. Create a role.
  2. Assign the role to the user by running the following command, where <role> is replaced with the name of the role and <user> is replaced with the name of the user: grant role <role> to user <user>
  3. Grant permissions to the role by running the following commands, replacing <role>, <user>, <database> and <schema> with the relevant values:
    • grant ownership on all procedures in database <database> to role <role>
    • grant ownership on all views in database <database> to role <role>
    • grant ownership on all functions in database <database> to role <role>
    • grant select on all tables in database <database> to role <role>
    • grant usage on warehouse COMPUTE_WH to role <role>

      Note: COMPUTE_WH is the default warehouse of the database. Change this value if necessary.

    • grant usage on schema <schema> to role <role>
  4. Grant permission to all objects that will be created in the future by running the following commands, replacing <role>, <user>, <database> and <schema> with the relevant values:
    • grant ownership on future tables in database <database> to role <role>
    • grant ownership on future views in database <database> to role <role>
    • grant ownership on future procedures in database <database> to role <role>
    • grant ownership on future functions in database <database> to role <role>
    • grant usage on future schemas in database <database> to role <role>

System catalogs:

  • SNOWFLAKE_SAMPLE_DATA, SNOWFLAKE, UTIL_DB

System schemas:

  • INFORMATION_SCHEMA

SQL Server

Notes:

  • SQL Server has the ability to have encrypted functions, procedures and views. The table sys.syscomments contains a column called encrypted which has a value of 0 or 1, where 1 means that the object is encrypted. The ObjectCode field on the Object Definitions pin will populate the notes column with a value of 3 if the object is encrypted.

Permissions:

  • Run one of the following two commands for each database where metadata is needed, replacing <database> with the name of the database and <user> with the name of the user:
    • grant view definition on database::[<database>] to [<user>]

      Or

    • grant select, execute, alter on database::[<database>] to [<user>]

    Note: The first option (grant view definition on database::[<database>] to [<user>]) is more restrictive and therefore should be used in most circumstances.

System catalogs:

  • master, tempdb, model, msdb, rdsadmin

System schemas:

  • db_accessadmin, db_backupoperator, db_datareader, db_datawriter, db_ddladmin, db_denydatareader, db_denydatawriter, db_owner, db_securityadmin, INFORMATION_SCHEMA, sys, guest

Sybase

Permissions:

  • Run the following commands for each database where metadata is needed, replacing <database> with the name of the database and <user> with the name of the user:
    • use <database>
    • sp_adduser <user>
    • select 'grant select, execute on ' + name + ' to <user>' from sysobjects where type in ('U','P')

System catalogs:

  • model, sybsystemprocs, sybsystemdb, tempdb, sybsecurity, sybmgmtdb

Teradata

Notes:

  • Teradata does not support the retrieval of procedure bodies, so the Data360 Govern Database Connector will not show relationships based on procedures.
  • Teradata reports users as schemas.
  • Does not support macros.

Permissions:

  • No additional permissions are required.

System schemas:

  • DBC, dbcmngr, LockLogShredder, SQLJ, SYSJDBC, SYSBAR, SYSLIB, SYSSPATIAL, SYSUDTLIB, SYSUIF, Sys_Calendar, SysAdmin, SystemFe, TDMaps, TDQCD, TDStats, TD_SERVER_DB, TD_SYSFNLIB, TD_SYSXML, tdwm

Hints and tips

Properties

DbUser

Optionally specify the database username to connect to the database.

DbPassword

Optionally specify the password used to connect to the database.

DbType

Select a database type to connect to.

If you are connecting to a database that uses a driver which is not pre-installed with Data360 Analyze, then this property is not required. In this case, specify the connection details in the Advanced property group.

DbName

Specify the name of the database to connect to.

Note: If you're connecting to MS Access, enter the file path to your MS Access database. If you're connecting to an Oracle database, enter the system identifier (SID).

If you are connecting to a database that uses a driver which is not pre-installed with Data360 Analyze, then this property is not required. In this case, specify the connection details in the Advanced property group.

DbHost

Specify the database host to connect to e.g. localhost. Leave this property blank if connecting to MS Access.

If you are connecting to a database that uses a driver which is not pre-installed with Data360 Analyze, then this property is not required. In this case, specify the connection details in the Advanced property group.

DbUrl

Optionally specify the driver's database URL.

If a database is selected from the DbType property and a DbName and DbHost are specified, then this property is not required.

Otherwise, specify the URL in the following format: jdbc:subprotocol:subname. For example, jdbc:sqlserver://localhost:1433;databaseName=CustomerData. Provided by the driver's vendor.

See the help topic Acquiring data from a database for more information on configuring this property.

DbDriver

Optionally specify the class name of your JDBC driver.

If a database is selected from the DbType property and a DbName and DbHost are specified, then this property is not required.

Otherwise, specify the class name of your JDBC driver, as provided by the driver's vendor.

See the help topic Acquiring data from a database for more information on configuring this property.

DbOptions

Optionally specify database connection properties e.g. key=val&key=val2 or key=value entry per line.

DbDriverClasspath

Optionally specify the database driver classpath.

If a database is selected from the DbType property and a DbName and DbHost are specified, then this property is not required.

Otherwise, specify the classpath with each classpath entry on a new line.

Each classpath entry must be the location on the file system of either a single jar file, or a directory containing jars or class files.

ShowDBInfoOnly

Optionally specify whether to extract database metadata and Schema information only.

The default value is True.

CatalogFilter

Optionally specify a list (comma separated) of names to constrain the Catalogs retrieved.

Note: For the CatalogFilter property, you must type the name of a Catalog and an exact match is performed. For the other filter properties, you can type either a name, or you can type a string followed by the % character.

Note: You can either use CatalogFilter or ExcludedCatalogs, but not both.

SchemaFilter

Optionally specify a list (comma separated) of filters to constrain the Schemas retrieved. When using a filter, "%" means match any substring of 0 or more characters, and "_" means match any one character.

Note: You can either use SchemaFilter or ExcludedSchemas, but not both.

ExcludedCatalogs

Optionally specify a list (comma separated) of catalogs that will be ignored.

Note: You can either use CatalogFilter or ExcludedCatalogs, but not both.

ExcludedSchemas

Optionally specify a list (comma separated) of schemas that will be ignored.

Note: You can either use SchemaFilter or ExcludedSchemas, but not both.

TableViewFilter

Optionally specify a list (comma separated) of filters to constrain the Tables/Views retrieved. When using a filter, "%" means match any substring of 0 or more characters, and "_" means match any one character.

ProcedureFilter

Optionally specify a list (comma separated) of filters to constrain the Procedures retrieved. When using a filter, "%" means match any substring of 0 or more characters, and "_" means match any one character.

FunctionFilter

Optionally specify a list (comma separated) of filters to constrain the Functions retrieved. When using a filter, "%" means match any substring of 0 or more characters, and "_" means match any one character.

TriggerFilter

Optionally specify a list (comma separated) of filters to constrain the Triggers retrieved. When using a filter, "%" means match any substring of 0 or more characters, and "_" means match any one character.

ExcludeSystemCatalogs

Optionally specify whether to exclude the system predefined Catalogs or not.

The default value is True.

Tip: The Database-specific information section of this topic lists the catalog information that would be excluded for each database type if this property is set to True.

ExcludeSystemSchemas

Optionally specify whether to exclude the system predefined Schemas or not.

The default value is True.

Tip: The Database-specific information section of this topic lists the schema information that would be excluded for each database type if this property is set to True.

ShowObjectDefinition

Optionally specify whether the code associated with Views, Procedures, Functions, and Triggers should be output.

The default value is False.

DefaultToUnicode

Optionally specify whether string columns should default to Unicode type.

If set to False, string columns default to string type.

The default value is True.

UnicodeBinding

Optionally specify how Unicode strings should be passed to the JDBC driver. Choose from:

  • CHARACTER - Bind the string using the mechanism for CHAR and VARCHAR types.
  • NATIONAL CHARACTER - Bind the string using the mechanism for NCHAR and NVARCHAR types.

SqlDateType

Optionally select a datetype in which to map a SQL DATE. Choose from:

  • date
  • datetime

Inputs and outputs

Inputs: 1 optional.

Outputs: Metadata, Catalogs, Schemas, Tables, Table Columns, Procedures, Procedure Parameters, Functions, Function Parameters, Triggers, Object Definitions.