|
Database Metadata
Uses third-party JDBC drivers to connect to and query the metadata of the database.
You can use this node to extract SQL code from a database which you can then feed into the DeepSQL node for lineage analysis. To configure the Database Metadata node:
- Configure the required properties to connect to your database. If you are connecting to a database that uses a driver which is pre-installed with Data360 Analyze, follow the Simple database connection steps. Or, if you are connecting to a database that uses a driver which is not shipped with Data360 Analyze, see Advanced database connection.
Tip: For a list of the drivers which are included with your installation, see Acquiring data from a database.
- If you are working with a large database, it is recommended that you filter your queries, see Filtering your queries.
Note: When connecting to Teradata, it is recommended that you use the latest version of the Teradata driver. Driver versions older than 17.10.0.26 can cause potential issues with this node where views reference objects which no longer exist.
The Database Metadata node has the following outputs:
- Metadata - Provides metadata about the data source that you are connecting to.
- Catalogs - Provides a list of catalogs in the database.
- Schemas - Provides a list of schemas in the database.
- Tables - Provides a list of tables and views in the database.
- Table Columns - Provides a list of tables, views and columns in the database.
- Procedures - Provides a list of procedures in the database.
- Procedure Parameters - Provides a list of procedure parameters in the database.
- Functions - Provides a list of functions in the database.
- Function Parameters - Provides a list of function parameters in the database.
- Triggers - Provides a list of triggers in the database.
- Object Definitions - Provides a list of SQL statements which you can use as input to the DeepSQL node. These SQL statements define the objects of the database that you are connected to. The Notes field on this output pin provides the following information:
- 0 - Body is valid
- 1 - Body is invalid (more than 4000 bytes)
- 2 - Body is unavailable (for example, it is not supported)
- 3 - Body is encrypted
In this topic:
Example
You are using the Database Metadata node to connect to and query a SQL Server database.
Click the Object Definitions output pin to expand the data viewer:
Each record shows a database object and the corresponding SQL statements that are needed to create the object. In the following example, the ObjectCode field contains the required SQL statements to create a view, a procedure, a trigger and a table from the SQL Server database:
The Object Definitions output pin is connected to a DeepSQL node, which analyzes the SQL statements to produce a set of nodes and edges to enable data lineage.
See DeepSQL for an example of working with the DeepSQL node.
Filtering your queries
- When working with a large database, initially, it is recommended that you set the ShowDBInfoOnly property to True. This will allow you to obtain database metadata and schema information.
- Click the Schemas output pin to view the schema information in the data viewer. Identify the schema that you want to analyze.
- Specify the schema in the SchemaFilter property to filter the information that is retrieved when you next run the node.
Note: The Database Metadata node contains a number of properties that you can use to filter your queries. In most cases, you can either type a name, and an exact match will be performed, or you can type a string followed by a %
character, and in this case a stem match will be performed. For the CatalogFilter property, the only option is to type the name of a catalog and an exact match is performed.
To filter by more than one item, specify the filters in a comma separated list. For example, to retrieve only the 'INVOICES' and 'CUSTOMERS' tables, set the TableViewFilter property to INVOICES,CUSTOMERS
.
Note: System catalogs and system schemas are excluded by default.
Database-specific information
The information that is output by the Database Metadata node varies depending on the database that you are connecting to. For example, Oracle does not support catalogs, so the Tables output pin will display NULL
for each record in the Catalog column.
The following table outlines which database components are supported by some of the most common database connections:
Database-supported components | |||||||
---|---|---|---|---|---|---|---|
Database | Schemas |
Catalogs |
Tables | Views | Functions | Procedures |
Triggers |
DB2 | P | O | P | P | P | P | P |
Hana | P | O | P | P | P | P | P |
Hive | P | O | P | P | O | O | O |
MS Access | O | O | P | O | P | P | O |
MySQL / MariaDB | O | P | P | P | P | P | P |
Oracle | P | O | P | P | P | P | P |
PostgreSQL | P |
P |
P | P | P | P | P |
RedShift | P |
P |
P | P | P | P | O |
Snowflake | P |
P |
P | P | P | P | O |
SQL Server | P |
P |
P | P | P | P | P |
Sybase | P |
P |
P | P | P | P | P |
Teradata | P | O | P | P | P | P | P |
DB2
Notes:
- Supports z/OS. If you are using z/OS, you will require an additional license file to connect which is typically named:
db2jcc_license_cisuz.jar
Place the license file in the following directory:
<site directory>/lib/java/db/db2
For example:
C:/users/<username>/AppData/Local/Data360Analyze/site-7731/lib/java/db/db2
Then, configure the DbDriverClasspath property to reference this, as follows:
{{%ls.brain.javaLibDir%}}/db/db2
{{%ls.appDataDir%}}/lib/java/db/db2/db2jcc_license_cisuz.jar
- Modules are not supported in Data360 Analyze or Data360 Govern. This means that, for example, if there are two functions with the same name in two different modules it would not be possible to differentiate the two unless the module was part of the path.
Permissions:
- No additional permissions are required.
System schemas for non-z/OS systems:
- NULLID, SQLJ, SYSCAT, SYSFUN, SYSIBM, SYSIBMADM, SYSIBMINTERNAL, SYSIBMTS, SYSPROC, SYSPUBLIC, SYSSTAT, SYSTOOLS
System schemas for z/OS systems:
- ADB, CLARKG, DSN8110, DSN81110, DSN811SA, DSN8BQRY, DSNRGCOL, IBMUSER, Q, RUNSTATS, SYSIBM, SYSIBMTS, SYSTOOL
Hana
Notes:
- Does not support macros or parameterized views.
Permissions, method one:
- Create a role by running the following command, where
<rolename>
is replaced with the name of the role:create role <rolename>
- Grant the role to the required user by running the following command, where
<rolename>
is replaced with the name of the role and<user>
is replaced with the name of the user:grant <rolename> to <user>
- Grant permission to the role by running the following command, where
<rolename>
is replaced with the name of the role:grant catalog read to <rolename>
Permissions, method two:
- Grant a special internal role to the user by running the following command where
<user>
is replaced with the name of the user:grant SAP_INTERNAL_HANA_SUPPORT to <user>
System schemas:
- HANA_XS_BASE, SAPDBCTRL, SAP_REST_API, SAP_XS_LM, SAP_XS_LM_PE, SAP_XS_LM_PE_TMP, SAP_XS_USAGE, SYS, _SYS_AFL, _SYS_AUDIT, _SYS_BI, _SYS_BIC, _SYS_DATA_ANONYMIZATION, _SYS_EPM, _SYS_EPM_DATA, _SYS_LDB, _SYS_PLAN_STABILITY, _SYS_REPO, _SYS_RT, _SYS_SECURITY, _SYS_SQL_ANALYZER, _SYS_STATISTICS, _SYS_TASK, _SYS_TELEMETRY, _SYS_WORKLOAD_REPLAY, _SYS_XS
Hive
Notes:
- Catologs display
" "
instead ofNULL
.
Permissions:
- No additional permissions are required.
System schemas:
- information_schema, sys
MS Access
Notes:
- The Object Definitions pin only includes tables.
MySQL / MariaDB
Notes:
- MariaDB is an open source version of MySQL.
Permissions:
- Run the following two commands for each database where metadata is needed, replacing
<database>
and<user>
with the name of the database and the name of the user:grant select, execute, trigger, show view on <database>.* to '<user>'@'%'
grant select on mysql.proc to '<user>'@'%'
MariaDB system catalogs:
- information_schema, mysql, performance_schema
MySQL system catalogs:
- information_schema, mysql, performance_schema, sys
Oracle
Notes:
- Packages are not supported, that is, the node will not output any object code defined in a package. Materialized / parameterized views are not supported.
- The Procedures output pin shows both functions and procedures. The Functions output pin also shows functions.
- The "select any dictionary" system privilege may be required when connecting to Oracle.
- Oracle has the ability to define objects that are invalid. These objects are not included in the Object Definitions pin.
- Oracle has the ability to specify whether triggers are enabled or disabled. Disabled triggers are not included on the Object Definitions pin.
- Triggers that are greater than 4000 characters will have a null value for the ObjectCode field and a value of 1 in the Notes field.
Permissions:
- Use the following commands where metadata is needed, replacing
<user>
with the name of the user:grant select any table to <user>
grant execute any procedure to <user>
grant select_catalog_role to <user>
System schemas:
- SYSTEM, XDB, CTXSYS, ANONYMOUS, DIP, SYS, DBSNMP, RDSADMIN, OUTLN
PostgreSQL
Notes:
- Catalogs associated with schemas are commonly
NULL
, implying the default catalog.
Permissions:
- Use the following command where metadata is needed, replacing
<user>
with the name of the user:grant trigger on all tables in schema public to <user>
System schemas:
- information_schema, pg_catalog
RedShift
Notes:
- Due to a known issue with the Redshift driver, there is no support for procedure parameters or function parameters.
Permissions:
Run the following commands, where <user>
is replaced with the name of the user:
grant select on all tables in schema public to <user>
grant execute on all procedures in schema public to <user>
grant execute on all functions in schema public to <user>
System schemas:
- information_schema, pg_catalog
Snowflake
Permissions:
- Create a role.
- Assign the role to the user by running the following command, where
<role>
is replaced with the name of the role and<user>
is replaced with the name of the user:grant role <role> to user <user>
- Grant permissions to the role by running the following commands, replacing
<role>
,<user>
,<database>
and<schema>
with the relevant values:grant ownership on all procedures in database <database> to role <role>
grant ownership on all views in database <database> to role <role>
grant ownership on all functions in database <database> to role <role>
grant select on all tables in database <database> to role <role>
grant usage on warehouse COMPUTE_WH to role <role>
Note:
COMPUTE_WH
is the default warehouse of the database. Change this value if necessary.grant usage on schema <schema> to role <role>
- Grant permission to all objects that will be created in the future by running the following commands, replacing
<role>
,<user>
,<database>
and<schema>
with the relevant values:grant ownership on future tables in database <database> to role <role>
grant ownership on future views in database <database> to role <role>
grant ownership on future procedures in database <database> to role <role>
grant ownership on future functions in database <database> to role <role>
grant usage on future schemas in database <database> to role <role>
System catalogs:
- SNOWFLAKE_SAMPLE_DATA, SNOWFLAKE, UTIL_DB
System schemas:
- INFORMATION_SCHEMA
SQL Server
Notes:
- SQL Server has the ability to have encrypted functions, procedures and views. The table
sys.syscomments
contains a column calledencrypted
which has a value of 0 or 1, where 1 means that the object is encrypted. The ObjectCode field on the Object Definitions pin will populate the notes column with a value of 3 if the object is encrypted.
Permissions:
- Run one of the following two commands for each database where metadata is needed, replacing
<database>
with the name of the database and<user>
with the name of the user:grant view definition on database::[<database>] to [<user>]
Or
grant select, execute, alter on database::[<database>] to [<user>]
Note: The first option (
grant view definition on database::[<database>] to [<user>]
) is more restrictive and therefore should be used in most circumstances.
System catalogs:
- master, tempdb, model, msdb, rdsadmin
Note: When you run against an Azure Synapse database, only metadata results for the current catalog are supported. Summary information as to which catalogs are in the system will be provided, but tables, views, procedures, functions, triggers and object definitions will only be retrieved for the current catalog.
System schemas:
- db_accessadmin, db_backupoperator, db_datareader, db_datawriter, db_ddladmin, db_denydatareader, db_denydatawriter, db_owner, db_securityadmin, INFORMATION_SCHEMA, sys, guest
Sybase
Permissions:
- Run the following commands for each database where metadata is needed, replacing
<database>
with the name of the database and<user>
with the name of the user:use <database>
sp_adduser <user>
select 'grant select, execute on ' + name + ' to <user>' from sysobjects where type in ('U','P')
System catalogs:
- model, sybsystemprocs, sybsystemdb, tempdb, sybsecurity, sybmgmtdb
Teradata
Notes:
- Teradata does not support the retrieval of procedure bodies, so the Data360 Govern Database Connector will not show relationships based on procedures.
- Teradata reports users as schemas.
- Does not support macros.
Permissions:
- No additional permissions are required.
System schemas:
- DBC, dbcmngr, LockLogShredder, SQLJ, SYSJDBC, SYSBAR, SYSLIB, SYSSPATIAL, SYSUDTLIB, SYSUIF, Sys_Calendar, SysAdmin, SystemFe, TDMaps, TDQCD, TDStats, TD_SERVER_DB, TD_SYSFNLIB, TD_SYSXML, tdwm
Hints and tips
- The user that is connecting to the database must have the required access permissions.
- If you are having trouble connecting to a database, see Troubleshooting database connection errors.
- For additional information, see JDBC Query and Acquiring data from a database.
Properties
DbUser
Optionally specify the database username to connect to the database.
DbPassword
Optionally specify the password used to connect to the database.
DbType
Select a database type to connect to.
If you are connecting to a database that uses a driver which is not pre-installed with Data360 Analyze, then this property is not required. In this case, specify the connection details in the Advanced property group.
DbName
Specify the name of the database to connect to.
Note: If you're connecting to MS Access, enter the file path to your MS Access database. If you're connecting to an Oracle database, enter the system identifier (SID).
If you are connecting to a database that uses a driver which is not pre-installed with Data360 Analyze, then this property is not required. In this case, specify the connection details in the Advanced property group.
DbHost
Specify the database host to connect to e.g. localhost
. Leave this property blank if connecting to MS Access.
If you are connecting to a database that uses a driver which is not pre-installed with Data360 Analyze, then this property is not required. In this case, specify the connection details in the Advanced property group.
DbUrl
Optionally specify the driver's database URL.
If a database is selected from the DbType property and a DbName and DbHost are specified, then this property is not required.
Otherwise, specify the URL in the following format: jdbc:subprotocol:subname
. For example, jdbc:sqlserver://localhost:1433;databaseName=CustomerData
. Provided by the driver's vendor.
See the help topic Acquiring data from a database for more information on configuring this property.
DbDriver
Optionally specify the class name of your JDBC driver.
If a database is selected from the DbType property and a DbName and DbHost are specified, then this property is not required.
Otherwise, specify the class name of your JDBC driver, as provided by the driver's vendor.
See the help topic Acquiring data from a database for more information on configuring this property.
DbOptions
Optionally specify database connection properties e.g. key=val&key=val2
or key=value
entry per line.
DbDriverClasspath
Optionally specify the database driver classpath.
If a database is selected from the DbType property and a DbName and DbHost are specified, then this property is not required.
Otherwise, specify the classpath with each classpath entry on a new line.
Each classpath entry must be the location on the file system of either a single jar file, or a directory containing jars or class files.
ShowDBInfoOnly
Optionally specify whether to extract database metadata and Schema information only.
The default value is True.
CatalogFilter
Optionally specify a list (comma separated) of names to constrain the Catalogs retrieved.
Note: For the CatalogFilter property, you must type the name of a Catalog and an exact match is performed. For the other filter properties, you can type either a name, or you can type a string followed by the %
character.
Note: You can either use CatalogFilter or ExcludedCatalogs, but not both.
SchemaFilter
Optionally specify a list (comma separated) of filters to constrain the Schemas retrieved. When using a filter, "%" means match any substring of 0 or more characters, and "_" means match any one character.
Note: You can either use SchemaFilter or ExcludedSchemas, but not both.
ExcludedCatalogs
Optionally specify a list (comma separated) of catalogs that will be ignored.
Note: You can either use CatalogFilter or ExcludedCatalogs, but not both.
ExcludedSchemas
Optionally specify a list (comma separated) of schemas that will be ignored.
Note: You can either use SchemaFilter or ExcludedSchemas, but not both.
TableViewFilter
Optionally specify a list (comma separated) of filters to constrain the Tables/Views retrieved. When using a filter, "%" means match any substring of 0 or more characters, and "_" means match any one character.
ProcedureFilter
Optionally specify a list (comma separated) of filters to constrain the Procedures retrieved. When using a filter, "%" means match any substring of 0 or more characters, and "_" means match any one character.
FunctionFilter
Optionally specify a list (comma separated) of filters to constrain the Functions retrieved. When using a filter, "%" means match any substring of 0 or more characters, and "_" means match any one character.
TriggerFilter
Optionally specify a list (comma separated) of filters to constrain the Triggers retrieved. When using a filter, "%" means match any substring of 0 or more characters, and "_" means match any one character.
ExcludeSystemCatalogs
Optionally specify whether to exclude the system predefined Catalogs or not.
The default value is True.
Tip: The Database-specific information section of this topic lists the catalog information that would be excluded for each database type if this property is set to True.
ExcludeSystemSchemas
Optionally specify whether to exclude the system predefined Schemas or not.
The default value is True.
Tip: The Database-specific information section of this topic lists the schema information that would be excluded for each database type if this property is set to True.
ShowObjectDefinition
Optionally specify whether the code associated with Views, Procedures, Functions, and Triggers should be output.
The default value is False.
DefaultToUnicode
Optionally specify whether string columns should default to Unicode type.
If set to False, string columns default to string type.
The default value is True.
UnicodeBinding
Optionally specify how Unicode strings should be passed to the JDBC driver. Choose from:
- CHARACTER - Bind the string using the mechanism for CHAR and VARCHAR types.
- NATIONAL CHARACTER - Bind the string using the mechanism for NCHAR and NVARCHAR types.
SqlDateType
Optionally select a datetype in which to map a SQL DATE. Choose from:
- date
- datetime
Inputs and outputs
Inputs: 1 optional.
Outputs: Metadata, Catalogs, Schemas, Tables, Table Columns, Procedures, Procedure Parameters, Functions, Function Parameters, Triggers, Object Definitions.
Note: The second pin (Catalogs) always contains all the catalogs in the database, regardless of the catalog filter. The third pin (Schemas) contains all the schemas in the database, but respects the catalog filters. This means that Schemas only displays the ones that are present in the filtered catalogs, while ignoring schema filtering. The other pins respect all filters.