Acquiring data from a database

You can use the JDBC Query node to connect to and acquire data from an external database.

 

  1. Firstly, ensure that you have the required JDBC drivers.

    The following JDBC drivers are included with your installation in the following directory: <Data360Analyze installation directory>/lib/java

    DriverJAR file(s)
    Amazon Redshift

    RedshiftJDBC42-1.2.37.1061.jar

    MariaDB (can be used to access MySQL)

    mariadb-java-client-2.6.0.jar

    MS Access (UCanAccess)

    ucanaccess-4.0.4.jar

    Oracle

    ojdbc8-19.3.0.0.0.jar

    Postgres

    postgresql-42.2.10.jar

    Snowflakesnowflake-jdbc-3.12.6.jar
    SQL Server

    mssql-jdbc-9.4.0.jre8.jar

    IBM DB2jcc-11.5.0.0.jar

    All other JDBC drivers must be downloaded from the database vendor and installed separately:

    1. Download the required drivers from the database vendor. Refer to the vendor's driver installation guide for details of the required JAR files for your use case.
    2. It is recommended that you install any third party drivers that you have downloaded in the site configuration directory to ensure that the files are retained during an upgrade. Drivers for different databases should be stored in separate sub-directories, you will need to create a new sub-folder for each driver:

      <site directory>/lib/java/db/<driverName>

      For example: <site directory>/lib/java/db/redshift/RedshiftJDBC42-1.2.12.1017.jar

      Tip: The JAR files are often version-dependent, so you must ensure that the JAR version matches the version of the database that you want to connect to. Also note that if the database vendor provides you with a ZIP file containing the JAR file(s), you will need to unpack the ZIP file and copy only the JAR file(s).

  2. Next, use the JDBC Query node to import your data. See the JDBC Query node help for information on how to configure the node.

Note: If you are connecting to IBM DB2 and your instance requires an additional license file, for example sb2jcc_license_cisuz.jar, place this file in <site directory>/lib/java/db/db2, and ensure that this directory is referenced in the DbDriverClasspath property of the JDBC Query node.

Tip: If you are having trouble connecting to a database, see Troubleshooting database connection errors.

If you are connecting to a database that uses a driver which does not ship with Data360 Analyze, you can establish a connection with your database by configuring the properties in the Advanced property group, as described in the JDBC Query node help. The examples in the table below will help you to configure the DbUrl and DbDriver properties.

In general, JDBC connection URLs begin with jdbc:, then the subprotocol and subname vary depending on the driver. The general format for specifying the database connection URL is as follows:

jdbc:<subprotocol>:<subname>

When adding JDBC URL parameters, you have to override the DbUrl. This is the case for most connection strings. The following is an example:

jdbc:sqlserver://{{^DbHost^}}:<Database Port>;databaseName={{^DbName^}};user={{^DbUser^}};password={{^DbPassword^}};domain=<Domain Value>;integratedSecurity=true;authenticationScheme=NTLM

The following table shows some examples of how to format common database connection URLs and their corresponding driver classes (case sensitive), where host is the hostname of the database server, port is the port which the database server listens to and dbname is the name of the database you are connecting to. Following the standard connection information, you can specify additional behavior by specifying properties as key-value pairs in the DbOptions property.

Product

Database connection URL

(specified in the DbUrl node property)

JDBC driver class

(specified in the DbDriver node property)

Amazon Redshift* jdbc:redshift://<host>:<port>/<dbname> com.amazon.redshift.jdbc42.Driver
Cloudera

jdbc:hive2://<host>:<port>/<dbname>;[auth=noSasl] / [user=<username>;password=<password>]

If you are connecting to a Hive server 1 instance, replace hive2 with hive. By default, Hive uses port 10000.

You can use the AuthMech property to specify an alternative authentication method. For example, to configure Kerberos authentication, set the AuthMech property to 1.

 

org.apache.hive.jdbc.HiveDriver

 

Example with Kerberos authentication enabled:

jdbc:hive2://database-server@example.com:10000/default;principal=hive/hive2_<host>;AuthMech=1;KrbServiceName=hive;KrbHostFQDN=_mycluster.example.com;KrbRealm=example.com;LogLevel=4;LogPath=C:\Users\<user>\Downloads

com.cloudera.hive.jdbc4.HS2Driver

Cloudscape (co-hosted) jdbc:cloudscape:<dbname> COM.cloudscape.core.JDBCDriver
Cloudscape RMI (remote hosted) jdbc:rmi://<host>:<port>/jdbc:cloudscape:<dbname> RmiJdbc.RJDriver

DB2*

(A driver for the first option is shipped with your installation)

jdbc:db2://<host>:<port>/<dbname>

 

jdbc:as400://<host>:<port>/<dbname>

com.ibm.db2.jcc.DB2Driver

 

com.ibm.as400.access.AS400JDBCDriver

Derby JDBC (co-hosted) jdbc:derby:/<dbname> org.apache.derby.jdbc.EmbeddedDriver
Derby (network)

jdbc:derby://<host>:<port>/<dbname>;user=<username>;password=<password>

- or -

jdbc:derby://<host>:<port>/<path-dir>/<dbname>;user=<username>;password=<password>

org.apache.derby.jdbc.ClientDriver
Google BigQuery

jdbc:bigquery://<host>:<port>;ProjectId=<BigQuery project name>;OAuthType=<driver authentication type>

com.simba.googlebigquery.jdbc42.Driver
Hana jdbc:sap://host:port

Note: If you are connecting to Hana in the cloud you will need to add ?encrypt=true to the connection URL.

com.sap.db.jdbc.Driver
Hypersonic SQL (file)

jdbc:hsqldb:file:C:/data/<dbname>

Specify a file path that is either a relative or absolute path to the database file.

org.hsqldb.jdbcDriver
Hypersonic SQL (server) jdbc:hsqldb:hsql://localhost org.hsqldb.jdbcDriver
Hypersonic SQL (in-memory) jdbc:hsqldb:mem:aname org.hsqldb.jdbcDriver
Informix jdbc:informix-sqli://<host>:<port>/MyDatabase:INFORMIXSERVER=<dbname> com.informix.jdbc.IfxDriver
Ingres jdbc:edbc://<host>:<port>/<dbname> ca.edbc.jdbc.EdbcDriver
Interbase jdbc:firebirdsql:<host>/<port>:<dir_path>/<dbname>.gdb org.firebirdsql.jdbc.FBDriver
jTDS

jdbc:jtds:sqlserver://<host>/<dbname>;instance=<instancename>

 

To use authenticated security, append the domain and userNTLMv2 properties as follows:

jdbc:jtds:sqlserver://<host>/<dbname>;instance=<instancename>;domain=<domainname>;useNTLMv2=true

net.sourceforge.jtds.jdbc.Driver
Jturbo jdbc:JTurbo://<host>:<port>/<dbname> com.newatlanta.jturbo.driver.Driver

MariaDB*

(can be used to access MySQL)

jdbc:mariadb://<host>:<port>/<dbname>

 

To use SSH, append "useSSL=true&serverSslCert=<path to server SSL certificate>" for example: jdbc:mariadb://<host>:<port>/<dbname>;useSSL=true&serverSslCert=<path to server SSL certificate>

org.mariadb.jdbc.Driver
Microsoft SQL Server*

jdbc:sqlserver://[serverName[\instanceName][:portNumber]][;property=value[;property=value]]

 

To access SQL Server in Windows Authentication mode, append ;authenticationScheme=NTLM;domain=<Domain Value>;integratedSecurity=true and enter your Windows credentials in the DbUser and DbPassword properties.

For example:

jdbc:sqlserver://localhost:63369;authenticationScheme=NTLM;domain=<domain value>;integratedSecurity=true

Using this format allows you to specify a Windows domain, username and password that are different to the signed-in user of the PC, or the user running Data360 Analyze, if desired.

If you are connecting to a database in Azure, you must also include ;authentication=ActiveDirectoryIntegrated in the connection string. Additionally, to allow the system user's ID to be accessible to the Azure Active Directory system, the on-premise Active Directory must be federated with the Azure Active Directory.

When using Windows Authentication, you can also simply append ;integratedSecurity=true, and leave the DbUser and DbPassword properties empty, to use the current user for Windows authentication. 

In this case, the signed-in user of the PC is the current user, for a desktop instance. For a server instance, the service account used to run the Data360 Analyze application is the current user, as opposed to the end-user accessing the server instance through their browser.

com.microsoft.sqlserver.jdbc.SQLServerDriver
Microsoft SQL Server (JTurbo) jdbc:JTurbo://<host>:<port>/<dbname> com.ashna.jturbo.driver.Driver
Microsoft SQL Server (Sprinta) jdbc:inetdae7:<host>:<port>?database=<dbname> com.inet.tds.TdsDrive
MS Access*

jdbc:ucanaccess://<path-to-db>

For example:

jdbc:ucanaccess://c:/users/temp/myDB.mdb

net.ucanaccess.jdbc.UcanloadDriver
MySQL jdbc:mysql://<host>:<port>/<dbname> com.mysql.jdbc.Driver
Netezza

jdbc:netezza://<host>:<port>/<dbname>

org.netzezza,Driver
Oracle*

There are several ways to connect to an Oracle database over the thin client:

  1. Use Oracle SID:

    jdbc:oracle:thin:@//<host>:<port>:<system identifier (SID)>

    For example, if the database to which you want to connect resides on host prodHost, at port 1521, and the system identifier is ORCL:

    jdbc:oracle:thin:@prodHost:1521:ORCL

    - or -

  2. Use a Service Name:

    jdbc:oracle:thin:@//<host>:<port>/<service name>

    For example: jdbc:oracle:thin:@//myhost:1521/myservicename

    - or -

  3. Use the full TNS keyword-value syntax:

    For example if the database to which you want to connect resides on host prodHost, at port 1521, and the system identifier is ORCL:

    jdbc:oracle:thin:@(description=(address_list=(address=(protocol=tcp)(port=1521)(host=prodHost)))(connect_data=(INSTANCE_NAME=ORCL)))

Alternatively, to connect using the Oracle TNSnames system, you can do the following:

  1. Enter your query and credentials.
  2. In the DbType property, select Oracle.
  3. Do not enter any value in the DbName or DbHost properties.
  4. In the DbUrl property, enter: jdbc:oracle:thin:@<TNSName>
  5. In the JvmProperties property, add a reference to the folder that contains the tnsnames.ora file, for example: oracle.net.tns_admin=/path/to/oracle/network/admin/

Tip: The path that you enter in JvmProperties must contain the final forward slash (/).

Tip: Some installations use /home/<appuser>/.tnsnames.ora as their default tnsnames file (with the dot before tnsnames). If this is the case, you must rename it to tnsnames.ora (without the dot) in order for it to be recognized by the Oracle JDBC driver. On linux systems, you can deal with this by adding a symbolic link with the new name, pointing to the old file.

oracle.jdbc.driver.OracleDriver
PointBase jdbc:pointbase:embedded:<dbname> com.pointbase.jdbc.jdbcUniversalDriver
PostgresSQL* jdbc:postgresql://<host>:<port>/<dbname> org.postgresql.Driver
Snowflake*

jdbc:snowflake://<account_name>.snowflakecomputing.com/?<connection_params>

<account_name> is the full name of your account, provided by Snowflake.

<connection_params> is a series of one or more parameters in the format <parameter>=<value>

Include your schema details in the URL, or in the DbOptions property, in the following format: schema=<YourSchema>

For more information, see the Snowflake documentation: https://docs.snowflake.net/manuals/user-guide/jdbc-configure.html#jdbc-driver-connection-string

net.snowflake.client.jdbc.SnowflakeDriver
Sybase jdbc:sybase:Tds:<host>:<port>/<dbname> com.sybase.jdbc2.jdbc.SybDriver
Teradata

jdbc:teradata://<host>/<Param1>,<Param2>

<Param1> and <Param2> are as per the Teradata JDBC Driver User Guide, see: https://www.info.teradata.com/download.cfm?ItemID=1002565

Note: Teradata databases older than version 14.10 are not supported.

com.teradata.jdbc.TeraDriver

 

* Shipped with your installation.

Tip: To walk through an example of how to connect to a database which uses a pre-installed driver see Acquiring data from MS Access. For an example of the alternative connection method for drivers which are not shipped with , see Connecting to a Cloudera Hadoop cluster.

Troubleshooting database connection errors

Error Possible fix

The following classpath entry for the node does not exist: [folder].
Error Code: brain.execution.java.node.classpathNotFound

Check that the required JAR file(s) exist in the following location:

<Data360Analyze installation directory>/lib/java

Unable to load db driver: [driver] Check that you have correctly entered the name of the JDBC driver class in the DbDriver property.
Unable to connect: Error initialized or created transport for authentication: java.net.UnknownHostException Check that you have entered the correct connection URL in the DbUrl property, and ensure that your firewall allows you to connect to the database server.
Unable to connect: No suitable driver found for [url]

Check that you have correctly configured the DbUrl property.

For example, if you receive the error:
Unable to connect: No suitable driver found for jdbc:hive2://server:port

In this case, the correct protocol needed was jdbc:impala://server:port

Unable to connect: The UserId, Password or Account is invalid Check that you have entered the correct user ID and password in the DbUser and DbPassword properties. Also verify that the specified user has the required permissions to sign into the database.

SQL Server connection

When connecting to a SQL Server database using JDBC, if you receive an error stating that the TCP/IP connection was refused, you may need to enable the TCP/IP protocol for SQL Server, as follows:

  1. From the Windows Start menu, select All Programs > Microsoft SQL Server > Configuration Tools and open the SQL server Configuration Manager.
  2. Under SQL Server Network Configuration, click Protocols for SQL Server.
  3. Right-click TCP/IP and select Properties.
  4. Select the Protocol tab and set Enabled to Yes.
  5. Select the IP Addresses tab, scroll down to IP All and set the TCP Port to 1433.

 

You can read more about the specific functions of the JDBC nodes in the following node help topics: