Create Data

Reads a small set of data stored within the data flow. This node is typically used when a small amount of unchanging data needs to exist within a data flow, often for test cases or examples.

The content should be standard CSV-formatted (RFC 4180) data, with a proper header line (name:type), and data properly separated and escaped.

If the fields in your data set are not comma-separated, enter the delimiter in the FieldDelimiter property. For example, if the fields are tab-delimited, enter \t.

If you want to quickly create a large data set to use as a test case, enter an integer in the Copies property to specify how many times to repeat the data in the output, see Autogenerating test data.

Example

You have the following records in the Data property:

First Name:string,Last Name:string
Charles,Dickens
Jane,Austen
William,Shakespeare
George,Orwell

The data is comma-separated, so you do not need to specify a different value in the FieldDelimiter property (comma is the default).

You want to output each record twice, so in the Copies property you type 2.

The node produces the following output:

First Name

string

Last Name

string

Charles Dickens
Jane Austen
William Shakespeare
George Orwell
Charles Dickens
Jane Austen
William Shakespeare
George Orwell

 

See also Data types.

Autogenerating test data

You can use the Create Data node to generate a random data set for testing:

  1. In the first line of the Data property, enter the header information of the data set that you want to create.
  2. In the second line of the Data property, enter instructions for how to generate the data, in the following format:

    <<<TYPE|keyword=value|keyword=value|...>>>

    For more information and examples, see Supported types.

  3. In the Copies property, type the number of rows that you want to generate.

For example, you could generate 100,000 rows of random data by typing 100000 in the Copies property and entering the following text in the Data property:

ID:unicode,First:unicode,Last:unicode,Email:unicode,Amount Owed:unicode,Last Payment Date:unicode,Client Since:unicode,Status:unicode
<<<ID|low=10000|format=%07d>>>,<<<NAME.FIRST>>>,<<<NAME.LAST>>>,<<<EMAIL>>>,<<<DOUBLE|low=100.0|high=500.0|format=%.2f|nulls=.1>>>,<<<DATE|low=2018-01-01 00:00:00|high=2019-03-31 00:00:00|format=dd/MM/yyyy>>>,<<<DATE|low=2012-01-01 00:00:00|high=2019-03-31 00:00:00|format=dd/MM/yyyy>>>,<<<ENUM|values=Platinum^Gold^Silver>>>

Considerations

  • Each variable must be used in a 1:1 relation to a field. It is not possible to concatenate multiple variables into a single field.
  • Following the header, there can only be a single row of variables. It is not possible to have alternating outputs by specifying two rows of variables.
  • Each column is calculated independently. For example, if you have name and email columns, the randomly generated names and the names within email addresses will not match. Likewise, having zip codes and state columns will yield incorrect combinations. The sample data does not validate nor reconcile against itself during creation. The goal is to provide randomly generated pseudo-accurate data for testing purposes.

Supported types

The following table lists all valid types and their associated supported keywords:

Type Description Supported keywords Example
ID An auto-incrementing number.
  • low (default: 1)
  • format (default: %07d)

<<<ID|low=10000|format=%07d>>>

DATE A random Date, DateTime or Time field.
  • low
  • high
  • format (default: yyyy-MM-dd HH:mm:ss)

<<<DATE|low=2018-01-01 00:00:00|high=2019-03-31 00:00:00|format=dd/MM/yyyy>>>

LONG A random long valued integral number.
  • low
  • high
  • format (default: %d)

<<<LONG|low=1000000000000000000|high=9223372036854775807|format=%d|nulls=.1>>>

DOUBLE

A random floating point value.
  • low
  • high
  • format (default: %f)

<<<DOUBLE|low=100.0|high=500.0|format=%.2f|nulls=.1>>>

ENUM A random value from a simple enumerated type, for example Platinum, Gold, Silver.
  • low
  • high
  • format (default: %f)

<<<ENUM|values=Platinum^Gold^Silver>>>

Note: The keyword nulls is supported for all types and should be a number between 0.0 and 1.0 to represent the percentage of nulls returned, as opposed to a value. The default is 0.0 (no nulls).

Additionally, the following types are also supported and will generate random values:

  • AIRPORT_CODE.IATA
  • COUNTRY.ISO-3166-2
  • COUNTRY.ISO-3166-3
  • COUNTRY.TEXT_EN
  • CREDIT_CARD_TYPE
  • CURRENCY_CODE.ISO-4217
  • EMAIL
  • GENDER.TEXT_EN
  • GUID
  • IPADDRESS.IPV4
  • IPADDRESS.IPV6
  • LANGUAGE.ISO-639-2
  • LANGUAGE.TEXT_EN
  • NAME.FIRST
  • NAME.FIRST_LAST
  • NAME.LAST
  • NAME.LAST_FIRST
  • STREET_ADDRESS_EN
  • TELEPHONE
  • URI.URL

If the locale is en-US, the following types are also supported:

  • MONTH.ABBR_en-US
  • MONTH.FULL_en-US
  • POSTAL_CODE.ZIP5_US
  • STATE_PROVINCE.PROVINCE_CA
  • STATE_PROVINCE.STATE_PROVINCE_NA
  • STATE_PROVINCE.STATE_US

Properties

Data

Specify delimited data to be output into the data flow. The content should be standard CSV-formatted (RFC 4180 style) data optionally with an alternative field delimiter. The data requires a proper header line (name:type), and data properly separated and escaped.

FieldDelimiter

Optionally specify the field delimiter character to use.

The default delimiter is the comma character (,).

Copies

Optionally specify an integer value specifying the number of times to repeat the Data on the output.

The default value is 1.

Inputs and outputs

Inputs: None.

Outputs: Sample Data.