Modify Fields

Modifies field metadata by including, excluding, renaming and changing the types of input fields.

Example data flow

A number of sample data flows are available from the Data360 Samples workspace, found in the Directory page.

In the Directory under the /Data360 Samples/Node Examples/ folder, you will find "Modify Fields", which shows examples of how to use this node.

Note: Upgrades will overwrite all data flows in the workspace. If you want to make changes to one of the sample data flows, we recommend that you create a copy and save it elsewhere, using Save as...

Unless otherwise specified, the examples below use the default data from the Create Data node.

Examples

Renaming fields

You can rename all the output fields by using Rename by Pattern, as shown below. Use * to reference the current field name, for example *_New or new_*.

Note: By default, all the input fields are included in the output. You can change to an Exclude pattern by clicking the green plus icon and selecting Exclude.

To rename a single field, type the new name for the field, as shown below.

If the renamed fields result in a conflict because two fields would have an identical field name, the conflict is highlighted in the table, as shown.

Changing field type

To change the type for an output field, choose a field Type from the drop-down list. The list of available types depends on the type of the input field. Fields that have a modified type are highlighted. In the screenshot below, for example the id field has had its type changed from int to string.

Automatic field type conversion

If an input field has a type that is either string or unicode, you can use the Auto type option to change the output field type based on the content of the input field. To use automatic type conversion, ensure that the Auto check box is ticked. When it is ticked, fields where both the input and output types are string or unicode will have their output type set to Auto.

Note: If you have modified the value of Type for an output field, when you select Auto the field type will not change to Auto. To include the field in auto type detection, you first need to revert to the properties of the input field, by clicking the undo icon in the corresponding row of the OutputFields table.

Null handling

When using automatic field type conversion, if all values in the input are NULL and no format is specified, the default format will be used. The following table shows the default format for automatic field type conversions:

Input data type Target data type Default format
String / Unicode Date 2020-06-06
String / Unicode DateTime

ISO_OFFSET_DATE_TIME for example:

2020-06-01T12:00:00+01:00

String / Unicode Time 12:00:00
Int / Long Time NULL

If the SampleSize property is set, and the records in the sample are NULL and no format is specified, the default format will be used. If there is a record outside of the sample which is not NULL, and if the value does not match the default format, this record will be output on the errors pin.

When converting Int / Long fields to Time, if all records in the sample are NULL and no format is specified, if any non-NULL value is detected outside of the sample, the record will be output on the errors pin.

Converting to a date type

When converting a field from string or unicode to a date, time or datetime data type, the Modify Fields node can determine the format of the date in the input field from a sample of the data. For example, the fields in the following table can be converted to date types:

dayFirst (string) monthFirst (string)
14/01/2010 01/14/2010
21/11/2010 11/21/2010

In the dayFirst column, the day appears before the month, and in the monthFirst column, the month appears first. Converting the type to date produces the following result:

dayFirst (date) monthFirst (date)
2010-01-14 2010-01-14
2010-11-21 2010-11-21

Sometimes, the date format can be ambiguous. For example, dayFirst could contain the values '01/01/2010' and '01/11/2010'. In this case, you can use the AmbiguousDateResolutionMode option to determine the format that should be assumed.

When your input contains values that are not already in a date format, these can also be converted to date/time fields. The value in the output depends on the determined data type of the input field. For example, the fields in the following table can be converted to date types:

integer (int) bigString (string)
123456789 123456789000

The value in the integer field is treated as an int and parsed as epoch time in seconds, and the value of the bigString field is treated as long and is parsed as epoch time in milliseconds. In both cases, the output field can be converted to a date type, and has the value '1973-11-29'.

Using a format to modify an output field

You can modify the content of an output field by applying a format to the value of the field.

Formatting follows standard Java conversions. You can use %s to output the input field value as a simple string, or convert the value to a different format before displaying the output as a string or unicode, for example by using the %X conversion to convert the value to a hexadecimal integer, which is then converted to upper case according to the locale under which the server is running. For example, you could apply the following formats to the default data from the Create Data node.

Name Type Format Example output
color string color: %s color: Red
id string id_%d id_10
id string id_%X id_A
dueDate string EEEE, MMM dd yyyy Tuesday, Jan 02 1996
rand string %09d 000017809
create string D 138
rand string %H 3C80

For more information about string output formats, see the Conversions section of the java Formatter docs.

For more information about date and time formatting, see Patterns for Formatting and Parsing in the Java DateTimeFormatter docs.

Reverting modifications

To undo any changes to a field and revert to the properties of the input field, click the undo icon in the corresponding row of the table for the field.

Properties

OutputFields

A table containing existing fields to be modified. In the top row of the table, a renaming pattern can be applied when including fields. Each subsequent row in the table represents a different field that can be modified before being written to the output, or excluded from the output.

When the checkbox is checked, the specified field will be included on the output.

Patterns and Fields

Choose whether to include or exclude all fields by default. By default, all the input fields are included in the output. You can change to an Exclude pattern by clicking the green plus icon and selecting Exclude.

To include a field in the output, make sure the box is ticked in the corresponding row for the field.

New Name

To change the name of a single field, enter the new name for the output field in the New Name column of the corresponding row for the field.

To add a pattern that will be applied to every field that is included in the output, enter a pattern in the Rename by Pattern box. See Renaming fields for examples.

Type

Choose a field Type for the output field from the drop-down list.

Possible values are: Unicode, String, Double, Integer, Long, Date, Datetime, Time, Boolean. The options available for a field depend on the input field Type.

Format

Enter a format to be applied to one or more fields in the data.

Formats for parsing string or unicode fields as date, time or datetime fields must be specified according to: https://docs.oracle.com/javase/8/docs/api/java/time/format/DateTimeFormatter.html.

Formats for printing date, time or datetime fields as string or unicode fields must be specified according to: https://docs.oracle.com/javase/8/docs/api/java/time/format/DateTimeFormatter.html.

Formats for numeric fields must be specified according to: https://docs.oracle.com/javase/8/docs/api/java/text/DecimalFormat.html.

All other formats for printing fields as string values must be specified according to: https://docs.oracle.com/javase/8/docs/api/java/util/Formatter.html#syntax.

See Using a format to modify an output field for examples of applying a format to a field.

RoundingMode

Optionally specify the method used for rounding floating point numbers to integral (long/int) output fields.

Choose one of the following from the drop-down list:

  • Ceiling - Rounds towards positive infinity.
  • Down - Rounds towards zero.
  • Floor - Rounds towards negative infinity.
  • Half Down - Rounds towards "nearest neighbor" unless both neighbors are equidistant, in which case, rounds down.
  • Half Even - Rounds towards "nearest neighbor" unless both neighbors are equidistant, in which case, rounds towards the even neighbor. Also known as "Banker's Rounding" and statistically minimizes cumulative error.
  • Half Up - Rounds towards "nearest neighbor" unless both neighbors are equidistant, in which case, rounds up.
  • Up - Rounds away from zero.

This value is also used if the type of an input field is string based, but the contents of the string field have a floating point representation. For example, writing an input string field with a value of "1.45" to an int type uses the RoundingMode property. For more information about the different RoundingMode options, see the RoundingMode documentation .

The default value is Half Even.

AmbiguousDateResolutionMode

Optionally specify the method used for automatic determination of the date format when converting from string or unicode and there is ambiguity as to where the day and month field appear in the date format.

Choose one of the following from the drop-down list:

  • Day First - Assumes that the day appears before the month.
  • MonthFirst - Assumes that the month appears before the day.
  • Auto - Uses the information from the locale, which is based on the Locale property, to determine the correct order.

For example, if you are converting an input string field containing the values "1/2/2009" and "1/3/2009" to a date field, and no format is provided:

  • If AmbiguousDateResolutionMode is set to Day First, then the determined format is d/M/yyyy and the values are parsed as February 1st, 2009 and March 1st, 2009.
  • If AmbiguousDateResolutionMode is set to Month First, then the determined format is M/d/yyyy and the values are parsed as January 2nd, 2009 and January 3rd, 2009
  • If AmbiguousDateResolutionMode is set to Auto, and the locale is US English (en_US), the determined format is M/d/yyyy and the values are parsed as January 2nd, 2009 and January 3rd, 2009.

  • If AmbiguousDateResolutionMode is set to Auto, and the locale is British English (en_GB), the determined format is "d/M/yyyy" and the values are parsed as Feburary 1st, 2009 and March 1st, 2009.

The default value is Auto.

GeneratedFieldsPrefix

Optionally specify the prefix to use on generated fields in the error output.

The default value is the underscore character _.

ConvertLeadingZeroes

Optionally specify whether or not string and unicode fields that contain leading zeros will be automatically converted to long or int output fields when the "Auto" conversion option is selected.

The default value is False.

Locale

Optionally specify the Locale to be used for parsing string fields into other types.

The Locale must be specified in the format <language code>[_<country code>[_<variant code>]].

For example, to set the locale to English, the Locale property should be set to en. For US English, the Locale property should be set to en_US. For Austrian German, the Locale property should be set to de_AT.

This is especially important in the parsing of strings that represent floating point numbers. However, it does not preclude parsing of floating point numbers in an international computing format.

For example, with the Locale property set to de_DE for German-speaking Germany, the following input data "1.234,56", "543.111,01" will be parsed as 1234.56 and 543111.01 respectively.

However, if the input data provided is of the form: "1234.56", "543111.01" the node will determine that these should still be parsed as 1234.56 and 543111.01 respectively.

The default value is the default Locale under which the server is running.

SampleSize

Optionally specify the maximum number of records to analyze, including null values, in order to determine date/time and numeric formats for type conversion.

Analysis is required in the following cases:

  • Conversions from string or unicode fields to date, time, datetime, int, long or double output fields, and where no format is provided.
  • Conversions from long and int to time, to determine whether the epoch time is specified in seconds or milliseconds.

In these cases, the number of input records and the value of SampleSize are compared, and the lower number is used to determine the value of SampleSize.

Setting the value to -1 means that the node performs any required analysis on all records.

The default value is 1000.

DetectionThreshold

Set the percentage for the DetectionThreshold - typically between 60 and 100. If set to 100 this insists on all values being of the appropriate type.

The default value is 95.

DefaultValueForText

Optionally specify the value to be used for output string and unicode fields when there is an error in calculating the output field, but the WriteErrorRecordsToMainOutput property is set to True.

The default value is Null.

DefaultValueForNumber

Optionally specify the value to be used for output int, long, and double fields when there is an error in calculating the output field, but the WriteErrorRecordsToMainOutput property is set to True.

The default value is Null.

DefaultValueForBoolean

Optionally specify the value to be used for output boolean fields when there is an error in calculating the output field, but the WriteErrorRecordsToMainOutput property is set to True.

The default value is Null.

DefaultValueForDate

Optionally specify the value to be used for output date fields when there is an error in calculating the output field, but the WriteErrorRecordsToMainOutput property is set to True.

If specified, the Date value must be specified in the format yyyy-MM-dd.

The default value is Null.

DefaultValueForTime

Optionally specify the value to be used for output time fields when there is an error in calculating the output field, but the WriteErrorRecordsToMainOutput property is set to True.

If specified, the Date value must be specified in the format HH:mm:ss.

The default value is Null.

DefaultValueForDateTime

Optionally specify the value to be used for output datetime fields when there is an error in calculating the output field, but the WriteErrorRecordsToMainOutput property is set to True.

If specified, the Date value must be specified in the format yyyy-MM-dd HH:mm:ss.

The default value is Null.

NumericConversionErrorBehavior

Optionally specify the node behavior when there are issues during conversion from one numeric field to a different numeric output type.

Choose one of the following from the drop-down list:

  • Error - The issues will be treated as any other conversion and subject to the rules specified in the WriteErrorRecordsToMainOutput, ErrorThresholdCount and ErrorThresholdPercentage properties.
  • Log - A warning will be logged by the node, however the issues do not contribute to the node error count and thresholds. No record will be written to the error output pin and the record will be written to the main output.
  • Ignore - No warning will be logged by the node. The issues do not contribute to the node error count and thresholds. No record will be written to the error output pin and the record will be written to the main output.

This property specifically covers the cases of:

  • NaN - How to handle cases when converting from a double NaN value to an int or long field.

  • Positive/Negative Infinity - How to handle cases when converting from a double +/- Infinity value to an int or long field.

  • Overflow - How to handle cases when converting from a double or long value to an int field and the value is too large to fit into the int field.

The default value is Error.

WriteErrorRecordsToMainOutput

Optionally specify whether records that have conversion errors should also be written to the first output of the node.

When set to False, a record will be written to the errors output, but not to the main output if there is a conversion error on the corresponding input record.

When set to True, a record will be written to both the errors output and the main output. For any fields that have conversion errors, the corresponding Default properties will be used to determine the output field value.

The default value is False.

ErrorThresholdCount

Optionally specify the number of input records that can result in expression errors before the node fails.

For each input record, all expressions in the table are evaluated. Therefore, for each input record there could potentially be an error for each row in the Output Fields table. This threshold, however, applies to the number of input records against which errors were reported while evaluating the expressions.

Specify a valid, non-negative integer value for the field, or one of the following values:

  • 0 - the node fails on the first error encountered.
  • -1 - the node never fails due to an expression error against an input record

The default value is -1.

Tip: Typically you would set a a value for either the ErrorThresholdCount or the ErrorThresholdPercentage but not both.

ErrorThresholdPercentage

Optionally specify the percentage of input records that can result in expression errors before the node fails.

For each input record, all expressions in the table are evaluated. Therefore for each input record there could be potentially an error for each row in the Output Fields table. This threshold however, applies to the the percentage of input records against which errors were reported while evaluating the expressions.

The threshold is only applied after at least 100 records have been processed, or all records have been processed if there are less than 100 input records.

Specify a valid integer for the field between 1 and 100 (inclusive), or one of the following values:

  • 0 - the node will never fail due to an expression error against an input record.
  • -1 - the threshold is not applied.

The default value is 100 - meaning that the node will fail if all input records or all of the first 100 input records have errors.

Tip: Typically you would set a value for either the ErrorThresholdCount or the ErrorThresholdPercentage but not both.

ErrorOutputIncludesExcludedFields

Optionally specify whether the error output includes fields that have been excluded from the main output.

When set to False, if an input field is excluded from the output it does not appear on the Error output.

When set to True all input fields appear on the Error output.

Note: Input fields are written to the Error output with their original input field name. Field renaming is not applied on the Error output.

The default value is False.

SingleErrorRecordPerInputRecord

Optionally specify that for each input record there will only ever be a maximum of one record on the Error output.

When set to False, a record appears on the Error output for each field conversion error. This can produce multiple error records for any input record.

When set to True there will be a maximum of one record on the error output per input record. In this case, information about all of the errors encountered for the input record is included, but the output will not include separate fields identifying the input and output field names and types.

The default value is False.

Inputs and outputs

Inputs: in1.

Outputs: data, errors.