Modify Fields

Modifies field metadata by including, excluding, renaming and changing the types of input fields.

Example data flow

A number of sample data flows are available from the Data360 Samples workspace, found in the Directory page.

In the Directory under the /Data360 Samples/Node Examples/ folder, you will find "Modify Fields", which shows examples of how to use this node.

Note: Upgrades will overwrite all data flows in the workspace. If you want to make changes to one of the sample data flows, we recommend that you create a copy and save it elsewhere, using Save as...

Unless otherwise specified, the examples below use the default data from the Create Data node.

Renaming fields

You can rename all the output fields by using Rename by Pattern, as shown below. Use * to reference the current field name, for example *_New or new_*.

Note: By default, all the input fields are included in the output. You can change to an Exclude pattern by clicking the green plus icon and selecting Exclude.

To rename a single field, type the new name for the field, as shown below.

If the renamed fields result in a conflict because two fields would have an identical field name, the conflict is highlighted in the table, as shown.

Changing field type

To change the type for an output field, choose a field Type from the drop-down list. The list of available types depends on the type of the input field. Fields that have a modified type are highlighted. In the screenshot below, for example the id field has had its type changed from int to string.

Automatic field type conversion

If an input field has a type that is either string or unicode, you can use the Auto type option to change the output field type based on the content of the input field. To use automatic type conversion, ensure that the Auto check box is ticked. When it is ticked, fields where both the input and output types are string or unicode will have their output type set to Auto.

Note: If you have modified the value of Type for an output field, when you select Auto the field type will not change to Auto. To include the field in auto type detection, you first need to revert to the properties of the input field, by clicking the undo icon in the corresponding row of the OutputFields table.

Null handling

When using automatic field type conversion, if all values in the input are NULL and no format is specified, the default format will be used. The following table shows the default format for automatic field type conversions:

Input data type	Target data type	Default format
String / Unicode	Date	2020-06-06
String / Unicode	DateTime	ISO_OFFSET_DATE_TIME for example: 2020-06-01T12:00:00+01:00
String / Unicode	Time	12:00:00
Int / Long	Time	NULL

If the SampleSize property is set, and the records in the sample are NULL and no format is specified, the default format will be used. If there is a record outside of the sample which is not NULL, and if the value does not match the default format, this record will be output on the errors pin.

When converting Int / Long fields to Time, if all records in the sample are NULL and no format is specified, if any non-NULL value is detected outside of the sample, the record will be output on the errors pin.

Converting to a date type

When converting a field from string or unicode to a date, time or datetime data type, the Modify Fields node can determine the format of the date in the input field from a sample of the data. For example, the fields in the following table can be converted to date types:

dayFirst (string)	monthFirst (string)
14/01/2010	01/14/2010
21/11/2010	11/21/2010

In the dayFirst column, the day appears before the month, and in the monthFirst column, the month appears first. Converting the type to date produces the following result:

dayFirst (date)	monthFirst (date)
2010-01-14	2010-01-14
2010-11-21	2010-11-21

Sometimes, the date format can be ambiguous. For example, dayFirst could contain the values '01/01/2010' and '01/11/2010'. In this case, you can use the AmbiguousDateResolutionMode option to determine the format that should be assumed.

When your input contains values that are not already in a date format, these can also be converted to date/time fields. The value in the output depends on the determined data type of the input field. For example, the fields in the following table can be converted to date types:

integer (int)	bigString (string)
123456789	123456789000

The value in the integer field is treated as an int and parsed as epoch time in seconds, and the value of the bigString field is treated as long and is parsed as epoch time in milliseconds. In both cases, the output field can be converted to a date type, and has the value '1973-11-29'.

Using a format to modify an output field

You can modify the content of an output field by applying a format to the value of the field.

Formatting follows standard Java conversions. You can use %s to output the input field value as a simple string, or convert the value to a different format before displaying the output as a string or unicode, for example by using the %X conversion to convert the value to a hexadecimal integer, which is then converted to upper case according to the locale under which the server is running. For example, you could apply the following formats to the default data from the Create Data node.

Name	Type	Format	Example output
color	string	color: %s	color: Red
id	string	id_%d	id_10
id	string	id_%X	id_A
dueDate	string	EEEE, MMM dd yyyy	Tuesday, Jan 02 1996
rand	string	%09d	000017809
create	string	D	138
rand	string	%H	3C80

For more information about string output formats, see the Conversions section of the java Formatter docs.

For more information about date and time formatting, see Patterns for Formatting and Parsing in the Java DateTimeFormatter docs.

Reverting modifications

To undo any changes to a field and revert to the properties of the input field, click the undo icon in the corresponding row of the table for the field.

Properties

OutputFields

A table containing existing fields to be modified. In the top row of the table, a renaming pattern can be applied when including fields. Each subsequent row in the table represents a different field that can be modified before being written to the output, or excluded from the output.

When the checkbox is checked, the specified field will be included on the output.

Patterns and Fields

Choose whether to include or exclude all fields by default. By default, all the input fields are included in the output. You can change to an Exclude pattern by clicking the green plus icon and selecting Exclude.

To include a field in the output, make sure the box is ticked in the corresponding row for the field.

New Name

To change the name of a single field, enter the new name for the output field in the New Name column of the corresponding row for the field.

To add a pattern that will be applied to every field that is included in the output, enter a pattern in the Rename by Pattern box. See Renaming fields for examples.

Type

Choose a field Type for the output field from the drop-down list.

Possible values are: Unicode, String, Double, Integer, Long, Date, Datetime, Time, Boolean. The options available for a field depend on the input field Type.

Format

Enter a format to be applied to one or more fields in the data.

Formats for parsing string or unicode fields as date, time or datetime fields must be specified according to: https://docs.oracle.com/javase/8/docs/api/java/time/format/DateTimeFormatter.html.

Formats for printing date, time or datetime fields as string or unicode fields must be specified according to: https://docs.oracle.com/javase/8/docs/api/java/time/format/DateTimeFormatter.html.

Formats for numeric fields must be specified according to: https://docs.oracle.com/javase/8/docs/api/java/text/DecimalFormat.html.

All other formats for printing fields as string values must be specified according to: https://docs.oracle.com/javase/8/docs/api/java/util/Formatter.html#syntax.

See Using a format to modify an output field for examples of applying a format to a field.

RoundingMode

Optionally specify the method used for rounding floating point numbers to integral (long/int) output fields.

Choose one of the following from the drop-down list:

Ceiling - Rounds towards positive infinity.
Down - Rounds towards zero.
Floor - Rounds towards negative infinity.
Half Down - Rounds towards "nearest neighbor" unless both neighbors are equidistant, in which case, rounds down.
Half Even - Rounds towards "nearest neighbor" unless both neighbors are equidistant, in which case, rounds towards the even neighbor. Also known as "Banker's Rounding" and statistically minimizes cumulative error.
Half Up - Rounds towards "nearest neighbor" unless both neighbors are equidistant, in which case, rounds up.
Up - Rounds away from zero.

This value is also used if the type of an input field is string based, but the contents of the string field have a floating point representation. For example, writing an input string field with a value of "1.45" to an int type uses the RoundingMode property. For more information about the different RoundingMode options, see the RoundingMode documentation .

The default value is Half Even.

AmbiguousDateResolutionMode

Optionally specify the method used for automatic determination of the date format when converting from string or unicode and there is ambiguity as to where the day and month field appear in the date format.

Choose one of the following from the drop-down list:

Day First - Assumes that the day appears before the month.
MonthFirst - Assumes that the month appears before the day.
Auto - Uses the information from the locale, which is based on the Locale property, to determine the correct order.

For example, if you are converting an input string field containing the values "1/2/2009" and "1/3/2009" to a date field, and no format is provided:

If AmbiguousDateResolutionMode is set to Day First, then the determined format is d/M/yyyy and the values are parsed as February 1st, 2009 and March 1st, 2009.
If AmbiguousDateResolutionMode is set to Month First, then the determined format is M/d/yyyy and the values are parsed as January 2nd, 2009 and January 3rd, 2009
If AmbiguousDateResolutionMode is set to Auto, and the locale is US English (en_US), the determined format is M/d/yyyy and the values are parsed as January 2nd, 2009 and January 3rd, 2009.
If AmbiguousDateResolutionMode is set to Auto, and the locale is British English (en_GB), the determined format is "d/M/yyyy" and the values are parsed as Feburary 1st, 2009 and March 1st, 2009.

The default value is Auto.

GeneratedFieldsPrefix

Optionally specify the prefix to use on generated fields in the error output.

The default value is the underscore character _.

ConvertLeadingZeroes

Optionally specify whether or not string and unicode fields that contain leading zeros will be automatically converted to long or int output fields when the "Auto" conversion option is selected.

The default value is False.

Locale

Optionally specify the Locale to be used for parsing string fields into other types.

The Locale must be specified in the format <language code>[_<country code>[_<variant code>]].

For example, to set the locale to English, the Locale property should be set to en. For US English, the Locale property should be set to en_US. For Austrian German, the Locale property should be set to de_AT.

This is especially important in the parsing of strings that represent floating point numbers. However, it does not preclude parsing of floating point numbers in an international computing format.

For example, with the Locale property set to de_DE for German-speaking Germany, the following input data "1.234,56", "543.111,01" will be parsed as 1234.56 and 543111.01 respectively.

However, if the input data provided is of the form: "1234.56", "543111.01" the node will determine that these should still be parsed as 1234.56 and 543111.01 respectively.

The default value is the default Locale under which the server is running.

SampleSize

Optionally specify the maximum number of records to analyze, including null values, in order to determine date/time and numeric formats for type conversion.

Analysis is required in the following cases:

Conversions from string or unicode fields to date, time, datetime, int, long or double output fields, and where no format is provided.
Conversions from long and int to time, to determine whether the epoch time is specified in seconds or milliseconds.

In these cases, the number of input records and the value of SampleSize are compared, and the lower number is used to determine the value of SampleSize.

Setting the value to -1 means that the node performs any required analysis on all records.

The default value is 1000.

DetectionThreshold

Set the percentage for the DetectionThreshold - typically between 60 and 100. If set to 100 this insists on all values being of the appropriate type.

The default value is 95.

DefaultValueForText

Optionally specify the value to be used for output string and unicode fields when there is an error in calculating the output field, but the WriteErrorRecordsToMainOutput property is set to True.