Calculate Fields

Performs calculations on input field values, producing new output fields.

Uses Python scripting to apply an expression to fields in data attached to the input node, generating new fields in your data to store the results of the calculation.

Examples

The examples below use the default data from the Create Data node. See:

Overwriting an existing field

If a calculated field has the same name as an existing field in your data, the field is overwritten with the contents of the calculated field. For example, using the data from the Create Data node, add a Calculated Field to alter the value of the rand field.

Name Type Expression
rand int rand * 2

Running the node against the default data replaces the rand field in the original data, with a new rand field, displayed as column one in the data, where for each row the value is double the value from the input.

Reusing a calculated field

You can create a new calculated field and then reuse that field in a subsequent expression in the same table. For example, the previous example can be extended to create a new field called 'randx2'. This new field can then be used in a new calculation:

Name Type Expression
double rand int rand * 2
and add one int fields['double rand'] + 1

Running the node against the default data creates two new fields, randx2 and randx4. The value of randx4 is based on the value of randx2.

Condition-based values

You can set a value for a calculated field depending on whether or not a condition is met in an existing field.

Name Type Expression
penalty int 100 if IOU > 400 else 10

Using datetime

You can use the imported Python datetime module to calculate values based on existing datetime fields. The following examples can be applied to the default data.

Name Type Expression
nextAnnual date datetime.date(year=dueDate.year+1,month=dueDate.month,day=dueDate.day)
startOfMonth date datetime.date(year=dueDate.year,month=dueDate.month,day=1)
sinceDue int (datetime.datetime.now().date()-dueDate).days

Data can be automatically converted from datetime to date or time formats, or from date to datetime format. For example, if you have a field in datetime format, you can use the Calculate Fields node to create a new field in date or time format from the existing field:

Name Type Expression Result
createDate date create 2004-01-01
createTime time create 12:35:41

See Python module support for more information about how datetime conversion handles your data.

Using math

You can use the imported Python math module to use mathematical functions defined by the C standard, for example math.ceil(), math.floor(), math.sqrt(), math.log().

Name Type Expression
rootIOU double math.sqrt(IOU)
rootRand double math.sqrt(rand)
cleanRootRand double math.sqrt(rand) if rand > 0 else 0

Note that applying the second example to the default data produces 18 calculated fields and two errors, because two of the rows in the default data contain negative values for the rand field. You can use the example in the third row to remove these errors.

Using regular expressions

You can use the imported Python re module to apply regular expressions to a field.

Name Type Expression
Match_start boolean re.match(r"An", name)
Match_any boolean re.search(r"An", name, flags=re.IGNORECASE)

The first expression returns true for any row where the name field starts with 'An'. Note that the regular expression is case-sensitive.

The second expression returns true for any row where the name field contains 'An' at any position. Note that this expression is also not case-sensitive, so it returns true for 'Anatoli' and 'Jordan'.

Replacing null values

You can use the optional DefaultValueForNull* properties to define a default value for a NULL field of a given data type. This can be useful if you need to overcome a situation where a NULL value would otherwise cause the expression that you enter in the Calculated Fields table to fail.

For example, you have the following date type input data:

dueDate
2022-10-10
2022-06-01
2022-05-05
NULL
NULL
NULL
2022-04-23
2022-04-19
NULL
NULL

 

In the Calculated Fields table, define a date field called dueDate, to overwrite an existing dueDate field:

Name Type Expression
dueDate date dueDate

 

In the DefaultValueForNullDate property, set the value to Today.

This example uses the DefaultValueForNullDate property because the input field that is being calculated contains date type data. This example was created on 2022-03-02 (today), therefore this is the date value that will be used to replace NULL values when the DefaultValueForNullDate property is set to Today.

After running the Calculate Fields node, the output is as follows:

dueDate
2022-10-10
2022-06-01
2022-05-05
2022-03-02
2022-03-02
2022-03-02
2022-04-23
2022-04-19
2022-03-02
2022-03-02

Note: The DefaultValueForNull* properties are applied to calculations on any fields of the selected data type that are generated as a result of expressions in the Calculated Fields table.

Using built-in variables

You can use the following built-in variables to add new fields to your output:

  • node.firstExec - Boolean object that is True for the first input record. Allows variable values to be set when processing the first record.
  • node.lastExec - Boolean object that is True for the last input record. Allows values to be derived when all records have been processed.
  • node.execCount - Provides a count of the current record number.
Name Type Expression
recordID long

node.execCount

delta int

0 if node.firstExec else 100

increment long

delta * node.execCount

completed boolean

node.lastExec

Result

recordID delta increment completed
1 0 0 false
2 100 200 false
3 100 300 false
4 100 400 true

Referencing property values in your calculation - advanced example

You can use the node.properties built-in variable to add a new field from a calculation that references a property value on the node.

 

This example uses the following input data:

net_price (double)
10.0
12.5
22.34
10.5

 

The node has a Tax Multiplier property which is set to 1.2.

 

Calculated Fields (where net_price is the name of the input field, tax.multiplier is the run property name for the Tax Multiplier property):

Name Type Expression
gross_price double

net_price * float(node.properties.getString("Tax Multiplier", "tax.multiplier"))

Result

gross_price (double) net_price (double)
12.0 10.0
15.0 12.5
26.808 22.34
12.6 10.5

 

The same result could also be achieved by using textual substitution:

net_price * float("{{^Tax Multiplier^}}")

Properties

Calculated Fields

A table containing rows of fields to be created from expressions applied to existing fields. Populate the table to create new output fields, or to override input fields with calculated values based on the data from the node's input.

Each row in the table represents a different field to be written to the output.

The table must be populated, and each column in the table must be filled for each of the table rows.

Calculated fields can be referenced in any row in the table after they are declared.

Name

Enter a name for the output field.

If the name of the output field is the same as that of an input field, the input field is overwritten.

Type

Choose a field Type from the drop-down list.

Possible values are: unicode, double, int, long, date, datetime, time, boolean, string. The value must match the data type of the evaluated expression.

Expression

Enter an expression to be applied to one or more fields in the data.

The expression should be entered as a Python expression.

To include a reference to a field in the data in your expression, enter the name of the field.

Tip: Use the notation fields['<FIELDNAME>'] if either of the following conditions applies:

  • The field name is a reserved Python keyword.
  • The field name contains a space.

If you enter a conditional expression, the expression must specify the values to be used for both True and False conditions, for example 100 IF IOU > 400 else 10. If the Type of the calculated field is boolean, enter an expression to be matched for a true condition, for example IOU > 100

The following Python modules are imported by default for use in expressions:

  • datetime - imported as datetime, allowing you to perform basic calculations on date, time, and datetime fields.
  • fn - imported as fn, providing null-safe comparisons and manipulation of strings that might be null. For more information, see The null-safe function module "fn".
  • math - imported as math, providing access to Python mathematical functions. For more information, see the Python math module documentation.
  • re - imported as re, allowing you to specify regular expressions as part of an expression.

See Examples for sample usage.

DefaultValueForNullText

Optionally specify the value to be used for Null input string or unicode fields when they are referenced in an expression in the Calculated Fields table.

Either choose the Null value from the drop-down list, or enter a different default value.

The default value is an empty string.

DefaultValueForNullNumber

Optionally specify the value to be used for Null input numeric fields when they are referenced in an expression in the Calculated Fields table.

Choose one of the following from the drop-down list, or enter a different value:

  • 0
  • 1
  • Null - Null numeric values are left null when used in an expression.

The default value is 0.

Note: This property is applied to all numeric fields; floating point numbers cannot be provided.

DefaultValueForNullBoolean

Optionally specify the value to be used for Null input boolean fields when they are referenced in an expression in the Calculated Fields table.

Choose from one of the following options:

  • True
  • False
  • Null

The default value is False.

DefaultValueForNullDate

Optionally specify the value to be used for Null input date fields when they are referenced in an expression in the Calculated Fields table.

Choose one of the following from the drop-down list, or enter a different default date value in the format yyyy-MM-dd:

  • Today - Null date values that are used in an expression are set to the date when the node began execution.
  • Null - Null date values are left null when used in an expression.
  • 1970-01-01

The default value is 1970-01-01.

DefaultValueForNullTime

Optionally specify the value to be used for Null input time fields when they are referenced in an expression in the Calculated Fields table.

Choose one of the following from the drop-down list, or enter a different default time value in the format HH:mm:ss:

  • Now - Null time values that are used in an expression are set to the time when the node began execution.
  • 00:00:00
  • Null - Null time values are left null when used in an expression.

The default value is 00:00:00.

DefaultValueForNullDateTime

Optionally specify the value to be used for Null input datetime fields when they are referenced in an expression in the Calculated Fields table.

Choose one of the following from the drop-down list, or enter a different default datetime value in the format yyyy-MM-dd HH:mm:ss:

  • Today - Null datetime values that are used in an expression are set to the start of the day when the node began execution.
  • Now - Null datetime values that are used in an expression are set to the value of datetime when the node began execution.
  • Null - Null datetime values are left null when used in an expression.
  • 1970-01-01 00:00:00

The default value is 1970-01-01 00:00:00.

WriteErrorRecordsToMainOutput

Optionally specify whether records are written to the first output when errors were encountered evaluating expressions against the corresponding input record.

For each input record, all expressions in the table are evaluated.

For any expression that has an error evaluating against a given input record, an error record is written to the Errors output.

Choose one of the following values:

  • False - no output record is written to the first output.
  • True - an output record is written to the first output, with all of the output fields populated that did not have expression errors.

The default value is False.

ErrorThresholdCount

Optionally specify the number of input records that can result in expression errors before the node fails.

For each input record, all expressions in the table are evaluated. Therefore, for each input record there could potentially be an error for each row in the Calculated Fields table. This threshold, however, applies to the number of input records against which errors were reported while evaluating the expressions.

Specify a valid, non-negative integer value for the field, or one of the following values:

  • 0 - the node fails on the first error encountered.
  • -1 - the node never fails due to an expression error against an input record

The default value is -1.

Tip: Typically you would set a a value for either the ErrorThresholdCount or the ErrorThresholdPercentage but not both.

ErrorThresholdPercentage

Optionally specify the percentage of input records that can result in expression errors before the node fails.

For each input record, all expressions in the table are evaluated. Therefore for each input record there could be potentially an error for each row in the Calculated Fields table. This threshold however, applies to the the percentage of input records against which errors were reported while evaluating the expressions.

The threshold is only applied after at least 100 records have been processed, or all records have been processed if there are less than 100 input records.

Specify a valid integer for the field between 1 and 100 (inclusive), or one of the following values:

  • 0 - the node will never fail due to an expression error against an input record.
  • -1 - the threshold is not applied.

The default value is 100 - meaning that the node will fail if all input records or all of the first 100 input records have errors.

Tip: Typically you would set a value for either the ErrorThresholdCount or the ErrorThresholdPercentage but not both.

Inputs and outputs

Inputs: in1.

Outputs: calculated fields, errors.