Python module support

The following modules are imported as "datetime" and "fn":

Note: Additional core Python libraries can be imported into either the ConfigureFields or ProcessRecords script, see Installing pure Python modules. Third party libraries are not currently supported. Importing of additional Python libraries may work in the script properties, however the mechanism is expected to change in the future and is therefore not currently supported. For performance reasons, it is recommended to perform any necessary imports in ConfigureFields. These imported modules will then be available within the ProcessRecords script provided that the name into which the module is imported does not conflict with the name of a variable (e.g. an input or output name) that will be bound into ProcessRecords for each invocation of that script.

A set of grouping functions are bound and can be accessed in the object "group", see Grouping functions.

The datetime module

You can use the datetime module in nodes to manipulate fields that contain date, time, or datetime data. Conversion between these date and time formats is handled automatically.

Conversion is handled using the Analyze server timezone, and the value is stored in the relevant specific date (dd/mm/yyyy), time (hh:mm:ss), or datetime format hour:min:second, or a timestamp in milliseconds since epoch, localized before being displayed.

The next sections explain how conversion is handled.

Conversion from datetime to date or time

Where you have input fields or expressions that yield a datetime result, the result can be handled automatically by an output field with date or time format, by invoking the date() or timetz() method of the python datetime object accordingly.

Converting from date to datetime

If you have an input date field that is used in an output field as a datetime field, or you have an expression that yields a date result that is sent to a datetime output field, this conversion is handled automatically by adding '00:00:00' as the time component to the date value from the field.

Examples

Using the default data from the Create Data node, you can use the Calculate Fields node to create new fields from the 'create' field with the following expressions:

Name

Type Expression Result
createDate date create 2004-01-01
createTime time create 12:35:41
createDateTime datetime createDate 2004-01-01T00:00:00+0000
createString unicode createDate 2004-01-01 12:35:41
createDateString unicode createDateTime 2004-01-01
createDateTimeString unicode createDateTime 2004-01-01
createTimeStr unicode createTime 12:35:41
createTimePlusStr unicode createTimeStr + 'aaa'

12:35:41 aaa

Note that the conversion is 'lossy', in that converting from a datetime format to a date format and then back to datetime format means the time element of the original data is replaced by the default time element of '00:00:00'.

The null-safe function module "fn"

Note: In all of the comparison methods described below, Null will be considered to be less than any value, except for another Null. All Null values are considered equal.

gt(f1, f2)

  • Returns (bool) whether or not the specified value f1 is greater than the specified value f2.
  • Example:
    if fn.gt(fields.firstField, fields.secondField):

gte(f1, f2)

  • Returns (bool) whether or not the specified value f1 is greater than or equal to the specified value f2.
  • Example:
    if fn.gte(fields.firstField, fields.secondField):

gt(f1, f2)

  • Returns (bool) whether or not the specified value f1 is greater than the specified value f2.
  • Example:
    if fn.gt(fields.firstField, fields.secondField):

gte(f1, f2)

  • Returns (bool) whether or not the specified value f1 is greater than or equal to the specified value f2.
  • Example:
    if fn.gte(fields.firstField, fields.secondField):

eq(f1, f2)

  • Returns (bool) whether or not the specified value f1 equals the specified value f2.
  • Example:
    if fn.eq(fields.firstField, fields.secondField):

lt(f1, f2)

  • Returns (bool) whether or not the specified value f1 is less than the specified value f2.
  • Example:
    if fn.lt(fields.firstField, fields.secondField):

lte(f1, f2)

  • Returns (bool) whether or not the specified value f1 is less than or equal to the specified value f2.
  • Example:
    if fn.lte(fields.firstField, fields.secondField):

cmp(f1, f2)

  • Returns (int) a negative number if f1 is less than f2.
  • Returns a positive number if f1 is greater than f2.
  • Returns 0 if f1 equals f2.
  • Example:
    if fn.cmp(fields.firstField, fields.secondField) > 0 :

substr(strVal, from[, to])

  • Returns (str) the substring of the provided string (or Null) argument starting at the specified from index (inclusive) until the specified to index (exclusive).
  • If strVal is Null, then the substr function will always return Null.
  • If no to argument is supplied, returns the substring of the provided string argument starting at the specified from index until the end of the string.
  • This effectively means that, as a minimum, the string length in combination with the from/to arguments is used.
  • Example:
    #Get the two characters of a string starting from the 3rd character in the string
    twoChars = fn.substr("abcdef", 2, 4) #twoChars will be equal to "cd"
    #Get all characters of a string starting from the 2nd character in the string
    last = fn.substr("abcdef", 2) #last will be equal to "cdef"

lower(strVal)

  • Returns (str) the provided string converted to lower case.
  • If strVal is Null, then the lower function will always return Null.
  • Example:
    #lowerCase will be equal to "mixedcasestring"
    lowerCase = fn.lower("mixedCaseString")

upper(strVal)

  • Returns (str) the provided string converted to upper case.
  • If strVal is Null, then the upper function will always return Null.
  • Example:
    #upperCase will be equal to "MIXEDCASESTRING"
    upperCase = fn.upper("mixedCaseString")

desc(field)

  • Only for use in properties for grouping and sorting (e.g. SortBy on the Sort node, GroupBy on the Transform node or Aggregate node).
  • When used in a record key property indicates that the provided field should be sorted in descending order.
  • Example (used in SortBy property in Sort node):
    #Sort is to be performed on field1 in descending order, and field 2
    fn.desc(fields.field1), fields.field2

groupFunctions(node)

  • Returns an object containing the grouping functions, see Grouping functions.
  • The argument must be the node:
    groupFunctions = fn.groupFunctions(node)
    out1.average = groupFunctions.avg(field.myAverageField)

Grouping functions

Special aggregation functions are available in the group function object. These are pre-built instructions that perform an aggregation operation and return a value. You can use the built-in "group" variable to reference these functions e.g. group.count().

You can use the grouping functions in both the ConfigureFields and ProcessRecords properties. To apply an aggregation over the values on an input field – whether this is over all values, or only some of the values in the records for that input field – use the grouping function in the ConfigureFields property. In this case, logic in an updatePredicate or in the ProcessRecords script can be used to specify whether or not the values from the input field are to be included in the aggregate. If the aggregation is to apply on derived or calculated field values, use the function in the ProcessRecords property.

While there are more complex ways to use the grouping functions (as outlined in the table below and in Advanced use of grouping functions), the aggregate functions can generally be used in their simple form by using the following pattern:

  1. In ConfigureFields, specify the field over which the aggregation is to be performed, and assign the aggregate to an output field e.g.

     

    out1.AverageSaleAmount = group.avg(fields.SaleAmount)

  2. In ProcessRecords use the input record to update the aggregation and assign fields to the output record e.g.

     

    out1 += in1

Tip: In the pattern above, the ConfigureFields property is only defining the aggregation function to use and to which output field this should be written. The script in the ProcessRecords property then provides the observations from each of the input records to the aggregate function and sets the resulting aggregate value on the output field.

The grouping functions often take the field to aggregate on as the first argument and can take additional optional arguments. In most cases, the second argument can either be a bool value specifying whether or not Null values are to be included in the aggregation, or you can specify a lambda function to use as a predicate which will then be called to determine if the new value should contribute to the aggregation. Full details of the format and function of the updatePredicate are covered in Advanced use of grouping functions.

The following aggregation functions are available:

Note: The above aggregation functions are only applicable to nodes which define some form of data grouping. For example, they can be used in the GroupBy property of the Aggregate and Transform nodes.

These aggregation functions are not available on nodes that do not have a GroupBy property, including the Merge node.

In addition, the following functions are also available on the group function object, though they do not perform any aggregation:

count()

  • Counts the number of records within the group.
  • Example:

    #Configure fields

    #Count the number of records in the group and assign to output field #"groupCount"

     

    out1.groupCount = group.count()

     

    #Process Records

    #Update the aggregation and set the values to the output record

     

    out1 += in1

count(field[, includeNulls[, initialValue])

  • Count the number of times the specified field is seen within the group.
  • The optional includeNulls argument specifies whether or not the Null field values are to contribute to the count. If provided, this argument must be a bool value. If not specified, includeNulls defaults to True.
  • The field can be of any type.
  • The result of the aggregate will be a long.
  • The optional initialValue argument can be provided to specify the value from which counting starts. If not specified, initialValue defaults to 0.
  • Example:

    #Configure fields

    #Count the number of records in the group where the "StartDate" field is not #Null

    #Assign the count to the output field "NumStarted"

     

    out1.NumStarted = group.count(fields.StartDate, False)

     

    #Process Records

    #Update the aggregation and set the values to the output record

     

    out1 += in1

count(field[, updatePredicate[, initialValue])

  • Count the number of times the specified field is seen within the group.
  • The field can be of any type.
  • The result of the aggregate will be a long.
  • An optional updatePredicate can be provided to specify whether or not the count is to be incremented based on the value of the field in the current record and the current value of the count. If provided, the updatePredicate must be a function (provided in the form of a lambda function) and must match the requirements described in Advanced use of grouping functions.
  • The initialValue argument can be provided to specify the value from which counting starts. If not specified, initialValue defaults to 0.
  • Example:

    #Configure fields

    #Count the number of records in the group where the #"CallStartTime" field is #between 16:00:00 and 20:00:00 and #assign to the output field #"NumEveningCalls"

     

    out1.NumEveningCalls = group.count(fields.CallStartTime, lambda agg, newValue: fn.gte(newValue, datetime.time(16,00,00)) and fn.lte(newValue, datetime.time(20,00,00)))

     

    #Process Records

    #Update the aggregation and set the values to the output record

     

    out1 += in1

sum()

sum(field[, includeNulls[, initialValue])

  • Calculate the sum of all of the values for the specified field within the group.
  • The field must be a numeric type (int, long, float).
  • The result of the aggregate will be either a long (for int or long input fields) or a float.
  • The optional includeNulls argument can be provided to specify whether or not the Null fields are to be included in the sum. If provided, this argument must be a bool value. If not specified, includeNulls defaults to False. For the sum function, this argument does not have any impact as Null values would contribute 0 to the sum.
  • The optional initialValue argument can be provided to specify the value at which the sum starts. If not specified, initialValue defaults to 0.
  • Example:

    #Configure fields

    #Calculate the sum of all "Sales" within the group and assign to the output #field "SalesTotal"



    out1.SalesTotal = group.sum(fields.Sales)

     

    #Process Records

    #Update the aggregation and set the values to the output record

     

    out1 += in1

sum(field[, updatePredicate[, initialValue])

  • Calculate the sum of all of the values for the specified field within the group.
  • The field must be a numeric type (int, long, float).
  • The result of the aggregate will be either a long (for int or long input fields) or a float.
  • An updatePredicate can be provided to specify whether or not the value of the field in the current record should contribute to the sum based on that field value and the current value of the sum. If provided, the updatePredicate must be a function (provided in the form of a lambda function) and must match the requirements described in Advanced use of grouping functions.

  • The initialValue argument can be provided to specify the value at which the sum starts. If not specified, initialValue defaults to 0.
  • Example:

    #Configure fields

    #Calculate the sum of all "Sales" values within the group which are greater #than 1 million and assign to the output "TotalBigSales"

     

    out1.TotalBigSales = group.sum(fields.Sales, lambda agg, newValue: fn.gt(newValue, 1000000))

     

    #Process Records

    #Update the aggregation and set the values to the output record

     

    out1 += in1

avg()

avg(field[, includeNulls)

  • Calculate the average (mean) of all of the values for the specified field within the group.
  • The field must be a numeric type (int, long, float).
  • The result of the aggregate will be a float.
  • The optional includeNulls argument can be provided to specify whether or not the Null fields are to be included in the average. If provided, this argument must be a bool value. If not specified, includeNulls defaults to False. If includeNulls is True, Null values are treated as the value 0 by the avg function.
  • Example:

    #Configure fields

    #Calculate the average "Age" within the group and assign to the output field #"AvgAge"



    out1.AvgAge = group.avg(fields.Age)

     

    #Process Records

    #Update the aggregation and set the values to the output record

     

    out1 += in1

avg(field[, updatePredicate)

  • Calculate the average (mean) of all of the values for the specified field within the group.
  • The field must be a numeric type (int, long, float).
  • The result of the aggregate will be a float.
  • An updatePredicate can be provided to specify whether or not a value is to contribute to the average based on the value of the field in the current record and the current value of the sum used to calculate the average. If provided, the updatePredicate must be a function (provided in the form of a lambda function) and must match the requirements described in Advanced use of grouping functions.
  • Example:

    #Configure fields

    #Calculate the average "Age" within the group for everyone 18 years or older #and assign to the output field "AverageAdultAge"

     

    out1.AverageAdultAge = group.avg(fields.Age, lambda agg, newValue: fn.gte(newValue, 18))

     

    #Process Records

    #Update the aggregation and set the values to the output record



    out1 += in1

stdev()

stdev(field, isPopulation[, includeNulls])

  • Calculate the standard deviation of all of the values for the specified field within the group.
  • The field must be a numeric type (int, long, float).
  • The result of the aggregate will be a float.
  • The required isPopulation bool argument specifies whether or not the aggregate should calculate a population based standard deviation (when True), or a sample standard deviation (when False).
  • The optional includeNulls argument can be provided to specify whether or not the Null fields are to be included in the standard deviation calculation. If provided, this argument must be a bool value. If not specified, includeNulls defaults to False. If includeNulls is True, Null values are treated as the value 0 by the stdev function.
  • Example:

    #Configure fields

    #Calculate the sample standard deviation of the "Age" within the group and #assign to the output field "StdevAge"



    out1.StdevAge = group.stdev(fields.Age, False)



    #Process Records

    #Update the aggregation and set the values to the output record

     

    out1 += in1

     

stdev(field, isPopulation[, updatePredicate])

  • Calculate the standard deviation of all of the values for the specified field within the group.
  • The field must be a numeric type (int, long, float).
  • The result of the aggregate will be a float.
  • The required isPopulation bool argument specifies whether or not the aggregate should calculate a population based standard deviation (when True), or a sample standard deviation (when False).
  • An updatePredicate can be provided to specify whether or not a value is to contribute to the standard deviation based on the value of the field in the current record and the current value of the sum of distances from the mean used to calculate the standard deviation. If provided, the updatePredicate must be a function (provided in the form of a lambda function) and must match the requirements described in Advanced use of grouping functions.
  • Example:

    #Configure fields

    #Calculate the population standard deviation "Age" within the group for #everyone 18 years or older and assign to the output field "AdultAgeStdev"



    out1.AdultAgeStdev = group.stdev(fields.Age, True, lambda agg, newValue: fn.gte(newValue, 18))



    #Process Records

    #Update the aggregation and set the values to the output record



    out1 += in1

variance()

variance(field, isPopulation[, includeNulls])

  • Calculate the variance of all of the values for the specified field within the group.
  • The field must be a numeric type (int, long, float).
  • The result of the aggregate will be a float.
  • The required isPopulation bool argument specifies whether or not the aggregate should calculate a population based variance (when True), or a sample variance (when False).
  • The optional includeNulls argument can be provided to specify whether or not the Null fields are to be included in the variance calculation. If provided, this argument must be a bool value. If not specified, includeNulls defaults to False. If includeNulls is True, Null values are treated as the value 0 by the variance function.
  • Example:

    #Configure fields

    #Calculate the sample variance of the "Age" within the group and assign to

    #the output field "AgeVar"

     

    out1.AgeVar = group.variance(fields.Age, False)

     

    #Process Records

    #Update the aggregation and set the values to the output record

     

    out1 += in1

variance(field, isPopulation[, updatePredicate])

  • Calculate the variance of all of the values for the specified field within the group.
  • The field must be a numeric type (int, long, float).
  • The result of the aggregate will be a float.
  • The required isPopulation bool argument specifies whether or not the aggregate should calculate a population based variance (when True), or a sample variance (when False).
  • An updatePredicate can be provided to specify whether or not a value is to contribute to the variance based on the value of the field in the current record and the current value of the sum of distances from the mean used to calculate the variance. If provided, the updatePredicate must be a function (provided in the form of a lambda function) and must match the requirements described in Advanced use of grouping functions.
  • Example:

    #Configure fields

    #Calculate the population variance "Age" within the group for

    #everyone 18 years or older and assign to the output field "AdultAgeVar"

     

    out1.AdultAgeVar = group.variance(fields.Age, True, lambda agg, newValue: fn.gte(newValue, 18))

     

    #Process Records

    #Update the aggregation and set the values to the output record

     

    out1 += in1

min()

min(field[, includeNulls[, initialValue])

  • Calculate the minimum value of the specified field within the group.
  • The field may be of any type.
  • The result of the aggregate will have the same type as the field specified.
  • The optional includeNulls argument can be provided to specify whether or not the Null fields are to be considered in the calculation of the minimum. If provided, this argument must be a bool value. If not specified, includeNulls defaults to False. The comparison used for the minimum calculation is the same as that used in the "lt" function in the fn module – meaning Null is less than any non-null value.
  • The initialValue argument can be provided to specify an initial minimum value. If not specified, there is no initial minimum.
  • Example:

    #Configure fields

    #Calculate the minimum "Salary" within the group, ignoring any Null values #and assign to the output field "MinSalary"



    out1.MinSalary = group.min(fields.Salary, False)

     

    #Process Records

    #Update the aggregation and set the values to the output record

     

    out1 += in1

 

min(field[, updatePredicate[, initialValue])

  • Calculate the minimum value of the specified field within the group.
  • The field may be of any type.
  • The result of the aggregate will have the same type as the field specified.
  • An updatePredicate can be provided to specify whether or not a value is to be considered a potential minimum based on the value of the field in the current record and the current minimum value. If provided, the updatePredicate must be a function (provided in the form of a lambda function) and must match the requirements described in Advanced use of grouping functions.
  • The initialValue argument can be provided to specify an initial minimum value. If not specified, there is no initial minimum.
  • Example:

    #Configure fields

    #Calculate the minimum positive "Profit" value within the group and assign to #the output field "MinProfit"

     

    out1.MinProfit = group.min(fields.Profit, lambda agg, newValue: fn.gt(newValue, 0))

     

    #Process Records

    #Update the aggregation and set the values to the output record

     

    out1 += in1

max()

max(field[, includeNulls[, initialValue])

  • Calculate the maximum value of the specified field within the group.
  • The field may be of any type.
  • The result of the aggregate will have the same type as the field specified.
  • The optional includeNulls argument can be provided to specify whether or not the Null fields are to be considered in the calculation of the maximum. If provided, this argument must be a bool value. If not specified, includeNulls defaults to False The comparison used for the maximum calculation is the same as that used in the "gt" function in the fn module – meaning Null is less than any non-null value.
  • The initialValue argument can be provided to specify an initial maximum value. If not specified, there is no initial maximum.
  • Example:

    #Configure fields

    #Calculate the largest positive "Profit" value within the group and assign to #the output field "MaxProfit"

     

    out1.MaxProfit = group.max(fields.Profit)

     

    #Process Records

    #Update the aggregation and set the values to the output record



    out1 += in1

 

max(field[, updatePredicate[, initialValue])

  • Calculate the maximum value of the specified field within the group.
  • The field may be of any type.
  • The result of the aggregate will have the same type as the field specified.
  • An updatePredicate can be provided to specify whether or not a value is to be considered a potential maximum based on the value of the field in the current record and the current maximum value. If provided, the updatePredicate must be a function (provided in the form of a lambda function) and must match the requirements described in Advanced use of grouping functions.
  • The initialValue argument can be provided to specify an initial maximum value. If not specified, there is no initial maximum.
  • Example:

    #Configure fields

    #Calculate the oldest "Age" value within the group less than 18 and assign to #the output field "OldestChildAge"



    out1.OldestChildAge = group.max(fields.Age, lambda agg, newValue: fn.lt(newValue, 18))

     

    #Process Records

    #Update the aggregation and set the values to the output record

     

    out1 += in1

first()

first(field[, includeNulls[, initialValue])

  • Return the first value of the specified field within the group.
  • The field may be of any type.
  • The result of the aggregate will have the same type as the field specified.
  • The optional includeNulls argument can be provided to specify whether or not the Null fields can be used in the result. If provided, this argument must be a bool value. If not specified, includeNulls defaults to True.
  • The initialValue argument can be provided to specify an initial first value. If not specified, there is no initial value.
  • Example:

    #Configure fields

    #Retrieve the first "SalePrice" value within the group

    #ignoring any Null values and assign to the output field "FirstSalePrice"



    out1.FirstSalePrice = group.first(fields.SalePrice, False)

     

    #Process Records

    #Update the aggregation and set the values to the output record



    out1 += in1

 

first(field[, updatePredicate[, initialValue])

  • Return the first value of the specified field within the group.
  • The field may be of any type.
  • The result of the aggregate will have the same type as the field specified.
  • An updatePredicate can be provided to specify whether or not a value is eligible to be the "first" value based on the value of the field in the current record and the current first value.
  • The use of an updatePredicate for the first function is effectively telling the function to return the first field within the group which matches the criteria specified in the update predicate. If provided, the updatePredicate must be a function (provided in the form of a lambda function) and must match the requirements described in Advanced use of grouping functions.
  • The initialValue argument can be provided to specify an initial first value. If not specified, there is no initial value.
  • Example:

    #Configure fields

    #Retrieve the first positive "Price" value within the group and assign to #the output field "FirstPlusPrice"



    out1.FirstPlusPrice = group.first(fields.Price, lambda agg, newValue: fn.gt(newValue, 0))

     

    #Process Records

    #Update the aggregation and set the values to the output record



    out1 += in1

last()

last(field[, includeNulls[, initialValue])

  • Return the last value of the specified field within the group.
  • The field may be of any type.
  • The result of the aggregate will have the same type as the field specified.
  • The optional includeNulls argument can be provided to specify whether or not the Null fields can be used in the result. If provided, this argument must be a bool value. If not specified, includeNulls defaults to True.
  • The initialValue argument can be provided to specify an initial last value. If not specified there is no initial value.
  • Example:

    #Configure fields

    #Retrieve the last "SalePrice" value within the group ignoring any Null #values and assign to the output field "LastSalePrice"



    out1.LastSalePrice = group.last(fields.SalePrice, False)

     

    #Process Records

    #Update the aggregation and set the values to the output record



    out1 += in1

 

last(field[, updatePredicate[, initialValue])

  • Return the last value of the specified field within the group.
  • The field may be of any type.
  • The result of the aggregate will have the same type as the field specified.
  • An updatePredicate can be provided to specify whether or not a value is eligible to be the "last" value based on the value of the field in the current record and the current last value.
  • The use of an updatePredicate for the last function is effectively telling the function to return the last field within the group which matches the criteria specified in the update predicate. If provided, the updatePredicate must be a function (provided in the form of a lambda function) and must match the requirements described in Advanced use of grouping functions.
  • The initialValue argument can be provided to specify an initial first value. If not specified, there is no initial value.
  • Example:

    #Configure fields

    #Retrieve the last positive "Price" value within the group and assign to the #output field "LastPlusPrice"



    out1.LastPlusPrice = group.last(fields.Price, lambda agg,newValue: fn.gt(newValue, 0))

     

    #Process Records

    #Update the aggregation and set the values to the output record



    out1 += in1

joinStrings()

joinStrings(field[, includeNulls[, delim[, initialVal]]])

  • Concatenate the strings in the specified field within the group.
  • The field must be a str or unicode type.
  • The result of the aggregate will have the same type as the field specified.
  • The optional includeNulls argument can be provided to specify whether or not the Null fields can be used in the result. If provided, this argument must be a bool value. If not specified, includeNulls defaults to False. If Null values are included, they are treated as the empty string ('').
  • The delim argument can be provided to specify the delimiter to use when joining the string values. The delim argument, if supplied, must be a str or unicode type. If not supplied, the default delimiter "," is used.
  • The initialValue argument can be provided to specify an initial string value. If not specified there is no initial value. The initialValue argument, if supplied, must be a str or unicode type. This value is effectively prepended to the joinStrings result.
  • Example:

    #Configure fields

    #Concatenate all of the non-Null "State" fields within the group together, #separated by the semicolon ';' character, and prepended with 'Locations:'.

    #Assign this to the output field "States".

     

    out1.States = group.joinStrings(fields.State, False, ';', 'Locations:')

     

    #Process Records

    #Update the aggregation and set the values to the output record

     

    out1 += in1

 

joinStrings(field[, updatePredicate[, delim[, initialValue]]])

  • Concatenate the strings in the specified field within the group.
  • The field must be a str or unicode type.
  • The result of the aggregate will have the same type as the field specified.
  • An updatePredicate can be provided to specify whether or not a value should be included in the concatenated string based on the value of the field in the current record and the current last value. If provided, the updatePredicate must be a function (provided in the form of a lambda function) and must match the requirements described in Advanced use of grouping functions.
  • The delim argument can be provided to specify the delimiter to use when joining the string values. The delim argument, if supplied, must be a str or unicode type. If not supplied, the default delimiter "," is used.
  • The initialValue argument can be provided to specify an initial string value. If not specified, there is no initial value. The initialValue argument, if supplied, must be a str or unicode type. This value is effectively prepended to the joinStrings result.
  • Example:

    #Configure fields

    #Concatenate all of the 3-letter "ProductCode" fields within the group #together, separated by the semicolon ':' character.

    #Assign this to the output field "ProductCodes".

     

    out1.ProductCodes = group.joinStrings(fields.ProductCode, lambda agg,newValue: newValue is not Null and newValue.len()==3, ':')

     

    #Process Records

    #Update the aggregation and set the values to the output record

     

    out1 += in1

 

appliesTo()

appliesTo(aggregateName, valueType)

  • Returns (bool) whether or not the specified aggregate function is applicable for the specified valueType.
  • The aggregateName argument must be one of the aggregate function names ('count', 'sum', 'avg', 'stdev', 'variance', 'min', 'max', 'first', 'last').
  • The valueType argument must be a Python type.
  • Example:
    #ConfigureFields
    #Pass through all input fields
    out1 += in1
    
    #Check if the "stdev" function would be applicable for the input field "Data" #based on the type of the field "Data"
    stdevApplies = group.appliesTo('stdev', fields.Data.type())
    
    #if this is applicable, then calculate the sample standard deviation on that #field and assign to the output field "DataStdev"
    if stdevApplies:
        out1.DataStdev = group.stdev(fields.Data, False)
    
    #Process Records
    #Update the aggregation and set the values to the output record
    out1 += in1

keyFields()

keyFields(keyFieldsPropName, keyFieldsRunTimePropName, targetInput)

  • Returns a list of field metadata for all of the fields referenced in the specified record key property on the specified input.
  • A "record key" property must either be a simple field list, or a field list with optional substr, lower, upper and desc functions applied – as can be used in GroupBy and SortBy style properties.
  • The keyFieldsPropName argument must be a str or unicode value. This should reference the name of the record key property on the node. The keyFieldsPropName supplied is not used to retrieve the property and is only used for error messages if there are any errors retrieving the key fields.
  • The keyFieldsRuntimePropName argument must be a str or unicode value. This must reference the run time property name of the record key property on the node.
  • The targetInput argument must be an int value. This specifies the index (0-indexed) of the input on which the fields referenced in the key fields property are to be searched.
  • Example:
    #Configure fields
    #Configure to output all the fields referenced on the first input by the #property "GroupBy" which has the runtime property name #"ls.brain.nodes.transform.groupBy"
    out1 += group.keyFields('GroupBy', 'ls.brain.nodes.transform.groupBy', 0)
    
    #Process Records
    #Copy over the fields referenced in the GroupBy and setup on the output in #ConfigureFields
    
    out1 += in1

Advanced use of grouping functions

This section covers the more advanced use case of the grouping functions, including the details of the format and function of the updatePredicate. If you have not yet covered the basics, see Grouping functions.

 

When calling one of the group functions, the result is an aggregation object. The aggregation object maintains the internal state of the aggregation value.

The aggregation object can then be presented with observations to update the internal aggregate value. Whenever a new observation is provided to the aggregation object, the object first checks if the new value should be included in the aggregation. If the aggregation object has been configured using the includeNulls argument, the new value is checked to verify whether or not it is Null. If the aggregation object has been configured using the updatePredicate argument, the predicate function is called to determine whether or not the value should be included in the aggregation. Based on the result of these checks, if the new value is to be used as an observation in the aggregation, then the aggregate value is updated.

 

When an aggregation object is constructed within the ConfigureFields property and assigned to an output field on the output MetadataBuilder object, then most of the aggregation work is done implicitly. In this case, within the ProcessRecords script, each time the output record is updated with the input record (using e.g. out1 += in1) the node finds all aggregation objects that were defined on the MetadataBuilder. For each of these objects, it locates the current value on the input record corresponding to the input field configured in the aggregation and provides that value as an observation to the aggregation object. Then, the current value of the aggregation is set on the corresponding output field in the output record.

UpdatePredicate function

When an aggregation object is constructed using an updatePredicate argument, that argument must be a function. The function must take two arguments.

The first argument is the current value of the aggregation. For example, in the "max" function the current value would be the maximum value seen prior to the new observation within the current group. The type of the aggregation value depends on both the aggregation function used and the type of field over which the aggregation is defined.

The second argument is the value of the new observation. The type of the new observation will be the same as the type of the field over which the aggregation is defined.

The function must return a bool value specifying whether or not the new value should be used as an observation in the aggregate.

Since the updatePredicate argument must be provided as a function, these should be provided as a lambda function to the group aggregate function.

Constructing aggregates within ProcessRecords

The most common way to use the aggregate object is to simply assign it to an output field in the ConfigureFields property (see the examples in the Grouping functions section). When using this mechanism, as long as ProcessRecords contains the script to use the input record to update the aggregation (using e.g. out1 += in1), then the node will update the aggregation variables, setting the aggregate values on the output fields and resetting the aggregation at the end of a group.

However, when constructing the aggregation within the ProcessRecords script as opposed to within ConfigureFIelds, or when you wish to provide the update values to the aggregate directly (as opposed to using input fields), then the path is a little different.

All of the overloads mentioned in the previous section which take a "field" argument can also be constructed using a "type" argument.

When using a type argument, the specified type is the type of the resulting aggregation value. If assigned to an output field, this will be used as the type of the output field. The type must be valid for the type of aggregation used, e.g. the avg function will only accept a float type.

Using this mechanism, as there is no input field specified, the function must be provided with the update values which are to contribute to the aggregation. The includeNulls and updatePredicate arguments will still ensure that only the desired values will contribute to the aggregation, and the aggregate value will be reset by the node on group end as long as the update is invoked for each input record.

The following shows an example of how to use an aggregation directly within ProcessRecords.

Here, rather than constructing a simple average, we really want the average duration in seconds between two datetime fields "start" and "end".

ConfigureFields

#map all of the fields from input pin in1 onto output pin out1
out1 += in1

#setup the output field for the average duration
out1.avgDurationSeconds = float

ProcessRecords

#setup the aggregate on first exec
if node.firstExec:
    durationAgg = group.avg(float, False)

#copy all records over
out1 += in1

#Calculate the value to update in the aggregate
duration = Null
if fields.end is not Null and fields.start is not Null:
    duration = (fields.end - fields.start).total_seconds()

#Update the aggregate – must be called for each record if reset on end of group is to work
durationAgg += duration

#Set the agg value on the output field
out1.avgDurationSeconds = durationAgg