API and script bindings

For both the ConfigureFields and ProcessRecords Python scripts the following variables are available:

Variable name Type Descripton
node Node The node object used for logging and failure control, properties access and explicit record writing.
inputs

When used in ConfigureFields , "inputs" is a collection of Metadata.

When used in ProcessRecords , "inputs" is a collection of Record objects.

Please note that in the Transform node these lists will only contain one item as there is only one input.

Access via: inputs['InputName'] or inputs[inputIndex]

outputs

When used in ConfigureFields, "outputs" is a collection of MetadataBuilder objects.

When used in ProcessRecords, "outputs" is a collection of output Record objects.

Access via: outputs['InputName'] or outputs[inputIndex]
patterns Pattern

Used in ConfigureFields to build patterns to search for input fields.

Can also be used in record key properties (i.e. in the Advanced tab of the Match Keys, GroupBy and SortBy properties).

fields

When used in ConfigureFields, "fields" contains the metadata for all of the fields in the inputs. In the Transform and Aggregate nodes, this is just one input, but in other nodes, this would contain references to all the fields against all inputs (unless explicitly restricted to a particular input).

When used in ProcessRecords, "fields" references the fields on input records, as opposed to field metadata.

Access the field metadata using fields['fieldName'] or fields.fieldName. In this example, the metadata for the input field "fieldName" is referenced.

 

fields.todict() can be used to construct a dictionary representation of the fields.

If there are multiple fields with the same name across the inputs to the node, any attempt to reference that field directly, or to use the todict() function will raise an error due to an ambiguous field reference. Otherwise, the key is the name of the field, and the value is the field metadata (in ConfigureFields) or the value of the field on the input record (in ProcessRecords).

fn Null-safe functions module. A set of Null-safe functions for data comparisons and string manipulation.
group Grouping module.

A set of functions for easy aggregation of data.

Note: The aggregation functions are only applicable to nodes which define some form of data grouping. For example, they can be used in the GroupBy property of the Aggregate and Transform nodes.

The aggregation functions are not available on nodes that do not have a GroupBy property, including the Merge node.

In addition, each input and output will be directly bound as a variable available within the Python script (e.g. in1, out1), as long as the input or output name meets the following criteria:

  • Is a valid python identifier (e.g. does not contain spaces).
  • Does not conflict with any of the other bound variables (node, inputs, outputs, patterns).
  • Does not conflict with any Python built-in type or keyword (e.g. print, raise, str, int).
  • Does not conflict with an imported module provided with Data360 Analyze.

If the input or output does not meet the above criteria, it must be accessed from the corresponding inputs or outputs collection, respectively.

The bound input and output variables will have the same type as the corresponding entry in the inputs/outputs collection – Metadata/Record for input in ConfigureFields/ProcessRecords respectively, and MetadataBuilder/Record for output in ConfigureFields/ProcessRecords respectively.

Each of these variables will be bound into ConfigureFields and then bound into ProcessRecords prior to each invocation of the ProcessRecords script. The scope of the local variable space is shared between the script in ConfigureFields and ProcessRecords. This means that any variable, function definition or import made within ConfigureFields will be available for use within the ProcessRecords script so long as the name does not conflict with one of the variables that are directly bound into ProcessRecords. This means, for example, that you can define a local variable "myVariable" in ConfigureFields then use that variable in the ProcessRecords script. However, if there is an input or output on the node named "myVariable", then the variable will be overwritten by the input/output bound into ProcessRecords.

 

For more information on the different object types, see:

Node objects

   
Members

properties

  • Used to access node properties.
  • Type: Properties

logger

  • Used to write messages to the node log.
  • Type: Logger

firstExec

  • Returns True for the first time the ProcessRecords script is invoked (on the first input record), thereafter returns False.
  • Type: bool

lastExec

  • Returns True for the last time the ProcessRecords script is invoked (on the last input record), otherwise returns False.
  • Type: bool

firstInGroup

  • Returns True for each time the ProcessRecords script is invoked where the record is the start of a new group – based on the GroupBy property otherwise returns False. If no GroupBy is specified, this will be the same as firstExec.
  • Type: bool

lastInGroup

  • Returns True for each time the ProcessRecords script is invoked where the record is the end of a group – based on the GroupBy property otherwise returns False. If no GroupBy is specified, this will be the same as lastExec.
  • Type: bool

execCount

  • Returns a count of the current record number in the input data set.
  • Type: long
Methods

write(outputName|outputIndex, outputRecord)

  • Writes the given record to the specified node output.
  • The output must be specified as the name (str) of the output or the index (int) of the output.
  • Arguments:
    outputNameType: str
    outputIndexType: int
    outputRecordType: Record
  • Return type: None

fail()

  • Marks the node as failed.
  • For readability, recommended use is: raise node.fail()
  • Indicates to the node that an error has occurred, and all necessary logging has been performed. No additional entries will be written to the log. Therefore, recommended usage is to first use node.logger.error(<message>) prior to calling node.fail()
  • Raises an exception. Also declared to throw the Exception.
  • Return type: Exception

Metadata objects

Metadata objects are a container of FieldMetadata.

The metadata object largely emulates a Python dict type with the key being the name of the field and the value, the field metadata.

The __len__, __getitem__, __iter__, __reversed__ and __contains__ functions are implemented on the Metadata type as defined in the Python documentation available at: https://docs.python.org/2/reference/datamodel.html#emulating-container-types* .

This means that along with being able to use the reversed and contains methods directly, the following operations are supported from the Python dict interface as defined in https://docs.python.org/2/library/stdtypes.html#mapping-types-dict*:

  • len(d)
  • d[key]
  • key in d
  • key not in d
  • iter(d)
   
Members

<fieldName>

  • Each field on the metadata is accessible directly as a member as long as it does not contain spaces.
  • The field metadata can be used in the += and -= operators on a MetadataBuilder to construct the output metadata. In addition, FieldMetadata can be directly assigned to an output field on a metadata builder, for example by using the following syntax: out1.newField = in1.inputField
  • Type: FieldMetadata

 

The Metadata itself is also just a collection of fields such that individual fields can be accessed using either of the following forms:

  • metadata['fieldName']
  • metadata[fieldIndex]
Methods

todict()

  • Constructs and returns a new dictionary representing the Metadata.
  • The keys in the dictionary are the field names.
  • The values in the dictionary are the corresponding FieldMetadata objects.
  • Return type: dictionary

intersection(other)

  • Generally, the intersection method is only useful on nodes with multiple inputs (e.g. the Merge node).
  • The "other" argument must be either a Metadata or a MetadataBuilder object.
  • When a MetadataBuilder object is provided as the "other" argument, the intersection operation operates on all of the output fields which exist on the MetadataBuilder at that time.
  • A field within this object is considered to be part of the intersection if there is a field in the "other" object with the same (case-insensitive) field name.
  • The list of FieldMetadata returned can be used to update an output MetadataBuilder or provided as an argument to the rename method on FieldPattern objects for renaming purposes.
  • Returns a list of FieldMetadata objects containing all of the fields in this metadata which are also in the provided object.
  • Arguments:
    otherType: Metadata or MetadataBuilder
  • Return type: list
  • Example:

     

    #Add all of the fields from the input 'leftInput' to the output 'matches'

     

    matches += leftInput

     

    #Locate all fields in input 'rightInput' which are also in 'leftInput'

     

    conflicts = rightInput.intersection(leftInput)

     

    #Add these fields to the "matches" output, renaming them to have the prefix #"Right."

     

    matches += patterns.all(conflicts).rename('Right.$0')

difference(other)

  • Generally, the difference method is only useful on nodes with multiple inputs (e.g. the Merge node).
  • The "other" argument must be either a Metadata or a MetadataBuilder object.
  • When a MetadataBuilder object is provided as the "other" argument, the difference operation operates on all of the output fields which exist on the MetadataBuilder at that time.
  • A field within this object is considered to be part of the intersection if there is a field in the "other" object with the same (case-insensitive) field name.
  • The list of FieldMetadata returned can be used to update an output MetadataBuilder or provided as an argument to the rename method on FieldPattern objects for renaming purposes.
  • Returns a list of FieldMetadata objects containing all of the fields in this metadata which are not in the provided object.
  • Arguments:
    otherType: Metadata or MetadataBuilder
  • Return type: list
  • Example:

     

    #Add all of the fields from the input 'leftInput' to the output 'matches'

     

    matches += leftInput

     

    #Add all of the fields from the input 'rightInput' which are not on the #'leftInput' to the output 'matches'

     

    matches += rightInput.difference(leftInput)

 

FieldMetadata objects

   
Methods

name()

  • Returns the name of the field.
  • Return type: str

 

type()

  • Returns the type of the field (for example, str).
  • Return type: Python type

MetadataBuilder objects

The metadata builder is used for building an output metadata.

The metadata builder object largely emulates a Python dict type with the key being the name of the field and the value, the field metadata.

The __len__, __getitem__, __iter__, __reversed__, __setitem__, __delitem__ and __contains__ functions are implemented on the Metadata type as defined in: https://docs.python.org/2/reference/datamodel.html#emulating-container-types*

This means that along with being able to use the reversed and contains methods directly, the following operations are supported from the Python dict interface as defined in https://docs.python.org/2/library/stdtypes.html#mapping-types-dict*:

  • len(d)
  • d[key]
  • d[key] = value
  • del d[key]
  • key in d
  • key not in d
  • iter(d)

The node takes care of constructing the output metadata from the builder, therefore there are generally no members and only a few methods required on the metadata builder. However, there are various useful operations, as follows:

   
Operations

metadataBuilder = metadataBuilder + obj

metadataBuilder += obj

  • Configures the MetadataBuilder to add all of the fields from the provided obj to the output metadata.
  •  Allowable obj types are:
    • Metadata – uses all fields from the provided input metadata
      • out1 += in1
    • String – treats the string as a pattern and includes all input fields matching that pattern
      • out1 += 'Account*'
    • FieldMetadata – includes the provided field
      • out1 += in1.Account
    • FieldPattern (wildcard/regex/all pattern) adds all input fields matching the pattern, with optional renaming of the output fields via a rename pattern
      • out1 += patterns.regex('.*?(due|present).*?Date')
      • out1 += patterns.all(in2).rename('Right.\g<0>')
    • List – adds all elements from within the list type provided, where each element must be a type that can be appended to the metadata builder
      • out1 += [in1.name, in1.address]
      • out1 += in1.difference(in2)
    • Dictionary– adds each of the elements of the dictionary, where the key must be the output field name to add, and the value must be either a FieldMetadata or a type object
      • out1 += {'newField' : str, 'newField2' : in1.inputField}

Note: When using the += operator, an error will be raised if any of the fields being added to the output would have the same (case-insensitive) output field name as one which already exists.

When adding a new field, if a field with the same (case-insensitive) name already exists, to remove the existing field and add the new field, explicitly name the new field by using the following syntax:
out1.<fieldname> = obj

 

metadataBuilder = metadataBuilder - obj

metadataBuilder -= obj

  • Configures the MetadataBuilder to subtract all of the fields from the provided obj to the output metadata.
  • The allowable types and operation follows the same rules for the '+' and '+=' cases outlined above.

 

metadata.<name> = type

  • Configures the MetadataBuilder to add a new field with the specified name and type
    • out1.newField = str

 

metadata.<name> = FieldMetadata

  • Sets up the mapping from the provided input field metadata to the new output field
    • out1.newField = in1.oldField
Methods

todict()

  • Constructs and returns a new dictionary representing the MetadataBuilder.
  • The keys in the dictionary are the field names.
  • The values in the dictionary are the corresponding FieldMetadata objects.
  • Return type: dictionary

 

Record objects

Record objects are a container of field values.

The record object largely emulates a Python dict type with the key being the name of the field and the value, being the value for that field on the record.

The __len__, __getitem__, __iter__, __reversed__, __setitem__ and __contains__ functions are implemented on the Metadata type as defined in: https://docs.python.org/2/reference/datamodel.html#emulating-container-types*

This means that along with being able to use the reversed and contains methods directly, the following operations are supported from the Python dict interface as defined in https://docs.python.org/2/library/stdtypes.html#mapping-types-dict*:

  • len(d)
  • d[key]
  • d[key] = value
  • key in d
  • key not in d
  • iter(d)
   
Members

<fieldName>

  • The value of each field on the record is accessible directly as a member as long as it does not contain spaces.

The Record itself is also just a collection of field values such that individual field values can be accessed using either of the following forms:

  • record['fieldName']
  • record[fieldIndex]
Operations

record = record + obj

record += obj

  • Operation only valid on output records. Input records are read-only.
  • Configures the record to add all the field values from the provided obj.
  •  Allowable obj types are:
    • Record – sets all of the fields on the output record from the provided input record. The mapping from the input corresponding to the specified record must have been defined in the metadata builder for the output corresponding to this record
      • out1 += in1
    • Dictionary – adds each of the elements of the dictionary, where the key must be the name of the output field and the value is the value to set. All of the specified fields must have been setup in the metadata builder for the output corresponding to this record
      • out1 += {'myStrField' : 'foo', 'myIntField' : 3}

 

record.fieldName = value

record['fieldName'] = value

record[fieldIndex] = value

  • Sets the specified field (via index or name) to the provided value.
Methods metadata()
  • Returns the metadata for the record.
  • Return type: Metadata

todict()

  • Constructs and returns a new dictionary representing the Record.
  • The keys in the dictionary are the field names.
  • The values in the dictionary are the corresponding field values on the Record.
  • Return type: dictionary

 

Logger objects

   
Methods

debug(message)

  • Writes a debug message to the node log.
  • Arguments:
    messageType: str or unicode
  • Return type: None

info(message)

  • Writes an info message to the node log.
  • Arguments:
    messageType: str or unicode
  • Return type: None

warn(message)

  • Writes a warning message to the node log.
  • Arguments:
    messageType: str or unicode
  • Return type: None

error(message)

  • Writes an error message to the node log.
  • Arguments:
    messageType: str or unicode
  • Return type: None

Properties objects

   
Methods

All property methods work by taking both the Run Time Property name of the property and the property name.

The node retrieves the property via the Run Time Property name, however any errors on property retrieval are reported against the (often more user friendly) property name.

 

isSet(propName, runtimePropName)

  • Returns (bool) whether or not the specified property is set.
  • Arguments:
    propNameType: str or unicode
    runtimePropNameType: str or unicode
  • Return type: bool

 

getString(propName, runtimePropName[, default])

  • Returns the string value of the specified property.
  • If the property does not exist and no default is provided an error is raised, otherwise the default is returned.
  • Arguments:
    propNameType: str or unicode
    runtimePropNameType: str or unicode
    defaultType: str or unicode
  • Return type: str

getInt(propName, runtimePropName[, default])

  • Returns the integer value of the specified property.
  • If the property does not exist and no default is provided an error is raised, otherwise the default is returned.
  • If the property exists but is not a valid integer, an error is raised.
  • Arguments:
    propNameType: str or unicode
    runtimePropNameType: str or unicode
    defaultType: int
  • Return type: int

getBool(propName, runtimePropName[, default)

  • Returns the boolean value of the specified property.
  • If the property does not exist and no default is provided an error is raised, otherwise the default is returned.
  • This will treat any value matching "true" case-insensitively as True. Any other value will be treated as False.
  • Arguments:
    propNameType: str or unicode
    runtimePropNameType: str or unicode
    defaultType: bool
  • Return type: bool

Patterns objects

   
Methods

The Patterns object has simple methods for constructing and returning FieldPattern objects which can then be provided to the MetadataBuilder += and -= operators to include or exclude input fields that match the pattern.

The Patterns object can also be used in record key properties (in the Advanced tab of the Match Keys, GroupBy and SortBy properties). This allows you to specify multiple input fields on which to join, group or sort the input data based on fields that match the pattern. For example, in the Aggregate node you can specify a wildcard pattern in the Advanced tab of the GroupBy property to group the input data by all fields that match the pattern. Note: If a pattern is specified it must match at least one input field name.

 

wildcard(pattern[, metadata])

  • Constructs and returns a FieldPattern which matches fields in the specified input metadata – or across all input metadata if no metadata is provided - using the provided wildcard pattern.
  • The "pattern" argument must be a string pattern and must be a valid wildcard pattern as can be used by the Python "fnmatch" module.
  • The "metadata" argument must be a Metadata, MetadataBuilder object or a list of FieldMetadata objects.
  • Arguments:
    patternType: str
    metadataType: Metadata, MetadataBuilder or FieldMetadata
  • Return type: FieldPattern
  • Example:

     

    #Add all fields from input 'in1' which are prefixed with 'Total' to output #'out1'

     

    out1 += patterns.wildcard('Total*', in1)

all([metadata])

  • Constructs and returns a FieldPattern to match all fields in the specified input metadata – or match all input fields across all input metadata if no metadata argument is provided.
  • The "metadata" argument must be a Metadata, MetadataBuilder object or a list of FieldMetadata objects.
  • Arguments:
    metadataType: Metadata, MetadataBuilder or FieldMetadata
  • Return type: FieldPattern
  • Example:

     

    #Add all fields from all inputs

     

    out1 += patterns.all()

regex(pattern[, metadata])

  • Constructs and returns a FieldPattern which matches fields in the specified input metadata – or across all input metadata if no metadata is provided - using the provided regular expression.
  • The "pattern" argument must be a valid string regular expression pattern as could be used in the Python "re" module.
  • The "metadata" argument must be a Metadata, MetadataBuilder object or a list of FieldMetadata objects.
  • Arguments:
    patternType: str
    metadataType: Metadata, MetadataBuilder or FieldMetadata
  • Return type: FieldPattern
  • Example:

     

    #Add all fields from input 'in1' which have 'Total' anywhere in their name #followed by a whitespace character

     

    out1 += patterns.regex('.*?Total\\s.*', in1)

FieldPattern objects

   
Methods

A FieldPattern object is returned as the result of the various method calls available on the Patterns object. FieldPattern objects are generally used as arguments to the MetadataBuilder += and -= operators to include or exclude input fields that match the pattern, and can also be used in record key properties (in the Advanced tab of the Match Keys, GroupBy and SortBy properties). The FieldPattern object also allows for pattern renaming.

 

rename(renamePattern)

  • Augments and returns this FieldPattern object by adding renaming functionality using the specified renamePattern.
  • The "renamePattern" argument must be a standard Python string renaming pattern – as could be used by the "sub" method in the "re" module. For example, the '\g<0>' rename pattern will return the entire matched string.
  • The resulting FieldPattern object can be used in MetadataBuilder += operations to add all of the fields matching the pattern used to construct this FieldPattern object, renamed on the output metadata using the provided renamePattern, and can also be used in record key properties.
  • Arguments:
    renamePatternType: str
  • Return type: FieldPattern
  • Example:

     

    #Add all fields on input 'in1' starting with the string 'Total' renamed to #replace

    #'Total' with 'SalesAmount' on the output 'out1'

     

    out1 += patterns.regex('^(Total)(.*)', in1).rename('SalesAmount\2')

Tip: Standard Python regular expressions are available via the "re" module.

(* links correct at time of publishing).

Null handling

All Null fields on data records are bound in as special Null objects in Python — not the Python None value.

 

To check if a value is Null, use the following syntax:

if in1.MyField is Null:

Non-ASCII characters

When working with the Python-based nodes, it is important to be aware of the following Python language notation if your data contains non-ASCII characters:

Tip: Python literals which include non-ASCII characters should be prefixed with a 'u' character to convert the string to a Unicode string.

For example, if you had the field name 'colör' as input to a Transform node, the following code would cause the node to fail:

out1.color = str(fields['colör'])

Instead, you would need to use the 'u' character prefix to convert the colör field to a Unicode string:

out1.color = str(fields[u'colör'])