IWD ETL Jobs
The data in iWD Data Mart is made available through a process that is called Extract, Transform and Load—or ETL, for short. The system that is used to create, configure, and execute the ETL process is Kettle, which is part of the Pentaho reporting suite. Kettle top-level objects are known as jobs. Jobs are a sequence of steps that are executed according to success/failure criteria. One of the steps that is used by iWD reporting is to transform steps.
The following list describes the preconfigured ETL jobs that are responsible for creating and populating the iWD Data Mart and for calculating the various aggregates and dimensions that are described in the iWD Data Mart schema. Jobs names are those displayed in iWD Manager.
Job Function | Attribute | Description | ||
---|---|---|---|---|
Initialize iWD Data | Job Name | Initialize | ||
Function | Initializes the necessary data structures and populates static dimensions, such as the AGE, BUSINESS_VALUE, EVENT_DATE, EVENT_TIME, DATE_TIME, PRIORITY, STATUS, TASK_EVENT_TYPE, and TIMEZONE dimensions. | |||
Schedule | Runs once. | |||
Load Configuration | Job Name | Load Configuration | ||
Function | Loads iWD configuration updates from the following ETL parameters:
into the ETL_CUSTOM_MAP and CUSTOM_DIM Data Mart tables. This function loads iWD configuration updates that have been gathered from the iWD Manager configuration database into the following Data Mart tables: | |||
Schedule | Configurable through service properties; typically, runs on a 15-minute cycle, but not more frequently than a 15-minute cycle. | |||
Load Intraday | Job Name | Load Intraday | ||
Function | Loads updates from tables in the Interaction Server database and event log into the following core fact tables:
as well as the following dimensions: | |||
Schedule | Configurable through Service Properties in iWD Manager; recommended that it be scheduled to run after the Load Configuration job ends through the Job Dependency scheduling option. | |||
Aggregate Intraday | Job Name | Aggregate Intraday | ||
Function | Aggregates data that previously was loaded into fact tables by the Load Intraday job into the aggregation tables. | |||
Schedule | It is recommended that this job be scheduled immediately after the Load Intraday job has completed—typically, running every 15 minutes. The frequency of running this aggregate job does not have any bearing on the current 15-minute aggregate that is being populated. | |||
Aggregate Statistics | Job Name | Aggregate Statistics | ||
Function | Generate extended statistics by executing statistics plug-ins. | |||
Schedule | It is recommended that this job be scheduled immediately following completion of the Aggregate Intraday job, because most of the statistics plug-ins user aggregated facts. | |||
Load Historical | Job Name | Load Historical | ||
Function | Moves noncurrent data from the intraday fact tables to their corresponding historical fact tables. (“Noncurrent” refers to data other than today’s data [CREATED_DATE_KEY < today].)
Important Task facts must also be finalized (having reached Completed, Canceled, or Rejected state) before they can be moved from the intraday fact tables regardless of duration in the intraday tables. | |||
Schedule | Runs daily through the schedule that is defined in Service Properties. | |||
Aggregate Historical | Job Name | Aggregate Historical | ||
Function | Aggregates data for the historical DAY aggregation tables (H_TASK_aggr_FACT_DAY). | |||
Schedule | Runs once a day, after the Load Historical job. | |||
Maintain iWD | Job Name | Maintain | ||
Function | Removes expired facts from the following tables based on the values that are set in the detailsExpirationDays and aggregation15min ExpirationDays parameters:
(These parameters are defined as rules on the ETL Service property in iWD Manager.) This job also adds to DATE_TIME ensuring that next-year values are present in this table. | |||
Schedule | Runs once a day, after the Aggregate Historical job. | |||
Prune events and interactions | Job Name | Prune | ||
Function | Removes events from the Event Log database when tasks are archived—that is, when they are moved or deleted from Interaction Server. | |||
Schedule | Runs once a day, after the Maintain job. |
You can schedule ETL jobs to run:
- On a recurring basis by using a CRON expression.
- Manually.
- Upon the successful completion of a dependent service.
For more information on the configuration of scheduling ETL jobs, refer to the intelligent Workload Distribution 8.1 Deployment Guide.