extract-data-thread-pool-size
Section: gim-etl
Default Value: 32
Valid Values: Any positive integer, as appropriate for your environment
Changes Take Effect: On the next ETL cycle
Dependencies: None
Specifies the maximum number of worker threads that are used to extract data concurrently. This option does not set a strong limit on the total number of threads that will be used in extraction processing, because certain extraction algorithms create additional helper threads. Instead, this option specifies the maximum number of logical partitions for concurrent extraction of subsets of data.
Database Access Point
Also known as a DAP. An object that provides the interface between an application in the Genesys installation and the databases to which the application requires access. Refer to the Framework Database Connectivity Reference Guide for more information about configuring and using DAPs.
Glossary
update-stats-schedule
Section: schedule
Default Value: 0/10 *
Valid Values: A valid CRON expression
Changes Take Effect: Immediately
Dependencies: run-update-stats
Defines the time intervals at which Job_UpdateStats will run. The job will start and then run periodically in accordance with this schedule. By default, the job runs every 10 minutes throughout the day. Job_UpdateStats can run in conjunction with the ETL jobs, but not in conjunction with Job_MaintainGIM.
The schedule is defined in the format of a CRON expression that represents a set. The expression comprises two fields, which are separated by whitespace:
- The first field specifies minutes. Valid values are 0–59 and optional special characters (see below).
- The second field specifies hours. Valid values are 0–23 and allowed special characters.
The following special characters are allowed in the CRON expression:
- , (comma)—Separates items in a list. For example, specifying the first field (minutes) as 0,30,45 means the 0th, 30th, and 45th minutes of the hour.
- - (hyphen)—Defines a range. For example, specifying the first field (minutes) as 30-35 means every minute between the 30th and 35th minute of the hour, inclusive; this is the same as specifying 30,31,32,33,34,35.
- * (asterisk)—Indicates that the CRON expression will match for all values of the field. For example, specifying the second field (hours) as * means every hour in the day.
- / (forward slash)—Describes increments. For example, specifying the first field (minutes) as 0/10 means the 0th minute of the hour and every 10 minutes thereafter.
run-update-stats
Section: schedule
Default Value: false
Valid Values: true, false
Changes Take Effect: Immediately
Dependencies: None
Specifies whether Job_UpdateStats will run in PostgreSQL deployments, at the time and intervals specified by the update-stats-schedule option.
Database Access Point
Also known as a DAP. An object that provides the interface between an application in the Genesys installation and the databases to which the application requires access. Refer to the Framework Database Connectivity Reference Guide for more information about configuring and using DAPs.
Glossary
extract-data-thread-pool-size
Section: gim-etl
Default Value: 32
Valid Values: Any positive integer, as appropriate for your environment
Changes Take Effect: On the next ETL cycle
Dependencies: None
Specifies the maximum number of worker threads that are used to extract data concurrently. This option does not set a strong limit on the total number of threads that will be used in extraction processing, because certain extraction algorithms create additional helper threads. Instead, this option specifies the maximum number of logical partitions for concurrent extraction of subsets of data.
Database Access Point
Also known as a DAP. An object that provides the interface between an application in the Genesys installation and the databases to which the application requires access. Refer to the Framework Database Connectivity Reference Guide for more information about configuring and using DAPs.
Glossary
Extract, Transform, And Load
Also known as ETL. The ETL processes extract data from various data sources; transform the data into a format and structure that is suitable for subsequent business purposes; and load the data into a target data store (other database, data mart, or data warehouse).
Glossary
Relational Database Management System
Also known as an RDBMS. Management of data through tables of rows and columns that correlate to data files. These tables all relate to each other through at least one common field.
Plural form: RDBMSs.
Glossary
Optimizing Database Performance: Database Tuning
The performance of the Genesys Info Mart jobs is greatly affected by the performance of the RDBMS server. Before you create the target database, tune your RDBMS server for optimal performance.
This page provides recommended parameter settings and tuning guidelines that you can use to improve the Info Mart ETL execution time. It also includes suggestions for your database administrator for managing the target database after it is deployed.
Tuning the Info Mart Database
There are several database parameters that enable the ETL jobs to load the Info Mart database successfully. This section provides the recommended parameter settings for each RDBMS:
- Tuning the Info Mart database on Microsoft SQL Server
- Tuning the Info Mart database on Oracle
- Tuning the Info Mart database on PostgreSQL
See also Additional Considerations.
Procedure: Tuning the Info Mart database on Microsoft SQL Server
Purpose: To set Microsoft SQL Server RDBMS parameters so that the ETL jobs load the Info Mart database successfully.Prerequisites
- Create the Info Mart database schema, as instructed in Preparing the Info Mart Database.
- Use Microsoft SQL Server Enterprise Manager to review the settings of the Microsoft SQL Server properties.
Steps
Allocate sufficient memory.
The ETL jobs issue many complex SQL queries against several Info Mart database tables. The amount of memory that you allocate to the database server is critical to the performance of these SQL queries. Allocate as much memory as possible to Microsoft SQL Server without causing paging. Optimal settings for your environment depend on the hardware and data volumes. As a quick approximation:
- Start with the total real memory on the database server.
- Subtract the memory that is required by the operating system and any other applications.
- Set the maximum memory that is allocated to Microsoft SQL Server to the result.
Select relevant server settings.
Some of the SQL commands that are issued by the ETL jobs are long running. To ensure that their cost is not limited by Microsoft SQL Server, disable Use query governor to prevent queries exceeding specified cost.
Configure connections.
The ETL jobs use many concurrent database connections. Set Maximum concurrent user connections to 0 (unlimited), but see also Note about connections for extraction, below.
- Review the settings of the following database properties:
- Data Files: Select Automatically grow file and Unrestricted file growth.
- Transaction Log: Select Automatically grow file and Unrestricted file growth.
- Options — Settings: Select Auto update statistics and Auto create statistics.
Options — Recovery Model: Take into account the following considerations.
The ETL jobs generate large amounts of database activity, with a correspondingly large database log space. The amount of log space that is needed depends on the recovery model and on the frequency of log file backups. The full-recovery model provides the most protection against data loss due to failures, but it requires the most log space. Performing daily log file backups can help limit the size of the transaction logs.
In general, you control the size of a transaction log in one of the following ways:
- If you are maintaining a log backup sequence for full or bulk-logged recovery models, schedule BACKUP LOG statements to occur at intervals that will keep the transaction log from growing past the desired size.
- If you are not maintaining a log backup sequence, specify the simple recovery model.
For a more complete description of the recovery models, and for information about how to manage transaction logs, see the Microsoft SQL Server documentation.
(Optional, but recommended) Configure the database to use the READ COMMITTED isolation level.
In Microsoft SQL Server deployments, the transformation job might record duplicate rows in Info Mart fact tables because of the inability of Microsoft SQL Server to finish rollback correctly when deadlock happens during execution of rollback. To minimize locking contention, Genesys recommends that you use the READ COMMITTED isolation level, with the READ_COMMITTED_SNAPSHOT database option set to ON.
To configure this database setting, execute the following SQL statement:
ALTER DATABASE <name_of_gim_db> set READ_COMMITTED_SNAPSHOT ON;
- Consult with your database administrator to further fine-tune these and other parameters if you find the performance of the ETL jobs in your environment unacceptable. See also Additional Considerations.
Next Steps
Configure the required DAPs. For more information, see Configuring Required DAPs.
Procedure: Tuning the Info Mart database on Oracle
Purpose: To set Oracle RDBMS parameters so that the ETL jobs load the Info Mart database successfully.Prerequisites
- Create the Info Mart database schema, as instructed in Preparing the Info Mart Database.
Steps
- Set the Oracle initialization parameters:
- filesystemio_options = ASYNCH
- processes = at least 1000, but see also Note about connections for extraction, below
- sessions = at least 1000, but see also Note about connections for extraction, below
- open_cursors = at least 1000
(For use with database links) open_links = at least the value of extract-data-thread-pool-size.
If your deployment uses database links, configure the open_links parameter to allow a sufficient number of connections through the database links. For more information about how to configure your deployment to use database links, see Optimizing Database Performance: Database Links.
Allocate sufficient memory.
The ETL jobs issue many complex SQL queries against several Info Mart database tables. The amount of memory that you allocate to the database server buffers is critical to the performance of these SQL queries. The optimal settings for your environment depend on the hardware and data volumes. As a quick approximation:
- Start with the total real memory on the database server.
- Subtract the amount of memory that is required by the operating system and any other applications.
- Split the result between the buffer cache and the PGA aggregate target.
Increase the size of the listener queue for TCP/IP connection requests.
The need to modify the listener configuration depends on deployment-specific factors such as hardware and the configured or desired degree of parallelism. See the Oracle documentation for information about the conditions under which you should increase the queue size in the listener configuration file, listener.ora, as well as the syntax for doing so. See also Performance tuning for a summary of Genesys Info Mart configuration options that control the degree of parallelism for ETL processing.
- Genesys recommends that you use case-sensitive settings for NLS_SORT and NLS_COMPARE.
- Ensure that you allocate sufficient space for the SYSTEM tablespace to accommodate long-term operations, as explained here.
- Consult with your database administrator to further fine-tune these and other parameters, including JVM startup parameters such as queryParallelism (see Modifying JVM Startup Parameters), if you find the performance of the ETL jobs in your environment unacceptable. See also Additional Considerations.
Next Steps
Configure the required DAPs. For more information, see Configuring Required DAPs.
Procedure: Tuning the Info Mart database on PostgreSQL
Purpose: To set PostgreSQL RDBMS parameters so that the ETL jobs load the Info Mart database successfully.Prerequisites
- Create the Info Mart database schema, as instructed in Preparing the Info Mart Database.
Steps
Set the database server configuration parameters in the postgresql.conf file:
- checkpoint_segments = 64
- synchronous_commit = off
- wal_buffers = 256MB
- max_connections = 1000, but see also Note about connections for extraction, below
- constraint_exclusion = partition
- default_statistics_target = 100
- enable_mergejoin = off
- temp_buffers = 128MB
- max_prepared_transactions = 1000
- work_mem = A value in the range 256MB–1GB*
- maintenance_work_mem = A value in the range 64MB–256MB*
Allocate sufficient memory.
The ETL jobs issue many complex SQL queries against several Info Mart database tables. The amount of memory that you allocate to the database server buffers is critical to the performance of these SQL queries. The optimal settings for your environment depend on the hardware and data volumes. As a quick approximation:
- Start with the total real memory on the database server.
- Subtract the amount of memory that is required by the operating system and any other applications. The result is the available memory for PostgreSQL.
- Set effective_cache_size to 75 percent of available memory and shared_buffers to 25 percent of available memory.
Configure autovacuum settings.
Autovacuum is a background process that performs several important tasks, including producing table statistics for the query planner, recovering disk space for changed rows, and preventing transaction ID wraparound. To balance these tasks with the needs of the ETL jobs, set the following combination of autovacuum parameters in the postgresql.conf file:
- autovacuum = on
- autovacuum_analyze_threshold = 10000
- autovacuum_freeze_max_age = 1000000000
- autovacuum_max_workers = 10
- autovacuum_naptime = 20s
- autovacuum_vacuum_cost_delay = 10ms
- autovacuum_vacuum_cost_limit = 1000
- autovacuum_vacuum_scale_factor = 0.3
- autovacuum_vacuum_threshold = 100000
- vacuum_freeze_min_age = 10000000
- vacuum_freeze_table_age = 800000000
- Consult your database administrator to further fine-tune these and other parameters if you find the performance of the ETL jobs in your environment unacceptable. See also Performance tuning for a summary of Genesys Info Mart configuration options that control the degree of parallelism for ETL processing, as well as Additional Considerations.
Next Steps
Configure the required DAPs. For more information, see Configuring Required DAPs.Note about connections for extraction
For reasons described under Database Connections, the extraction job in particular uses a large number of connections to the Info Mart database and IDBs. Genesys strongly recommends the connection-related settings specified above. However, if you want to refine your estimates of resource usage, the following calculation approximates the theoretical maximum number of processes and sessions (for Oracle) or connections (for Microsoft SQL Server and PostgreSQL) the extraction job might use.
min(80,extract-data-thread-pool-size) * Number of IDBs + 10 + extract-data-thread-pool-size
where:
- extract-data-thread-pool-size is the value of the extract-data-thread-pool-size configuration option.
- Number of IDBs equates to the number of extraction DAPs in the Genesys Info Mart application's connections.
- The formula adds:
- 10 connections used for merging extracted data, as well as for other ETL needs.
- The extract-data-thread-pool-size value, to account for the connections to the Info Mart database that are used to write data extracted from IDBs into the Info Mart database.
If the IDBs and Info Mart database are hosted by the same RDBMS instance, double the number of IDB connections.
Additional Considerations
In addition to the previously listed database tuning requirements, you might also need to do the following:
- Periodically update statistics on the Info Mart fact tables. Failure to update them periodically can have a negative impact on the performance of end-user queries. Enable the automatic gathering of statistics on Info Mart tables if your RDBMS supports this feature.
- Have database administrators actively manage Genesys Info Mart after it is deployed.