Author: Kun Cheng, Lonnye Bower, Konstantin Dotchkoff, Roger Toren
Reviewers: Peter Carlin, Nicholas Dritsas, Prem Mehra
 
Background
TEMENOS T24 is a complete banking solution designed to meet the challenges faced by financial institutions in today’s competitive market. By working with Microsoft, Temenos was able to take advantage of the latest Windows and SQL Server technologies to tune T24 and run it well on Microsoft platform. The lessons learned in the tech note were derived from T24 solution tuning engagement, but most of them apply to other typical OLTP workloads as well.
Benchmark Overview

The benchmark environment, created to reflect real-world retail banking activity volumes, was made up of 25 million accounts and 15 million customers across 2,000 branches. At peak performance, the system processed 3,437 transactions per second (TPS) in online business testing and averaged a record-breaking 5,203 interest accrual and capitalizations per second during COB testing, processing 25 million accounts in less than two hours. The maximum CPU utilization of the database server during the peak hour did not exceed 70%, providing considerable additional capacity. In addition, the testing demonstrated near linear scalability (95 percent) in building up toward the final the hardware configuration.

 

T24 Architecture

The T24 solution consists of several layers, as shown below, including:
• User access
• Presentation (clients)
• Messaging/connectivity (web servers)
• Application (application servers)
• Database (database servers)
The application layer accepts messages in a Temenos-specific format called Open Financial Services (OFS). All requests, from a web browser or from a non-web client, are translated into the OFS format and then submitted to the application layer. The communication between the messaging/connectivity layer and the application layer depends on the specific deployment and can use various channels, including message queues, web services, and a native direct connection between the two layers.
T24 was originally designed to use jBASE, a multidimensional database that uses records consisting of fields, multi-values (multi-valued lists), and sub-values. OFS messages are transformed into the internal record format and processed by the application layer; the records are then stored in a jBASE database. When SQL Server, a supported database system, is used, the jBASE records are transformed into XML format (or in some cases left as BLOBs) and are stored in the database.
 

1.    SQL Server File Configuration

1.1        Configure Data Files

The filegroup used for the T24 data should be composed of multiple files. Best practice is to use one file for every two CPU cores on computer systems with 32 or more cores. On computer systems with less than 32 cores, use the same number of files as the number of CPU cores (the ratio should be 1:1). The data files should be equal in size. Note that the out-of-the-box configuration uses only one file in the primary filegroup, so you need to add additional files for optimal configuration.
Pre-allocate enough space in the data files based on the initial size of the computer system. Monitor the database free space and if necessary extend each file simultaneously so that all of the files have the same amount of free space. SQL Server optimizes writes by spreading its write operations across the files based on the ratio of free space among the files, so extending all files at once maintains this optimization.
Leave the autogrowth setting on as an “insurance policy” so that SQL Server does not stop when it runs out of space; however, do not rely on autogrowth to extend the database files as a standard way of operating. While you should not allocate space for the data files in small units, if you allocate in very large units during autogrowth, the application must wait (possibly several minutes) while the space is allocated. Since you cannot control when autogrowth engages, allocate only by the space needed for a few days of operations.

1.2        Configure Log File

The transaction log file, generally a sequentially written file, must be written as quickly as possible—even before the data is written to the data files (the data portion can be rebuilt from the log if necessary). While there is no performance benefit from using more than one file, multiple files can be beneficial for maintenance purposes (for example, if you are running out of space on the log drive). Adding physical devices to support the LUN can benefit performance.

1.3        Configure tempdb Files

SQL Server tempdb files are used for the storage of temporary data structures. The tempdb files are responsible for managing temporary objects, row versioning, and online index rebuilds. T24 uses a read-committed snapshot isolation level as its default isolation level, which uses row versioning. For more information, see Isolation Levels in the Database Engine.
To ensure efficient tempdb operation:

·         Create one tempdb file per physical CPU core.
This reduces page free space (PFS) contention.

·         Pre-size the tempdb files, and make the files equal in size.

·         Do not rely on autogrow.

·         Use startup trace flag 1118.
For more information about this SQL Server trace flag, see the article
Concurrency Enhancements for the tempdb Database.

For information on how to set startup settings for SQL Server, see the article Configure Server Startup Options (SQL Server Configuration Manager.
For further information, see the MSDN article Optimizing tempdb Performance.

2.     SQL Server Memory Configuration

2.1        SQL Server Memory Settings

Configure the SQL Server “max server memory (MB)” setting by taking the amount of memory allocated to the database system and subtracting one GB for every four cores (round up). This leaves the operating system with enough memory to work efficiently without having to “grab” memory back from SQL Server. For example, if the server has 64 GB of RAM and 24 cores, set the maximum memory to 58 GB (64 GB minus 6 [24 cores divided by 4]).

2.2        Lock Pages in Memory

To reduce SQL Server paging, you can grant the SQL Server service account “Lock Pages in Memory” privilege through the Windows Group Policy editor.
For detailed instructions, see How to reduce paging of buffer pool memory in the 64-bit version of SQL Server on the Microsoft® Support site.

3.     Recovery Interval Change

Increasing the recovery interval server configuration option causes the checkpoint process to occur less often. This can reduce the I/O load driven by checkpoints and improve the overall performance. During lab testing, a recovery interval of 510 minutes has been determined to be the best setting for T24.
Before changing the recovery interval, you should consider its implication on the mean time to recovery and recovery point objectives. Note that when using failover clustering, a longer recovery interval also influences the failover time of the database instance.
For more information about the recovery interval option, see the article Recovery Interval Option.

4.     Use Trace Flag 834

On computer systems with 64 or more CPU cores, use startup trace flag 834. When this trace flag is set, SQL Server uses Windows large-page memory allocations for the buffer pool. Allocating buffer pages is expensive, and turning on trace flag 834 boosts performance.
For more information about this SQL Server trace flag, see Microsoft Support Article 920093

5.     Enable Receive-Side Scaling

You should enable Receive-Side Scaling (RSS) on the SQL Server network interface card (NIC) that is serving the application servers. This setting is found on the Advanced Property tab of the network card. Also, be sure that offloading options are enabled. See the Microsoft® Developer Network (MSDN®) articles Introduction to Receive-Side Scaling and Receive-Side Scaling Enhancements in Windows Server 2008 for more information. If your NIC does not support these options, consider replacing it with one that does.
You should configure the maximum number of RSS processors by setting the MaxNumRssCpus registry key value to 8 on a computer system with 32 or more CPU cores. For computer systems with less than 32 cores, use the default setting.
The RSS base CPU number (RssBaseCpu) is the CPU number of the first CPU that RSS can use. RSS cannot use the CPUs that are numbered below the base CPU number. You should set RssBaseCpu carefully so it does not overlap with the starting CPU.
Lab testing has shown good results with setting both registry key values to 8 (on a computer system with more than 32 cores); this means that 8 RSS processors are used starting with core number 8 to process network traffic.
Note: You should use the Windows RSS registry keys to configure these values instead of NIC settings because NIC settings can be overridden by the Windows registry keys.

6.     Index Fill-factor Change

In high-volume deployments (installations with 10 million accounts and more) of T24, you should consider a lower fill factor with PAD_INDEX on for indexes on hot tables with high latch contention. Consider a lower fill factor only if there is need to improve the performance and if excessive latch contention has been observed. Lab testing has shown good results using a fill factor of 50% for hot tables.
Page latch contention can be identified by examining the “SQL Server: Wait Statistics – Page Latch waits” performance counter and querying the dynamic management view sys.dm_os_wait_stats using this query:
SELECT * FROM sys.dm_os_wait_stats
WHERE wait_type LIKE 'PAGELATCH%'
To identify which tables and which pages experience latch contention, you can use the following queries:
SELECT *
FROM sys.dm_db_index_operational_stats (DB_ID('T24'), NULL, NULL, NULL)
ORDER BY [page_latch_wait_in_ms] DESC, tree_page_latch_wait_in_ms DESC
and
SELECT * FROM sys.dm_os_waiting_tasks
WHERE wait_type LIKE 'PAGELATCH%'
 
For more information on the fill-factor option for indexes, see the article Fill Factor.

7.     Optimizing T24 XQueries – Promote Key Attributes/Elements to Relational Columns

To improve query performance, start by identifying slow-running queries. The following query selects the top 50 SQL Server statements ordered by the total CPU time (i.e., total amount of CPU time, in microseconds, for all executions of each statement):
SELECT TOP 50 
   SUM(query_stats.total_worker_time) AS "total CPU time",
SUM(query_stats.total_worker_time)/SUM(query_stats.execution_count) AS "avg CPU Time",
SUM(query_stats.execution_count) AS "executes",
SUM(query_stats.total_logical_reads) AS "total logical reads",
SUM(query_stats.total_logical_reads)/SUM(query_stats.execution_count) AS "avg logical reads",
SUM(query_Stats.total_logical_writes) AS "total logical writes",
SUM(query_Stats.total_logical_writes)/SUM(query_stats.execution_count) AS  "avg logical writes",
   MIN(query_stats.statement_text) AS "statement text"
FROM
    (SELECT QS.*,
    SUBSTRING(ST.text, (QS.statement_start_offset/2) + 1,
    ((CASE statement_end_offset
        WHEN -1 THEN DATALENGTH(ST.text)
        ELSE QS.statement_end_offset END
            - QS.statement_start_offset)/2) + 1) AS statement_text
     FROM sys.dm_exec_query_stats AS QS
     CROSS APPLY sys.dm_exec_sql_text(QS.sql_handle) as ST) AS query_stats
GROUP BY query_stats.query_hash
ORDER BY 1 DESC
For T24, we have identified some of the XQueries on the list of the top list. An example of T24 XQuery is:
SELECT t.RECID,t.XMLRECORD
FROM
F_HOLD_CONTROL t
WHERE
t.XMLRECORD.exist(N'/row/c2[.="NEW.LOAN.REPORT"]') = 1

For XQuery like this, we can use scalar promotion to reduce the query runtime. A single-value field (or even a specific value of a multi-valued field) that is part of the XMLRECORD can be “promoted” as computed column of the table and be used in relational search conditions. Further, a relational index can be created on the computed column to improve the query performance. The detailed steps to promote a single-value XML field are as follows:

 

1.)    Create a persisted computed column for the specific field.
Create a user-defined function that evaluates the value of the field. The return value of the function should be a single scalar value. Using this function, the computed column should be added to the table and persisted.


 
-- scalar promotion of single valued field
CREATE
FUNCTION udf_HOLD_CONTROL_C2(@xmlrecord XML)
RETURNS nvarchar(35)
WITH SCHEMABINDING
BEGIN
      RETURN @xmlrecord.value('(/row/c2/text())[1]', 'nvarchar(35)')
END
 
ALTER TABLE F_HOLD_CONTROL
ADD C2 AS dbo.udf_HOLD_CONTROL_C2(XMLRECORD) PERSISTED

2.)     Create non-clustered index on the computed column.
After creating the persisted computed column, create an index for this column:

-- example 1
CREATE
INDEX ix_HOLD_CONTROL_C2 ON F_HOLD_CONTROL(C2)


Verify optimizations

Verify that the changes are successful and measure the impact of the optimizations.

·         For scalar promotion (promoted and indexed fields):

ü  Verify the query translation.

Without scalar promotion, T24 uses a query syntax such as:

SELECT t.RECID,t.XMLRECORD
FROM
F_HOLD_CONTROL t
WHERE
t.XMLRECORD.exist(N'/row/c2[.="NEW.LOAN.REPORT"]') = 1

The execution of this query usually uses a table scan to retrieve the results.

After promoting the field “c2”, query should become:

SELECT t.RECID,t.XMLRECORD
FROM F_HOLD_CONTROL t
WHERE t.c2 = 'NEW.LOAN.REPORT'

In this case, index lookup on ix_HOLD_CONTROL_C2 is used.

ü  Prove that the index is used by reproducing the query and verifying the actual execution plan. You can run the query in SQL Server Management Studio and activate the icon “Include Actual Execution Plan” on the SQL Editor toolbar.

Alternatively, you can use the SET STATISTICS PROFILE ON statement to display execution plan information.

ü  Verify the performance of the query has improved.

ü  After using the application for a period of time (e.g., couple of hours or days), use the sys.dm_db_index_usage_stats dynamic management view to verify the index usage. Consider the ratio between index reads and index writes, keeping in mind that an index usually improves the performance for read operations but slows down modifications (i.e., inserts, updates, deletes) at the same time.

ü  Consider the number of promoted columns and indexes per table. Too many indexes may degrade the overall performance. As a general rule, you should avoid creating more than seven indexes on a table for T24.

ü  Do not create XML indexes on T24 XMLRECORD fields. The impact on transaction latency is too high, and the benefit in query performance is usually not significant.

 

Reference

Microsoft and Temenos Prove Scale of Support for the Largest Retail Banking Operations
 
Best Practices for Running TEMENOS T24 on Microsoft SQL Server and Windows Server