SQL Server Customer Advisory Team - SQL Server Best Practices

Enabling SQL Server customers to navigate the most challenging frontiers of large scale data management.

Technical Notes

Moving Large Amounts of Data Between Oracle and SQL Server: Findings and Observations

Author: Nicholas Dritsas
Contributers: Hae Jin Park
Technical Reviewers: Denny Lee, Burzin Patel

 

Case

A large online gaming company tested a number of methods to quickly and efficiently move data between an Oracle 9i/UNIX system and SQL Server 2005. Their goal was to move 200‑GB of daily user activity data within two hours. The data was to move without the use of transformations from Oracle’s OLTP system to the SQL Server Operational Data Store (ODS) database. An ODS database is used in data warehouse architecture to hold an exact and intact copy of the source data, representing the only true data and history. The ODS data would be used later by SSIS packages to transform and load data into a staging database, a DW database, and an OLAP cube for reporting purposes.

Note   This Technical Note does not describe a suggested method to fully migrate data from Oracle to SQL Server in all cases. Depending on the application, third-party tools and wizards may be a better fit for your migration.

Process

The company started by using a 64-bit server with 32 and 64 cores, testing both x64 and ia64. They installed SQL Server, Analysis Services and SSIS on that server. Since it was a 64-bit platform, the Microsoft OLE DB Provider for Oracle was ruled out because it does not yet support 64-bit environments. The project team decided to use Oracle’s 10.2.0.2.2 version provider. Here are the initial tests and findings:

Solution 1:  The first solution they tried was using the SQL Server Integration Services (SSIS) OLE DB Source task to move the Oracle 200-GB data into SQL Server. However, the speed was very slow and they could not meet their goal of two hours so they stopped the test. In this scenario, they used the 64-bit server and the 64-bit provider from Oracle and the projected time of completion was about 18 hours.

Solution 2: The next solution was to use a separate 32-bit server (8 cores, 20-GB RAM) for SSIS only and 32-bit Oracle’s OLEDB Provider for Oracle. They downloaded the Oracle data into a binary file (using the network drive on the 64-bit server) by using the SSIS Raw File destination task. From there they uploaded the data into SQL Server staging tables, and then converted it into SQL ODS tables by using the SSIS Raw File source task.

Result: 2 hours 47 min elapsed.

 

Issue : Large staging file and staging DB (not ODS) size

Even though the size of the Oracle data  was 200 GB, the binary file size was over 600 GB. Then, the 600 GB of data had to be uploaded into SQL Server tables without data conversion. After that, it was necessary to convert it into SQL Server target ODS table format. The process required at least 1.4 terabytes of additional disk space for the binary file and staging tables.

Note    Less overall time would have been required if they had used Oracle SQL*Plus commands to spool the data into a file. That way, they would not have had to use the slow provider to extract into a raw file. They did not have time to test this scenario but it is worth mentioning.  Here is a helpful link: http://www.orafaq.com/faq/is_there_a_sql_unloader_to_download_data_to_a_flat_file.

Solution 3: Next they tried using DataDirect Technologies’ Oracle provider with wire protocol on the x64 server (it seems DataDirect does not have an ia64 provider). The achieved speed results were very impressive. In a comparison test, they loaded 10,000 records from Oracle to SQL Server. In that test, the DataDirect provider took 2 seconds versus 22 seconds when using the Oracle provider. However, the overall test failed because of conversion errors.

Issue:  They tried to retrieve the Oracle data by using the SSIS OLE component with DataDirect’s provider (x64) but there was a code page conversion error for the SQL Server code page CP949 (Korean character set).  They tried to use Unicode conversion by using an output column (in SSIS), but this did not work either.  They plan to open a support case with DataDirect. It seems there are some special Korean characters that do not get mapped properly in CP949 and the driver fails.

Solution 4: The company finally decided to use  the SSIS OLE DB source and target components to move data from Oracle to SQL Server directly through an 8‑core 32‑bit server using Oracle’s32‑bit OLEDB provider (ver. 10.2.0.2.2) and Korean_90_Bin collation (with the SQL Server ODS database). When Korean_90_bin is used, they do not have to convert Korean data explicitly in Oracle, thus reducing response time.

Result: The response time was 2 hr 33m 32 sec (without the need of an additional 1.4 terabytes of disk space).

Findings

·         Oracle OLEDB Provider performance appears to be better optimized on 32‑bit compared to x64 or IA64. (Oracle driver speed: x86 > x64 > IA64). So, the company had to use 32‑bit middle server to meet the target time.

·         It seems that the 32-bit middle server's spec is important to the data move time. Based on this test result, if they used a 32-bit server with more than 8 cores, it is possible they could reduce the time even more. But, based on the tests, when they went from 8 cores to 16 cores, the difference was small. The total time went from 2h33min to 2h11min.

·         To achieve even better performance, they used the Fast Load option under the Data Access mode of the OLE DB Destination task. They decided not to use the SQL Server destination, even though it is supposed to be faster, because it requires that SSIS and SQL Server be on the same server. By using the OLE DB Destination, they have the option to deploy the packages on a separate SSIS server.·         Regarding the Oracle to SQL ODS step (using a direct load via OLEDB and an x86 server), they found that performance on a 16 dual-core x64 is better than on a 32 dual-core x64. (The gap is 10min.)  The x64 server hosts SQL Server and the ODS database.Detailed Test Results

Following are the test results on an HP superdome IA64 with 16- and 32-way plus a Unisys ES7000 16‑way and 32-way x64 based on EMC and Hitachi storage. The SQL Database DW size is 47 GB and the Analysis Services database with 20 cubes is 19 GB. It appears that a Unisys x64 32-way EMC may be the right option for their requirements.

Unisys x64
Case Step Server Elapsed Time CPU

 

DW Middle

 

case1 Oracle ODS -> Staging ODS (With SSISOLE) 32way - HDS 2:53:34 AVG 7%
MAX 11%
AVG 81%
MAX 99%

 

16way - HDS 2:52:02 AVG 13%
MAX 27%
AVG 82%
MAX 99%

 

32way - EMC 2:43:35 AVG 7%
MAX 11%
AVG 86%
MAX 100%

 

16way - EMC 2:33:32 AVG 15%
MAX 27%
AVG 92%
MAX 100%

 

case2 ETL (Fact, Mart) 32way - HDS 1:35:30 AVG 23%
MAX 44%
 

 

16way - HDS 1:42:57 AVG 36%
MAX 76%
 

 

32way - EMC 1:34:13 AVG 28%
MAX 50%
 

 

16way - EMC 1:40:34 AVG 37%
MAX 74%
 

 

case3 Cube build 32way - HDS 0:53:40 AVG 13%
MAX 42%
 

 

16way - HDS 0:56:03 AVG 27%
MAX 72%
 

 

32way - EMC 0:52:29 AVG 16%
MAX 44%
 

 

16way - EMC 0:55:13 AVG 27%
MAX 72%
 

 

 
HP IA64

 

Case Step Server Elapsed Time CPU

 

DW Middle

 

case1 Oracle ODS -> Staging ODS (with Raw files) 32way - HDS 2:53:00 AVG : 13%
MAX : 44%
AVG : 71%
MAX : 99%

 

16way - HDS 2:47:07 AVG : 25%
MAX : 87%
AVG : 75%
MAX : 99%

 

32way - EMC 2:51:58 AVG : 16%
MAX : 42%
AVG : 77%
MAX : 100%

 

16way - EMC 2:52:38 AVG : 27%
MAX : 85%
AVG : 73%
MAX : 99%

 

case2 ETL (Fact, Mart) 32way - HDS 1:52:14 AVG : 18%
MAX : 45%
 

 

16way - HDS 2:05:24 AVG : 30%
MAX : 82%
 

 

32way - EMC 1:55:19 AVG : 20%
MAX : 50%
 

 

16way - EMC 1:57:37 AVG : 34%
MAX : 81%
 

 

case3 Cube build 32way - HDS 1:03:07 AVG : 13%
MAX : 41%

 

16way - HDS 1:00:29 AVG : 25%
MAX : 84%
 

 

32way - EMC 1:01:01 AVG : 13%
MAX : 38%
 

 

16way - EMC 59:01 AVG : 25%
MAX : 96%
 

 

 Conclusion

Based on these results for the specific requirements of this company it appears the best solution is Solution 4. The company plans to get the following servers to maximize price/performance ratio:

·         A 4 dual-core 32‑bit server with 20‑GB of RAM using Oracle’s latest 32‑bit OLE DB Provider as a dedicated SSIS server

·         An 8 dual-core x64 server with 32‑GB of RAM and EMC SAN to host SQL Server RBMS and Analysis Services

For more information about SSIS connectivity options, see Connectivity and SQL Server 2005 Integration Services.

Comments

 

Moving Large Amounts of Data Between Oracle and SQL Server: Findings… | Online Gaming said:

Pingback from  Moving Large Amounts of Data Between Oracle and SQL Server: Findings… |  Online Gaming

August 9, 2008 11:31 AM
 

All things SQL Server Related.... said:

Oh no...did I just say the O word, oh well I have to live in real world and accept that there are many

December 1, 2008 5:04 AM

About Nicholas Dritsas

Nicholas Dritsas has been working with SQL Server and Business Intelligence for the past 11+ years as part of Microsoft Consulting Services and SQL Server Product Group. In his time at MCS, he acted as a Lead Architect to a number of Fortune 500 SQL Server and BI implementations, including Hilton Hotels, Nestle USA and Walt Disney. Since 2006, he joined the SQL Server’s Customer Advisory Team working with the largest and most complex SQL Server and BI projects in Asia. Nicholas has presented numerous times in internal and external events and conferences around BI subjects and best practices.
sql, server, best practices, whitepapers, analysis services, data mining, olap, datawarehouse, datawarehousing, availability, clustering, capacity, collation, data types, data warehouse, database, design, index, mirroring, optimization, partitions, performance, precision, processing, querying, scalability, security, reporting services, integration services