This is a compilation of my friend Rekha Ram Burra's experiences while testing ETL applications. I just borrowed this information from him to share publicly on my Blog. Thanks a lot to Rekharam being kind enough to allow me to reuse this content for publishing in my blog.
This is a compilation of the test activities performed while testing an ETL application. Though ETL stands for Extract, Transform and Load, as the final target of the test team is to validate the data in destination database, Test team should only verify data between source and destination. It is not compulsory that Test team should verify how data is extracted and transformed. We can consider this as Black Box Testing.
Assumptions
There are four different databases (on same server or different servers) in place.
There can be changes to the above architecture. E.g. both the delta and staging databases can be combined to one single database still with different tables for delta and staging. The general architecture would be to maintain Staging, Factory and Warehouse databases separately (if not on different servers) to reduce load on the warehouse.
Test approach
Verify the data type of the columns between source and destination
Between Source and Delta
Test should make sure that the data type of each column of each table is as per the Functional Specification. If no specific details are mentioned in Functional Specification (FS) about the table schema, then test should make sure on the below:
Between Source and Factory
Between Factory and Warehouse
The schema of tables will be exactly same between Factory and Warehouse. The below query can be used to check the table schema. Please note, the below are linked queries assuming that the Factory and Warehouse databases are on different servers. You may have to add the servers as linked servers using SP_ADDLINKEDSERVER/ remove the linked servers if the databases are on same server.
DECLARE @FactColCount int, @RepColCount int, @MatchCount int
SELECT @FactColCount = COUNT(*) FROM <FactoryServerName>.<FactoryDatabaseName>.dbo.sysobjects so INNER JOIN <FactoryServerName>.<FactoryDatabaseName>.dbo.syscolumns sc ON so.id = sc.id WHERE so.type = 'U' AND so.name = <Factory Table name to check>
SELECT @RepColCount = COUNT(*) FROM <WarehouseServername>.<WarehouseDatabaseName>.dbo.Sysobjects so INNER JOIN <WarehouseServername>.<WarehouseDatabaseName>.dbo.syscolumns sc ON so.id = sc.id
WHERE so.type = 'U' AND so.name = <Warehouse Table name to check>
SELECT @MatchCount = COUNT(*) FROM <FactoryServerName>.<FactoryDatabaseName>.dbo.sysobjects so1, <FactoryServerName>.<FactoryDatabaseName>.dbo.syscolumns sc1, <WarehouseServername>.<WarehouseDatabaseName>.dbo.sysobjects so2, <WarehouseServername>.<WarehouseDatabaseName>.dbo.syscolumns sc2 WHERE so1.type = 'U' AND so1.id = sc1.id AND so1.name = <Factory Table name to check> AND so2.type = 'U' AND so2.id = sc2.id AND so2.name = <Warehouse Table name to check> AND (sc1.name = sc2.name AND sc1.colorder = sc2.colorder AND sc1.name = sc2.name AND sc1.length = sc2.length AND sc1.isNullable = sc2.isNullable)
IF (@FactColCount <> @RepColCount)
SELECT 'ERROR: The number of columns (' + convert(varchar(20), @FactColCount) + ') in Factory does not match with the number of columns (' + convert(varchar(20), @RepColCount) + ') in Warehouse'
ELSE
IF (@FactColCount <> @MatchCount)
SELECT 'ERROR: The number of columns (' + convert(varchar(20),@RepColCount) + ') in both Factory and Warehouse matches. But some of the attributes does not match'
ELSE
SELECT 'SUCCESS: The tables are identical in structure'
GO
Note: The above query uses SysObjects and SysColumns which are discouraged on SQL 2008. You may consider changing it as per your need
Verifying the data
Check the number of records in source and destination
A simple SELECT COUNT(*) FROM <Table Name> with needed WHERE condition should be enough to check whether there are same number of records on source and destination. But we cannot rely on just this query, reason: assume one record is duplicated into destination instead of another record. In this case, the number of records will be same but there is one record missing from source.
We may encounter two kinds of scenarios:
a. When the data in source and destination tables can be related using a Key
DECLARE @SrcRowCount int, @DestRowCount int
DECLARE @SrcExtraRows int, @DestExtraRows int
SELECT @SrcRowCount = COUNT(<ID Column>) FROM <Source Server>.<Source Database>.dbo.<SourceTable> gcd /*Any Other Join Conditions that are needed WHERE Conditions needed */
SELECT @DestRowCount = COUNT(*) FROM <WarehouseServername>.<WarehouseDatabaseName>.dbo.<WarehouseTableName>
SELECT @SrcExtraRows = COUNT(*) FROM <Source Server>.<Source Database>.dbo.<SourceTable> gcd /* Any Other Join Conditions that are needed WHERE Conditions needed */
AND gcd.<ID Column> NOT IN
(
select <ID Column> from <WarehouseServername>.<WarehouseDatabaseName>.dbo.<WarehouseTableName> /*Do not forget to filter the Zero Key if the table has one*/
)
SELECT @DestExtraRows = COUNT(*) FROM <WarehouseServername>.<WarehouseDatabaseName>.dbo.<WarehouseTableName> WHERE <ID Column> NOT IN
(
SELECT gcd.<ID Column> FROM <Source Server>.<Source Database>.dbo.<SourceTable> gcd /* Any Other Join Conditions that are needed WHERE Conditions needed */
)
IF (@SrcRowCount <> @DestRowCount)
PRINT 'ERROR: The number of rows in destination table does not match with the number of rows in source.'
ELSE IF (@SrcExtraRows > 0)
PRINT 'ERROR: All the records of source table have not migrated to Destination table'
ELSE IF (@DestExtraRows > 0)
PRINT 'ERROR: There are extra Ids in Destination table that are not present in the Source.'
ELSE
PRINT 'SUCCESS: The tables have same row counts and same Ids'
b. When the data in source and destination tables cannot be related using a Key (There may be cases where every time the destination is just truncated and repopulated with the source data). In this case,
Verifying Straight Map columns
If there is only one source table for a destination Warehouse table, identify all the columns that are just copied from source and stored in destination. Use the below query to verify the column values
SELECT TOP 10 'Exp StraightMapColumn1' = Source.StraightMapColumn1, 'Act StraightMapColumn1' = Destination.StraightMapColumn1, FROM <Source Server>.<Source Database>.dbo.<Source Table> Source JOIN <WarehouseServername>.<WarehouseDatabaseName>.dbo.<Destination Table> Destination ON Source.<Id Column> = Destination.<Id Column> WHERE Source.StraightMapColumn1 <> Destination.StraightMapColumn1
If there are more than one source table from which few columns are just copied into destination table, we can group the columns by source table and use the above query for each group.
Verifying Derived Columns
Simple derivations like coalesce can be directly done using the same straight map query with a few changes. If there are any complex calculations for a column, then it is always suggested to create a temporary table only with two columns an Id column that can be used to map to the source and destination tables and the actual column that needs to be tested. Following the logic given in Functional Specification (FS) we have to populate the temporary table (This is similar to what dev do but Test also should do it to validate the column values). Once the temporary table is populated, we can use the same above straight map query to compare data between the temporary table and the destination table.
Verifying Zero Key row
There will be a record (may not be in every warehouse) with the surrogate key value as 0. This record contains the default values for each column. This record must be validated with the values given in the Functional Specification (FS).
Conclusion
This document gives a basic idea of the approach that can be followed to test the finally loaded data. There may be a lot other approaches for similar kind of application testing. Once you get used to the approach, you can further refine the approach by automating the test cases to pick up the column names automatically and using Dynamic SQL. Once accustomed with the queries, you can start looking into the code for any kind of discrepancies that might occur due to data (e.g. looking into queries that might generate duplicate data etc…)
Friday, August 21, 2009
ETL Testing Process Overview
Monday, April 13, 2009
Project Vs Product Testing
Imagine a situation where you are working on testing a project and you have completed testing of the project. You got a new assignment and you are going to test a Product this time.
Now, what kind of shift you need to make in your approach, scope, areas of importance between testing a project and product is what is going to be discussed in this whitepaper.
Normally Project will be specific to single client's requirement. Everything in a project revolves around a single client, his requirements, pain points etc.
A Project can be executed using any of the SDLC models. But the most preferred models for executing a project are
If you observe Waterfall model is not the most preferred SDLC for executing a project. This is because, clients requirements will keep on changing as time passes and these changes need to be absorbed by the project team and final delivery should be made as per the changed requirements.
Below listed are the various types of testing to be given importance and should be taken into consideration when testing a Project.
Functionality Testing:
Functionality of the application is of utmost importance in case of project. Also the functionality should exactly match with the project specifications or requirements.
User Interface Testing:
User interface has to be tested for boundary, tab order, any limitations on data to be entered, punctuations, formats to be followed for data entry, ease of use, Spelling and grammar etc.
Security Testing: Now a day's more emphasis is given on security due to the various security threats a project has to deal with after it is live in production. So the application has to be tested for any existence of security loop holes, un-authorized accesses in both front-end and back-end. Some Examples of security checks are cross site scripting, SQL injection etc.
Performance and Load Testing:
These are to be given due importance while testing a project. Performance and benchmarks and expected user Load along with mix of users should be clearly identified and defined before starting the testing, Load and performance test tools are best suited for doing this kind of testing.
Concurrency Testing: Concurrency play vital role in case of financial domain related applications. Even this is important for non-financial domain applications. This testing is little difficult to perform manually. So using automation tools is a good option to do this testing. Concurrency scenarios should be clearly defined and automation tool should be configured properly for getting best results.
When we talk about Product the requirements will be a common pool of requirements gathered from several clients/end users and prioritized and frozen for a release/version.
The most preferred SDLC models for executing a product are
Below listed are the various types of testing to be taken into consideration when testing a Product besides normal functional, UI, load and performance testing.
Installation Testing:
We should test a product for proper installation and this testing focus on the proper installation of the product based on the instructions provided in the installation guide. The installation should be simple and should involve minimal human intervention for greater customer satisfaction.
Fail Over/Recovery Testing: Every product should handle failures gracefully and recover from the failures with no or minimum loss of data. The main focus in this testing is to simulate the possible failure scenario and observe how the failure is handled. Once the system is back to normal situation how the recovery is made and whether there is any data loss in the failure and recovery process.
Help/User Guide:
For any product Help document or the User Guide is heart as the product can be purchased and used by any one. And the first this user will refer to is Help or User guide for using the product or solving any problem while using the product. So these documents should be tested thoroughly for completeness, clarity. All navigations, re-directions, links should be checked for proper functioning in the Help or User Guide.
Portability Testing: In this testing we need to test whether the product works correctly on various configurations, Operating Systems, Hardware etc.
User Friendliness/Usability: In this testing we need to check how easy/difficult for the user to use the product. Can people with disabilities use this product easily? Any tools or options provided for disabled users to use the application easily without any problems?
Copyright Information and Packaging: Good packaging of a product attracts interest of more number of users. And in case of a product Copyright information is of utmost importance. The organization can land into legal problems if copyright information is not proper.
Privacy and Confidentiality: In these days a lot of emphasis given on privacy and Confidentiality of the information handled using a product. So this aspect should be given due importance when testing a product.
Parallel Testing:
This type of testing comes into picture and should be given due importance when a product developed in one technology is re-developed or migrated into a new technology. In this testing we will compare the functional output of the new product with output of the old product. We have to compare the functional scenarios and we should get the same output in both versions.
Below is a snapshot of some differences between a project and product
Project Product Requirements will be specific to a single client Requirements will be derived from a common pool of several clients/end users Iterative, Prototype and Agile SDLC models are most preferred Waterfall and Incremental SDLC model are most preferred Requirements will be dynamic and may change very frequently Requirements will be constant most of the times Project will be delivered to the client directly Product will be released in the market and can be purchased by any one Test Automation is not encouraged in case of a project where the requirements change very frequently Test Automation will help immensely in case of testing a product Projects will be planned for the immediate use with less focus on future extendibility or versions Products will be planned and will have several newer versions and covers several years to come Project durations will be shorter in comparison with products (Maintenance/ Sustenance are exceptions) Product durations will be longer generally in comparison with Projects The conclusions we can derive from the above discussion are
Project Testing
Product Testing
Project Vs Product Testing
Conclusion