Recently I was reviewing a SSIS package written by one of my team members. The package was pretty simple, which connected to one SQL database and it had two tasks:
- Execute SQL Task: Which had few SQL statements to set some data which would be used in the next task.
- Data flow task: Selecting data from a table and load it into another table.
Interestingly, I found, that in “Execute SQL Task” there were few SQL statements and it did not have transaction control i.e., no BEGIN TRAN or COMMIT/ROLLBACK TRAN. Thus, in this blog, I wanted to highlight the impact of writing SQL in the task without transaction control. Although this applies to the SQL code, written in stored procedures, query in SSMS, SSIS task or anywhere SQL code can be written, its impact may go un-noticed especially in SSIS task. For details, read on….
Let’s say, we have a batch of multiple SQL statements without transaction control, which are executed from the Execute SQL task. Now we are going to analyze the impact on data quality if this batch of SQL statements fails at some palce. Here we face the challenge. It is difficult to find out, in this case, what is committed and what is rolled back. This is because SQL Server treats compile time errors and run time errors differently. What I mean by this:
1. SQL Server terminates the batch as soon as Compile time error occurs.
2. SQL Server may or may not terminate the batch when a run time error occurs. Read “may or may not”.
Let’s see this in action…..
Create a test table as follows:
CREATE TABLE Test_ExecSQLTask ( ID INT);
Let’s execute the following statements:
insert into Test_ExecSQLTask values ( 10 );
insert into Test_ExecSQLTask values ( 999999999999999 );
insert into Test_ExecSQLTask values ( 20 );
insert into Test_ExecSQLTask values ( ‘Thirty’ );
insert into Test_ExecSQLTask values ( 40 );
Let’s see what is in table:
select * from Test_ExecSQLTask
Only 1st and 3rd statements go through. When 2nd failed, 3rd was executed. When 4th failed, why the 5th statement did not execute? Why did this happen?
This happened due to different things happening in the background:
The default transaction mode for the Database Engine is AutoCommit Transactions. In this mode, each SQL statement is executed in its own transaction. So for example if a batch has four insert statements, each statement would be executed and committed. What this means is if 3rd statement fails, first two are already committed by then and the rows added would show up in the table.
- insert into Test_ExecSQLTask values ( 999999999999999 );
This failed for the value too big to be stored in INT data type. This is a run time error but it did NOT STOP the batch from executing.
- insert into Test_ExecSQLTask values ( ‘Thirty’ );
This failed for another reason and though this is also a run time error but it did STOP the batch from executing.
Thus, as a good practice, while using Execute SQL Task in SSIS, we should carefully:
- We shall manage the transaction explicitly in the Execute SQL Task. If not, then it is assumed that in case of package failing in the middle of execution, there is no adverse impact on data. Or at least we know the impact.
- Examine the impact of failure of Execute SQL task esp. with no transction control in a big scheme of complete package. When the packages grow in complexities, mulitple developers work on same package over a period of time, we may lose sight of problems.
If a stored procedure is called from “Execute SQL Task”, make sure the stored procedure is managing the transaction. So if transaction is not managed in stored procedure we shall wrap the call in a transaction in the SSIS task. This is very important as the same stored procedure may be called by other modules/apps.
Please let me know your feedback …