SQL Server - How to skip a bad-data row in the input CSV file

Asked By John Kotuby on 11-Mar-09 04:02 PM
Hi all,
I have previously created some DTS packages, in fact a rather large complex
project, in SQL 2000. I am trying to create a package in SSIS 2005 and am
finding it to be more difficult than the older DTS designer.

I have what should be a simple problem to overcome but cannot find the
solution in SSIS. Importing a comma delimited CSV flat file into a SQL table
is failing because a few rows do not have the correct number of Row
delimiters.

I would like to (for now) simply skip the bad rows and let the package
continue. I have changed the property of "Fail Package on Error" to "NO". I
have set the allowable errors to 50. But still the package fails on the
first bad source row.

I don't have the time to get a "formal" education in SSIS, as that would
cover many more topics than I actually need to know to get my current
project completed.

Can anyone tell me the simple way to allow the package to skip source rows
if they have bad data that cause an error?

I tried to create an OnError event handler but I didn't get very far with
that.

I suppose I can always re-install SQL Server 200 on my development machine
just to get the job done...but I hope that somebody can help me.

Thanks...




Todd replied on 11-Mar-09 05:01 PM
John:
Try this:
Add a Variable of Type Int32 to your Package. You will use it later.
Add a Row Count transform to the Data Flow.
Drag the RED arrow coming from the Source adapter to the Row Count. You will
get a dialog box pop up. Set every option to "Redirect Row"
In the Row Count, set it to the Variable (last property on the page)

see if that works.
--
Todd C

[If this response was helpful, please indicate by clicking the appropriate
answer at the bottom]
John Kotuby replied on 11-Mar-09 05:18 PM
I will try that Todd and let you know of my results
Thanks...
John Kotuby replied on 12-Mar-09 03:39 PM
Todd,
No luck with that attempt. I'm pretty sure I did everything you mentioned.
I tried redirecting failed rows to a Row Count transform which redirects the
output to a variable of INT32.
The red line from the mdc_msc.csv source is even labelled "Flat File Source
Error Output".

However, the package is stall halted with a Fatal error.
It is clear to me after looking at the CSV file in a text editor that on row
1032 (of 543,000) there is a
missing comma delimeter. Of course I could fix that line only to discover
another 20 or so bad records.

This package must be run daily to update a working database. It must run
unattended and merely keep track of
failed rows.

I get the "input" files from a 3rd party so cannot fix whatever might be
causing the error in the CSV file
used for input.

Below I have typed the error messages.

Error: The column delimeter for column "Column 13" was not found.
Error: An error occurred while processing file "mdc_msc.csv" on data row
2003.
Error: SSIS Error Code DTS_E_PRIMEOUTPUTFAILED. The PrimeOutput method on
component "Source -
mdc_msc_csv"(1) returned error code 0xC0202092. The componenet returned a
failure code when the pipeline
engine called PrimeOutput(). The meaning of the failure code is defined by
the component, but the error is
fatal and the pipeline stopped executing. There may be error messages posted
before this with more
information about the failure.
Error: SSIS Error code DTS_E_THREADFAILED. Thread "SourceThread0" has exited
with error code 0xC0047038.

Thanks for the input. I wish I could present this to a Microsoft SSIS
expert.
Hrock replied on 12-Mar-09 04:41 PM
I have the same issue in fact i changed from using the bulk insert task to
using an execute sql tasks that calls a stored proc that does the bulk
inserts and in the stored proc i speficy maxerros= as welll as errorfile=.

bulk insert dumps the bad rows into the datafile and the bulk insert
succeeds however the ouput from bulk insert list all the errors that it
encountered and dumped into the error file and ssis doesnt know how to handle
this information and it hangs on me.

So. if anyone can figure out how to keep ssis running after the bulk insert
errors that would be of great help! thanks!
John Kotuby replied on 12-Mar-09 05:30 PM
Hrocks,

Thanks for recommending a workaround.
I will try what you have suggested and hope we both get an answer.
I just read about the fact that .NET 3.5 SP1 can break some existing
packages causing failure and generating eror messages very similar to the
ones I receive.

Did you have to stet up a "Shared Server" for the flat-file input in order
to get at it with T-SQL code?
Hrock replied on 12-Mar-09 07:54 PM
Just a connection manager that points to the server that holds my stored
proc. in the execute sql task i specified the connection and in the sql
statement is where i exec the stored proc and pass in the params for filename
etc.

I keep all my file names, table names etc in a table in the db and i just
load those into a result set using an execute sql task and later map the
variables in whatever task i am running.
John Kotuby replied on 16-Mar-09 08:56 AM
Thanks so much,
I got the Bulk Insert to work once I had defined a Format File for the CSV
that accounted for different Field Terminators due to the fact that the
character data in the CSV is delimited by quotes. Thus for an integer field
followed by a character field the terminator would be ' ," ' (comma-quote),
2 consecutive character fields are defined ' "," ' (quote-comma-quote) and
because the last field in the row is character ' "\r\n '  (quote-carriage
return-line feed).

Unfortunately when I use a Format file I don't get an error output file. But
at least the import runs to completion.