SQL Server - How to Remove Control Characters and Junk from Column

Asked By Russell Mangel on 09-Mar-09 07:57 AM
Hi,

I'm using SQL Server 2005 Standard with SP3.

I have some VARCHAR columns that have some control characters in them
0x0D,0x0A,0x09, (CR, LF, Tab). I would like to UPDATE the
column by replacing them with nothing ''.

Question:
======
How would I make a query check for all control characters,
from 0x00-0x1F or (0 - 31) Decimal, and replace with nothing ''?

The following query works but it only does the 3 previously mentioned
control characters.
Maybe you have a shorter, or smarter query, for removing control characters?

UPDATE
I2DE
SET DesNM = REPLACE(REPLACE(REPLACE(DesNM, CHAR(0x0D), ''), CHAR(0x0A), ''),
CHAR(0x09), '')
FROM I2DE
WHERE (CHARINDEX(CONVERT(char(1),0x09), DesNM) > 0) OR
(CHARINDEX(CONVERT(char(1),0x0D), DesNM) > 0) OR
(CHARINDEX(CONVERT(char(1),0x0A), DesNM) > 0)

Thanks

Russell Mangel
Las Vegas, NV




Tom Moreau replied on 09-Mar-09 08:37 AM
Consider writing a CLR-based, scalar User-Defined Function (UDF) that strips
out the control characters and returns an appropriate string.  Then use that
in the SET clause of your UPDATE statement.

--
Tom

----------------------------------------------------
Thomas A. Moreau, BSc, PhD, MCSE, MCDBA, MCITP, MCTS
SQL Server MVP
Toronto, ON   Canada
https://mvp.support.microsoft.com/profile/Tom.Moreau


Hi,

I'm using SQL Server 2005 Standard with SP3.

I have some VARCHAR columns that have some control characters in them
0x0D,0x0A,0x09, (CR, LF, Tab). I would like to UPDATE the
column by replacing them with nothing ''.

Question:
======
How would I make a query check for all control characters,
from 0x00-0x1F or (0 - 31) Decimal, and replace with nothing ''?

The following query works but it only does the 3 previously mentioned
control characters.
Maybe you have a shorter, or smarter query, for removing control characters?

UPDATE
I2DE
SET DesNM = REPLACE(REPLACE(REPLACE(DesNM, CHAR(0x0D), ''), CHAR(0x0A), ''),
CHAR(0x09), '')
FROM I2DE
WHERE (CHARINDEX(CONVERT(char(1),0x09), DesNM) > 0) OR
(CHARINDEX(CONVERT(char(1),0x0D), DesNM) > 0) OR
(CHARINDEX(CONVERT(char(1),0x0A), DesNM) > 0)

Thanks

Russell Mangel
Las Vegas, NV
Plamen Ratchev replied on 09-Mar-09 08:45 AM
Here is one method. Note this assumes the key column in the table is
named 'keycol', just replace with the real key column.

WITH Num1 (n) AS (SELECT 1 UNION ALL SELECT 1),
Num2 (n) AS (SELECT 1 FROM Num1 AS X, Num1 AS Y),
Num3 (n) AS (SELECT 1 FROM Num2 AS X, Num2 AS Y),
Num4 (n) AS (SELECT 1 FROM Num3 AS X, Num3 AS Y),
Nums (n) AS (SELECT ROW_NUMBER() OVER(ORDER BY n) FROM Num4),
UpdateCTE
AS
(SELECT keycol, DesNM,
(SELECT CASE WHEN ASCII(SUBSTRING(DesNM, n, 1))
BETWEEN 0x00 AND 0x1F
THEN ''
ELSE SUBSTRING(DesNM, n, 1)
END + ''
FROM I2DE AS B
JOIN Nums
ON n <= LEN(DesNM)
WHERE B.keycol = A.keycol
FOR XML PATH('')) AS DesNM_clean
FROM I2DE AS A)
UPDATE UpdateCTE
SET DesNM = DesNM_clean;

--
Plamen Ratchev
http://www.SQLStudio.com
Russell Mangel replied on 09-Mar-09 09:49 AM
Yeah,
this kind of thing is probably best done in C#. It's nasty to do in TSQL.

The query will just be used to clean up the columns after I fix the real
problem, and that is validating data before it gets to the database.
It's a ten-year-old schema / program, which I didn't write, you know the
story.

Thanks for reminding me of the CLR.

Russ.
Russell Mangel replied on 09-Mar-09 09:54 AM
Very interesting query.

it is been awhile since I have seen a query that I did not understand.

I will have to study up on what you have done here. You got me scratching my
head.

Thanks

Russ
Plamen Ratchev replied on 09-Mar-09 10:04 AM
There are a couple parts to it, one is generating a table with numbers
on the fly (using the NumX CTEs), then slicing the column to individual
characters, checking each if it is in the range then skipping it,
finally concatenating back to a single value using FOR XML PATH.

--
Plamen Ratchev
http://www.SQLStudio.com
Julie Collins replied to Plamen Ratchev on 29-Mar-10 02:23 PM
That solved a problem for me Plamen. Thank you.