SQL Server - delete only 9 of 10 rows where colX = 'abc'

Asked By Ric on 25-Mar-08 11:48 AM
tblX contains 1000 rows.   10 of the rows contain the value 'abc' in colX.  I
need to delete 9 of these 10 rows.

Delete tblX Where colX = 'abc'

This will delete all 10 rows.  How can I make it only delete 9 of the 10
rows? Or if there were 23 rows, I would need to delete 22 rows leaving only 1
distinct row.   What is the tsql to do this?

Thanks,
Rich




--CELKO-- replied on 28-Mar-08 12:31 AM
Please post DDL, so that people do not have to guess what the keys,
constraints, Declarative Referential Integrity, data types, etc. in
your schema are. What does the key for the table look like?  Do you
even have a key? etc.

If you want to learn how to ask a question on a Newsgroup, look at:
http://www.catb.org/~esr/faqs/smart-questions.html
Plamen Ratchev replied on 25-Mar-08 11:54 AM
You did not define if there are any other columns that can determine which
rows need to be deleted. Here is one generic way on SQL Server 2005:

;WITH Dups (rn)
AS
(SELECT ROW_NUMBER() OVER(
PARTITION BY colX
ORDER BY colX)
FROM tblX)
DELETE Dups
WHERE rn > 1;

HTH,

Plamen Ratchev
http://www.SQLStudio.com
dave ballantyne replied on 25-Mar-08 11:59 AM
Are you trying to clean the table of all duplicates ???
Or only where ColX='abc'

Untested but ...

Declare @DelRows integer
Select @DelRows = count(*)
from  tblX where ColX = 'abc'

Select @DelRows = @DelRows -1
if(@DelRows >0) begin
set rowcount @DelRows
delete from tblX where ColX = 'abc'
set rowcount 0
end


will do what you ask but may not be 'best'

Dave
Ric replied on 25-Mar-08 12:10 PM
Sql Server 2000

Create Table tblX(colX varchar(3))
Insert Into tblX
Select 'abc' union all
select 'abc' union all
Select 'abc' union all
select 'abc' union all
Select 'abc' union all
select 'abc' union all
Select 'abc' union all
select 'abc' union all
Select 'abc' union all
Select 'abc' union all
Select 'def' union all
Select 'def' union all
Select 'def' union all
Select 'def' union all
Select 'def' union all
Select 'def' union all
Select 'def' union all
Select 'def' union all
Select 'def' union all
Select 'def' union all
Select 'def' union all
Select 'def'

I need to delete all but one of the rows where colX contains 'abc'.
Aaron Bertrand [SQL Server MVP] replied on 25-Mar-08 12:18 PM
Your question is still not specific enough.  It could be one of at least
three options:

a) regardless of what is in any other row, what you want to end up with is a
table with exactly one row, containing 'abc'
b) you want to delete all the 'abc' values except one, leaving all the 'def'
duplicates intact
c) you want to delete all the 'abc' values except one, all the 'def' values
except one, etc. etc.

If it is a) then here is a silly suggestion:

DELETE tblX;
INSERT tblX SELECT 'abc';

If it is b) then another silly suggestion:

DELETE tblX WHERE colX = 'abc';
INSERT tblX SELECT 'abc';

If it is c) then you can do b) in a loop, repeating for each value of colX
that you want to remove duplicates for.
Plamen Ratchev replied on 25-Mar-08 12:41 PM
Assuming you mean to clean duplicates for all values (not only 'abc'), you
can select the distinct value into a temp table, clear the original table,
and then reinsert the clean values back:

SELECT DISTINCT colX
INTO #Temp
FROM tblX

DELETE tblX

INSERT INTO tblX (colX)
SELECT colX
FROM #Temp

If you need to remove duplicates only for the 'abc' value then add WHERE
colX = 'abc' to the first two queries.

HTH,

Plamen Ratchev
http://www.SQLStudio.com
Ric replied on 25-Mar-08 12:47 PM
Thank you all for your replies.  Sorry about the ambiguity of the question.
Yes, I wan to remove duplicates.  It is actually a simple table.  I just
wanted to make sure that there wasn't some sophisticated tsql I was not aware
of before I did the spaghetti code thing (looping).  But it looks like that
is how to do it.


Actually (and sorry for being too lazy to include it) my table has an
Identity column, and I was actually thinking that maybe there was a join
thing I could do  where I could select top N -1 from tblx as t2 Join tblX as
t1 on t2.Ident = t1.Ident

pseudo code here

Declare @A int
Select @A =  Count(*) From tblX Where colX = 'abc'
Delete from tblx from tblx t1 join (Select top @A - 1 t2 from tblx) t2 on
t1.Ident = t2.Ident

But it looks like I would still have to loop for all the different values of
colX.  It also looks like I would have to using dynamic sql for the Select
top @A - 1 part.  Is there any potential for this method?
--CELKO-- replied on 28-Mar-08 12:31 AM
This is not a table by definition; there is no key in it.  What you
should have for this kind of thing is:

CREATE TABLE Foobar
(silly_string CHAR(3) NOT NULL PRIMARY KEY,
occurrence_cnt INTEGER DEFAULT 1 NOT NULL
CHECK (occurrence_cnt > 0));

INSERT INTO Foobar
VALUES ('abc', 10), ('def', 13);

Now you update the table:
UPDATE Foobar
SET occurrence_cnt = 1
WHERE silly_string = 'abc';

You are missing the whole idea of RDBMS and are thinking in very
physical terms.
Aaron Bertrand [SQL Server MVP] replied on 25-Mar-08 12:50 PM
Why not just

SELECT DISTINCT colX INTO #foo FROM tblX;
TRUNCATE TABLE tblX; -- resets identity too
INSERT tblX(colX) SELECT colX FROM #foo;
DROP TABLE #foo;


Now, you probably want to add a unique constraint to colX so that you don't
have to repeat this exercise tomorrow, the next day, etc.

A
Aaron Bertrand [SQL Server MVP] replied on 25-Mar-08 12:53 PM
Hey Joe, did you try this on SQL Server 2000?  This type of construct is
supported in SQL Server 2008, but the op stated that he is not using SQL
Server 2008.
Plamen Ratchev replied on 25-Mar-08 01:03 PM
You can remove the duplicates with a single query based on the key column:

DELETE tblX
WHERE EXISTS
(SELECT *
FROM tblX AS A
WHERE A.colX = tblX.colX
AND A.ident < tblX.ident)

HTH,

Plamen Ratchev
http://www.SQLStudio.com
Aaron Bertrand [SQL Server MVP] replied on 25-Mar-08 01:05 PM
Good one.  And if you want to keep the minimum vs. maximum ident value for
each unique value of colX then you can swap < and > in the last line...
Tony Rogerson replied on 25-Mar-08 01:11 PM
And just how on earth is occurance_cnt ever going to be > 1?

How on earth does your ddl prevent duplicates from getting in?


You need to learn the ability to do rudementary testing as well as adjust
your ignorant, arrogant and condecending attitude on this forum.

Oh - try posting stuff that works on MICROSOFT SQL SERVER might help as
well.

--
Tony Rogerson, SQL Server MVP
http://sqlblogcasts.com/blogs/tonyrogerson
[Ramblings from the field from a SQL consultant]
http://sqlserverfaq.com
[UK SQL User Community]
Plamen Ratchev replied on 25-Mar-08 01:15 PM
Works fine but this can be very slow on a large table with many duplicates.
Then a better approach would be to SELECT INTO with MIN or MAX value for the
key to another table.

Plamen Ratchev
http://www.SQLStudio.com
Aaron Bertrand [SQL Server MVP] replied on 25-Mar-08 01:20 PM
Right, that's what I suggested.  And since the use of IDENTITY doesn't
really imply that those values mean anything, I assumed it was ok to reset
it and start over (especially since the op didn't seem to care which
identity value was retained).
Ric replied on 25-Mar-08 01:36 PM
Thank you all again for these great replies.  It looks like what I had in
mind was something along this suggestion

DELETE tblX
WHERE EXISTS
(SELECT *
FROM tblX AS A
WHERE A.colX = tblX.colX
AND A.ident < tblX.ident)


Actually, I am creating a whole new system for my place from scracth.  So
the table (tables) is/are fairly new and don't have too much data.  Of
course, I will be adding all sorts of constraints to prevent these
duplicates, but in the initial development phase I had to allow for dups.
While I am at it -- I was trying to retrieve @@Identity from a
dataAdapter.InsertCommand using text rather than a stored procedure:

da.InsertCommand.CommandType  = CommandType.Text
da.InsertCommand.CommandText = "Insert Into tblCompanies(CoID, CoName)
Select @CoID, @CoName; Select @@Identity"
...
i = da.Update(ds, "tblx")

This did not capture @@Identity.   It seems to only work if I use a
SelectCommand

i = CInt(da.SelectCommand.ExecuteScalar)

So I went with the Store Proc and used an output param to capture
@@Identity.  Anyone know if it is possible to retrieve @@Identity using
da.InsertCommand using CommandType.Text instead of
CommandType.StoredProcedure?  The goal is to minimize how many SPs I add to
this new DB.  The old one had hundreds (maybe more than that) -- difficult to
manage.
--CELKO-- replied on 28-Mar-08 12:32 AM
I know, I know; but I work with other SQLs that have been up to
Standards for years and forget to "dumb down" my code sometimes.
--CELKO-- replied on 28-Mar-08 12:32 AM
Get a book on data modeling and look at "commodity entities" (or some
such thing -- authors differ on the terms) in the index.  The proper
question, is "how can occurance_cnt ever going to be < 0?"  and "What
is the scale that measures it?"

The idea is that some entities are, for our purposes, identical.  The
individuals are not identified, but an aggregation is given an
identifier and a quantifier (unfortunate term because it conflicts
with predicate logic terms).   English  distinguishes between separate
units measured on an absolute scale (machine screws, "many", a count)
and consolidated entities (water and sand, "more", a volume) measured
on a ratio scale.  I am assuming an absolute scale given no other
information.


You really don't understand the commodity data concept, do you? I
cannot post a 1-2 hour lecture on scales and measurements in a
Newsgroup, but you can Google something.


Yes, I can clearly see who is doing those things in this posting :)


It is that 2000, 2005 or 2008 MS SQL Server? :)
Tony Rogerson replied on 25-Mar-08 04:14 PM
Given your DDL....

CREATE TABLE Foobar
(silly_string CHAR(3) NOT NULL PRIMARY KEY,
occurrence_cnt INTEGER DEFAULT 1 NOT NULL
CHECK (occurrence_cnt > 0));

INSERT INTO Foobar
VALUES ('abc', 10), ('def', 13);

Now you update the table:
UPDATE Foobar
SET occurrence_cnt = 1
WHERE silly_string = 'abc';

Demonstrate to me how that prevents duplicates which is what the OP is
trying to do.


Demonstrate to me how your DDL prevent duplicates from getting in.


What, 2008 released? Wow - I missed that one.

Show me how that syntax works on the currently released product and not some
Community Technical Preview or are you telling us that it's fine to put
systems into production on beta software?

--
Tony Rogerson, SQL Server MVP
http://sqlblogcasts.com/blogs/tonyrogerson
[Ramblings from the field from a SQL consultant]
http://sqlserverfaq.com
[UK SQL User Community]
Tony Rogerson replied on 25-Mar-08 04:16 PM
Really, you spend 99% of your time on this forum which is MICROSOFT SQL
SERVER; do you have a program that blanks that out or something?

Why do you find it so difficult to post code that works with the product
this forum is based around?

Only the truly ignorant and arrogant would continue to make this fundemental
error.

--
Tony Rogerson, SQL Server MVP
http://sqlblogcasts.com/blogs/tonyrogerson
[Ramblings from the field from a SQL consultant]
http://sqlserverfaq.com
[UK SQL User Community]
Uri Dimant replied on 26-Mar-08 04:11 AM
Rich
http://dimantdatabasesolutions.blogspot.com/2007/02/dealing-with-duplicates.html
david replied on 28-Mar-08 12:33 AM
I think the trick was making silly_string a primary key.    : )

David