SQL Server - Check constraint advice for date range overlaps

Asked By srowlan on 08-Jun-07 05:48 PM
I have this table: (I know how Mr. Celko likes the DDL!)

CREATE TABLE [dbo].[ProgramYears](
[ProgramYear] [int] NOT NULL,
[StartDate] [smalldatetime] NOT NULL,
[EndDate] [smalldatetime] NOT NULL,
[TS] [timestamp] NOT NULL,
CONSTRAINT [PK_ProgramYears] PRIMARY KEY CLUSTERED
(
[ProgramYear] ASC
) ON [PRIMARY]
) ON [PRIMARY]

GO
USE [HCSWAP]
GO
ALTER TABLE [dbo].[ProgramYears]  WITH CHECK ADD  CONSTRAINT
[EnsureNoOverlaps] CHECK  (([dbo].[WillCreateOverlap]([ProgramYear],
[StartDate], [EndDate]) = 0))
GO
ALTER TABLE [dbo].[ProgramYears]  WITH NOCHECK ADD  CONSTRAINT
[StartDateLessThanEndDate] CHECK  (([StartDate] <= [EndDate]))
GO
ALTER TABLE [dbo].[ProgramYears]  WITH CHECK ADD  CONSTRAINT
[YearIsInRange] CHECK  (([ProgramYear] >= datepart(year,[StartDate])
and [ProgramYear] <= datepart(year,[EndDate])))

INSERT INTO dbo.ProgramYears (ProgramYear, StartDate, EndDate) VALUES
(1994, '10/1/1993 12:00 AM','9/30/1994 11:59 PM')
INSERT INTO dbo.ProgramYears (ProgramYear, StartDate, EndDate) VALUES
(1995, '10/1/1994 12:00 AM','9/30/1995 11:59 PM')
INSERT INTO dbo.ProgramYears (ProgramYear, StartDate, EndDate) VALUES
(1996, '10/1/1995 12:00 AM','9/30/1996 11:59 PM')
INSERT INTO dbo.ProgramYears (ProgramYear, StartDate, EndDate) VALUES
(1997, '10/1/1996 12:00 AM','9/30/1997 11:59 PM')
INSERT INTO dbo.ProgramYears (ProgramYear, StartDate, EndDate) VALUES
(1998, '10/1/1997 12:00 AM','9/30/1998 11:59 PM')
INSERT INTO dbo.ProgramYears (ProgramYear, StartDate, EndDate) VALUES
(1999, '10/1/1998 12:00 AM','9/30/1999 11:59 PM')
INSERT INTO dbo.ProgramYears (ProgramYear, StartDate, EndDate) VALUES
(2000, '10/1/1999 12:00 AM','9/30/2000 11:59 PM')
INSERT INTO dbo.ProgramYears (ProgramYear, StartDate, EndDate) VALUES
(2001, '10/1/2000 12:00 AM','9/30/2001 11:59 PM')

The EnsureNoOverlaps constraint calls a SQL Scalar function that
checks to make sure that the date range being inserted (or updated)
doesn't overlap (and is not overlapped by) any other date ranges,
returns a 1 if it will create an overlap, 0 if it won't.
Here is the TSQL of the function:

ALTER FUNCTION [dbo].[WillCreateOverlap]
(
@ProgramYear  INT,
@StartDate DATETIME,
@EndDate DATETIME
)
RETURNS BIT
AS
BEGIN

DECLARE @Result BIT


IF
(@EndDate < ALL
(SELECT StartDate
FROM dbo.EAPProgramYears
WHERE
ProgramYear <> @ProgramYear
AND
@StartDate < StartDate
)
)
AND
(@StartDate > ALL
(SELECT EndDate
FROM dbo.EAPProgramYears
WHERE
ProgramYear <> @ProgramYear
AND
@EndDate > EndDate
)
)
BEGIN
SET @Result = 0
END
ELSE
BEGIN
SET @Result = 1
END

RETURN @Result

END

I really don't like the idea of calling a function from a constraint
(for one thing, I have to drop the constraint from the table if I need
to make a change to it). But if I try to just put the logic of the
function in the expression for the constraint, I get the "Subqueries
are not allowed in contraints..." error.

Is there anyway to implement this sort of logic just in the contraint
expression?
I need to make sure noone enteres something like:
INSERT INTO dbo.ProgramYears (ProgramYear, StartDate, EndDate) VALUES
(2002, '10/1/2000 12:00
AM','9/30/2002 11:59 PM')
or
INSERT INTO dbo.ProgramYears (ProgramYear, StartDate, EndDate) VALUES
(2002, '8/1/2001 12:00 AM','9/30/2002 11:59 PM')
Thank you!




--CELKO-- replied on 09-Jun-07 02:08 PM
Everyone likes DDL.  So, let me yell at you for the lack of ISO-8601
temporal formats in the same data to keep up my "Simon Kowl" image.

I would also add a table constraint that (start_date < end_date).
Also, if start and end dates are each unique, that removes some
overlapping situations, where durations start and/or end together.

This is a nice piece of code.  You know about the ALL() predicate,
which most SQL programmers do not! I am impressed.  Yet, you write
code with procedural IF-THEN-ELSE constructs and low-level BIT data
types as if you were still in an old 3GL.  Let's make it more SQL and
less T-SQL so the code will port and the optimizer can do its job
better.

CREATE FUNCTION WillCreateOverlap
(@my_program_year INTEGER,
@my_start_date DATETIME,
@my_end_date DATETIME)
RETURNS CHAR(1); -- portable data type
AS
RETURN
(CASE WHEN
@my_end_date
FROM EAP_ProgramYears
WHERE program_year <> @my_program_year
AND @my_start_date < start_date)
AND @my_start_date
FROM EAP_ProgramYears
WHERE program_year <> @my_program_year
AND @my_end_date > end_date)
THEN 'F' ELSE 'T' END);

Standard SQL would allow you to put this in a CHECK() constraint.
Having said this, you are stuck in T-SQL.  You will probably want to
use an INSERT trigger and a NOT EXISTS() predicate with a ROLLBACK
action in it.

You should also download the Rick Snodgrass book in PDF from the
University of Arizona on temporal queries in SQL.  It is free and has
a lot of code.
AlejandroMes replied on 09-Jun-07 04:36 PM
srowland@roeing.com,

I think it is ok to use a UDF in this case, if we can not use subqueries in
constraints.

Long time ago, somebody (do not remember who) posted a code in this ng, to
check for date overlaping, and the idea was to check if it exists instead if
it doesn't. May be it could yield better performance for your function.

create FUNCTION [dbo].[WillCreateOverlap]
(
@ProgramYear  INT,
@StartDate DATETIME,
@EndDate DATETIME
)
RETURNS BIT
AS
BEGIN

DECLARE @Result BIT

if exists (
select
*
from
dbo.EAPProgramYears
where
ProgramYear <> @ProgramYear
and @StartDate < [EndDate]
and @EndDate > [StartDate]
)
set @Result = 1
else
set @Result = 0

RETURN @Result
END
go


AMB
Alex Kuznetsov replied on 09-Jun-07 08:41 PM
An excerpt from my unfinished article:

When Process Steps Cannot Overlap.

Consider the following tables which store processes and their steps:

CREATE TABLE dbo.Processes(ProcessID INT NOT NULL,
Description VARCHAR(30)
CONSTRAINT PK_Processes PRIMARY KEY(ProcessID)
)
go
DROP TABLE dbo.ProcessSteps
go
CREATE TABLE dbo.ProcessSteps(ProcessID INT NOT NULL,
Status VARCHAR(20),
StartedAt DATETIME NOT NULL,
FinishedAt DATETIME NOT NULL,
PreviousFinishedAt DATETIME NULL,
--  CONSTRAINT PK_ProcessSteps PRIMARY KEY(ProcessID, StartedAt),
CONSTRAINT PK_ProcessSteps_ProcessID_FinishedAt PRIMARY
KEY(ProcessID,
FinishedAt),
CONSTRAINT UNQ_ProcessSteps_ProcessID_PreviousFinishedAt
UNIQUE(ProcessID,
PreviousFinishedAt),
CONSTRAINT FK_ProcessSteps_ProcessID FOREIGN KEY(ProcessID)
REFERENCES
dbo.Processes(ProcessID),
CONSTRAINT FK_ProcessSteps_ProcessID_PreviousFinishedAt
FOREIGN KEY(ProcessID, PreviousFinishedAt)
REFERENCES dbo.ProcessSteps(ProcessID, FinishedAt),
CONSTRAINT CHK_ProcessSteps_StartedAt_Before_FinishedAt
CHECK(StartedAt <= FinishedAt),
CONSTRAINT CHK_ProcessSteps_PreviousFinishedAt_Before_StartedAt
CHECK(PreviousFinishedAt <= StartedAt)
)
go

Let me add some data so that I can demostrate how these constraints
work:

INSERT INTO dbo.Processes(ProcessID, Description) VALUES(1,
'Something')
go

INSERT INTO dbo.ProcessSteps(ProcessID, Status, StartedAt, FinishedAt,
PreviousFinishedAt)
VALUES(1, 'Pending', '20070101', '20070103', NULL)

The unique constraint UNQ_ProcessSteps_ProcessID_PreviousFinishedAt
guarantees
that every process has only one initial step (i.e. step for which
PreviousFinishedAt  IS NULL).

INSERT INTO dbo.ProcessSteps(ProcessID, Status, StartedAt, FinishedAt,
PreviousFinishedAt)
VALUES(1, 'Pending', '20070104', '20070105', NULL)

Server: Msg 2627, Level 14, State 2, Line 1
Violation of UNIQUE KEY constraint
'UNQ_ProcessSteps_ProcessID_PreviousFinishedAt'.
Cannot insert duplicate key in object 'ProcessSteps'.
The statement has been terminated.

This unique constraint also guarantees that every step has at most one
next
step.
The foreign key constraint
FK_ProcessSteps_ProcessID_PreviousFinishedAt
guarantees that if the process step is not initial (i.e.
PreviousFinishedAt IS
NOT NULL), then its PreviousFinishedAt column matches FinishedAt for
the
previous step.

INSERT INTO dbo.ProcessSteps(ProcessID, Status, StartedAt, FinishedAt,
PreviousFinishedAt)  VALUES(1, 'Pending', '20070104', '20070105',
'20070103')
INSERT INTO dbo.ProcessSteps(ProcessID, Status, StartedAt, FinishedAt,
PreviousFinishedAt)  VALUES(1, 'Opened', '20070104', '20070109',
'20070105')

Server: Msg 547, Level 16, State 1, Line 1
INSERT statement conflicted with TABLE CHECK constraint
'CHK_ProcessSteps_PreviousFinishedAt_Before_StartedAt'. The conflict
occurred in
database 'RiskCenter', table 'ProcessSteps'.
The statement has been terminated.

The check constrint
CHK_ProcessSteps_PreviousFinishedAt_Before_StartedAt
guarantees that process steps do not overlap. To make sure that there
are
neither gaps nor overlaps, simply change (PreviousFinishedAt <=
StartedAt) to
(PreviousFinishedAt = StartedAt).

While adding new process steps after existing ones is easy, inserting
a process
step between two existing ones is more involved.
Let me add one more row so that there is a gap between January 5 and
January 14:

INSERT INTO dbo.ProcessSteps(ProcessID, Status, StartedAt, FinishedAt,
PreviousFinishedAt)
VALUES(1, 'Opened', '20070114', '20070119', '20070105')

Filling this gap with one more step would require two different
operations:
- insert a row to fill the gap
- update the row immediately ofter the gap, having its
PreviousFinishedAt column
point to the newly inserted step
Clearly you can accomplish both goals in one MERGE statement. Because
MS SQL
Server does not provide MERGE yet,  you need to use the following
workaraund:
1. Add a new step after teh last one for the process:

INSERT INTO dbo.ProcessSteps(ProcessID, Status, StartedAt, FinishedAt,
PreviousFinishedAt)
VALUES(1, 'Reviewed', '20070125', '20070129', '20070119')

2. In one and the same update statement,
- modify the step that you just added so that it fills the gap
- update the row immediately ofter the gap, having its
PreviousFinishedAt
column point to the step that filled the gap

UPDATE dbo.ProcessSteps SET StartedAt = CASE WHEN FinishedAt =
'20070129' THEN
'20070105' ELSE StartedAt END,
FinishedAt = CASE WHEN FinishedAt = '20070129' THEN '20070114' ELSE
FinishedAt
END,
PreviousFinishedAt = CASE WHEN FinishedAt = '20070129' THEN
'20070105' ELSE
'20070114' END
WHERE ProcessID = 1 AND FinishedAt IN('20070129', '20070119')

Now the gap is filled:

SELECT ProcessID, Status, CONVERT(CHAR(8), StartedAt, 112) StartedAt,
CONVERT(CHAR(8), FinishedAt, 112) FinishedAt,
CONVERT(CHAR(8), PreviousFinishedAt, 112) PreviousFinishedAt
FROM dbo.ProcessSteps ORDER BY FinishedAt

/*
ProcessID   Status               StartedAt FinishedAt
PreviousFinishedAt
----------- -------------------- --------- ----------
------------------
1           Pending              20070101  20070103   NULL
1           Pending              20070104  20070105   20070103
1           Reviewed             20070105  20070114   20070105
1           Opened               20070114  20070119   20070114

(4 row(s) affected)
*/
Steve Dassin replied on 11-Jun-07 02:21 AM
I am not. Real sql programmers know All involves a max and the
subquery can be avoided.

For example:
Bonehead sql:

select ShipCountry,sum(Freight) as Maxsumfrt
from Orders
group by ShipCountry
having sum(Freight)
All
(select sum(Freight)
from Orders
group by ShipCountry)

ShipCountry     Maxsumfrt
--------------- ---------------------
USA             13771.2900

vs.

Nice neat and expedient sql:

select top 1 shipcountry,sum(freight)over(partition by shipcountry) as
Maxsumfrt
from Orders
order by sum(freight)over(partition by shipcountry) desc

You don't even need a group by let alone a subquery!

www.beyondsql.blogspot.com
--CELKO-- replied on 11-Jun-07 09:32 AM
Finding the extrema on a column with a unique constraint is not a
problem. Depending on the underlying implementation, the optimizer
will convert this to an EXISTS() with an index or do a hash to look
for equality.


SELECT ship_country, SUM(freight_amt) AS maxsumfrt
FROM Orders
GROUP BY ship_country
HAVING SUM(freight_amt)
FROM Orders
GROUP BY ship_country);


This is straw man.  I think the query is "find which country shipped
the most total freight."


Unfortunately, what you posted is not SQL, but local dialect.

SELECT TOP 1 ship_country, -- proprietary syntax!!
SUM(freight)OVER(PARTITION BY ship_country) AS maxsumfrt
FROM Orders
ORDER BY SUM(freight)OVER(PARTITION BY ship_country) DESC;
-- proprietary syntax again; you can use maxsumfrt in the ORDER BY
clause of the cursor

We can do this with Standard syntax and a simple CTE:

WITH FX(ship_country, frt_tot)
AS
(SELECT ship_country, SUM(freight_amt)
FROM Orders
GROUP BY ship_country)
SELECT FX.ship_country, FX.frt_tot
FROM FX
WHERE F1.frt_tot
= (SELECT MAX(frt_tot) FROM FX);

A good optimizer will materialize the FX table once and collect the
sums and extrema in the process.

As an aside, since the OLAP functions are functions, you can write
things like:

MAX (SUM(freight)OVER(PARTITION BY ship_country)) AS maxsumfrt

But they can get weird and slow because of scoping.
Steve Dassin replied on 11-Jun-07 03:09 PM
I'm using the All subquery as it is generally used to find a max.
You understand this yet say I'm inventing a extraneous problem,
a straw man. This is incoherent on your part.


Of course it's 'local', the server is on the pc I'm using.


Talk about a straw-berry!



Really.
select MAX (SUM(freight)OVER(PARTITION BY shipcountry)) AS maxsumfrt
from Orders

Server: Msg 4109, Level 15, State 1, Line 1
Windowed functions cannot be used in the context of another windowed
function or aggregate.

No you can't. Yet get a rasp-berry.
--CELKO-- replied on 11-Jun-07 07:19 PM
Now, that is probably true, but it is bad programming :)  Let's ignore
the fact that functions are not predicates.  The MAX() aggregate
function throws out the NULLs before looking for its value and the
different.


LOL!  I mean that "SELECT TOP n .. ORDER BY.." is local T-SQL dialect
and not ANSI/ISO Standard syntax.  For this to fall into the Standard
SQL language model, the TOP subclause would have to be part of the
cursor level ORDER BY, like the LIMIT extension in other products.


Yes, it was sweet and simple.  A good optimizer will materialize the
FX table once and collect the sums and extrema in the process. Right
now, SQL Server is not doing a good job with the CTE code and tends to
insert them as in-line macros rather than seeing what is actually used
from the CTE.


The OLAP stuff is another area where SQL Server is still weak.  For
example I cannot even use "MAX(freight_amt)OVER(PARTITION BY
ship_country)" like DB2 and Oracle.  Programmers should know about the
full power so that they can be ready when MS catches up.
srowlan replied on 12-Jun-07 10:10 AM
Thank you for your information, I have considered your critiques.

types as if you were still in an old 3GL.  Let's make it more SQL and
less T-SQL so the code will port and the optimizer can do its job
better. "

I originally did code it pretty much the same way you suggested, as I
am not a big fan of programming constructs in database code either.
However, I looked at the query plan for both of these and it created
essentially the same plan both ways, plus this table should never have
more than 100 or so rows, so I am not terribly concerned with
performance and optimization for this. This is a somewhat "static"
table, the users don't have any way to add rows to it. A DBA will be
the one adding rows to it. I just want all of the constraints on it
because we will be doing a lot of queries that involve getting the
program year for a particular date, and I don't want to take a chance
on that returning multiple rows and throwing off reporting
aggregations.
In the end, it looked, to me anyway, to be more readable with the IF-
THEN-ELSE then the case statement, so that is what I chose.
My goal for posting this was to see if it could be done in the
expression of a check constraint, instead of the check constraint
calling a scalar function.
If we have to port this database to another product, this table and
its constraints will be the least of my worries. We tend to create SQL
server databases specifically, because that is what we have the most
experience with. In 10 years of doing this for this company, I have
yet to have to port any database to another product. That being said,
who knows, maybe our next project will have that requirement, in which
case I will certainly spend a considerable amount of time ensuring
that the database is portable.

I downloaded the pdf you mentioned on temporal databases, I am always
interested in learning whatever I can. Although I am not sure when I
will have time to read a 528 page document!
Thanks