SQL Server - Data Modeling: "None Selected" in Many-to-Many Scenario

Asked By Bob Johnson on 13-Feb-07 07:01 PM
A fellow developer and I considering a basic data modeling scenario and
would like additional perspective. For the sake of this question, please
consider one of the classic many-to-many relationships we find in
introductory database design articles/books: Students and Classes entities.

Of course we'd expect 3 tables to implement the many-to-many relationship
between Students and Classes (e.g., Students, StudentsXClasses, and Classes;
I hope you don't think you need DDL for this question :-)

Now we all know that the StudentsXClasses table would "tell us" which
students are enrolled in which classes. No problem there. But what about the
situation in which a student is enrolled in zero classes... or the similar
situation in which a Class has zero students. In both of those cases we
would expect the StudentsXClasses table to have zero rows in it for the
particular Student/Class combinations (where zero are enrolled).

To my thinking, that's the "end of the line" on the implementation. If we
want to find any class with zero students, then we query to find the classes
with zero rows in StudentsXClasses, for example... same with students
enrolled in zero classes.

But my fellow developer wants to have a "class" (row in the Classes table)
named "none" or something like that... and when a student is enrolled in
zero "real classes", then the StudentsXClasses table gets one row that
points to the student and to the "none class." The argument in favor of
doing this is that we have an explicit way for users to indicate - "yes,
this student is in fact enrolled in zero classes" (i.e., by creating that
entry)... as opposed to simply inferring "zero classes enrolled" from a lack
of expected row(s) in StudentsXClasses.

If we go with his desired implementation (i.e., having a pseudo class,
zero classes), then I'm thinking we'd have to create a table constraint or
trigger that prevents the "none class" from being entered into the
StudentsXClasses table for a given student along with other "real classes" -
as that would be an invalid combination of rows.

While that implementation seems unnecessarily complicated to me (having the
invalid row combinations), my fellow developer is telling me that the
business currently spends a lot of time trying to answer to the question;
some explicit "flag" that is there to answer that question would be valuable
to the business... therefore we have a "good reason" to do something we
otherwise would not want to do.

Thoughts? Perspective? Opinions?

Thanks!




Tom Cooper replied on 13-Feb-07 08:16 PM
Don't add a row that says this student has no classes or that a class has no
students.

If a student has no rows in the StudentsXClasses table, then your database
has said that this student has no classes.  If you add a row that also says
this student has no classes, you have stored the same information twice, a
clear violation of normalization principles.

If the processes that put data into these tables are correct, then the data
will be correct.  And if the processes that put data into these tables
insert incorrect data, then what you are talking about won't help, you'll
still have incorrect data.  Fix the processes that are generating incorrect
data, do not add this kludge.

You say you don't trust the data in the StudentsXClasses table when the
student has no classes.  Why, then, do you trust the data when the student
has classes?  If it says the student is taking English and chemistry, how do
you know he is not also taking mathematics?

Tom