Thursday, January 14, 2010

The Relational Data Model( database)

The Relational Data Model
The Relational Data Model has the relation at its heart, but then a whole series of rules governing keys, relationships, joins, functional dependencies, transitive dependencies, multi-valued dependencies, and modification anomalies.

The RelationThe Relation is the basic element in a relational data model.



A relation is subject to the following rules:
1. Relation (file, table) is a two-dimensional table.
2. Attribute (i.e. field or data item) is a column in the table.
3. Each column in the table has a unique name within that table.
4. Each column is homogeneous. Thus the entries in any column are all of the same type (e.g. age, name, employee-number, etc).
5. Each column has a domain, the set of possible values that can appear in that column.
6. A Tuple (i.e. record) is a row in the table.
7. The order of the rows and columns is not important.
8. Values of a row all relate to some thing or portion of a thing.
9. Repeating groups (collections of logically related attributes that occur multiple times within one record occurrence) are not allowed.
10. Duplicate rows are not allowed (candidate keys are designed to prevent this).
11. Cells must be single-valued (but can be variable length). Single valued means the following:
o Cannot contain multiple values such as 'A1,B2,C3'.
o Cannot contain combined values such as 'ABC-XYZ' where 'ABC' means one thing and 'XYZ' another.
A relation may be expressed using the notation R(A,B,C, ...) where:
• R = the name of the relation.
• (A,B,C, ...) = the attributes within the relation.
• A = the attribute(s) which form the primary key.
Keys
1. A simple key contains a single attribute.
2. A composite key is a key that contains more than one attribute.
3. A candidate key is an attribute (or set of attributes) that uniquely identifies a row. A candidate key must possess the following properties:
o Unique identification - For every row the value of the key must uniquely identify that row.
o Non redundancy - No attribute in the key can be discarded without destroying the property of unique identification.
4. A primary key is the candidate key which is selected as the principal unique identifier. Every relation must contain a primary key. The primary key is usually the key selected to identify a row when the database is physically implemented. For example, a part number is selected instead of a part description.
5. A superkey is any set of attributes that uniquely identifies a row. A superkey differs from a candidate key in that it does not require the non redundancy property.
6. A foreign key is an attribute (or set of attributes) that appears (usually) as a non key attribute in one relation and as a primary key attribute in another relation. I say usually because it is possible for a foreign key to also be the whole or part of a primary key:

o A many-to-many relationship can only be implemented by introducing an intersection or link table which then becomes the child in two one-to-many relationships. The intersection table therefore has a foreign key for each of its parents, and its primary key is a composite of both foreign keys.
o A one-to-one relationship requires that the child table has no more than one occurrence for each parent, which can only be enforced by letting the foreign key also serve as the primary key.
7. A semantic or natural key is a key for which the possible values have an obvious meaning to the user or the data. For example, a semantic primary key for a COUNTRY entity might contain the value 'USA' for the occurrence describing the United States of America. The value 'USA' has meaning to the user.
8. A technical or surrogate or artificial key is a key for which the possible values have no obvious meaning to the user or the data. These are used instead of semantic keys for any of the following reasons:
o When the value in a semantic key is likely to be changed by the user, or can have duplicates. For example, on a PERSON table it is unwise to use PERSON_NAME as the key as it is possible to have more than one person with the same name, or the name may change such as through marriage.
o When none of the existing attributes can be used to guarantee uniqueness. In this case adding an attribute whose value is generated by the system, e.g from a sequence of numbers, is the only way to provide a unique value. Typical examples would be ORDER_ID and INVOICE_ID. The value '12345' has no meaning to the user as it conveys nothing about the entity to which it relates.
9. A key functionally determines the other attributes in the row, thus it is always a determinant.
10. Note that the term 'key' in most DBMS engines is implemented as an index which does not allow duplicate entries.

Relationships
One table (relation) may be linked with another in what is known as a relationship. Relationships may be built into the database structure to facilitate the operation of relational joins at runtime.
1. A relationship is between two tables in what is known as a one-to-many or parent-child or master-detail relationship where an occurrence on the 'one' or 'parent' or 'master' table may have any number of associated occurrences on the 'many' or 'child' or 'detail' table. To achieve this the child table must contain fields which link back the primary key on the parent table. These fields on the child table are known as a foreign key, and the parent table is referred to as the foreign table (from the viewpoint of the child).
2. It is possible for a record on the parent table to exist without corresponding records on the child table, but it should not be possible for an entry on the child table to exist without a corresponding entry on the parent table.
3. A child record without a corresponding parent record is known as an orphan.
4. It is possible for a table to be related to itself. For this to be possible it needs a foreign key which points back to the primary key. Note that these two keys cannot be comprised of exactly the same fields otherwise the record could only ever point to itself.
5. A table may be the subject of any number of relationships, and it may be the parent in some and the child in others.
6. Some database engines allow a parent table to be linked via a candidate key, but if this were changed it could result in the link to the child table being broken.
7. Some database engines allow relationships to be managed by rules known as referential integrity or foreign key restraints. These will prevent entries on child tables from being created if the foreign key does not exist on the parent table, or will deal with entries on child tables when the entry on the parent table is updated or deleted.

Determinant and Dependent


The terms determinant and dependent can be described as follows:
1. The expression X Y means 'if I know the value of X, then I can obtain the value of Y' (in a table or somewhere).
2. In the expression X Y, X is the determinant and Y is the dependent attribute.
3. The value X determines the value of Y.
4. The value Y depends on the value of X.

Functional Dependencies (FD)

A functional dependency can be described as follows:
1. An attribute is functionally dependent if its value is determined by another attribute which is a key.
2. That is, if we know the value of one (or several) data items, then we can find the value of another (or several).
3. Functional dependencies are expressed as X Y, where X is the determinant and Y is the functionally dependent attribute.
4. If A (B,C) then A B and A C.
5. If (A,B) C, then it is not necessarily true that A C and B C.
6. If A B and B A, then A and B are in a 1-1 relationship.
7. If A B then for A there can only ever be one value for B.

Transitive Dependencies (TD)


A transitive dependency can be described as follows:

1. An attribute is transitively dependent if its value is determined by another attribute which is not a key.
2. If X Y and X is not a key then this is a transitive dependency.
3. A transitive dependency exists when A B C but NOT A C.

Multi-Valued Dependencies (MVD)A multi-valued dependency can be

described as follows:
1. A table involves a multi-valued dependency if it may contain multiple values for an entity.
2. A multi-valued dependency may arise as a result of enforcing 1st normal form.
3. X Y, ie X multi-determines Y, when for each value of X we can have more than one value of Y.
4. If A B and A C then we have a single attribute A which multi-determines two other independent attributes, B and C.
5. If A (B,C) then we have an attribute A which multi-determines a set of associated attributes, B and C.


Types of Relational Join


A JOIN is a method of creating a result set that combines rows from two or more tables (relations). When comparing the contents of two tables the following conditions may occur:
• Every row in one relation has a match in the other relation.
• Relation R1 contains rows that have no match in relation R2.
• Relation R2 contains rows that have no match in relation R1.
INNER joins contain only matches. OUTER joins may contain mismatches as well.


Inner Join


This is sometimes known as a simple join. It returns all rows from both tables where there is a match. If there are rows in R1 which do not have matches in R2, those rows will not be listed. There are two possible ways of specifying this type of join:
SELECT * FROM R1, R2 WHERE R1.r1_field = R2.r2_field;
SELECT * FROM R1 INNER JOIN R2 ON R1.field = R2.r2_field
If the fields to be matched have the same names in both tables then the ON condition, as in:

ON R1.fieldname = R2.fieldname
ON (R1.field1 = R2.field1 AND R1.field2 = R2.field2)

can be replaced by the shorter USING condition, as in:

USING fieldname
USING (field1, field2)

Natural Join
A natural join is based on all columns in the two tables that have the same name. It is semantically equivalent to an INNER JOIN or a LEFT JOIN with a USING clause that names all columns that exist in both tables.
SELECT * FROM R1 NATURAL JOIN R2

The alternative is a keyed join which includes an ON or USING condition.
Left [Outer] Join

Returns all the rows from R1 even if there are no matches in R2. If there are no matches in R2 then the R2 values will be shown as null.
SELECT * FROM R1 LEFT [OUTER] JOIN R2 ON R1.field = R2.field

Right [Outer] Join

Returns all the rows from R2 even if there are no matches in R1. If there are no matches in R1 then the R1 values will be shown as null.

SELECT * FROM R1 RIGHT [OUTER] JOIN R2 ON R1.field = R2.field


Full [Outer] Join

Returns all the rows from both tables even if there are no matches in one of the tables. If there are no matches in one of the tables then its values will be shown as null.


SELECT * FROM R1 FULL [OUTER] JOIN R2 ON R1.field = R2.field

Self Join


This joins a table to itself. This table appears twice in the FROM clause and is followed by table aliases that qualify column names in the join condition.
SELECT a.field1, b.field2 FROM R1 a, R1 b WHERE a.field = b.field

Cross Join


This type of join is rarely used as it does not have a join condition, so every row of R1 is joined to every row of R2. For example, if both tables contain 100 rows the result will be 10,000 rows. This is sometimes known as a cartesian product and can be specified in either one of the following ways:

SELECT * FROM R1 CROSS JOIN R2
SELECT * FROM R1, R2

No comments:

Post a Comment