Thursday, January 14, 2010

Advantages&Disadvantages of DBMSs

Advantages of DBMSs• Control of data redundancy
• Data consistency
• More information from the same amount of data
• Sharing of data
• Improved data integrity
• Improved security
• Enforcement of standards
• Economy of scale
• Balanced conflicting requirements
• Improved data accessibility and responsiveness
• Increased productivity
• Improved maintenance through data independence
• Increased concurrency
• Improved backup and recovery services

Disadvantages of DBMSs

Complexity
• Size
• Cost of DBMS
• Additional hardware costs
• Cost of conversion
• Performance
• Higher impact of a failure

Components of the Database Environment

Components of the Database Environment

Hardware
– Can range from a PC to a network of computers containing secondary storage volumes and hardware processor(s) with associated main memory that are used to support execution of database management system
• Software
– DBMS, operating system, network software (if necessary) and also the application programs.
• Data
– Used by the organization and a description of this data called the schema. The data as discussed above is integrated and shared
– By integrated it is meant that the data is actually a unification of several files with redundancy among files partially eliminated
– By Shared it is meant that individual pieces of data in the database can be shared among different users.
• Procedures
– Instructions and rules that should be applied to the design and use of the database and DBMS.
• People
– The people that participate in the database environment.
– Including
• Application Programmers who are responsible for writing database applications
• End Users are people who interact with database system from workstations and terminals in order view and use data to complete their routine tasks.
• Data Administrator is the person who is responsible for deciding what data is important and should be recorded. This person belongs to senior management level (normally not a technician) and understands what is important for the enterprise. He is also responsible for defining various policies related to data including security policy.
• Database Administrator: This is a technical person responsible for implementing the policies defined by the data administrator. DBA is also responsible for ensuring that the system operates with adequate performance and for providing a variety of technical services.

INTRO OF Database Management System & VIEWS

Database Management System (DBMS)• A software system that enables users to define, create, and maintain the database and that provides controlled access to this database.


Views
Allows each user to have his or her own view of the database.
• A view is essentially some subset of the database.
• Benefits include:
• Reduce complexity;
• Provide a level of security;
• Provide a mechanism to customize the appearance of the database;
• Present a consistent, unchanging picture of the structure of the database, even if the underlying database is changed.

Database Approach

Database Approach

• Arose because:
– Definition of data was embedded in application programs, rather than being stored separately and independently.
– No control over access and manipulation of data beyond that imposed by application programs.
• Result:
– the database and Database Management System (DBMS).

Database Approach

• Data definition language (DDL).
– Permits specification of data types, structures and any data constraints.
– All specifications are stored in the database.
• Data manipulation language (DML).
– General enquiry facility (query language) of the data.
• Controlled access to database may include
– A security system.
– An integrity system.
– A concurrency control system.
– A recovery control system.
– A user-accessible catalog.
• A view mechanism.
– Provides users with only the data they want or need to use.

File-based Systems& Limitations of File-Based Approach

File-based Systems• Collection of application programs that perform services for the end users (e.g. reports).
• Each program defines and manages its own data
• E.g. is a C++ system that accepts and stores data. In such case the sequence in which the fields are recorded is coded in the program, not in the file.

Limitations of File-Based Approach

• Separation and isolation of data
– Each program maintains its own set of data.
– Users of one program may be unaware of potentially useful data held by other
programs.
– Duplication of data
– Same data is held by different programs.
– Wasted space and potentially different values and/or different formats for the same item.

• Duplication of data
– Same data is held by different programs.
– Wasted space and potentially different values and/or different formats for the same item.

Introducing Database

Introducing Database:

A collection of computerized data files. In simple words it is computerized record keeping.
Examples of Database Applications
Purchases from the supermarket
• Purchases using your credit card
• Booking a holiday at the travel agents
• Using the local library
• Taking out insurance
• Using the Internet
• Studying at university


Formal definition of Database• Shared collection of logically related data (and a description of this data), designed to meet the information needs of an organization.
• System catalog (metadata) provides description of data to enable program–data independence.
• Logically related data comprises entities, attributes, and relationships of an organization’s information.

The Relational Data Model( database)

The Relational Data Model
The Relational Data Model has the relation at its heart, but then a whole series of rules governing keys, relationships, joins, functional dependencies, transitive dependencies, multi-valued dependencies, and modification anomalies.

The RelationThe Relation is the basic element in a relational data model.



A relation is subject to the following rules:
1. Relation (file, table) is a two-dimensional table.
2. Attribute (i.e. field or data item) is a column in the table.
3. Each column in the table has a unique name within that table.
4. Each column is homogeneous. Thus the entries in any column are all of the same type (e.g. age, name, employee-number, etc).
5. Each column has a domain, the set of possible values that can appear in that column.
6. A Tuple (i.e. record) is a row in the table.
7. The order of the rows and columns is not important.
8. Values of a row all relate to some thing or portion of a thing.
9. Repeating groups (collections of logically related attributes that occur multiple times within one record occurrence) are not allowed.
10. Duplicate rows are not allowed (candidate keys are designed to prevent this).
11. Cells must be single-valued (but can be variable length). Single valued means the following:
o Cannot contain multiple values such as 'A1,B2,C3'.
o Cannot contain combined values such as 'ABC-XYZ' where 'ABC' means one thing and 'XYZ' another.
A relation may be expressed using the notation R(A,B,C, ...) where:
• R = the name of the relation.
• (A,B,C, ...) = the attributes within the relation.
• A = the attribute(s) which form the primary key.
Keys
1. A simple key contains a single attribute.
2. A composite key is a key that contains more than one attribute.
3. A candidate key is an attribute (or set of attributes) that uniquely identifies a row. A candidate key must possess the following properties:
o Unique identification - For every row the value of the key must uniquely identify that row.
o Non redundancy - No attribute in the key can be discarded without destroying the property of unique identification.
4. A primary key is the candidate key which is selected as the principal unique identifier. Every relation must contain a primary key. The primary key is usually the key selected to identify a row when the database is physically implemented. For example, a part number is selected instead of a part description.
5. A superkey is any set of attributes that uniquely identifies a row. A superkey differs from a candidate key in that it does not require the non redundancy property.
6. A foreign key is an attribute (or set of attributes) that appears (usually) as a non key attribute in one relation and as a primary key attribute in another relation. I say usually because it is possible for a foreign key to also be the whole or part of a primary key:

o A many-to-many relationship can only be implemented by introducing an intersection or link table which then becomes the child in two one-to-many relationships. The intersection table therefore has a foreign key for each of its parents, and its primary key is a composite of both foreign keys.
o A one-to-one relationship requires that the child table has no more than one occurrence for each parent, which can only be enforced by letting the foreign key also serve as the primary key.
7. A semantic or natural key is a key for which the possible values have an obvious meaning to the user or the data. For example, a semantic primary key for a COUNTRY entity might contain the value 'USA' for the occurrence describing the United States of America. The value 'USA' has meaning to the user.
8. A technical or surrogate or artificial key is a key for which the possible values have no obvious meaning to the user or the data. These are used instead of semantic keys for any of the following reasons:
o When the value in a semantic key is likely to be changed by the user, or can have duplicates. For example, on a PERSON table it is unwise to use PERSON_NAME as the key as it is possible to have more than one person with the same name, or the name may change such as through marriage.
o When none of the existing attributes can be used to guarantee uniqueness. In this case adding an attribute whose value is generated by the system, e.g from a sequence of numbers, is the only way to provide a unique value. Typical examples would be ORDER_ID and INVOICE_ID. The value '12345' has no meaning to the user as it conveys nothing about the entity to which it relates.
9. A key functionally determines the other attributes in the row, thus it is always a determinant.
10. Note that the term 'key' in most DBMS engines is implemented as an index which does not allow duplicate entries.

Relationships
One table (relation) may be linked with another in what is known as a relationship. Relationships may be built into the database structure to facilitate the operation of relational joins at runtime.
1. A relationship is between two tables in what is known as a one-to-many or parent-child or master-detail relationship where an occurrence on the 'one' or 'parent' or 'master' table may have any number of associated occurrences on the 'many' or 'child' or 'detail' table. To achieve this the child table must contain fields which link back the primary key on the parent table. These fields on the child table are known as a foreign key, and the parent table is referred to as the foreign table (from the viewpoint of the child).
2. It is possible for a record on the parent table to exist without corresponding records on the child table, but it should not be possible for an entry on the child table to exist without a corresponding entry on the parent table.
3. A child record without a corresponding parent record is known as an orphan.
4. It is possible for a table to be related to itself. For this to be possible it needs a foreign key which points back to the primary key. Note that these two keys cannot be comprised of exactly the same fields otherwise the record could only ever point to itself.
5. A table may be the subject of any number of relationships, and it may be the parent in some and the child in others.
6. Some database engines allow a parent table to be linked via a candidate key, but if this were changed it could result in the link to the child table being broken.
7. Some database engines allow relationships to be managed by rules known as referential integrity or foreign key restraints. These will prevent entries on child tables from being created if the foreign key does not exist on the parent table, or will deal with entries on child tables when the entry on the parent table is updated or deleted.

Determinant and Dependent


The terms determinant and dependent can be described as follows:
1. The expression X Y means 'if I know the value of X, then I can obtain the value of Y' (in a table or somewhere).
2. In the expression X Y, X is the determinant and Y is the dependent attribute.
3. The value X determines the value of Y.
4. The value Y depends on the value of X.

Functional Dependencies (FD)

A functional dependency can be described as follows:
1. An attribute is functionally dependent if its value is determined by another attribute which is a key.
2. That is, if we know the value of one (or several) data items, then we can find the value of another (or several).
3. Functional dependencies are expressed as X Y, where X is the determinant and Y is the functionally dependent attribute.
4. If A (B,C) then A B and A C.
5. If (A,B) C, then it is not necessarily true that A C and B C.
6. If A B and B A, then A and B are in a 1-1 relationship.
7. If A B then for A there can only ever be one value for B.

Transitive Dependencies (TD)


A transitive dependency can be described as follows:

1. An attribute is transitively dependent if its value is determined by another attribute which is not a key.
2. If X Y and X is not a key then this is a transitive dependency.
3. A transitive dependency exists when A B C but NOT A C.

Multi-Valued Dependencies (MVD)A multi-valued dependency can be

described as follows:
1. A table involves a multi-valued dependency if it may contain multiple values for an entity.
2. A multi-valued dependency may arise as a result of enforcing 1st normal form.
3. X Y, ie X multi-determines Y, when for each value of X we can have more than one value of Y.
4. If A B and A C then we have a single attribute A which multi-determines two other independent attributes, B and C.
5. If A (B,C) then we have an attribute A which multi-determines a set of associated attributes, B and C.


Types of Relational Join


A JOIN is a method of creating a result set that combines rows from two or more tables (relations). When comparing the contents of two tables the following conditions may occur:
• Every row in one relation has a match in the other relation.
• Relation R1 contains rows that have no match in relation R2.
• Relation R2 contains rows that have no match in relation R1.
INNER joins contain only matches. OUTER joins may contain mismatches as well.


Inner Join


This is sometimes known as a simple join. It returns all rows from both tables where there is a match. If there are rows in R1 which do not have matches in R2, those rows will not be listed. There are two possible ways of specifying this type of join:
SELECT * FROM R1, R2 WHERE R1.r1_field = R2.r2_field;
SELECT * FROM R1 INNER JOIN R2 ON R1.field = R2.r2_field
If the fields to be matched have the same names in both tables then the ON condition, as in:

ON R1.fieldname = R2.fieldname
ON (R1.field1 = R2.field1 AND R1.field2 = R2.field2)

can be replaced by the shorter USING condition, as in:

USING fieldname
USING (field1, field2)

Natural Join
A natural join is based on all columns in the two tables that have the same name. It is semantically equivalent to an INNER JOIN or a LEFT JOIN with a USING clause that names all columns that exist in both tables.
SELECT * FROM R1 NATURAL JOIN R2

The alternative is a keyed join which includes an ON or USING condition.
Left [Outer] Join

Returns all the rows from R1 even if there are no matches in R2. If there are no matches in R2 then the R2 values will be shown as null.
SELECT * FROM R1 LEFT [OUTER] JOIN R2 ON R1.field = R2.field

Right [Outer] Join

Returns all the rows from R2 even if there are no matches in R1. If there are no matches in R1 then the R1 values will be shown as null.

SELECT * FROM R1 RIGHT [OUTER] JOIN R2 ON R1.field = R2.field


Full [Outer] Join

Returns all the rows from both tables even if there are no matches in one of the tables. If there are no matches in one of the tables then its values will be shown as null.


SELECT * FROM R1 FULL [OUTER] JOIN R2 ON R1.field = R2.field

Self Join


This joins a table to itself. This table appears twice in the FROM clause and is followed by table aliases that qualify column names in the join condition.
SELECT a.field1, b.field2 FROM R1 a, R1 b WHERE a.field = b.field

Cross Join


This type of join is rarely used as it does not have a join condition, so every row of R1 is joined to every row of R2. For example, if both tables contain 100 rows the result will be 10,000 rows. This is sometimes known as a cartesian product and can be specified in either one of the following ways:

SELECT * FROM R1 CROSS JOIN R2
SELECT * FROM R1, R2