File systems of varying degrees of sophistication satisfied the need for information storage and processing for several years. However, large enterprises tended to build many independent files containing related and even overlapping data, and data-processing activities frequently required the linking of data from several files. It was natural, then, to design data structures and database management systems that supported the automatic linkage of files. Three database models were developed to support the linkage of records of different types. These are: (1) the hierarchical model, in which record types are linked in a treelike structure (e.g., employee records might be grouped under a record describing the departments in which employees work); (2) the network model, in which arbitrary linkages of record types may be created (e.g., employee records might be linked on one hand to employees’ departments and on the other hand to their supervisors—that is, other employees); and (3) the relational model, in which all data are represented in simple tabular form.
- Types of database models
In the relational model, the description of a particular entity is provided by the set of its attribute values, stored as one row of the table, or relation. This linkage of attribute values to provide a meaningful description of a real-world entity or a relationship among such entities forms a mathematical n-tuple; in database terminology, it is simply called a tuple. The relational approach also supports queries (requests for information) that involve several tables by providing automatic linkage across tables by means of a “join” operation that combines records with identical values of common attributes. Payroll data, for example, could be stored in one table and personnel benefits data in another; complete information on an employee could be obtained by joining the tables on the employee’s identification number. To support any of these database structures, a large piece of software known as a database management system (DBMS) is required to handle the storage and retrieval of data (via the file management system, since the data are physically stored as files on magnetic disk) and to provide the user with commands to query and update the database. The relational approach is currently the most popular, as older hierarchical data management systems, such as IMS, the information management system produced by IBM, are being replaced by relational database management systems such as IBM’s large mainframe system DB2 or the Oracle Corporation’s DBMS, which runs on large servers. Relational DBMS software is also available for workstations and personal computers.
The need for more powerful and flexible data models to support nonbusiness applications (e.g., scientific or engineering applications) has led to extended relational data models in which table entries need not be simple values but can be programs, text, unstructured data in the form of binary large objects (BLOBs), or any other format the user requires. Another development has been the incorporation of the object concept that has become significant in programming languages. In object-oriented databases, all data are objects. Objects maybe linked together by an “is-part-of” relationship t o represent larger, composite objects. Data describing a truck, for instance, may be stored as a composite of a particular engine, chassis, drive train, and so forth. Classes of objects may form a hierarchy in which individual objects may inherit properties from objects farther up in the hierarchy. For example, objects of the class “motorized vehicle” all have an engine; members of subclasses such as “bus” or “airplane” will then also have an engine. Furthermore, engines are also data objects, and the engine attribute of a particular vehicle will be a link to a specific engine object. Multimedia databases, in which voice, music, and video are stored along with the traditional textual information, are becoming increasingly important and also are providing an impetus toward viewing data as objects, as are databases of pictorial images such as photographs or maps. The future of database technology is generally perceived to be a merging of the relational and object-oriented views.
- Data integrity
Integrity is a major database issue. In general, integrity refers to maintaining the correctness and consistency of the data. Some integrity checking is made possible by specifying the data type of an item. For example, if an identification number is specified to be nine digits, the DBMS may reject an update attempting to assign a value with more or fewer digits or one including an alphabetic character. Another type of integrity, known as referential integrity, requires that an entity referenced by the data for some other entity must itself exist in the database. For ‘example, if an airline reservation is requested for a particular flight number, then the flight referenced by that number must actually exist. Although one could imagine integrity constraints that limit the values of data items to specified ranges (to prevent the famous “computer errors” of the type in which a N10 cheque is accidentally issued as N10,000), most database management systems do not support such constraints but leave them to the domain of the application program.
Access to a database by multiple simultaneous users requires that the DBMS include a concurrency control mechanism to maintain the consistency of the data in spite of the possibility that a user may interfere with the updates attempted by another user. For example, two travel agents may try to book the last seat on a plane at more or 1ess the same time. Without concurrency control, both may think they have succeeded, while only one booking is actually entered into the database.
A key concept in studying concurrency control and the maintenance of database correctness is the transaction, defined as a sequence of operations on the data that transform the database from one consistent state into another. To illustrate the importance of this concept, consider the simple example of an electronic transfer of funds (say N500 using Western Union Money Transfer facility) from bank account A to account B. The operation that deducts N500 from account A leaves the database inconsistent in that the total over all accounts is N500 short. Similarly, the operation that adds N500 to account B in itself makes the total N500 too much. Combining these two operations, however, yields a valid transaction. The key to maintaining database correctness is therefore to ensure that only complete transactions are applied to the data and that multiple concurrent transactions are executed (under a concurrency control mechanism) in such a way that a serial order can be defined that would produce the same results. A transaction- oriented control mechanism for database access becomes difficult in the case of so-called long transactions—for example, when several engineers are working, perhaps over the course of several days, on a product design that may not reach a consistent state until the project is complete. The best approach to handling long transactions is a current area of database research.
As discussed above, databases may be distributed, in the sense that data reside at different host computers on a network. Distributed data may or may not be replicated, but in any case the concurrency-control problem is magnified. Distributed databases must have a distributed database management system to provide overall control of queries and updates in a manner that ideally does not require that the user know the location of the data. The attainment of the ideal situation, in which various databases fall under the unified control of a distributed DBMS, has been slowed both by technical problems and by such practical problems as heterogeneous hardware and software and database owners who desire local autonomy.
A closely related concept is interoperability, meaning the ability of the user of one member of a group of disparate systems (all having the same functionality) to work with any of the systems of the group with equal ease and via the same interface. In the case of database management systems, interoperability means the ability of users to formulate queries to any one of a group of independent, autonomous database management systems using the same language, to be provided with a unified view of the contents of all the individual databases, to formulate queries that may require fetching data via more than one of the systems, and to be able to update data stored under any member of the group. Many of the problems of distributed databases are the problems of distributed systems in general. Thus distributed databases may be designed as client-server systems, with middleware easing the heterogeneity problems
- Database Structure
Databases are structured to facilitate the storage, retrieval, modification, and deletion of data in conjunction with various data -processing operations. Databases can be stored on magnetic disk or tape, optical disk, or some other secondary storage device.
A database consists of a file or a set of files. The information in these files may be broken down into records, each of which consists of one or more fields. Fields are the basic units of data storage, and each field typically contains information pertaining to one aspect or attribute of the entity described by the database. Using keywords and various sorting commands, users can rapidly search, rearrange, group, and select the fields in many records to retrieve or create reports on particular aggregates of data.
Database records and files must be organized to allow retrieval of the information. Early systems were arranged sequentially (i.e., alphabetically, numerically, or chronologically); the development of direct-access storage devices made possible random access to data via indexes. Queries are the main way users retrieve database information. Typically, the user provides a string of characters, and the computer searches the database for a corresponding sequence and provides the source materials in which those characters appear; a user can request, for example, all records in which the contents of the field for a person’s last name is the word Shehu.
The many users of a large database must be able to manipulate the information within it quickly at any given time. Moreover, large business and other organizations tend to build up many independent files containing related and even overlapping data, and their data-processing activities often require the linking of data from several files. Several different types of database management systems have been developed to support these requirements: flat, hierarchical, network, relational, and object-oriented.
In flat databases, records are organized according to a simple list of entities; many simple databases for personal computers are flat in structure. The records in hierarchical databases are organized in a treelike structure, with each level of records branching off into a set of smaller categories. Unlike hierarchical databases, which provide single links between sets of records at different levels, network databases create multiple linkages between sets by placing links, or pointers, to one set of records in another; the speed and versatility of network databases have led to their wide use in business. Relational databases are used where associations among files or records cannot be expressed by links; a simple flat list becomes one row of a table, or “relation,” and multiple relations can be mathematically associated to yield desired information. A popular example of this type of database program among Nigerian software developers is dBase.
Fig. 281 Visual dBase 5.5Relational Database
Object-oriented databases store and manipulate more complex data structures, called “objects,” Which are organized into hierarchical classes that may inherit properties from classes higher in the chain; this database structure is the most flexible and adaptable.
The information in many databases consists of natural-language texts of documents; number-oriented databases primarily contain information such as statistics, tables, financial data, and raw scientific and technical data. Small databases can be maintained on personal-computer systems and may be used by individuals at home. These and larger databases have become increasingly important in business life. Typical commercial applications include airline reservations, production management functions, medical records in hospitals, and legal records of insurance companies.
The largest databases are usually maintained by governmental agencies, business organizations, and universities. These databases may contain texts of such materials as abstracts, reports, legal statutes, wire services, newspapers and journals, encyclopedias, and catalogs of various kinds. Reference databases contain bibliographies or indexes that serve as guides to the location of information in books, periodicals, and other published literature. Thousands of these publicly accessible databases now exist, covering topics ranging from law, medicine, and engineering to news and current events, games, classified advertisements, and instructional courses. Professionals such as scientists, doctors, lawyers, financial analysts, stockbrokers, and researchers of all types increasingly rely on these databases for quick, selective access to large volumes of information.
In this unit, you have learnt about Database Management Systems and how they can be used to create massive access points for vast arrays of data.