Database Systems

Learning Computing History

A Brief History of Database Systems

Data are raw facts that constitute building blocks of information. Database is a collection of information and a means to manipulate data in a useful way, which must provide proper storage for large amounts of data, easy and fast access and facilitate the processing of data. Database Management System (DBMS) is a set of software that is used to define, store, manipulate and control the data in a database. From pre-stage flat-file system, to relational and object-relational systems, database technology has gone through several generations and its 40 years history.

The Evolution of the Database

Ancient History: Data are not stored on disk; programmer defines both logical data structure and physical structure, such as storage structure, access methods, I/O modes etc. One data set per program: high data redundancy. There is no persistence; Random access memory (RAM) is expensive and limited, programmer productivity low.

1968 File-Based: predecessor of database, Data maintained in a flat file. Processing characteristics determined by common use of magnetic tape medium.

Data are stored in files with interface between programs and files. Mapping happens between logical files and physical file, one file corresponds to one or several programs
Various access methods exits, e.g., sequential, indexed, random
Requires extensive programming in third-generation language such as COBOL, BASIC.
Limitations:
- Separation and isolation: Each program maintains its own set of data, users of one program may not aware of holding or blocking by other programs.
- Duplication: Same data is held by different programs, thus, wastes space and resources.
- High maintenance costs such as ensuing data consistency and controlling access
- Sharing granularity is very coarse
- Weak security

1968-1980 Era of non-relational database: A database provides integrated and structured collection of stored operational data which can be used or shared by application systems. Prominent hierarchical database model was IBM’s first DBMS called IMS. Prominent network database model was CODASYL DBTG model; IDMS was the most popular network DBMS.

Hierarchical data model

Mid 1960s Rockwell partner with IBM to create information Management System (IMS), IMS DB/DC lead the mainframe database market in 70’s and early 80’s.
Based on binary trees. Logically represented by an upside down tree, one-to many relationship between parent and child records.
Efficient searching; Less redundant data; Data independence; Database security and integrity
Disadvantages:
- Complex implementation
- Difficult to manage and lack of standards, such as problem to add empty nodes and can’t easily handle many-many relationships.
- Lacks structural independence, such add up application programming and use complexity.

Network data model

Early 1960s, Charles Bachmann developed first DBMS at Honeywell, Integrated Data Store ( IDS)
It standardized in 1971 by the CODASYL group (Conference on Data Systems Languages)
Directed acyclic graph with nodes and edges
Identified 3 database component: Network schema—database organization; Subschema—view s of database per user; Data management language -- at low level and procedural
Each record can have multiple parents:
- Composed of sets relationships, a set represents a one--many relationship between the owner and the member
- Each set has owner record and member record
- Member may have several owners
Main problem: System complexity and difficult to design and maintain; Lack of structural independence

The distinction of storing data in files and databases is that databases are intended to be used by multiple programs and types of users.

1970-present Era of relational database and Database Management System (DBMS): Based on relational calculus, shared collection of logically related data and a description of this data, designed to meet the information needs of an organization; System catalog/metadata provides description of data to enable program-data independence; logically related data comprises entities, attributes, and relationships of an organization’s information. Data abstraction allows view level, a way of presenting data to a group of users and logical level, how data is understood to be when writing queries.

1970: Ted Codd at IBM’s San Jose Lab proposed relational models.
Two major projects start and both were operational in late 1970s
- INGRES at University of California, Berkeley became commercial and followed up POSTGRES which was incorporated into Informix.
- System R at IBM san Jose Lab, later evolved into DB2, which became one of the first DBMS product based on the relational model. (Oracle produced a similar product just prior to DB2.)
1976: Peter Chen defined the Entity-relationship(ER) model
1980s: Maturation of the relational database technology, more relational based DBMS were developed and SQL standard adopted by ISO and ANSI.
1985: Object-oriented DBMS (OODBMS) develops. Little success commercially because advantages did not justify the cost of converting billions of bytes of data to new format.
1990s: incorporation of object-orientation in relational DBMSs, new application areas, such as data warehousing and OLAP, web and Internet, Interest in text and multimedia, enterprise resource planning (ERP) and management resource planning (MRP)
- 1991: Microsoft ships access, a personal DBMS created as element of Windows gradually supplanted all other personal DBMS products.
- 1995: First Internet database applications
- 1997: XML applied to database processing, which solves long-standing database problems. Major vendors begin to integrate XML into DBMS products.

Relational DBMS at glance:

Fundamental Relational Database Characteristics

Database Schema(The description of the user data in the database)

DBMS Functions

Database Approach

Advantages and disadvantages of DBMSs

§ The internal structure of an operating database is basically fixed in the ‘row” direction

§ The user will interact with a logical view of the data, and need not know anything about the actual internal structure.

§ Conceptual schema: logically describes all data in the database

§ Internal schema (Physical schema): describes how data are actually stored.

§ External schema (User view): describes the data which are interested by user.

§ Data dictionary management

§ Data storage management

§ Data transformation and presentation

§ Security management

§ Multi-user access control

§ Backup and recovery management

§ Data integrity management

§ Database language and application programming interfaces

§ Database communication interfaces

§ Data definition language (DDL): define database schemas

§ Data manipulation language (DML): to retrieve, insert, delete and update data in the database. Query language are part of DML

§ Data control language (DCL): control the access of data.

Advantages:

§ Control of data redundancy, consistency, abstraction, sharing

§ Improved data integrity, security, enforcement of standards and economy of scale.

§ Balanced conflicting requirements

§ Improved data accessibility, responsiveness, maintenance

§ Increase productivity, concurrency, backup and recovery services.

Disadvantages:

§ Complexity, size, cost of DBMSs

§ Higher impact of a failure

The main players:
- Microsoft Corp- SQL Server
- Oracle- Oracle 9i
- IBM – IMS/DB, DB2

Relational companies challenged by “object-oriented DB” companies, and countered with “object-relational” systems, which retain the relational core while allowing type extension as in OO systems.

The advanced database technology, along with Internet has proved faster communication and world-wide connectivity, ubiquitous publishing seems led information overload, and still, I can’t find a thing!

Last modified: 2004 December 5