
| |
A Brief History
of Database Systems
Data are raw facts that constitute building blocks of information. Database is
a collection of information and a means to manipulate data in a useful way,
which must provide proper storage for large amounts of data, easy and fast
access and facilitate the processing of data. Database Management System (DBMS)
is a set of software that is used to define, store, manipulate and control the
data in a database. From pre-stage flat-file system, to relational and
object-relational systems, database technology has gone through several
generations and its 40 years history.
The Evolution of the Database
Ancient History:
Data are not stored on disk; programmer
defines both logical data structure and physical structure, such as storage
structure, access methods, I/O modes etc. One data set per program: high data
redundancy. There is no persistence; Random access memory (RAM) is expensive
and limited, programmer productivity low.
1968 File-Based:
predecessor of database,
Data maintained in a flat file.
Processing characteristics determined by common use of magnetic tape medium.
- Data are stored in files
with interface between programs and files. Mapping happens between logical
files and physical file, one file corresponds to one or several programs
- Various access methods
exits, e.g., sequential, indexed, random
- Requires extensive
programming in third-generation language such as COBOL, BASIC.
- Limitations:
- Separation and
isolation: Each program maintains its own set of data, users of one program
may not aware of holding or blocking by other programs.
- Duplication: Same data
is held by different programs, thus, wastes space and resources.
- High maintenance costs
such as ensuing data consistency and controlling access
- Sharing granularity is
very coarse
- Weak security
1968-1980 Era of non-relational database:
A database provides integrated and structured collection of stored operational
data which can be used or shared by application systems. Prominent hierarchical
database model was IBM’s first DBMS called IMS. Prominent network
database model was CODASYL DBTG model; IDMS was the most popular
network DBMS.
Hierarchical data model
-
Mid 1960s Rockwell partner with IBM to create
information Management System (IMS), IMS DB/DC lead the mainframe database
market in 70’s and early 80’s.
-
Based on binary trees. Logically represented
by an upside down tree, one-to many relationship between parent and child
records.
-
Efficient searching; Less redundant data; Data
independence; Database security and integrity
-
Disadvantages:
-
Complex implementation
-
Difficult to manage and lack of standards,
such as problem to add empty nodes and can’t easily handle many-many
relationships.
-
Lacks structural independence, such add up
application programming and use complexity.
Network data model
-
Early 1960s, Charles Bachmann developed first
DBMS at Honeywell, Integrated Data Store ( IDS)
-
It standardized in 1971 by the CODASYL group
(Conference on Data Systems Languages)
-
Directed acyclic graph with nodes and edges
-
Identified 3 database component: Network
schema—database organization; Subschema—view s of database per user; Data
management language -- at low level and procedural
-
Each record can have multiple parents:
-
Composed of sets relationships, a set
represents a one--many relationship between the owner and the member
-
Each set has owner record and member record
-
Member may have several owners
-
Main problem: System complexity and difficult
to design and maintain; Lack of structural independence
The distinction of storing data in files and
databases is that databases are intended to be used by multiple programs and
types of users.
1970-present Era of relational database
and Database Management System (DBMS):
Based on relational calculus, shared collection
of logically related data and a description of this data, designed to meet the
information needs of an organization; System catalog/metadata provides
description of data to enable program-data independence; logically related data
comprises entities, attributes, and relationships of an organization’s
information. Data abstraction allows view level, a way of presenting data
to a group of users and logical level, how data is understood to be when
writing queries.
-
1970: Ted Codd at IBM’s San Jose Lab proposed
relational models.
-
Two major projects start and both were
operational in late 1970s
-
INGRES at University of California, Berkeley
became commercial and followed up POSTGRES which was incorporated into
Informix.
-
System R at IBM san Jose Lab, later evolved
into DB2, which became one of the first DBMS product based on the relational model. (Oracle produced a similar product just prior to DB2.)
-
1976: Peter Chen defined the Entity-relationship(ER)
model
-
1980s: Maturation of the relational database
technology, more relational based DBMS were developed and SQL standard adopted
by ISO and ANSI.
-
1985: Object-oriented DBMS (OODBMS) develops.
Little success commercially because advantages did not justify the cost of
converting billions of bytes of data to new format.
-
1990s: incorporation of object-orientation in
relational DBMSs, new application areas, such as data warehousing and OLAP,
web and Internet, Interest in text and multimedia, enterprise resource
planning (ERP) and management resource planning (MRP)
-
1991: Microsoft ships access, a personal
DBMS created as element of Windows gradually supplanted all other personal
DBMS products.
-
1995: First Internet database applications
-
1997: XML applied to database processing,
which solves long-standing database problems. Major vendors begin to
integrate XML into DBMS products.
Relational DBMS at glance:
Fundamental Relational Database
Characteristics |
Database Schema(The description of the user
data in the database) |
DBMS Functions |
Database Approach |
Advantages and disadvantages of DBMSs |
§
The internal structure of an operating database is basically fixed in
the ‘row” direction
§
The user will interact with a logical view of the data, and need not
know anything about the actual internal structure. |
§
Conceptual schema:
logically describes all data in the database
§
Internal schema
(Physical schema): describes how data are actually stored.
§
External schema (User view): describes the data which are
interested by user. |
§
Data dictionary management
§
Data storage management
§
Data transformation and presentation
§
Security management
§
Multi-user access control
§
Backup and recovery management
§
Data integrity management
§
Database language and application programming interfaces
§
Database communication interfaces |
§
Data definition language (DDL): define database schemas
§
Data manipulation language (DML): to retrieve, insert, delete
and update data in the database. Query language are part of DML
§
Data control language (DCL): control the access of data. |
Advantages:
§
Control of data redundancy, consistency, abstraction, sharing
§
Improved data integrity, security, enforcement of standards and
economy of scale.
§
Balanced conflicting requirements
§
Improved data accessibility, responsiveness, maintenance
§
Increase productivity, concurrency, backup and recovery services.
Disadvantages:
§
Complexity, size, cost of DBMSs
§
Higher impact of a failure |
-
The main players:
-
Microsoft Corp- SQL Server
-
Oracle- Oracle 9i
-
IBM – IMS/DB, DB2
Relational companies challenged by
“object-oriented DB” companies, and countered with “object-relational” systems,
which retain the relational core while allowing type extension as in OO systems.
The advanced database technology, along with
Internet has proved faster communication and world-wide connectivity, ubiquitous
publishing seems led information overload, and still, I can’t find a thing!
|