Chapter 7

Section 7.2: Full Text

Chapter Contents | View Full Text | View Bullet Text

THE DATABASE APPROACH TO DATA MANAGEMENT

Database technology can cut through many of the problems a traditional file organization creates. A more rigorous definition of a database is a collection of data organized to serve many applications efficiently by centralizing the data and controlling redundant data. Rather than storing data in separate files for each application, data are stored so as to appear to users as being stored in only one location. A single database services multiple applications. For example, instead of a corporation storing employee data in separate information systems and separate files for personnel, payroll, and benefits, the corporation could create a single common human resources database. Figure 7-4 illustrates the database concept.

FIGURE 7-4 The contemporary database environment
A single human resources database serves multiple applications and also enables a corporation to easily draw together all the information for various applications. The database management system acts as the interface between the application programs and the data.

Database Management Systems

A database management system (DBMS) is simply the software that permits an organization to centralize data, manage them efficiently, and provide access to the stored data by application programs. The DBMS acts as an interface between application programs and the physical data files. When the application program calls for a data item, such as gross pay, the DBMS finds this item in the database and presents it to the application program. Using traditional data files, the programmer would have to specify the size and format of each data element used in the program and then tell the computer where they were located. A DBMS eliminates most of the data definition statements found in traditional programs.

The DBMS relieves the programmer or end user from the task of understanding where and how the data are actually stored by separating the logical and physical views of the data. The logical view presents data as they would be perceived by end users or business specialists, whereas the physical view shows how data are actually organized and structured on physical storage media. The database management software makes the physical database available for different logical views presented for various application programs. The logical description of the entire database showing all the data elements and relationships among them is called the conceptual schema, whereas the specifications of how data from the conceptual schema are stored on physical media is termed the physical schema or internal schema. The specific set of data from the database, or view, that is required by each user or application program is termed the subschema. For example, for the human resources database illustrated in Figure 7-4 an employee retirement benefits program might use a subschema consisting of the employee’s name, address, social security number, pension plan, and retirement benefits data.

A database management system has three components:

A data definition language
A data manipulation language
A data dictionary

          The data definition language is the formal language programmers use to specify the structure of the content of the database. The data definition language defines each data element as it appears in the database before that data element is translated into the forms required by application programs.

          Most DBMS have a specialized language called a data manipulation language that is used in conjunction with some conventional third- or fourth-generation programming languages to manipulate the data in the database. This language contains commands that permit end users and programming specialists to extract data from the database to satisfy information requests and develop applications.

           The most prominent data manipulation language today is Structured Query Language, or SQL. End users and information systems specialists can use SQL as an interactive query language to access data from databases, and SQL commands can be embedded in application programs written in conventional programming languages.

           The third element of a DBMS is a data dictionary. This is an automated or manual file that stores definitions of data elements and data characteristics, such as usage, physical representation, ownership (who in the organization is responsible for maintaining the data), authorization, and security. Many data dictionaries can produce lists and reports of data use, groupings, program locations, and so on.

           Figure 7-5 illustrates a sample data dictionary report that shows the size, format, meaning, and uses of a data element in a human resources database. A data element represents a field. In addition to listing the standard name (AMT-PAY-BASE), the dictionary lists the names that reference this element in specific systems and identifies the individuals, business functions, programs, and reports that use this data element.

FIGURE 7-5 Sample data dictionary report
The sample data dictionary report for a human resources database provides helpful information, such as the size of the data element, which programs and reports use it, and which group in the organization is the owner responsible for maintaining it. The report also shows some of the other names that the organization uses for this piece of data.

           By creating an inventory of data contained in the database, the data dictionary serves as an important data management tool. For instance, business users could consult the dictionary to find out exactly what pieces of data are maintained for the sales or marketing function or even to determine all the information maintained by the entire enterprise. The dictionary could supply business users with the name, format, and specifications required to access data for reports. Technical staff could use the dictionary to determine what data elements and files must be changed if a program is changed.

           Most data dictionaries are entirely passive; they simply report. More advanced types are active; changes in the dictionary can be automatically used by related programs. For instance, to change ZIP codes fromfive to nine digits, one could simply enter the change in the dictionary without having to modify and recompile all application programs using ZIP codes.

           In an ideal database environment, the data in the database are defined only once and used for all applications whose data reside in the database, thereby eliminating data redundancy and inconsistency. Application programs, which are written using a combination of the data manipulation language of the DBMS and a conventional programming language, request data elements from the database. Data elements called for by the application programs are found and delivered by the DBMS. The programmer does not have to specify in detail how or where the data are to be found.

HOW A DBMS SOLVES THE PROBLEMS OF THE TRADITIONAL FILE ENVIRONMENT

A DBMS can reduce data redundancy and inconsistency by minimizing isolated files in which the same data are repeated. The DBMS may not enable the organization to eliminate data redundancy entirely, but it can help control redundancy. Even if the organization maintains some redundant data, using a DBMS eliminates data inconsistency because the DBMS can help the organization ensure that every occurrence of redundant data has the same values. The DBMS uncouples programs and data, enabling data to stand on their own. Access and availability of information can be increased and program development and maintenance costs can be reduced because users and programmers can perform ad hoc queries of data in the database. The DBMS enables the organization to centrally manage data, their use, and security.

          The Window on Organizations illustrates some of the benefits of a DBMS for information management. Procter & Gamble had to manage massive amounts of product data for its more than 300 brands. Because these data were stored in 30 separate repositories, the company could not easily bring together information about the company’s various products and their components, degrading operational efficiency. Management of these data improved once P&G created a common set of technical standards for all its products and organized its data in a single global database.

Return to Top

Types of Databases

Contemporary DBMS use different database models to keep track of entities, attributes, and relationships. Each model has certain processing advantages and certain business advantages.

RELATIONAL DBMS

The most popular type of DBMS today for PCs as well as for larger computers and mainframes is the relational DBMS. The relational data model represents all data in the database as simple two-dimensional tables called relations. Tables may be referred to as files. Information in more than one file can be easily extracted and combined.

Figure 7-6 shows a supplier table and a part table. In each table the rows represent unique records and the columns represent fields, or the attributes that describe the entities. The correct term for a row in a relation is tuple. Often a user needs information from a number of relations to produce a report. Here is the strength of the relational model: It can relate data in any one file or table to data in another file or table as long as both tables share a common data element.

FIGURE 7-6 The relational data model
Each table is a relation, each row is a tuple representing a record, and each column is an attribute representing a field. These relations can easily be combined and extracted to access data and produce reports, provided that any two share a common data element. In this example, the PART and SUPPLIER files share the data element Supplier_Number.

To demonstrate, suppose we wanted to find in the relational database in Figure 7-6 the names and addresses of suppliers who could provide us with part number 137 or part number 152. We would need information from two tables: the supplier table and the part table. Note that these two files have a shared data element: Supplier_Number.

In a relational database, three basic operations, as shown in Figure 7-7, are used to develop useful sets of data: select, project, and join. The select operation creates a subset consisting of all records in the file that meet stated criteria. In our example, we want to select records (rows) from the part table where the part number equals 137 or 152. The join operation combines relational tables to provide the user with more information than is available in individual tables. In our example, we want to join the now-shortened part table (only parts numbered 137 or 152 will be presented) and the supplier table into a single new table.

FIGURE 7-7 The three basic operations of a relational DBMS
The select, project, and join operations enable data from two different tables to be combined and only selected attributes to be displayed.

The project operation creates a subset consisting of columns in a table, permitting the user to create new tables (also called views) that contain only the information required. In our example, we want to extract from the new table only the following columns: Part_Number, Supplier_Number, Supplier_Name, and Supplier_Address (see Figure 7-7).

The SQL statements for producing the new resultant table in Figure 7-7 would be as follows:

SELECT PART.Part_Number, SUPPLIER.Supplier_Number, SUPPLIER.Supplier_Name,
SUPPLIER.Supplier_Address
FROM PART, SUPPLIER
WHERE PART.Supplier_Number = SUPPLIER.Supplier_Number AND Part_Number = 137 OR Part_Number = 152;

           You can learn more about SQL and how to create a SQL query in the Hands-On Guide to MIS at the end of this text.

          Leading mainframe relational database management systems include IBM’s DB2 and Oracle from the Oracle Corporation. DB2, Oracle, and Microsoft SQL Server are used as DBMS for midrange computers. Microsoft Access is a PC relational database management system, and Oracle Lite is a DBMS for small handheld computing devices.

Internet Connection

The Internet Connection for this chapter directs you to a series of Web sites where you can complete an exercise to evaluate various commercial database management system products.

HIERARCHICAL AND NETWORK DBMS

You can still find older systems that are based on a hierarchical or network data model. The hierarchical DBMS is used to model one-to-many relationships, presenting data to users in a treelike structure. Within each record, data elements are organized into pieces of records called segments. To the user, each record looks like an organizational chart with one top-level segment called the root. An upper segment is connected logically to a lower segment in a parent–child relationship. A parent segment can have more than one child, but a child can have only one parent.

          Figure 7-8 shows a hierarchical structure that might be used for a human resources database. The root segment is Employee, which contains basic employee information such as name, address, and identification number. Immediately below it are three child segments: Compensation (containing salary and promotion data), Job Assignments (containing data about job positions and departments), and Benefits (containing data about beneficiaries and benefit options). The Compensation segment has two children below it: Performance Ratings (containing data about employees’ job performance evaluations) and Salary History (containing historical data about employees’ past salaries). Below the Benefits segment are child segments for Pension, Life Insurance, and Health, containing data about these benefit plans.

FIGURE 7-8 A hierarchical database for a human resources system
The hierarchical database model looks like an organizational chart or a family tree. It has a single root segment (Employee) connected to lower level segments (Compensation, Job Assignments, and Benefits). Each subordinate segment, in turn, may connect to other subordinate segments. Here, Compensation connects to Performance Ratings and Salary History. Benefits connects to Pension, Life Insurance, and Health. Each subordinate segment is the child of the segment directly above it.

Whereas hierarchical structures depict one-to-many relationships, network DBMS depict data logically as many-to-many relationships. In other words, parents can have multiple children, and a child can have more than one parent. A typical many-to-many relationship for a network DBMS is the student–course relationship (see Figure 7-9). There are many courses in a university and many students. A student takes many courses, and a course has many students.

FIGURE 7-9 The network data model
This illustration of a network data model showing the relationship the students in a university have to the courses they take represents an example of logical many-to-many relationships.

           Hierarchical and network DBMS are considered outdated and are no longer used for building new database applications. They are much less flexible than relational DBMS and do not support ad hoc, English language–like inquiries for information. All paths for accessing data must be specified in advance and cannot be changed without a major programming effort. For instance, if you queried the human resources database illustrated in Figure 7-8 to find out the names of the employees with the job title of administrative assistant, you would discover that there is no way the system can find the answer in a reasonable amount of time. This path through the data was not specified in advance.

           Relational DBMS, in contrast, have much more flexibility in providing data for ad hoc queries, combining information from different sources, and providing capability to add new data and records without disturbing existing programs and applications. However, these systems can be slowed down if they require many accesses to the data stored on disk to carry out the select, join, and project commands. Selecting one part number from among millions, one record at a time, can take a long time. Of course, the database can be tuned to speed up prespecified queries.

           Hierarchical DBMS can still be found in large legacy systems that require intensive high-volume transaction processing. Banks, insurance companies, and other highvolume users continue to use reliable hierarchical databases, such as IBM’s Information Management System (IMS) developed in 1969. As relational products acquire more muscle, firms will shift away completely from hierarchical DBMS, but this will happen over a long period of time.

OBJECT-ORIENTED DATABASES

Conventional database management systems were designed for homogeneous data that can be easily structured into predefined data fields and records organized in rows and columns. But many applications today and in the future will require databases that can store and retrieve not only structured numbers and characters but also drawings, images, photographs, voice, and full-motion video. Conventional DBMS are not well suited to handling graphics-based or multimedia applications. For instance, design data in a computer-aided design (CAD) database consist of complex relationships among many types of data. Manipulating these kinds of data in a relational system requires extensive programming to translate complex data structures into tables and rows. An objectoriented DBMS, however, stores the data and procedures that act on those data as objects that can be automatically retrieved and shared.

           Object-oriented database management systems (OODBMS) are becoming popular because they can be used to manage the various multimedia components or Java applets used in Web applications, which typically integrate pieces of information from a variety of sources. OODBMS also are useful for storing data types such as recursive data. (An example would be parts within parts as found in manufacturing applications.) Finance and trading applications often use OODBMS because they require data models that must be easy to change to respond to new economic conditions.

           Although object-oriented databases can store more complex types of information than relational DBMS, they are relatively slow compared with relational DBMS for processing large numbers of transactions. Hybrid object-relational DBMS are now available to provide capabilities of both object-oriented and relational DBMS. A hybrid approach can be accomplished in three different ways: by using tools that offer object-oriented access to relational DBMS, by using object-oriented extensions to existing relational DBMS, or by using a hybrid object-relational database management system.

Return to Top

A DATABASE HELPS P&G MANAGE PRODUCT INFORMATION

Procter & Gamble (P&G) is one of the largest consumer goods companies in the world, with more than $43 billion in annual sales. P&G sells more than 300 brands worldwide, including major brands such as Tide, Mr. Clean, Ivory Soap, Crest, Pringles, Pampers, Clairol, and Prell. P&G has five global business units in more than 80 countries, including research and development and contract manufacturing operations. This dispersed business deals with 100,000 suppliers.

          The fundamental product development process at P&G begins with the creation of a set of technical specifications for a product, raw material, packaging material, piece of artwork, or analytical and process standards. Each P&G product has its own set of technical standards and specifications. For example, the formula card (equivalent to a list of ingredients or instructions) for a single size and fragrance of one Olay product defines technical standards for 30 raw material specifications, 20 test methods, 3 packaging standards, 8 packaging material standards, 4 artwork standards, manufacturing instructions, a set of quality acceptance criteria, 1 process standard, 2 additional formula cards tied to the same packaging standard, and 15 substitute ingredient standards for producing a product in a different location.

           Because P&G is a global company with some products tailored to regional markets and consumer tastes, P&G has many functionally similar products with significant variations in actual specifications for different regions. There were no global data standards or “pick lists” across all P&G divisions. Consequently, P&G has more than 600,000 specifications and a mountain of product data to manage.

           These massive quantities of technical standards data used to be stored in 30 separate data repositories, preventing information sharing among researchers. Difficulties integrating and accessing this information made material acquisition more inefficient and expensive. For example, P&G had perhaps 12 different sets of specifications for blue dye, and it was placing 12 different orders for the same blue dye to the same supplier at 12 different prices. One large P&G business unit was using more than 50 different types of adhesives around the world with scores of suppliers, even though the specifications for several adhesives were identical with different commercial names and only three adhesive types were needed to meet all global needs.

           P&G made the management of product information easier by cataloging the technical standards for all of its products and organizing them in a single global database called the Corporate Standards System (CSS). The system is based on eMatrix from MatrixOne, a provider of product life cycle management software. This database is now available to 8,200 P&G employees and will be made available to some suppliers and contract manufacturers.

           The database organizes any and every piece of information that goes into making a particular product. Once this information is in the database, data can be analyzed from many different perspectives to deliver information to serve specific functions. For instance, designers and engineers can use the data to design a new product or improve an old one. Purchasing can use the data to consolidate materials orders. The data even help retailers determine the proper shelf height for storing an item.

           P&G’s investment in the database is producing returns. With all product data in one place, P&G can obtain a unified company-wide picture of what materials it needs and use this information to consolidate purchases to secure better prices. This information helped P&G cut down on total adhesive use and develop new processes for applying adhesives more economically.

           The database also enables P&G researchers who need constant access to standards to work more efficiently and facilitates reuse of technical standards that have already been defined. A researcher in one facility may have already defined a technical standard for a hair-coloring gel, while another researcher may be working on a similar product or a product that may interact with that brand of hair color. The second researcher will be able to reuse the coloring product’s technical standards and know precisely how an ingredient in that product might react with other ingredients. In heavily regulated industries such as health and beauty products, ingredients and their proportions must be exact. Precision and consistency among technical standards are critical.

           When P&G’s research and development group is working on new products, it can use this information without repeating study to show that there is no problem with using that color in a new product. The database helped P&G achieve a tenfold reduction in the number of colorants and a reduction in colorant suppliers to one for colors and one for whites, which enables new package development teams to select from this pre-approved palette.

           CSS is being enhanced to create more structured data from technical standards that can be easily shared with more groups in the company. Specification data will be delivered to users through specialized interfaces that are tailored for specific business needs. CSS will define and catalog technical standards for any new product, whether that product was developed internally or through an acquisition.

Sources: Lafe Low, “They Got It Together,” CIO Magazine, February 15, 2004; Deborah D’Agostino, “PLM: The Means of Production,” eWeek, February 27, 2004; “Procter & Gamble Awarded Enterprise Value Award by CIO Magazine for Work with MatrixOne on Innovative Product Lifecycle Management Application,” MatrixOne, February 23, 2004; and “Procter & Gamble,” www.matrixone.com, accessed May 20, 2004.

To Think About: How does P&G’s CSS database provide value for the company? What management, organization, and technology issues had to be addressed when developing this database?

Return to Top