Databases and Database Users
1 Introduction
2 Database Characteristics
3 Database Actors
4 Data Models, Schemas, and Instances
Database System Concepts
5 Three-Schema Architecture and Data Independence
6 Database Languages
Databases and Database Users
Many daily activities involve the use of applications that interact with a database, for example making a bank operation or an hotel reservations, borrowing a book, purchasing tickets or anything else online.
A database application manages information that is to be stored in a database:
- traditional applications use databases to store textual or numeric data
- social-media web sites use big data storage systems or NOSQL systems to store text data and non conventional data such as images and video data
1 Introduction
Databases definition:
- a database is a collection of related data
- for example, a collection of names, addresses and phone numbers is an address book database
- a mobile phone can store a database of phone numbers
- for example, a collection of brands, manufacturer and addresses is a grocery's brands database
- a computer can store a grocery's brands database using a software such as Microsoft Excel
The definition of database can be improved as follows:
- a database represents an aspect of the real world, called miniworld or universe of discourse
- a database is a consistent collection of data with some meaning
- a database is designed, built, populated and updated for a specific purpose. It has a group of users which employ the database through predesigned applications
Database Interactions
- the data in the database is derived from some aspect of the real world
- user are interested in content of the database and perform transactions, which change the data in the database: for example, you buy a mobile phone from Amazon
- events in the real world must be reflected in database and change its content as well: for example, a new mobile phone model is a available for sale on Amazon
Database Size
- the size of database can be of a hundreds of record, for example an address book database
- a million entries, for example a library database
- billions entries, for example the database of the users of a social media company
Large Database Example: Amazon.com
- Amazon.com has 60 million users and millions of books, DVDs, electronics and other items:
- the database has 42 terabytes of data and is stored on hundreds of server computers
- the database has millions of daily visitors who make a purchase
- the database is continually updated as new products are added to Amazon's shop inventory and product stock quantities are updated
Database Management System (DBMS) Definition
- A database management system is a software system that allows to create and maintain a database
- The DBMS enable database users to define, create, manipulate and share database among other users and applications
Database Management System DBMS Functionalities
- Defining a database involves to specify data types, structures and constraints on the data to be stored in the database
- the database definition or database metadata is stored on the database in the form of database catalog or dictionary
- Constructing a database is the process in which the DBMS stores information on a strorage medium
- Manipulating a database include functions such as querying the database to retrieve information, updating the database to reflect changes in the miniworld
- Sharing a database allows a number of users and programs to access the database concurrently
- an application program sends requests to the DBMS, which executes queries or transactions on the database. (A query causes some data to be retrieved. A transaction causes some data to be read and some data to be written on the database)
Other DBMS functions
- A DBMS provides database system protection: against hardware and software malfunctions and against unauthorized or malicious access
- A DBMS should enable users to maintain the database allowing the system to evolve as the requirements change over the time.
Database System
- the term Database System indicates the database and DBMS software (see figure)
2 Database Characteristics
Database approach vs File processing approach
- In the file processing approach, each user defines and implements the files needed for a particular software application.
For example, suppose a University use two applications: the administrative office uses an application to store information about students and grades, the accounting office uses an application to store information about students and payments. The two offices make use of different files to store information about students: thus data about students (student_number, name, surname, ...) are redundant. - In the database approach, a single repository maintains data that is defined once and accessed by multiple users through different application programs
The characteristics of the database approach are:
- database systems are self-describing
- isolation between data and programs
- support for multiple views of the data
- sharing of data among multiple users and support for multiuser transactions
2.1 Database System Are Self-Describing
Database systems contain a database and also the catalog of the database
- the database's catalog stores definition of structure and constraints of the database
- catalog's data is the definition of the structure of each file, type and format of each data element and constraints on data
- information stored in catalog is called meta-data.
- A DBMS is a general-purpose software which has to work with every database application
- for this reason, the DBMS refers to the catalog to know the structure of the file in a specific database
- the catalog is used by the DBMS and also by users who seek information about the structure of database
- Note: NOSQL database systems do not use meta-data, they store data in a structure that contains both the name of data element and data value
In file processing, without using a DBMS, application programs define the data structure and access the database files
- the data structure is declared in the application program, for example a struct or a class in a C++ application
- as a consequence file-processing application can access only a specific database, whereas DBMS software can access diverse database as long they extract database definition form the catalog
2.2 Isolation between Programs and Data - Data Abstraction
File processing applications programs and data are strictly coupled
- the structure of the files in embedded in the application program
- any change to the structure of the file, require changing all the programs that access that file
DBMS applications have PROGRAM-DATA INDEPENDENCE
- the structure of the files is stored in DBMS catalog, not in the application program
- changes to the structure of the database, do not require to change the application program that access the database
DBMS applications have PROGRAM-OPERATION INDEPENDENCE
- in object-relational database systems, users can specify operations on data
- users application programs operate on data invoking operations, regardless of how the operations are implemented
Program-data independence and program-operation independence is also called DATA ABSTRACTION:
- a DBMS provides users with a conceptual representation of data that is abstract
- a conceptual representation does not include details of how data is stored and of how operations are implemented
- a DBMS uses a data model, as data abstraction, to provide the conceptual representation
- a data model is based on logical concepts, such as objects, object properties and relationships between objects
- the data model hides the concepts of storage and implementation details
- logical concepts are easier to understand than computer storage concepts
Database's files internal representation
- a database end user is not concerned with the location of each data item within the record:
- a database user is concerned that when he references the name of STUDENT, a correct value is returned
- the internal representation of the STUDENT file, for the University database, may be defined by:
- the length of its record
- the lenght of each data item in the record
- the starting byte of each data item in the record
- In the database approach, the structure and organization of files are stored in tha catalog
- database users and application programs refer to the conceptual representation of the files
- the DBMS extracts the details of file storage from the catalog, when the DBMS need to access the files
In object-oriented and object-relational databases, the abstraction process include not only the data structure but also the operations on the data
2.3 Support of Multiple Views of the Data
A database has different types of users
- each user requires a different perspective or view of the database
- a view may be a subset of the database
- a view may contain virtual data that is derived from the database files but not explicitly stored
- a DBMS has to provide facilities for defining multiple views
- for example, for the university database a user is interested only in printing the transcript of each student
2.4 Sharing of Data and Multiuser Transaction Processing
A multiuser DBMS has to allow multiple users to access the database at the same time
- the DBMS includes software that controls concurrent access
- if multiple users are trying to update the same data, the DBMS allows them to do so, without errors
- for example, if several reservation agents are trying to assign a seat to different passengers, a concurrency control module ensure that each seat is accessed by only one agent at a time
- this type of applications are called online transactions processing applications (OLTP)
- the DBMS must ensure that concurrent transactions operate properly
Database Transactions
- a transaction is a program or process that includes one or more database accesses, such as reading and updating database records
- each transaction is supposed to work without interference from other transactions
- the DBMS must enforce transaction properties such as isolation and atomicity
3 Database Actors
What are the people whose jobs involve the use of large database?
3.1 Database Administrators
A company's database is a shared resource accessed by many people who work in the same organization
- there is a need for a administrator to oversee and control the database and DBMS resources
- the database administartor (DBA) is responsible of authorizing access to database, coordinating and monitoring its use
3.2 Database Designers
Database designers task are undertaken before the database is actually implemented and populated with data
- database designers identify data to be stored in the database and choose appropriate structures to represent and store this data
- database designers communicate with users to understand their requirements and to create a design that meets these requirements
3.3 End Users
End users are the people who access to the database for querying, updating and generating reports
There are many categories of database users:
- end users without database knowledge constitute the biggest portion of database end users
- they interact with the database using standard types of queries and updates, called canned transactions, that have been previously programmed
- example of tasks that such user perform are:
- bank customers and tellers check account balance and execute deposits and withdrawals
- customers make reservations for hotels, airlines and car rental companies
- social media users read and post items on social media web sites
- end users with database knowledge include:
- engineers and scientists who have a complete knowledge of the facilities provided by the DBMS
- they implement their own applications which meet their complex requirements
- managers who occasionally access the database
- they use sophisticated database query interfaces to specify their requests
3.4 System analysts and application programmers
Software developers are system analyst and application programmers
- system analyst establish the requirements of end users, especially inexperienced end users, and write specifications for standard canned transactions that meet these requirements
- applications programmmers implement these specifications as programs
Database System Concepts
Database Systems Architecture
- monolithic systems architecture
- client-server system architecture
- In a basic client-server DBMS architecture, there are two types of modules:
- a client module which runs an application program and handles user interaction through a user interface
- a server module that handles data storage, access to data and other functionalities
- distributed database systems in cloud computing environments
- distribute architecture with thousands of computers can handle a great amount of data
- cloud computing vendors host thousands of servers to control big data for web users
4 Data Models, Schemas, and Instances
DBMS data abstraction
- data abstraction means showing only essential features of data and hiding the details of how data is stored
- data abstraction allows users to percieve data with different level of details
DBMS data models
- DBMSs use data models to do data abstraction
- a data model is a set of concepts used to describe the structure of data
- a data model provides data types, relationships, constraints and a set of operations that apply to the data
Data model's operations
- the data model include basic operations to insert, retrieve, modify and delete an object
- in addition to basic operation, object-oriented data models include user-defind operations, to specify behavior of objects
- for example, for the University database, a software developer may write a
compute_grade_point_averageoperation on a STUDENT object
4.1 Data Models Classification
Database data models can be classified according to the concepts they use to describe the structure of the database
- high-level or conceptual data models use concepts that are similar to how end users perceive data
- implementation or representational data models use concepts which are similar to the way data is structured on storage media, but can be also understood by end users
- low-level or physical data models use concepts that describe the details of how data is stored on storage media and are meant for specialists not for end users
High-level conceptual data models
- the most popular high level conceptual data model is the entity–relationship model
- the entity–relationship data model uses concepts such as entities, attributes and relationships
- an entity represents an object from the subject (or miniworld) you want to describe in the database
- an attribute represents a property that characterizes the entity
- a relationship between two or three entities represents an association among the entities
Representational data models
- representational or implementation data models are the models implemented by commercial DBMS
- representational data models include the relational data model, the object data model and the legacy network and hierarchical models
- representational data models represent data by using record structures
- records or record structures group data elements together as one data structure
- record structures are similar to C data structures
Physical (low-level) data models
- physical data models describe how data is stored in files on storage media
- physical data models describe storage structure and access mechanism
- information at this level is record formats, record ordering and access paths. An access path is a search structure used to search for database records.
Self-describing data models
- self-describing data models combine the descriptions of data with the data values
- in traditional DBMS, the schema describes the data and is separated from the data
- examples of self-describing data models are: XML, many key-value stores and NOSQL systems
4.2 Schemas, Instances, and Database State
What is a database schema?
- the database schema is different from the database itself
- the database schema is a description of the database and is specified during the design phase
- each object of the schema is called schema construct
What is a schema diagram?
- Some data models adopt conventions for displaying database schemas as diagrams
- a schema diagram is the drawn version of a database schema
Example: schema diagram for the ROSTER database
_PLAYER_________________________________________________ | | | | | | ID | NAME | POSITION | SALARY | |_____________|_____________|_____________|______________| _TEAM-PLAYER_______________ | | | | PLAYER_ID | TEAM_ID | |_____________|_____________| _TEAM___________________________________________________ | | | | | | ID | CITY | NAME | LEAGUE_ID | |_____________|_____________|_____________|______________| _LEAGUE_________________________________________________ | | | | | | ID | DTYPE | NAME | SPORT | |_____________|_____________|_____________|______________| The schema contains the schema constructs: Player, Team and League
A schema diagram only displays some aspects of a database schema:
- a schema diagram shows the names of the records and data items and some constrains
- a schema diagram does not show data type of data items and relationship among files
Database state
- the database state or snapshot is the data in the database in a particular instant in time
- data in a database changes every time you insert a record or change the value of a data item within a record
- in a given database state, each schema construct holds a set of occurrences or instances.
- for example, the PLAYER construct holds a set of player entities or records as its instances
- the state of the database is the result of the current set of instances in the database
Database schema VS database state
- there is a distinction between database schema and database state
- when you define a new database and specify the database schema to the DBMS
- database state is the empty state, with no data in the database
- when the database is initially populated
- the database is in its the initial state
- every time you update the database
- the database changes its state to another one
- a valid state is a one that satisfies the structure and constraints specified in the schema
- the DBMS stores the definition of schema constructs and constraints in the DBMS catalog
- the DBMS is responsible for making sure that the database is always in a valid state
- the database schema is also called the intension, a database state is also called an extension of the schema
5 Three-Schema Architecture and Data Independence
The three schemas architecture
- the three schemas architecture for database systems serves to achieve the three characteristics of databases
- the aforementioned three characteristics of database are: 1) database are self-describing due to storing schema to catalog, 2) database isolates programs from data, 3) database supports multiple user views.
5.1 The Three-Schema Architecture
The goal of the three-schema architecture is to separate the user applications from the physical database.
This three-schema architecture define schemas at three levels:
- at external or view level, the external schemas or user views describe a part of the database for a particular group of users:
- the external schema is implemented using a representational data model
- at conceptual level, the conceptual schema describes the structure of the whole database for database users:
- the representational data model is used to describe the conceptual schema
- at internal level, the internal schema describes the physical structure of the database:
- the internal schema uses a physical data model
The three schema ANSI architecture
- the three schema ANSI architecture allows to visualize the schema levels in a database system
- the three schema architecture is important in database design, because this architecture separates the three levels: users' levels, conceptual level and internal storage level
How the three level architecture works:
- the three schemas are only descriptions of data, but the actual data is stored at the physical level only.
- each user specifies its request on an external schema, the DBMS is responsible for transforming user's request into a request against a conceptual schema and then into a request on the internal schema for being processed over the stored database
- if the request retrieves data from the database, the result data must be reformatted to match the external view
- the processes of transforming requests and results between levels are called mappings
5.2 Data Independence
The three schema architecture enables DBMS data independence
- data independence is the capacity to change the schema at one level of the database without having to change the schema at higher level
- there are two types of data independence:
- logical data independence is the capacity to change the conceptual schema without having to change the external schemas or application programs:
- for example, you may add a record type or a data item to the database
- if the DBMS supports logical data independence only the view definition and mappings need to be changed
- physical data independence is the capacity to change the internal schema without having to change the conceptual schema
- for example, you may need to reorganize physical files, creating new access structures, to improve retrieval performance
- if you do not add new data to database, there is no need to change the conceptual schema
6 Database Languages
A DBMS is accessed by a variety of users and the DBMS provides a language and interface for each category of users
Database design and DBMS languages
- To design a database specify conceptual schema, internal schema and mapping between the two schemas
- a data definition language (DDL) is used to define internal and conceptual schemas
- BDMSs use the same language at the conceptual and internal levels because they mantain no separation between different levels
- At implementation time, DDL statements are processed by DBMS DDL compiler, which identify schema constructs and store the schema description to DBMS catalog
Database design languages types
- if the DBMS wanted to maintain a separation between the three levels, it should use:
- DDL (Data Definition Language) to specify the conceptual schema
- SDL (Storage Definition Language) to specify the internal schema
- DDL or SDL to specify the mappings between conceptual and internal schemas
- VDL (View Definition Language) to specify user views and their mappings
- commercial relational DBMS do not implement SDL and VDL languages
- they use function and specifications related to storage of files to specify the internal schema
- they use SQL in the role of DDL to define both conceptual and external schemas
Users manipulate the database using a data manipulation language (DML)
- common manipulations include insertion, retrieval, modification and deletion of the data
in commercial DBMSs, only one language, the SQL language is used in place of DDL and VDL languages
- SQL include constructs for conceptual schema definition, view definition and data manipulation
- SQL include statements for constraints specification
- SQL does not have storage language capabilities
There are two types of data manipulation languages
- high-level or nonprocedural DML is used as standalone language to specify complex database operations
- DML statements are entered in the command line or embedded in other programmming languages
- DML statements retrieve many records with a single statement
- the high-level DML standalone language is called a query language
- a query specifies which data to retrieve rather than how to retrieve it, therefore the language is also called declarative
- low-level or procedural DML must be embedded in a programming language
- this type of DML statements tipically retrieve records from the database and process each separatly
- the general purpose language is called the host language, the DML is called the sublanguage

No comments:
Post a Comment