Classification of databases according to the nature of the stored information. Informatics lesson "basic database concepts"

RDBs are complex systems, and their classification can be made both for the entire RDB and for each component separately (Fig. 9). The central component of the SDB is the database and most of the classification features refer to it.

According to the form of information presentation distinguish visual, as well as audio and multimedia systems. This classification shows the form in which information is stored in the database and issued to users.

By the nature of data organization Databases can be divided into unstructured, partially structured and structured.

To unstructured DBs organized in the form of semantic networks can be classified.

partially structured one can read the database in the form of plain text or hypertext systems.

Structured database require preliminary design and structure description.

Structured database by type of model used are divided into

· hierarchical,

· network,

· relational,

· mixed and

· multi model.

This classification also applies to DBMS.

There are several levels in structured databases. information units (IU) included in one another.

Most systems support:

· field – the smallest semantic unit of information;

a set of fields (or more complex IEs) forms record b;

a set of the same type of records represents database file .

Many DBMS explicitly support the database layer as a collection of interrelated database files.

By type of information stored The database is divided into

· factual,

· documentaries and

· lexicographic.

In factual databases information of an actual nature is stored - numerical or textual characteristics of objects presented in a formalized form. In response to the request, information about the object of interest is given.

In documentary databases the storage unit is a document and the user is given a link to the document or the document itself. Documentary databases are organized without storage and with storage of the document on machine media. The first type includes bibliographic, abstract and database indexes Referencing the source of the information. Systems that store the full text of a document are called full text . Their variety is DB of document forms, in which the document is searched for to be used as a template.

To lexicographic databases include various dictionaries (classifiers, multilingual dictionaries, dictionaries of word bases, etc.).

By the nature of the organization of data storage and access to them distinguish

· local (personal),

· general (integrated,

· centralized) and

· distributed databases (Fig. 10).

Rice. 10. Classification of the database by the nature of storage and access to data

Personal database intended for local use by a single user. Local databases can be created by each user independently, or they can be retrieved from a shared database.

Integrated and distributed databases suggest the possibility of simultaneous access to information of several users (multi-user access mode). Parts of distributed databases are physically located on different computers, but logically represent a single whole.

Other components of the SDB can also be distributed over the network nodes. The database itself may be unallocated. Therefore, they distinguish:

Distributed databases

Distributed SDB(in which at least one component is distributed).

Some sources mention extensional and intensional DB. The former are built by explicitly storing data in the database, the latter by using the rules that determine their content.

Databases are also classified by volume . A special place is occupied by the so-called very large database . For large databases, the questions of ensuring the efficiency of storing information and ensuring its processing are put differently.

DBMS classification

By languages ​​of communication DBMS are divided into

· open,

· closed and

· mixed.

In open systems, universal languages ​​are used to access the database. Closed systems have their own languages ​​of communication with SDB users.

According to the functions of the DBMS are divided into

· information and

· operating rooms.

Information allow you to organize the storage of information and access to it. For more complex processing, special programs are needed. Operating rooms perform complex processing and can change processing algorithms.

According to the scope of possible application distinguish

· universal and

· specialized (problem-oriented DBMS).

The set of data types in different DBMS is different. A number of DBMSs allow the developer to add new data types and new operations. Such systems are called extensible database systems . Further developments are object-oriented database systems, with powerful modeling capabilities for complex objects.

By DBMS power are divided into

· desktop (Dbase, FoxBase/FoxPro, Clipper, Paradox, Access, Approach) and

· corporate (Oracle, DB2, Sybase, Informix, Ingres, Progress).

For the first low requirements for technical means, end-user orientation and low cost.

Second provide work in a distributed environment, high performance, have advanced administration tools, wide opportunities maintaining integrity. They are complex, expensive and require significant resources.

Among DBMS occupying an intermediate position between desktop and industrial systems, you can call Interbase, Microsoft SQL Server. V last years there has been a tendency to blur the boundaries between desktop and professional systems.

By focusing on the predominant category of users DBMS can be distinguished

The basis of the information system is the database.

The purpose of any information system is to process data about objects real world.

In the broad sense of the word, a database is a collection of information about specific objects of the real world in any subject area.

In addition, a database is a shared data store. When automating human activity, the real world is transferred to an electronic format. For this, some part of this world is allocated and analyzed for the possibility of automation. It is called the subject area and strictly outlines the range of objects that are studied, measured, evaluated, etc. As a result of this process, automation objects are identified and the details by which these objects are evaluated are determined.

The result of this process is a database that describes a specific part of the real world from strictly defined positions.

Databases perform two main functions. They group data by information objects and their relationships and provide this data to users.

Data is a formalized representation of information available for processing, interpretation and exchange between people or in automatic mode.

DB classification

There are many types of databases that differ in different criteria. Consider the main classifications.

Database classification by data model:

The hierarchical database model consists of objects with pointers from parent to child objects, linking related information together. Hierarchical databases can be represented as a tree consisting of objects of different levels. The top level is occupied by one object, the second - objects of the second level, etc.;

The network database model is similar to the hierarchical database model, except that it has pointers in both directions that connect related information;

Relational model - "relational" from English. Relation (relation), focused on organizing data in the form of two-dimensional tables, also called relational tables. Information entered in one table can be linked to one or more records in another table.

26. When working with the DBMS, the working field and the control panel are displayed on the screen. The control panel includes a menu, an auxiliary control area and a hint line. The location of these areas on the screen can be arbitrary and depends on the features of a particular program. Some DBMS allow you to display a directive window (command window) or a command line.

Menu bar contains the main modes of the program. By selecting one of them, the user gets access to a drop-down submenu containing a list of commands included in it. Selecting some of the drop-down menu commands will result in additional submenus.

Auxiliary control area includes:

status bar;

Toolbars;

Vertical and horizontal scroll bars.

V status bar(status line) the user will find information about the current mode of operation of the program, the file name of the current database, etc.

Toolbar(pictographic menu) contains a certain number of buttons (icons) designed to quickly activate the execution of certain menu commands and program functions. To display areas of a database table, form, or report that are not currently displayed on the screen, use vertical and horizontal ruler scroll.

Prompt line designed to issue messages to the user regarding his possible actions in this moment.

An important feature of the DBMS is the use of an intermediate storage buffer when performing a number of operations. The buffer is used during copy and move commands to temporarily store copied or moved data, after which it is sent to a new address. When data is deleted, it is also buffered. The contents of the buffer are kept until a new piece of data is written to it.

DBMS programs have a sufficient number of commands, each of which has possible various options(options). Such a system of commands, together with operations, forms menu with its own characteristics for each type of DBMS. Selecting a particular command from the menu is done in one of the following two ways:

Hovering the cursor over the command selected in the menu using the cursor keys and pressing the enter key;

Entering the first letter of the selected command from the keyboard.

You can get additional information about the commands that make up the DBMS menu and their use by entering the help mode.

Despite the features of the DBMS, the set of commands made available to the user by some average database management system can be divided into the following typical groups:

1. create new database objects; save and rename previously created objects; open existing databases; close previously opened objects; output database objects to the printer, the printing process begins with the selection of the printer driver. Each type of printer requires a different driver. The next step is to set the page parameters, form the headers and footers, as well as select the type and size of the font. Next, set the number of copies, print quality, and the number or numbers of pages of the document to be printed.

2. editing commands: entering data and changing the content of any fields of database tables, components of screen forms and reports is carried out using a group of editing commands, the main of which are moving, copying and deleting. Among the editing commands, a special place is occupied by the commands find and replace user-defined context within the entire document or a selected part of it, as well as cancellation the last command entered (rollback).

3. formatting commands. The visual presentation of data during output is important. Most DBMS provide the user with a large number of commands related to the design of output information. With these commands, the user can change the data alignment direction, font types, line thickness and arrangement, letter height, background color, etc.

4. commands for working with windows. Most DBMSs make it possible to open many windows at the same time, thus organizing a "multi-window mode" of work. In this case, some windows will be visible on the screen, others will be under them. By opening several windows, you can work with several tables at once, quickly moving from one to another. There are special commands that allow you to open a new window, move to another window, change the relative position and size of windows on the screen. In addition, the user has the ability to split the window into two parts to view different parts of a large table at the same time, or to fix some part of the table that will not disappear from the screen when moving the cursor to the far parts of the table.

5. commands for working in the main DBMS modes(table, form, query, report);

6. receiving background information . Database management systems incorporate electronic manuals that provide the user with instructions on the possibilities of performing basic operations, information on specific menu commands and other reference data. A feature of obtaining reference information using an electronic directory is that it provides information depending on the situation in which the user finds himself.

7. commands for working with files:

In some DBMS, the group of commands under consideration includes commands that provide the ability to export-import and attach tables created by other software tools.

Along with the above operations, a large group of DBMS programs has the ability to insert a diagram, drawing, etc., including objects created in other software environments, establishing links between objects

database models.

The classification of data models is based on the concepts of the relationship of objects. Four types of different relationships can exist between database tables: one-to-one; "one to many"; "many to many".

hierarchical model. Assumes the organization of data in the form of a tree structure. A tree is a hierarchy of elements. At the top level of the structure is the root of the tree. One tree can have only one root, the rest are nodes, called child nodes. Each node has a parent node above it.

network model. The model is based on network structures in which any element can be connected to any other element. The information constructs in the model are relationships and fan relationships. The latter are divided into main and dependent. A fan relation W(R,S) is a pair of relations R and S and a connection between them, provided that each value of S is associated with a single value of R. The relation R is called the original (main), and S is called the generated (dependent).

relational model. The data structure of this model is based on the apparatus of relational algebra and normalization theory. The model assumes the use of two-dimensional tables (relations)

28. Structural elements of a relational database.
1. In relational databases, any data sets are presented in the form of two-dimensional tables (relations), similar to the list of students described above. Each table consists of a fixed number of columns and a certain (variable) number of rows. The description of the columns is called the layout of the table.
2. Each column of the table represents a field - an elementary unit of the logical organization of data, which corresponds to an indivisible unit of information - the details of the data object (for example, student's last name, address).
The following characteristics are used to describe the field:
field name (for example, personal file number, Surname);
field type (for example, character, date);
additional characteristics (field length, format, precision).
For example, the Date of Birth field can be of type "date" and length 8 (6 digits and 2 dots separating day, month, and year in a date entry).
3. Each row of the table is called a record. The record logically combines all fields that describe one data object, for example, all fields in the first row of the above table describe data about student Petrov Ivan Vasilievich born on March 12, 1989, living at ul. Gorky, 12-34, studying in class 4A, personal file number - P-69. The system numbers the records in order: 1,2, ..., n, where n is the total number of records (rows) in the table at the moment. In contrast to the number of fields (columns) in the table, the number of records during the operation of the database can vary in any way (from zero to millions). The number of fields, their names and types can also be changed, but this is already a special operation called changing the table layout.
3. The file record structure specifies fields whose values ​​are a simple key that identify the record instance. An example of such a simple key in the Students table is the file number field, the value of which uniquely identifies one table object - one student, since there are no two students in the table with the same file number.
4. Each field can be included in several tables (for example, the Surname field can be included in the table List of those involved in the theater circle).

29 .Processing relations can be described in one of the following ways: specifying the list of operations, the execution of which leads to the desired result (procedural approach); description of the properties that the resulting relation must satisfy (declarative approach).
The system of relations and operations on them forms a relational algebra. Let us consider a more familiar procedural approach in describing the relational calculus. The list of operations includes operations of projection, selection, union, intersection, subtraction, join, division.
The selection operation is performed on one relation (table). The resulting relation contains a subset of tuples (rows) combined by some condition.
The projection operation copies the attributes (fields) from the source relation into the resulting relation, according to the given projection condition.
The union operation is performed on two relations. The resulting relation includes all tuples from the first relation and the missing tuples from the second relation.
The intersection operation includes tuples in the first relation that are in the second relation.
The difference operation includes tuples in the first relation that are not present in the second relation.
The join operation is performed on two relations, in each of which an attribute is allocated on which the join will be made. The resulting relation includes all the attributes of the original relations and their rows concatenated according to the join condition.

30 . MS-ACCESS database screen interface. MS-ACCESS database components
The Microsoft Access system is distinguished by versatility, a wide range of visual development tools, the ability to integrate with other software products of the Microsoft Office package, as well as with programs that support OLE technology.
The program is launched using the Start, Programs, MS-Access commands. In the prompt window that appears, there are two options for creating a new database and opening a previously created database. When you select the "New Database" radio button, Access prompts you to enter a name for the database. You must specify a database name and click OK. The creation of a new database can also be performed using the command File, New.
After starting the program and creating a new database, the main window of the system appears. Traditionally, the window contains a title, which indicates the name of the program - Microsoft Access, the next line contains the program menu, and below - the toolbar.
The work area of ​​the window contains sections corresponding to the types of objects that the database can contain. Such objects are Tables, Queries, Reports, Pages, Macros, Modules.
The title of the window contains the name of the database file.
The interface for working with database objects is unified, it has standard modes of operation - "View", "Designer", "Create".

A query is a means of retrieving information from a database, and the data can be in multiple tables. In MS Access, for the formation of queries, a method is used that has received the name of the method according to the model. Using this tool, based on visual information, the necessary data is extracted from one or more tables.
Macros are designed to automate frequently performed operations. A macro contains one or more macro commands that perform a specific action (such as opening a form or printing a report).

31. Creating tables in the MS-ACCESS database. Working with forms in the MS-ACCESS database. Reporting methods in MS-ACCESS.
To create a table, open the created database window, go to the Tables tab and select the following modes in the dialog box:

Creating a table in data entry mode;

Creating a new table in Design view;

Creating a new table in Wizard mode.

After selecting the mode of further work, it is necessary to create a table structure and, after specifying its name, save it.
The File, External Data, Import command allows you to import tables from an external file into the current database;
The command File, External data, Link to tables allows you to create tables linked to tables from external files.
Setting field properties. The field name is entered in the Field name column. When naming fields, the following rules must be followed:

The field name must contain up to 64 characters;

The field name can contain letters, numbers, spaces, and special characters, except for periods, exclamation points, parentheses, and ASCII control characters;

The field name cannot start with a space;

Two fields in the same table cannot have the same name.

The data type of the field is entered in the field of the Data type column. The following types of data are valid in Access: text, numeric, currency, counter, date/time, boolean, memo field (variable length data fields can contain up to 65535 characters), OLE object field, hyperlink, lookup wizard. Each of the data types has its own properties, which are displayed in the Field Properties section of the Design window.
A primary key is one or more fields that uniquely identify each record in a table. Having a key helps you find and sort records faster. The fields used as the primary key are indexed automatically, but you can create a separate index for other fields as well. By default, Access creates a Code field of type Counter. You can create a primary key yourself by selecting the field that you want to use as the primary key. Next, select the "Key Field" button on the Table Designer toolbar. On the selected field, right-clicking the mouse leads to the appearance of the context menu, which contains the "Key Field" command. An icon with a key image will appear in the field marking area.
Entering data into a table using Datasheet view is the easiest way. When entering data into a table, the TAB key is used to move to the next field.
Working with forms in the MS-Access database
Creation of forms. The data in the database can be viewed in various modes. However, Forms mode provides the most flexibility, the most convenient way to view, add, edit, and delete data.
The form allows you to display all the fields of one or more records at the same time. Datasheet view also allows you to view multiple records at once, but it may not always be possible to display all fields at once. An optimally designed form can hold up to 100 fields on one screen, and if there are many more fields, then you can create a multi-page form for each entry.
The following tools are used to create forms:

Constructor - allows you to create a new form yourself;
Form Wizard - allows you to automatically create a form based on the fields selected from the table (using the following appearance forms: single column, ribbon, tabular or aligned);
Autoform: to column - provides automatic creation of a form with fields arranged in one column;

Autoform: tape - provides automatic creation of tape forms;

Autoform: tabular - provides automatic creation of tabular forms;

Chart - creating a form with a chart;
PivotTable - Create a form with an Excel PivotTable.
The listed tools also become available using the Insert, Form command or after pressing the button on the New Object: AutoForm toolbar.
To create a form, you need to open the database window, select the Forms tab, click the "Create" button, select the form option in the dialog box and follow the instructions in the dialog box.

The form is printed using the appropriate "Print" button on the Standard toolbar or the menu command File, Print. The completed form can be opened in Forms mode or by using the Form Builder to modify it.

6.10. Working with objects in the MS-Access database

You can insert pictures, video clips, sound files, business charts, Excel spreadsheets, and Word documents into MS Access. Any OLE type object can be associated with forms and reports. At the same time, they can not only be used in Access, but also edited directly in the form. Objects can be embedded in attached and free frames, as well as in a picture frame. Embedding places an object in an Access database, where it is stored in a form, report, or table entry.

Injection of a free object. You can use the following two methods to embed a free object in a form or report:

Insert an object into a form or report, this will create an object of the “picture” type or a free frame of the object;

First, create an object of type "picture" or a free object frame, and then insert the object or picture into this frame.

Embedding a picture. To embed OLE objects or pictures in a free frame of an object or picture, you must:

Open the form in Form Design view;

Click on the Picture button in the toolbar;

Create a picture frame by moving the Picture tool.

When you create a picture frame, the Select Picture dialog box will appear, listing the picture files contained in the current folder. Next, select an image and click the OK button. As a result, the drawing will be embedded and displayed.

Working with reports in MS-Access

There are flexible and powerful tools for creating reports in MS Access:

Report Designer, in which you independently develop your own reports with specified properties;

Report Wizard, which allows you to quickly create a report based on the selected fields;

Autoreport: to a column, which allows you to create a report with fields arranged in one or more columns;

Autoreport: tape, which allows you to automatically create a tape report;

Chart wizard that creates a report containing a chart display of data;

Mailing Labels, which generates a report formatted for printing mailing labels.

To view the database report, you can use:

commands File, Preview of the main menu;

commands Preview context menu;

the Preview button on the toolbar.

To print the created report from the Report Designer window or the Database window, you must:

Run the commands File, Print, this will open the Print dialog box, allowing you to set the necessary print options;

Click the "Print" button on the toolbar, in which case the report will be printed with the current settings.

To modify a previously created report:

In the database window, go to the Report tab;

Place the mouse pointer on the modified report;
- Click the Design button.
To save the report, execute the File, Save or File, Save As, Export commands or by clicking the Save button on the Standard toolbar. If the report is saved for the first time or using the Save As, Export commands, then you must specify the name of the report. Access saves only the design of the report, not the data or the report itself.
To open a report, execute the commands File, Open or click the corresponding button on the toolbar.

Basic concepts and classification of database management systems A database (DB) is a set of structured data stored in the memory of a computer system and displaying the state of objects and their relationships in the subject area under consideration. The logical structure of the data stored in the database is called the data representation model. The main data presentation models (data models) include hierarchical, network, relational. A database management system (DBMS) is a set of language and software tools designed to create, maintain and share a database with many users. RDBMS are usually distinguished by the data model used. So, DBMS based on the use of the relational data model are called relational DBMS. To work with a database, DBMS tools are often sufficient. However, if it is required to ensure the convenience of working with the database for unskilled users or the DBMS interface does not suit users, then applications can be developed. Their creation requires programming. An application is a program or a set of programs that automate the solution of an applied task. Applications can be created in or outside the DBMS environment - using a programming system that uses database access tools, for example, Delphi or C ++ Вuildeg. Applications developed in the DBMS environment are often referred to as DBMS applications, while applications developed outside the DBMS are often referred to as external applications. The data dictionary is a database subsystem designed for centralized storage of information about data structures, relationships between database files, data types and presentation formats, data ownership by users, security and access control codes, etc. Information systems based on the use Databases usually operate in a client-server architecture. In this case, the database is located on the server computer and is shared. A server of a certain resource in a computer network is a computer (program) that manages this resource, a client is a computer (program) that uses this resource. As a computer network resource, for example, databases, files, printing services, mail services can act. The advantage of organizing an information system on a client-server architecture is a successful combination of centralized storage, maintenance and collective access to common corporate information with individual user work. According to the basic principle of the client-server architecture, data is processed only on the server. The user or application forms queries that come to the database server in the form of instructions SQL language. The database server provides search and extraction of the necessary data, which is then transferred to the user's computer. The advantage of this approach in comparison with the previous ones is a noticeably smaller amount of transmitted data. There are the following types of DBMS: * full-featured DBMS; * database servers; * tools for developing programs for working with the database. Full-featured DBMSs are traditional DBMSs. These include dBaseIV, Microsoft Access, Microsoft FoxPro, etc. Database servers are designed to organize data processing centers in computer networks. Database servers provide processing of requests from client programs, usually using SQL statements. Examples of database servers are: Microsoft SQL Server, InterBase, etc. In general, DBMS, spreadsheets, word processors, programs can be used as client programs. Email and others. Tools for developing programs to work with the database can be used to create the following programs: * client programs; * database servers and their individual components; * custom applications. According to the nature of the use of DBMS, they are divided into multi-user (industrial) and local (personal). Industrial, DBMS are a software basis for the development of automated control systems for large economic objects. Industrial DBMS must meet the following requirements: * the possibility of organizing joint parallel work of many users; * scalability; * portability to various hardware and software platforms; * fault tolerance various kinds , including the presence of a multi-level backup system for stored information; * Ensuring the security of stored data and a developed structured system of access to them. Personal DBMS is software that is focused on solving the problems of a local user or a small group of users and is intended for use on a personal computer. This explains their second name - desktop. The defining characteristics of desktop systems are: * Relative ease of use, allowing you to create workable user applications on their basis; * relatively limited requirements for hardware resources. According to the data model used, DBMS are divided into hierarchical, network, relational, object-oriented, etc. Some DBMS can simultaneously support several data models. To work with data stored in the database, the following types of languages ​​are used: * data description language - a high-level non-procedural language of a declarative type, designed to describe the logical data structure; * data manipulation language - a set of structures that ensure the implementation of basic operations for working with data: input, modification and selection of data by request. Named languages ​​in different DBMS may have differences. The two most widely used standardized languages ​​are QBE - Pattern Query Language and SQL - Structured Query Language. QBE basically has the properties of a data manipulation language, SQL combines the properties of both types of languages. The DBMS implements the following basic low-level functions: * data management in external memory; * RAM buffer management; * transaction management; * logging changes in the database; * Ensuring the integrity and security of the database. The implementation of the data management function in external memory ensures the organization of resource management in the OS file system. The need for data buffering is due to the fact that the amount of RAM is less than the amount of external memory. Buffers are areas of RAM designed to speed up the exchange between external and RAM. Buffers temporarily store database fragments, data from which is supposed to be used when accessing the DBMS or is planned to be written to the database after processing. The transaction mechanism is used in the DBMS to maintain the integrity of the data in the database. A transaction is some indivisible sequence of operations on database data, which is tracked by the DBMS from beginning to end. If for any reason (failures and failures of equipment, errors in software, including the application) the transaction remains incomplete, then it is canceled. Transactions have three main properties: * atomicity (all operations included in the transaction are performed or none); * serializability (there is no mutual influence of transactions performed at the same time); * durability (even the crash of the system does not lead to the loss of the results of a committed transaction). An example of a transaction is the operation of transferring money from one account to another in the banking system. First, money is withdrawn from one account, then they are credited to another account. If at least one of the actions is not completed successfully, the result of the operation will be incorrect and the balance of the operation will be upset. Change logging is performed by the DBMS to ensure the reliability of data storage in the database in the presence of hardware and software failures. Ensuring the integrity of the database is necessary condition successful operation of the database, especially when it is used on a network. The integrity of the database is a property of the database, which means that it contains complete, consistent and adequately reflecting the subject area information. The integrity of the state of the database is described using integrity constraints in the form of conditions that must be satisfied by the data stored in the database. Security is achieved in the DBMS by data encryption, password protection, support for access levels to the database and its individual elements (tables, forms, reports, etc.).

When developing application programs, the following stages are distinguished: problem statement, mathematical description and choice of a method for solving a problem, algorithmization of a problem solution, drawing up a program and its adaptation.

Formulation of the problem involves a description of the problem being solved, a description of the input, output and reference information, as well as a description of the test case.

The characteristics of the selected task include: determination of the goal of solving the problem; establishing the composition and forms of presentation of input, intermediate and result information, establishing the frequency of solving the problem and the relationship of the problem being solved with other tasks, determining the forms and methods for monitoring the reliability of information.

The description of input operational information includes: the name of the input message, the source of information - a document or an array, the form of information presentation, the timing and frequency of information receipt.

Description of reference information includes classification of this type of information and the content of the directories used.

The description of the output information includes: the list of received output messages, the form of the message presentation (document or array), the timing and frequency of issuing messages, the purpose of the forms of the output information, the recipients of the output information.

The test case description includes: demonstration of the procedure for solving the problem in the traditional way, reflection of all forms of initial data, enumeration of all normal and abnormal situations that arise when solving the problem and a description of the user's actions in each case.

Mathematical description and choice of method for solving the problem. The mathematical notation of the problem statement provides a reflection of its essence, conciseness of the notation, unambiguous understanding. For problems that allow a mathematical description, a numerical method of solution is chosen, and for non-numerical problems, a schematic diagram of the solution is developed.

Algorithmization of problem solution. An algorithm is a precise prescription that defines a computational process that leads from changing initial data to a desired result. To solve the same problem, there are a number of algorithms that differ from each other in the level of complexity, the volume of computational and logical operations, the composition of the initial and intermediate information, and the accuracy of the results obtained. The algorithm itself can be written in verbal form, graphically, using decision tables, etc.

Compilation, debugging and testing of programs. Compilation (coding) of the program is performed using the operators of the programming language. In general, a programming language is a formalized language for describing an algorithm for solving a problem on a computer or a fixed notation for describing algorithms and data structures.

Debugging a program involves a set of actions aimed at eliminating errors, and testing is designed to demonstrate the absence or detection of errors in the developed programs.

The word "algorithm" appeared as a result of a distorted translation from Arabic into European languages ​​of the name of the Uzbek scientist IX century Al Khorezmi, who outlined the rules for arithmetic operations on numbers in positional decimal system reckoning. These rules are called algorithms.

Algorithm is a system of precisely formulated rules that determine the process of transforming the available initial data (input information) into the desired result (output information) in a finite number of steps.

The algorithm has a number of mandatory properties (attributes):

Discreteness - provides for breaking down the information processing process into simpler stages (execution steps);

Certainty (or determinism) - characterizes the uniqueness of the implementation of each individual step of information transformation;

- effectiveness (or finiteness) - implies the completion of the algorithm as a whole in a finite number of steps;

Mass - characterizes the suitability of the algorithm for solving a certain class of problems.

LECTURE

DB classification. Factographic and documentary databases.

Database of operational and retrospective information.

Data warehouses. Local and distributed databases. Correlation of the main requirements and properties of a DBMS: a system of compromises

2.1. Classification databases

The classification of databases and databanks can be made according to different criteria (and related to different components and aspects of the functioning of the database), among which are, for example, the following(Slide 2).

By the form of information provided one can single out factual, documentary, multimedia, to one degree or another corresponding to digital, symbolic and other (non-digital and non-symbolic) forms of information representation in a computing environment. The latter include cartographic, video, audio, graphic and other databases.

By type of stored (non-media) information it is possible to allocate factographic, documentary, lexicographic databases. Lexicographic databases are classifiers, codifiers, dictionaries of word bases, thesauri, rubricators, etc., which are usually used as references together with documentary or factual databases. Document bases are subdivided according to the level of information presentation - full-text (the so-called "primary" documents) and bibliographic-abstract ("secondary" documents, reflecting the primary document at the address and content level).

By type of data model used There are three classical classes of databases: hierarchical, network, relational. The development of data processing technologies has led to the emergence of post-relational, object-oriented, multidimensional databases, which, to one degree or another, correspond to the three classical models mentioned.

By storage topologies data distinguish between local and distributed databases.

By access typologies and nature of use stored information DB can be divided into specialized and integrated.

By functional purpose (the nature of the tasks solved with the help of the database and, accordingly, the nature of the use of data), one can single out operational and reference and informational ones. The latter include retrospective databases (electronic catalogs of libraries, databases of statistical information, etc.), which are used for information support of the main activity and do not imply changes to existing records, for example, based on the results of this activity. Operational databases are designed to control various technological processes. In this case, the data is not only retrieved from the database, but also changed (including added), including as a result of this use.

By scope of possible application one can distinguish between universal and specialized (or problem-oriented) systems.

By accessibility it is possible to allocate public and databases with limited user access. In the latter case, we speak of controlled access, which individually determines not only the set of available data, but also the nature of the operations that are available to the user.

It should be noted that the presented classification is not complete and exhaustive. It largely reflects the historical state of affairs in the field of activity related to the development and use of databases.

Typology of databases in terms of information processes

DBs can be mapped to different levels information processes: information technology (IT) level, system (IS) level, information resources(IR). (slide 3)

At the level of information technology, a database is defined as an interconnected set of OS files containing data about the subject area of ​​the problem being solved. In doing so, the main focus is on the physical structure of the database.

At the level of information systems, the database is considered as a component, which is an information model of the subject area. Here the most important problem is logical structure of the database.

When considering the database at the level of information resources, the database is treated as an element of the world's IR. The main feature here is database content, although data structures are also important.

2.2. Factual and documentary databases

The main difference between factographic and documentary databases is the structure of the information storage unit.

Under information storage unit we will understand the totality of data, which from the point of view of the information system is a single whole. A storage unit defines the integrity and consistency properties of data.

From the point of view of the structure of the storage unit, it is customary to distinguish between well-structured data and weakly structured data.

Well structured data - this is data in which each unit of information storage can be represented as a finite set of attributes. In this case, each of them will take on a precisely defined value.

semi-structured data is data in which each storage unit is also represented by a finite number of attributes, but the value of the attribute is not precisely defined, depends on the context of use, and can in turn have a complex structure.

Factual databases - Databases focused on storing well-structured data. The unit of storage in such databases is the description of the "fact" by a finite well-defined set of characteristic properties.

When constructing a conceptual model of such databases, the subject area (SdA) is naturally decomposed into objects and relationships between them. Each characteristic property of an object has an atomic value that is independent of the context of use.

Documentary DB - designed to store loosely structured data. In this case, the storage unit is a document specified by a finite (but not fixed) set of fields in the general case of arbitrary length.

When constructing documentary databases, ObD is usually represented as a collection of generally non-interacting objects. The set of characteristic properties of an object is finite, but not fixed. The value of a characteristic property can be multiple and may depend on the context of use (slide 4).

From the point of view of search methods and algorithms, factual databases are considered as information support for data retrieval, and documentary databases as information support for information retrieval.

The differences between these two types of search are presented on the slide. (slide 5) .

When searching for data, they usually look for a full match of the query with the data element. When searching for data, results are inferred by simple induction, for example, if A and B then C . The search for information is much closer to the methods of deduction: the relationship is described only by the degree of certainty or uncertainty. In information retrieval, as a rule, the search strategy is built on the principle of truncation of the initial search results, which leads to the logic "from general to particular". From this follows a deterministic description of the data retrieval model and a probabilistic information retrieval model.

In information retrieval, the presence of an attribute is not always necessary and sufficient to assign records to the set of searched ones. This means that each of the records (documents) relates to some part of the user's information need. This requirement document matching property is called relevance. Distinguish between formal and true relevance. The first has usually a numerical expression and is calculated search engine, the second is the user's assessment in terms of compliance with the real need generated by the problem situation in the user's main activity.

When searching for data, all found data that matched the request are given to the user. During information retrieval, it is possible that almost all database documents to some extent can be considered relevant to the request, and therefore the documents will be ordered, for example, according to the degree of formal relevance, and only the first few will be returned.

The query language for data retrieval is usually artificial, with a strict syntax and a limited vocabulary, when searching for information, natural language is preferable, although with some exceptions, and currently "natural language" is reduced to a list of keywords. In data retrieval, a query is usually a complete specification of what needs to be found and in what form to show, in information retrieval it is incomplete, in addition, many actions are performed by the information retrieval system by default.

2.3. Database of operational and retrospective information. Data warehouses

From the point of view of the main features of the ObD and the tasks to be solved, two main classes of databases can be distinguished - operational and retrospective information.

Databases of operational information are the basis of the so-calledOLTP -applications (Op- Line Transactions Processing ) . Typical examples OLTP -applications arewarehouse accounting systems, ticketing systems, banking systems,performing money transfer operations, etc. The main function of such systems is to simultaneously perform a large number ofshort transactions - completed blocks of data manipulation operations, for example: withdraw some amountneg from account A and add this amount to the account V", "sell a passenger a ticket for a given train to a given place on a certain date."The completion of the transaction means that if an error occurs, the transaction must be completely rolled back and return the database to the state that was before the start of the transaction (not there must be a situation when money is withdrawn from the account A but not credited V).

Key Features OLTP applications:

1. A large number of transactions are simultaneously executed per unit of time(the system can be connectedthere are several thousand users working at the same time).

2. Almost all database queries that need to be executedin real time, consist of insert, update, delete commands.

3. Select requests are primarily intended toenabling users to choose from various directories, andmost of these requests are known in advance at the design stage.

Ta in a way that is critical for OLTP - Applications is speed and onreliability of performing short data update operations.

DBs of retrospective information are part of documentary IS oriented to the tasks of information retrieval, as well asOLAP-applications(Op- Line Analytical Processing , operational analytical data processing). This is a generalized term that characterizes the principles of construction decision support systems( DSS, Decision Support System ), as well as vaults data(data warehouse ) and data mining systems( data mining ). Such systems are designed to establish dependencies betweendata (for example, you can try to determine how the volume is relatedsales of goods with the characteristics of potential buyers) or forconducting analysis that answers the questions "what if ...".

DBs of retrospective information are characterized by the following features:

1. Adding new data to the database is relatively rarelarge blocks.

2. Data from a database is usually never deleted.

3. Data queries are ad hoc and usually quite complex. Very often a new request is formulated by an analystto refine the result obtained by performing the previous millet.

4. The speed of query execution is important, but not critical.

For OLAP -applications, it is typical that before loading the data undergo various "cleaning" procedures, due togiven that one database can receive data from manysources that have different presentation formats for the samethe same data, the data may be incorrect, erroneous, etc..

Data warehouses

O huge amount of information accumulated in operationaldatabases, allows, for example, to set the task of using systems underdecision making. However, since online processing systems are most often designed without taking into account anyor support for such requirements, so the conversion is usually OLTP systems in the decision support system turns out to be extremelytea is a difficult task. Typically, a typical organization has many different transaction processing systems with overlapping and sometimes evenconflicting definitions, for example with different types chosento represent the same data. The main task isthere is a transformation of the accumulated data archives into a source of new knowledge,moreover, in such a way that the user is provided with a single integration a consolidated view of the organization's data. Conceptdata storage was conceived as a technology capable of satisfyingrequirements of decision support systems and information-basedinformation coming from several different sources of operational data.

Data warehouse concept initially was proposed as a solution that provides access to data accumulatedin non-relational systems. It was assumed that such a repository of informationwill enable organizations to use their data archives to effectivelysolving business problems. However, due to the extreme complexity and low productivity of such systems created at the initial stages, the firstattempts to create information repositories as a whole were unsuccessful. Ever sinceSince then, the concept of information repositories has been returned again and again, but onlyin recent years, data warehousing technology has come to be seen as a pricenew and viable solution.

Data store - domain-specific, integrated, time-bound and immutable dataset designed to bedecision support.

In the above definition, the specified characteristics of the dataare considered as follows. (slide 6)

subject orientation. The data warehouse is organized aroundthe main objects (or subjects) of the organization (for example, customers, thenvars and sales), rather than around application areas (customer invoicing, inventory control, and sales of goods). This reflective propertyThere is no need to store data intended for decision support, and not ordinary operational and applied data.

Integration . The meaning of this characteristic is that the operaapplied data usually come from different sources, which often have inconsistent representations of the same data, such as using different formats. To provide the user with a singlegeneralized representation of data, it is necessary to create an integrated a source that ensures the consistency of stored information.

Time binding . The data in the vault is accurate and valid only inif they are tied to a certain moment or intervaltime: stored infoThe mation is actually a set of snapshots of the state of the data.

immutability. This means that the data is not updated online.mode, but only regularly replenished due to information from the operationalprocessing systems. At the same time, new data is never replaced, andonly complement the previous ones. Thus, the storage database is constantlyexplicitly updated with new data, sequentially integrated withalready accumulated information.

TO ultimate goal of creatingdata warehouse is the integration of corporate data in a single repository a torii that users can access to query, prepare reports, and analyze data. Summing up, we can say that technoData warehousing is a technology for data management and analysis.

System Comparison OLTPand data warehouses

A DBMS designed to support online transaction processing ( OLTP), generally considered unsuitable for data warehousing because the two types of systems have very different requirements.vanity. For example, systems OLTP designed to provide maximumlow-intensive processing of fixed transactions, while storingschA data - primarily for processing single arbitrary requests . On the slide (slide 7) for comparison, the main characteristics are given typical OLTP systems and data warehouses.

Problems of development and maintenance of data warehouses

Let's list the potential problems associated with the developmentand maintenance of data warehouses (slide 8) .

· Underestimating the resources needed to load data : many developers tend to underestimate the time it takes to retrievecheniya, cleaning and loading data in storage.

· The Hidden Problems of Data Sources : problems associated with data sources supplying informationformation in a repository may not be discovered until several years after the start of their operation.

· Lack of required data in the available archives : in data warehouses, there is often a need to obtain informationdeductions that were not taken into account in the operational systems that serve as a sourcemi data. In this case, the organization must decide whether it is worth it to modifyvalidate existing systems OLTP or create new system on collection notfetching data

· Increasing end user requirements

· Data unification : Building a large-scale data warehouse can be a big data unification challenge, but unification can reducesew the value of the collected information

· High resource requirements : May require a huge amount of disk space.

· Data ownership : creating a data warehouse may require changing the status of the endpointsusers regarding data ownership

· Complex accompaniment : any reorganization of business processes or data sources mayaffect the performance of the data warehouse

· Long-term nature of projects

· Difficulties of integration

Local and distributed databases

In general, the modes of operation with the database can be classified according to the following criteria:

- multitasking - single-user or multi-user;

- request service rule - serial or parallel;

- data placement scheme - centralized or distributed database.

It should be noted that the general trend in the development of data processing technologies is consistent with the stages in the development of computer technology and information technology, and primarily network technologies. In this sense, two classes should be distinguished: distributed data processing systems and distributed database systems.

Distributed data processing systems basically reflect the structure and properties of multi-user operating systems with a database hosted on a large central computer (mainframe). Until recently, this was the only possible computing environment for implementing large databases. Client places in this case were implemented either in the form of terminals or mini-computers, which mainly provide data input / output and do not have their own computing resources for function-oriented processing of the received data.

The development of network technologies, combined with the widespread use of personal computers and the introduction of open systems standards, has led to the emergence of database systems hosted on a network of different types of computers. Such distributed database systems ensure the processing of distributed queries, when the processing of one query uses database resources located on different computers in the network. A distributed database system consists of nodes, each of which is a DBMS, and the nodes interact with each other so that the database of any node will be available to the user, as if it were local. The distributed database architecture is shown on the slide (slide 9) .

Correlation of the main requirements and properties of a DBMS: a system of compromises (slide 10)

In general, we can say that the main tasks of data processing, solved on the basis of database concepts, are reduced to the following questions:

one). How to represent complex non-linear data structures in the form of linear ones that are most consistent with the principle of sequential representation (storage) in machine memory.

2). How to organize data so that data can be entered, deleted, and edited efficiently.

3). How to organize the data so that the use of memory space (data density) is quite rational, and the speed of access to data records is high.

4). How to organize the data so that the search is efficient and allows you to search for records by multiple keys.

The creation of a database is essentially an attempt to find a compromise in several directions at once and combinations of several mutually inverse factors (in terms of their influence on the indicator of the overall system efficiency), including the following (slide 11) :

1) Efficiency - simplicity;

2) Sample rate is the cost (complexity) of hardware;

3) The sampling rate is the complexity of the access procedures;

4) Data density - access time and complexity of procedures;

5) Data independence - performance;

6) Flexibility of search tools – data redundancy or

7) Search flexibility - search speed;

8) Complexity of access procedures - ease of maintenance.

1. The concept of a database Database (DB) is a collection of arrays and data files organized according to certain rules that provide for standard principles for describing, storing and processing data, regardless of their type. Database (DB)- a set of organized information related to a specific subject area, intended for long-term storage in the external memory of a computer and permanent use.

By type of data model used allocate three classic database classes:

    hierarchical,

    network,

    relational. The development of data processing technologies has led to the emergence of post-relational, object-oriented, multidimensional databases, which, to one degree or another, correspond to the three classical models mentioned above.

By storage topologies data distinguish between local and distributed databases. By typology of access and nature the use of stored database information can be divided into specialized and integrated.

Typology of databases in terms of information processes

On the other hand, databases can be related to different levels information processes:

    level of information technology (IT),

    system level (IS),

    level of information resources (IR).

At the level of information technology, a database is defined as an interconnected set of OS files containing data about the subject area of ​​the problem being solved. In doing so, the main focus is on the physical structure of the database.

At the level of information systems, the database is considered as a component, which is an information model of the subject area. Here the most important problem is logical structure of the database.

When considered at the level of information resources, the database is treated as an element of global IR. The main feature here is database content, although data structures are also important.

Classification by data models

    Hierarchical

    Network

    relational

    Object and object-oriented

    Object-relational

    functional.

Classification by Persistent Storage Environment

    In secondary memory, or traditional ( English conventional database): the persistent storage medium is peripheral non-volatile memory (secondary memory) - typically HDD. The DBMS places only cache and data for current processing.

    in RAM ( English in-memory database, memory-resident database, main memory database): all data in progress is in random access memory.

    In tertiary memory ( English tertiary database): a persistent storage medium is a mass storage device (tertiary storage) detached from the server, usually based on magnetic tapes or optical discs. The server's secondary memory stores only the tertiary memory data directory, file cache, and data for current processing; loading the data itself requires a special procedure.

Content classification

    Geographic

    historical

  • Multimedia.

Classification according to the degree of distribution

    Centralized or concentrated English centralized database): A database that is fully supported on a single computer.

    Distributed (English distributed database): a database, the components of which are located in various nodes of a computer network in accordance with some criterion.

    • Heterogeneous ( English heterogeneous distributed database): fragments of a distributed database in different network nodes are supported by means of more than one DBMS

      Homogeneous ( English homogeneous distributed database): fragments of a distributed database in different network nodes are supported by means of the same DBMS.

      Fragmented or partitioned ( English partitioned database): data distribution method is fragmentation (partitioning, sectioning), vertical or horizontal.

      Replicated ( English replicated database): the data distribution method is replication ( replication).

Other types of database

    Spatial (English spatial database ): A database that maintains the spatial properties of entities in the domain. Such databases are widely used in geoinformation systems.

    Temporary, or temporal ( English temporal database): a database that supports some aspect of time, not counting the time defined by the user.

    Spatio-temporal ( English spatial-temporal database) DB: A DB that simultaneously maintains one or more dimensions in terms of both space and time.

    cyclic (English round-robin database): a database, the amount of stored data of which does not change over time, since the same records are used cyclically in the process of saving data.

The DBMS has software, technical and organizational components.

The software includes a control system that provides input-output, processing and storage of information, creation, modification and testing of a database. The internal programming languages ​​of the DBMS are the fourth generation languages ​​(C, C++, Pascal, Object Pascal). Database languages ​​are used to create applications, databases, and user interfaces, including screen forms, menus, and reports.

2. Database management system (DBMS) - a set of software and linguistic tools for general or special purposes that manage the creation and use of databases.

Subd classifications By data model

    Hierarchical

  • relational

    Object Oriented

    Object-relational

According to the degree of distribution

    Local DBMS (all parts of the local DBMS are hosted on the same computer)

    Distributed DBMS (parts of the DBMS can be hosted on two or more computers).

By way of accessing the database

    File Server

In file-server DBMS, data files are located centrally on the file server. The DBMS is located on each client computer (workstation). The DBMS accesses the data through the local network. Synchronization of reads and updates is carried out by means of file locks. The advantage of this architecture is the low CPU load of the file server. Disadvantages: potentially high local network load; difficulty or impossibility of centralized control; the difficulty or inability to provide such important characteristics as high reliability, high availability and high security. They are used most often in local applications that use database management functions; in systems with low data processing intensity and low peak loads on the database.

At the moment, file-server technology is considered obsolete.

    Client-server

The client-server DBMS is located on the server together with the database and accesses the database directly, in exclusive mode. All client requests for data processing are processed centrally by the client-server DBMS. The disadvantage of client-server DBMS is the increased requirements for the server. Advantages: Potentially lower local network load; convenience of centralized management; the convenience of providing important features such as high reliability, high availability and high security.

    Embedded

Embedded DBMS - a DBMS that can be delivered as component some software product without requiring a self-installation procedure. An embedded DBMS is designed to store its application data locally and is not intended to be shared over a network. A physically embedded DBMS is most often implemented as a plug-in library. Access to data from the application can occur through SQL or through special programming interfaces.

The choice of a database management system (DBMS) is a complex multi-parameter task and is one of the important steps in the development of database applications. The selected software product must meet both the current and future needs of the enterprise, while taking into account the financial costs of acquiring the necessary equipment, the system itself, developing the necessary software based on it, as well as training personnel. In addition, you need to make sure that the new DBMS can bring real benefits to the enterprise.

The simplest approach when choosing a DBMS is based on an assessment of the extent to which existing systems satisfy the basic requirements of the information system project being created. A more complex and expensive option is to create a test project based on several DBMS and then select the most suitable of the candidates. But even in this case, it is necessary to limit the range of possible systems based on certain selection criteria. Generally speaking, the list of DBMS requirements used in the analysis of a particular information system may vary depending on the goals set. Nevertheless, several groups of criteria can be distinguished:

    Data Modeling

    Architectural features and functionality

    System operation control

    Application development features

    Performance

    Reliability

    Work environment requirements

    mixed criteria

3. Database architecture

Information about a particular subject area is represented in the database by models of several levels. According to the number of levels in the architecture, one-level, two-level, three-level systems are distinguished. Different levels of DBMS architecture support different levels of data abstraction. At present, the most common is the three-level database organization system proposed by the American Committee for Standardization ANSI (American National Standards Institute). When designing databases, there are three levels: conceptual, internal and external.

1. The level of external models - the highest level, where each model has its own "vision" of the data. This level determines the point of view on the database of individual applications. Each application sees and processes only the data that is necessary for this particular application. For example, the work distribution system uses information about the qualifications of an employee, but it is not interested in information about the salary, home address and telephone number of the employee, and vice versa, it is this information that is used in the HR subsystem.

2. The conceptual level is the central control link. Here the database is presented in the most general view A that aggregates the data used by all applications that work with the given database. In fact, the conceptual level reflects the generalized logical model of the subject area for which the database was created. Like any model, the conceptual model reflects only the essential, from the point of view of processing, features of the objects of the subject area. The conceptual model is a logical level model and does not depend on the features of the used DBMS. The selection of the conceptual level made it possible to develop an apparatus for centralized database management.

3. Physical level - the actual data located in files or in page structures located on external storage media. The physical representation of the database refers to the internal level. It describes ways to organize data on external storage media (in the form of file or page structures) and is designed to achieve optimal performance and efficiency in the use of computing system resources. The description of the physical structure of the database is called the storage schema, and the corresponding phase of the database design is called the physical design.

Database design consists of two main phases: logical and physical modeling. During the logical modeling phase, the developer collects the requirements for the database being developed, compiles a description of the subject area, and develops a model that does not depend on a particular DBMS. During the physical modeling phase, the developer creates a model optimized for the DBMS and specific user applications. Currently, the internal level is almost completely provided by the DBMS. The main emphasis in the design of the database is transferred to the creation of a model of the conceptual level. This architecture allows for logical (between levels 1 and 2) and physical (between levels 2 and 3) independence when working with data.

Logical independence implies the ability to change one application without adjusting other applications working with the same database, and reorganizing the mechanism for accessing physical data.

Physical independence implies the possibility of transferring stored information from one media to another while maintaining the performance of all applications working with the database.

Client-server Information system consists in the simplest case of 2 main components:

1. A database server that manages data storage, access and protection, backup, monitors the integrity of data in accordance with business rules and, most importantly, fulfills client requests;

2. A client that provides other clients with a user interface that executes application logic that validates data, sends requests to the server and receives responses to it;

In addition, we should not forget about the network and communication software that interacts between the client and the server through network protocols.

The client is the user's application. It is also called the client application.

The client and server interact as follows:

1. The client generates and sends queries (SQL queries) for reading or changing data to the server hosting the database. These queries are written in SQL.

2. A remote network server sends a request to the SQL Server program (database server).

Benefits of client-server architecture.

 To work with data, a relational access method is used. This reduces the load on the network, since now only the necessary information circulates in the network.

 For example, if it is necessary to select five records from a table containing a million, the client application sends a query to the server, which is compiled, optimized and executed by the server, after which the result of the query (those same 5 records, and not the entire table at all) is transmitted back to the workstation. At the same time, it’s not uncommon, as a first approximation, that you don’t have to think, and if there is an index at all that can facilitate the search for the desired record, if it exists, it will be used by the server, if not, the request will still be executed, although, most likely, in a greater amount of time.

 The application does not directly manage the database, only the server is in charge of management. This leads to an increase in the degree of information security.

 Reducing the complexity of client applications due to the absence of code related to database control and access control to it.

Database life cycle.

The process of designing, implementing, and maintaining a database system is called the database life cycle (DBLC). The procedure for creating a system is called the system life cycle (LCC).

The LCBD consists of the following steps:

1. Preliminary planning - database planning, performed in the process of developing a strategic database plan. The following information is collected during the planning process:

 what applications are used and what functions they perform;

 what files are associated with each of these applications;

 What new applications and files are in the pipeline.

This information helps to determine how application information is used, to determine future requirements for the database system.

The information of this stage is documented in the form of a generalized data model.

2. Feasibility check. Here the technological, operational and economic feasibility of the database creation plan is determined, i.e.:

 Technological feasibility - is there a technology to implement the planned database?

 Operational feasibility – are there the tools and experts needed to successfully implement the database plan?

 economic feasibility - can the conclusions be determined? Will the planned system pay off? Can costs and benefits be assessed?

3. Definition of requirements includes the choice of database objectives, the clarification of information requirements for the system and requirements for hardware and software. Thus, at this stage of data collection and requirements definition, a general information model is created, expressed in the following tasks:

 The goals of the system are determined by analyzing information needs. It also necessarily indicates what kind of database should be created (distributed, holistic) and what communication tools are needed. An output document is a comment that describes the goals of the system.

 Definition of user requirements: documentation in the form of generalized information (comments, reports, surveys, questionnaires, etc.); fixing the functions of the system and determining the application systems that will fulfill these requirements. The data are presented in the form of relevant documents.

 Determination of general requirements for hardware and software related to maintaining the desired level of performance. (Finding out the number of users of the system, the number of input messages per day, the number of printouts). This information used to select the types of computers and DBMS, the volume of disks, the number of printers. The data of this stage is set out in a report containing approximate hardware and software configurations.

 Development of a plan for the phased creation of the system, including the selection of initial applications.

4. Conceptual design - creation of a conceptual database schema. Specifications are developed to the extent necessary to move to implementation.

The main output document is a single infological model (or database schema at the conceptual level). When developing this model, the information and functions that the system must perform, determined at the stage of collecting and determining the requirements for the system, are used. At this stage, it is also desirable to define: 1) rules for the data; 2) rules for processes; 3) rules for the interface.

5. Implementation - the process of turning a conceptual model into a functional database. It includes the following steps.

1) Selection and acquisition of the necessary DBMS.

2) Transformation of the conceptual (infological) database model into a logical and physical data model:

 on the basis of the infological data model, a data schema is built for a specific DBMS, if necessary, the database denormalization is implemented in order to speed up the processing of queries in all time-critical applications;

 determine which application processes need to be implemented in the data schema as stored procedures;

 implement constraints designed to ensure data integrity and enforce data rules;

 design and generate triggers to implement all centrally defined data rules and data integrity rules that cannot be specified as constraints;

 develop an indexing and clustering strategy; perform sizing of all tables, clusters, and indexes;

 define user access levels, develop and implement security and auditing rules. Create roles and aliases to provide multi-user access with consistent levels of access permissions.

 develop a database network topology and a mechanism for seamless access to remote data (replicated or distributed database).

3) Building a data dictionary, which defines the storage of definitions of the database data structure. The data dictionary also contains information about access rights, data protection rules, and data control.

4) Filling the database.

5) Creation of application programs, management control.

6) User training.

6. Evaluation and improvement of the database schema. Includes a user survey to identify functional unmet needs. Changes are made as needed, adding new programs and data items as needs change and expand.

Thus, the LCBD includes:

 Study of the subject area and submission of relevant documentation (1-3).

 Building an infological model (4).

 Implementation (5).

 Evaluation of work and support of the database (6).

Database design steps

When developing a database, the following stages of work can be distinguished.

I stage. Formulation of the problem.

At this stage, the task of creating a database is formed. It describes in detail the composition of the database, the purpose and purpose of its creation, and also lists what types of work are supposed to be carried out in this database (selection, addition, data modification, printing or output of a report, etc.).

II stage. Object analysis.

At this stage, it is considered what objects the database can consist of, what are the properties of these objects. After splitting the database into separate objects it is necessary to consider the properties of each of these objects, or, in other words, to establish what parameters describe each object. All this information can be arranged in the form of separate records and tables. Next, you need to consider the data type of each individual unit of record. Information about data types should also be entered in the table being compiled.

III stage. Model synthesis.

At this stage, according to the above analysis, it is necessary to choose a specific database model. Further, the advantages and disadvantages of each model are considered and compared with the requirements and tasks of the created database. After such an analysis, the model that can maximize the implementation of the task is selected. After choosing a model, it is necessary to draw its diagram indicating the relationships between tables or nodes.

IV stage. The choice of ways to present information and software tools.

After creating the model, it is necessary, depending on the selected software product, to determine the form of information presentation.

In most DBMS, data can be stored in two forms:

    using forms;

    without using forms.

A form is a user-created graphical interface for entering data into a database.

V stage. Synthesis of a computer model of the object.

In the process of creating a computer model, some stages typical for any DBMS can be distinguished.

Stage 1. Launching the DBMS, creating a new database file or opening a previously created database.

Stage 2. Creation of the initial table or tables.

When creating a source table, you must specify the name and type of each field. Field names must not be repeated within the same table. In the process of working with the database, you can supplement the table with new fields. The created table must be saved by giving it a name that is unique within the database being created.

1. Information in the table should not be duplicated. There should be no repetition between tables. When certain information is stored in only one table, then it will only have to be changed in one place. This makes the work more efficient, and also eliminates the possibility of mismatching information in different tables. For example, one table should contain addresses and phone numbers of customers.

2. Each table should contain information on only one topic. Information on each topic is processed much more easily if they are contained in independent tables. For example, it is better to store addresses and customer orders in separate tables so that when an order is deleted, customer information remains in the database.

3. Each table must contain the required fields. Each field in a table should contain separate information about the topic of the table. For example, a customer data table might contain fields with company name, address, city, country, and phone number. When designing the fields for each table, remember that each field must be associated with a table topic. It is not recommended to include data in the table that is the result of an expression. The table should contain all the necessary information. Information should be split into the smallest logical units (For example, the fields "First name" and "Last name", and not the general field "First name").

4. The database must have a primary key. This is necessary so that the DBMS can link data from different tables, for example, data about a client and his orders.

Stage 3. Creation of screen forms.

Initially, you need to specify the table on the basis of which the form will be created. It can be created using the form wizard, specifying what form it should have, or independently. When creating a form, you can specify not all the fields that the table contains, but only some of them. The name of the form can be the same as the name of the table on which it was created. Based on one table, you can create several forms, which may differ in the type or number of fields used from this table. Once created, the form must be saved. The created form can be edited by changing the location, size and format of the fields.

Stage 4. Filling in the database.

The process of filling in the database can be carried out in two forms: in the form of a table and in the form of a form. Numeric and text fields can be filled in as a table, while MEMO and OLE fields can be filled in as a form.

VI stage. Working with the created database.

Working with the database includes the following steps:

    search for the necessary information;

    data sorting;

    data selection;

    printout;

    change and addition of data.

Understanding and properly approaching the LCDB is very important and requires detailed consideration, as it is based on a data-driven approach. The data items are more stable than the system functions performed. Creating the right data structure requires complex analysis of data item classes and the relationships between them. If you build a logical database schema, then in the future you can create any number of functional systems using this schema. The function-oriented approach can only be used to create temporary systems that are designed for a short period of operation.

Conceptual (infological) design

Conceptual (infological) design - building a semantic model of the subject area, that is, an information model of the most high level abstraction. Such a model is created without focusing on any particular DBMS data model. The terms "semantic model", "conceptual model" and "infological model" are synonymous. In addition, the words “database model” and “domain model” (for example, “conceptual database model” and “conceptual domain model”) can be used equally in this context, since such a model is both an image of reality and an image of a design database for this reality.

The specific form and content of the conceptual database model is determined by the formal apparatus chosen for this. Graphical notations similar to ER diagrams are commonly used.

The most common conceptual database model includes:

    description of information objects, or concepts of the subject area and relationships between them.

    description of integrity constraints, i.e. requirements for valid data values ​​and relationships between them.

Logical (datalogical) design

Logical (datalogical) design is the creation of a database schema based on a specific data model, for example, a relational data model. For a relational data model, a datalogical model is a set of relationship schemas, usually with primary keys, as well as "links" between relationships, which are foreign keys.

The transformation of a conceptual model into a logical model, as a rule, is carried out according to formal rules. This step can be largely automated.

At the stage of logical design, the specifics of a particular data model are taken into account, but the specifics of a particular DBMS may not be taken into account.

DB- a named set of data that reflects the state of objects of their relations in a given subject area. PR admissions committee, accounting.

To work with the database, it is necessary to provide for such operations as entering information, updating and deleting it, and so on. issuance of information upon request to the database. The DBMS allows you to quickly and efficiently implement these operations. Databank(BnD)- based on database technology, a system of software, language, organizational and technical means designed for centralized accumulation and collective use of data.

DBMS- a set of language and software tools designed to create, maintain and share a database with many users. OL FoxPro 2.5, Visual FoxPro 8.0, Paradox, Access. Knowledge base (KB)- a formalized system of information? about a certain subject area containing data on the properties of objects, patterns of processes and phenomena, as well as the rules for using this data in given situations to make new decisions.

DB classification:

1. By type of information systems

Local - provide automation of individual functions (workstation of an accountant)

Corporate - provide automation of all functions at all levels of management across the enterprise, corporation.

2. By the nature of data organization and access to them

Local (personal)

General (centralized, integrative)

Distribution

3. According to the processing method

System of online information processing (Online Transaction Processing) - include databases used to automate management processes in subject areas, such as a bank or a warehouse. Such databases are characterized by a large number of transaction updates. transaction is an indivisible sequence of data manipulation operations from the position of impact on the database,

Analytical processing system (Online Analysis Processing),

Deductive logical databases - used in intelligent systems. PR: to display new information using the rules of logic.

4. By type of stored data

Unstructured (semantic networks)

Partially structured (hypertext)

Structured (DB FVP)

5. Structured databases are classified according to the type of data model used

Hierarchical,

network,

relational,

Multidimensional

6. according to the form of data submission by the user

video systems,

audio systems,

Multimedia

7. by type of stored information

Factual databases (structured)

Documentaries (text information)

Lexicographic (various dictionaries)

8. According to economic and organizational features:

8.1. According to the terms of service:

free,

Paid

8.2. By form of ownership:

State,

Non-state

8.3. By availability:

public,

With a limited circle of users.

    Stages of development of the concept of a database (DB).

Stage 1 associated with the beginning of the development of BT. Thus, for the initial period, a close connection between programs and dans is characteristic. Dan, stored in computer memory, on external storage devices, are called physical. The level of data presentation in computer memory is called physical. Programs worked directly with the physical storage layer. There is no independence. Basic concepts of the physical layer: physical record, physical blocks of data on the ML (magnetic media). Conclusions: concept inf. B. as unrelated arrays; one-phys.-ur. representations are given; lack of independence.

Second phase(60s) Associated with the appearance of computers of the 2nd generation, on transistors. PL and OS began to appear, which provided the opportunity to work at the logical level. PLs allowed to work with names, not addresses of data. There is a new level of data presentation - logical. Physical data independence has been implemented. At this time, computers began to be used to solve economic problems. The database began to be called a set of files stored not in the VZU and used to solve complex problems. Basic concepts of the logical level:

File is a named collection of logical records of a single type. Logical entry– a named collection of related data fields. Field data is the smallest unit of (stored) data. The field has a name, type, length. The development of various AIS began. Conclusions: DB - scoop f., has 2 ur. - log. and physical, physical level is independent.

Third stage begins with the advent of 3rd generation computers, which were equipped with storage devices on the ML, MD. Now you can store large amounts of data and quickly access them. As the number of tasks grew, the shortcomings of the file became more and more obvious. systems: data dependency; rigidity; static; lack of integration; duplication of data (unmanaged redundancy); contradictory (inconsistent, unreliable); inability to share; inefficiency; the impossibility of processing atypical requests. There was a need for centralized control given. This is how the concepts of the database were formed and the first DBMS that implement this concept appeared.

In J. Martin's book “Database Organization in Computing Systems” (Mir, 1980), a definition is given that has become a classic:

Database is a collection of related data stored together, with a minimum amount of redundancy such that it can be used optimally for one or more offerings.

The data is stored in such a way that it is independent of the programs using this data, a general controlled method is used to add new or modify existing data, as well as to search for data in the database.

This definition formulates the main provisions of the modern database concept: integrated storage; differentiated use; minimal redundancy; data independence; centralized management.

The first DBMS supported hierarchical data structures that naturally reflected the scope of the subject area: IMS, ADABAS, OKA and others. Ines then networked - Setor, Sedan. Gradually, a general view was formed on universal DBMS that implement the initial data models, formal representations of data. Ideas about the relation of MD were formed, but the first relations of the DBMS were only experimental. The result of the third stage was the formation of database concepts, the emergence of the first DBMS and the development of the theory of data models. Minicomputers appear.

For fourth stage(late 80s - early 90s) characterized by the emergence of personal computers, universal relational DBMS, the development of the theory of data models and database design methods. The most difficult to implement is logical independence. There was an idea about the 3-level architecture of the DBMS. Computer networks have spread, data processing distribution (local, homogeneous), along with centralized ones. The most significant achievement is relational MD, relational query languages, relational DBMS. There was a network layer of data presentation in the local network. Databases occupy a central place in the design methodology.

Fifth stage Modern stage. The main problem is the integration of heterogeneous networks based on the client-server architecture. Combination of centralized and distributed databases. New level - network. Independence from the data source. Users do not care where the necessary data is located, in what form, whether the data is duplicated on the network or not. New approaches to design - object, component. There are a number of new requirements for the DBMS:

support for a wide range of data presentation and operations on them (including factual, documentary, graphic, video data); multidatabase management; management of distributed databases, generally heterogeneous; natural and efficient representation in data models of various objects of domain relations (for example, space-time, with visualization); development of knowledge base technology; ensuring integrity and security; applied systems require a significant increase in the volume of information stored in the database, higher reliability of their operation, and a significant increase in productivity. A new ur representation appears - in the global network. New technologies are used - intranet, client-server.

Database canvas

Qty. levels

Independence

Machine languages

Arrays (f) unconnected given

Missing

II generation, ML

YaP 1st generation, OS

Data files

2 - physical, log

Physical

III generation, ID

Yap 2 software, OS, DBMS

The concept of database centralization

logical, physical

4th generation

PC, networks

AP 3 software, network OS, OS, DBMS

DB relay, DB distribution, multi-model

Logical, physical, source independent

Networks, archive "cl - ser", WWW

Yap 4 software, OS, DBMS, technol internet, intranet

Distribution and central databases, OODB, Hypertext

ur objects + network

Log, physical, Data source independent


3. Database design goals, database requirements. Structure of the design process

The term "design" refers to all types of work related to the creation of the final product.

The main design goals are to:

    provide users with complete, up-to-date and reliable data necessary for the performance of official duties;

2) provide access to data in a reasonable time.

The task of the design process is to develop a database that must meet all the requirements arising from the current stage of development of the concept (technology) of the database.

These requirements are as follows:

    Adequacy of the subject area database. The DB should contain the objects and processes of the ObD.

    Flexibility and adaptability of the structure, that is, the possibility of development and adaptation to changes in the SbA and the requirements of the field.

    Performance. Ensuring requirements for the execution time of user requests.

    Efficiency and reliability funkt. It means providing the minimum costs for the functions, restoration and development of the system.

    Simplicity and ease of use (from the point of view of users).

    Possibility of user interaction. different categories and in different modes.

    Integration, independence, min data redundancy. Conceptual representation. about the data should be uniform.

    Integrity, consistency, recoverability of data.

Integrity. A database has the integrity property if it satisfies certain data value constraints and retains this property for all modifications. An integrity constraint is a statement about the allowable values ​​of individual inform. units and connections m/du them. Def. features of the Pro.

Consistency. The database has the property of consistency with respect to a certain set of users if at any time the database responds to their requests in the same way. Implemented by a blocking system.

Recoverability. Ability to restore integrity after any system failure. Has an impact on efficiency (copying is expensive).

    Security - protecting data from unauthorized access. access, modification or destruction.

Loading...Loading...