Sunday, 10 July 2011

Firebird's Indices, part 2: How to speed up indices

It is described above that application of indices can greatly speed up the execution of queries. It is really so for the most cases, but there are certain stipulations. First, we will answer the question frequently arising among those who have become familiar with indices. If indices quicken retrieval from a database, why would not we index all the fields in the table? There are two moments, blocking general indexing, - a disk space and costs when modifying the data in the table.

Every created index has a size equal to a data size in the indexed field, plus data size of records allocation. If we create indices for every field in the table, their total size will be more than the data size in the table! Therefore, creation of a large number of indexes leads to a huge expenditure of a disk space.

The second moment is more important. These are outlays when modifying the data in the table. In a relational DBMS, as you know, records in tables are unordered and consequently adding/ deleting of records go without significant outlays of resources of a server. Even if a record is deleted from the middle of a database, there is no moving of data sizes to fill this emptiness, – it is not required: the server will simply mark the empty place and will write something there when necessary. As to addition, in most cases it is executed in the end of the table.

However, though a server does not move the main data in the table when modifying, the data stored in indexes, is reordered every time when adding/ deleting the records! In other words, the server has to rebuild the index when adding a record to the middle of the table. Certainly, index implementation somehow is intended for frequent reorganizations, but these operations nevertheless take time and resources of the processor and when there is a large number of indexes in the table, data modification within it may be much slower than in the same table without indexes!

These are two main reasons, which interfere with general indexing. Besides them, there are some more remarks restricting index application. The first is a rule of 20 %. It says that if the retrieval query returns more than 20 % of records from the table, index using can slow down data retrieval! Certainly, the situation depends on a concrete query and the conditions set to the retrieval on, but we should remember that 20 % of records are a threshold when efficiency of using indices becomes doubtful. The second remark is not formulated so clear. It is connected with the work of Firebird optimizer.

The optimizer is a collection of mechanisms, which develop the schedule of executing the query. When the user gives any SQL query to Firebird, he specifies what server should return after executing the inquiry, but does not define, HOW server should fulfill the inquiry.

The optimizer on the basis of the given query creates the schedule of its execution, i.e. from where and in what order the data for executing the query will be taken, what indices will be used at that. When the server analyzes the retrieval conditions (these are mainly parts of expression WHERE, ORDER BY, etc.) for every field included in the condition, the server tries to use index. Unfortunately, the algorithm of creating the schedule is incomplete and the optimizer frequently uses indexes that are not too effective for the concrete query because of what execution time can be slowed down essentially.

Therefore, creation of unnecessary indices can lead to creation of nonoptimal schedules.

It should be marked that in the latest Firebird versions this problem is solved due to using modern algorithms of making schedules. The third case when index is not necessary are fields with the limited set of values - for example, the field storing the information about the sex of the person and contains only two possible values - "F" and "M"; it is no point in indexing this field. So, we have considered main constraints creating indices. Now we should cover the problem, when it is necessary to use indices to achieve improvement of productivity. There are 3 main cases when a field has to be indexed:

1. When this field is used under the conditions of retrieval in queries
2. When joins of tables use this field
3. When this field is used in the statement of sorting ORDER BY

If the field is applied in the way mentioned above, creating the index for it can lead to improvement of query productivity.

Let's consider a syntax of creating indexes. Here is a complete format of DDL command that allows to create indexes:
INDEX index ON table (col [, col ...]);

The minimum expression creating the index is the following:
CREATE INDEX my_index ON Table_example(ID)

In this example index with name my_index is created for table Table_example, and ID field is the indexed field. The index is ascending, i.e. values in it are ordered by ascension, as well as non-unique, and it means that ID field can have several identical values. It is certainly the simplest example of index - the most common.

As we can see from the description of syntax, index may contain not one, but a few fields. Such index is used when queries are frequently fulfilled and contain a combination of indexed fields under the conditions of search or sorting. For example, if we have a table containing fields the Surname, the Name, the Patronymic, such index will be applied when making the query that uses sorting by Surname, Name, and Patronymic.

In general, it is not necessary to specify the conditions for all 3 fields applied in index to use its advantages. If we want to sort the result of the query, the index will be used in case the first field in a condition of sorting coincides with the first field in the index. For example, our index will be applied in case of sorting by Surname and Name.

According to documentation for optimization of query execution containing in statement WHERE a join of fields with OR condition, we should use not the aggregate index, but a few single ones for all fields included in OR condition.

As to the question of index sort order, it can be either ascending or descending. Why do we need different sort orders? Obviously, for different sortings! If we wish to sort people by the surname in ascending order, we create the ascending index (ASC), and if in descending (from Z to A) – then descending! If we want both, we have to create both indices.

Tuesday, 5 July 2011

Firebird's Indices, part 1

The concept assumed as the basis of indices is simple and visual and is one of the most important bases of databases design. On the basis of indices many database basic objects are grounded, and furthermore the correct use of indices is key to an improvement of the productivity of databases applications. However, what is index? Index is an ordered pointer of the records in the table. Pointer means that the index contains values of one or several fields in the table and the addresses of data pages where these values are located. In other words, index consists of pairs of values "field value" – "physical location of this field".

Thus, by value of the field (or fields), included in the index, using index we can find quickly that place in the table where the record containing this value is allocated. Ordered means that values of the fields stored in the index are ordered.

Very often index is compared with a library catalogue, in which all the books are recorded to cards and are ordered in some way: according to the alphabet or themes, and in every card contains the information where exactly the given book is allocated in storage.
Why do we need indices?

The only thing indices promote is speeding up of record retrieval by its indexed field (indexed - means included in the index). The main function of indices is to provide fast record retrieval in the table. Any index using comes to this.

How this retrieval function is realized? At the input of this function we have the value of indexed field (or several fields). As a result of retrieval we should receive the whole record, in which the indexed field has a preset value. First in the index (to put it more precisely, in the ordered array of values of the indexed field) the required value is searched, then the address of data page is taken where the required record is located, the server goes to this page and reads the found record. It looks rather inconvenient, however search, using index, is many times faster, than sequential enumeration of all values from the table.

If we continue to make the analogy between index and library catalogue, we will see that the record retrieval, using index, is very similar to the book search, using card. When we find a book in a rather small catalogue (in comparison with the whole library storage), we immediately receive the information about where exactly the book is stored and we can go straight there. Search without using index can be compared with sequential enumeration of all the books in the library!

Enumeration of all records in the table is called direct or natural. We should say that despite of powers of modern computers natural enumeration may be very long if the table contains a great number of records.
How are they organized?

Index is not a part of the table, it is a separate object connected with the table and other database objects. This is a very important point of DBMS implementation allowing to separate information storage from its representation.

Firebird as any other relational database stores records in tables in the unordered way, i.e. does not care at all how records are physically allocated in the table.

Unordered storing means that two records added to the table one by one may not be next to each other. Moreover, the data extracted from the table also have no order apart from the one that should be explicitly specified by the user making a retrieval query.

However, we cannot go without ordering the stored data: end-users of applications want to see the data in the defined order - for example, people's last names according to the alphabet. Indices solve the problem of data representation in the ordered way. Field values included in the index are ordered and represented in the special view, optimized for searching the required values (namely, this is essential for creating ordered sequences).

Separating data storage from their representation gives additional benefits in comparison with direct sorting – maybe you will need to sort the initial table in different ways. Then indices will help you – there can be up to 64 indices for every table!

If we speak about implementation of indices at the physical level, they represent a binary tree which nodes represent pairs "field value in index " – "data allocation in the table ". Retrieval of the required record in index is performed using the mechanism of hash-search - one of the fastest search algorithms.
Index application

Now, when it is clear what we can demand from indices, it is time to know about their function in a database. Indices are used in three main cases:

1. Speeding up of inquiry execution. Indices are created for the fields used under the conditions of search of SQL-inquiries.

2. Support of uniqueness of values in fields; a primary key constraint (about which it was told in the chapter "Tables. Primary keys ") demands that in the table there will be no two identical values of the fields included in a primary key. In order to meet this condition, when inserting a new record you should search for the same value that will be inserted. For record retrieval the special variety of index is used - a unique index (see below).

3. Support of reference integrity. Foreign keys constraints (which are considered in the chapter "Database constraints") are used to check that the values inserted into the table necessarily exist in other table. When creating a foreign key index is automatically created. This index is applied for speeding up the inquiries using join of tables, as well as for checking the conditions of the foreign key. We have briefly covered all possible index applications. Now we will consider the peculiarities of every case in more detail and we will answer most frequently arising questions concerning index application.

Read in the part 2:

* How to speed up SQL queries with indices
* Support of reference integrity
* Optimization of index performance


About Me

My photo
IBSurgeon was established in 2002, 10 years we recover databases and save Firebird/InterBase data.