Sunday, 10 July 2011

Firebird's Indices, part 2: How to speed up indices

It is described above that application of indices can greatly speed up the execution of queries. It is really so for the most cases, but there are certain stipulations. First, we will answer the question frequently arising among those who have become familiar with indices. If indices quicken retrieval from a database, why would not we index all the fields in the table? There are two moments, blocking general indexing, - a disk space and costs when modifying the data in the table.

Every created index has a size equal to a data size in the indexed field, plus data size of records allocation. If we create indices for every field in the table, their total size will be more than the data size in the table! Therefore, creation of a large number of indexes leads to a huge expenditure of a disk space.

The second moment is more important. These are outlays when modifying the data in the table. In a relational DBMS, as you know, records in tables are unordered and consequently adding/ deleting of records go without significant outlays of resources of a server. Even if a record is deleted from the middle of a database, there is no moving of data sizes to fill this emptiness, – it is not required: the server will simply mark the empty place and will write something there when necessary. As to addition, in most cases it is executed in the end of the table.

However, though a server does not move the main data in the table when modifying, the data stored in indexes, is reordered every time when adding/ deleting the records! In other words, the server has to rebuild the index when adding a record to the middle of the table. Certainly, index implementation somehow is intended for frequent reorganizations, but these operations nevertheless take time and resources of the processor and when there is a large number of indexes in the table, data modification within it may be much slower than in the same table without indexes!

These are two main reasons, which interfere with general indexing. Besides them, there are some more remarks restricting index application. The first is a rule of 20 %. It says that if the retrieval query returns more than 20 % of records from the table, index using can slow down data retrieval! Certainly, the situation depends on a concrete query and the conditions set to the retrieval on, but we should remember that 20 % of records are a threshold when efficiency of using indices becomes doubtful. The second remark is not formulated so clear. It is connected with the work of Firebird optimizer.

The optimizer is a collection of mechanisms, which develop the schedule of executing the query. When the user gives any SQL query to Firebird, he specifies what server should return after executing the inquiry, but does not define, HOW server should fulfill the inquiry.

The optimizer on the basis of the given query creates the schedule of its execution, i.e. from where and in what order the data for executing the query will be taken, what indices will be used at that. When the server analyzes the retrieval conditions (these are mainly parts of expression WHERE, ORDER BY, etc.) for every field included in the condition, the server tries to use index. Unfortunately, the algorithm of creating the schedule is incomplete and the optimizer frequently uses indexes that are not too effective for the concrete query because of what execution time can be slowed down essentially.

Therefore, creation of unnecessary indices can lead to creation of nonoptimal schedules.

It should be marked that in the latest Firebird versions this problem is solved due to using modern algorithms of making schedules. The third case when index is not necessary are fields with the limited set of values - for example, the field storing the information about the sex of the person and contains only two possible values - "F" and "M"; it is no point in indexing this field. So, we have considered main constraints creating indices. Now we should cover the problem, when it is necessary to use indices to achieve improvement of productivity. There are 3 main cases when a field has to be indexed:

1. When this field is used under the conditions of retrieval in queries
2. When joins of tables use this field
3. When this field is used in the statement of sorting ORDER BY

If the field is applied in the way mentioned above, creating the index for it can lead to improvement of query productivity.

Let's consider a syntax of creating indexes. Here is a complete format of DDL command that allows to create indexes:
INDEX index ON table (col [, col ...]);

The minimum expression creating the index is the following:
CREATE INDEX my_index ON Table_example(ID)

In this example index with name my_index is created for table Table_example, and ID field is the indexed field. The index is ascending, i.e. values in it are ordered by ascension, as well as non-unique, and it means that ID field can have several identical values. It is certainly the simplest example of index - the most common.

As we can see from the description of syntax, index may contain not one, but a few fields. Such index is used when queries are frequently fulfilled and contain a combination of indexed fields under the conditions of search or sorting. For example, if we have a table containing fields the Surname, the Name, the Patronymic, such index will be applied when making the query that uses sorting by Surname, Name, and Patronymic.

In general, it is not necessary to specify the conditions for all 3 fields applied in index to use its advantages. If we want to sort the result of the query, the index will be used in case the first field in a condition of sorting coincides with the first field in the index. For example, our index will be applied in case of sorting by Surname and Name.

According to documentation for optimization of query execution containing in statement WHERE a join of fields with OR condition, we should use not the aggregate index, but a few single ones for all fields included in OR condition.

As to the question of index sort order, it can be either ascending or descending. Why do we need different sort orders? Obviously, for different sortings! If we wish to sort people by the surname in ascending order, we create the ascending index (ASC), and if in descending (from Z to A) – then descending! If we want both, we have to create both indices.

No comments:

Post a Comment


About Me

My photo
IBSurgeon was established in 2002, 10 years we recover databases and save Firebird/InterBase data.