Sunday 10 July 2011

Firebird's Indices, part 2: How to speed up indices

It is described above that application of indices can greatly speed up the execution of queries. It is really so for the most cases, but there are certain stipulations. First, we will answer the question frequently arising among those who have become familiar with indices. If indices quicken retrieval from a database, why would not we index all the fields in the table? There are two moments, blocking general indexing, - a disk space and costs when modifying the data in the table.

Every created index has a size equal to a data size in the indexed field, plus data size of records allocation. If we create indices for every field in the table, their total size will be more than the data size in the table! Therefore, creation of a large number of indexes leads to a huge expenditure of a disk space.

The second moment is more important. These are outlays when modifying the data in the table. In a relational DBMS, as you know, records in tables are unordered and consequently adding/ deleting of records go without significant outlays of resources of a server. Even if a record is deleted from the middle of a database, there is no moving of data sizes to fill this emptiness, – it is not required: the server will simply mark the empty place and will write something there when necessary. As to addition, in most cases it is executed in the end of the table.

However, though a server does not move the main data in the table when modifying, the data stored in indexes, is reordered every time when adding/ deleting the records! In other words, the server has to rebuild the index when adding a record to the middle of the table. Certainly, index implementation somehow is intended for frequent reorganizations, but these operations nevertheless take time and resources of the processor and when there is a large number of indexes in the table, data modification within it may be much slower than in the same table without indexes!

These are two main reasons, which interfere with general indexing. Besides them, there are some more remarks restricting index application. The first is a rule of 20 %. It says that if the retrieval query returns more than 20 % of records from the table, index using can slow down data retrieval! Certainly, the situation depends on a concrete query and the conditions set to the retrieval on, but we should remember that 20 % of records are a threshold when efficiency of using indices becomes doubtful. The second remark is not formulated so clear. It is connected with the work of Firebird optimizer.

The optimizer is a collection of mechanisms, which develop the schedule of executing the query. When the user gives any SQL query to Firebird, he specifies what server should return after executing the inquiry, but does not define, HOW server should fulfill the inquiry.

The optimizer on the basis of the given query creates the schedule of its execution, i.e. from where and in what order the data for executing the query will be taken, what indices will be used at that. When the server analyzes the retrieval conditions (these are mainly parts of expression WHERE, ORDER BY, etc.) for every field included in the condition, the server tries to use index. Unfortunately, the algorithm of creating the schedule is incomplete and the optimizer frequently uses indexes that are not too effective for the concrete query because of what execution time can be slowed down essentially.

Therefore, creation of unnecessary indices can lead to creation of nonoptimal schedules.

It should be marked that in the latest Firebird versions this problem is solved due to using modern algorithms of making schedules. The third case when index is not necessary are fields with the limited set of values - for example, the field storing the information about the sex of the person and contains only two possible values - "F" and "M"; it is no point in indexing this field. So, we have considered main constraints creating indices. Now we should cover the problem, when it is necessary to use indices to achieve improvement of productivity. There are 3 main cases when a field has to be indexed:

1. When this field is used under the conditions of retrieval in queries
2. When joins of tables use this field
3. When this field is used in the statement of sorting ORDER BY

If the field is applied in the way mentioned above, creating the index for it can lead to improvement of query productivity.

Let's consider a syntax of creating indexes. Here is a complete format of DDL command that allows to create indexes:
CREATE [UNIQUE] [ASC[ENDING] | DESC[ENDING]]
INDEX index ON table (col [, col ...]);


The minimum expression creating the index is the following:
CREATE INDEX my_index ON Table_example(ID)

In this example index with name my_index is created for table Table_example, and ID field is the indexed field. The index is ascending, i.e. values in it are ordered by ascension, as well as non-unique, and it means that ID field can have several identical values. It is certainly the simplest example of index - the most common.

As we can see from the description of syntax, index may contain not one, but a few fields. Such index is used when queries are frequently fulfilled and contain a combination of indexed fields under the conditions of search or sorting. For example, if we have a table containing fields the Surname, the Name, the Patronymic, such index will be applied when making the query that uses sorting by Surname, Name, and Patronymic.

In general, it is not necessary to specify the conditions for all 3 fields applied in index to use its advantages. If we want to sort the result of the query, the index will be used in case the first field in a condition of sorting coincides with the first field in the index. For example, our index will be applied in case of sorting by Surname and Name.

According to documentation for optimization of query execution containing in statement WHERE a join of fields with OR condition, we should use not the aggregate index, but a few single ones for all fields included in OR condition.

As to the question of index sort order, it can be either ascending or descending. Why do we need different sort orders? Obviously, for different sortings! If we wish to sort people by the surname in ascending order, we create the ascending index (ASC), and if in descending (from Z to A) – then descending! If we want both, we have to create both indices.

Tuesday 5 July 2011

Firebird's Indices, part 1

The concept assumed as the basis of indices is simple and visual and is one of the most important bases of databases design. On the basis of indices many database basic objects are grounded, and furthermore the correct use of indices is key to an improvement of the productivity of databases applications. However, what is index? Index is an ordered pointer of the records in the table. Pointer means that the index contains values of one or several fields in the table and the addresses of data pages where these values are located. In other words, index consists of pairs of values "field value" – "physical location of this field".

Thus, by value of the field (or fields), included in the index, using index we can find quickly that place in the table where the record containing this value is allocated. Ordered means that values of the fields stored in the index are ordered.

Very often index is compared with a library catalogue, in which all the books are recorded to cards and are ordered in some way: according to the alphabet or themes, and in every card contains the information where exactly the given book is allocated in storage.
Why do we need indices?

The only thing indices promote is speeding up of record retrieval by its indexed field (indexed - means included in the index). The main function of indices is to provide fast record retrieval in the table. Any index using comes to this.

How this retrieval function is realized? At the input of this function we have the value of indexed field (or several fields). As a result of retrieval we should receive the whole record, in which the indexed field has a preset value. First in the index (to put it more precisely, in the ordered array of values of the indexed field) the required value is searched, then the address of data page is taken where the required record is located, the server goes to this page and reads the found record. It looks rather inconvenient, however search, using index, is many times faster, than sequential enumeration of all values from the table.

If we continue to make the analogy between index and library catalogue, we will see that the record retrieval, using index, is very similar to the book search, using card. When we find a book in a rather small catalogue (in comparison with the whole library storage), we immediately receive the information about where exactly the book is stored and we can go straight there. Search without using index can be compared with sequential enumeration of all the books in the library!

Enumeration of all records in the table is called direct or natural. We should say that despite of powers of modern computers natural enumeration may be very long if the table contains a great number of records.
How are they organized?

Index is not a part of the table, it is a separate object connected with the table and other database objects. This is a very important point of DBMS implementation allowing to separate information storage from its representation.

Firebird as any other relational database stores records in tables in the unordered way, i.e. does not care at all how records are physically allocated in the table.

Unordered storing means that two records added to the table one by one may not be next to each other. Moreover, the data extracted from the table also have no order apart from the one that should be explicitly specified by the user making a retrieval query.

However, we cannot go without ordering the stored data: end-users of applications want to see the data in the defined order - for example, people's last names according to the alphabet. Indices solve the problem of data representation in the ordered way. Field values included in the index are ordered and represented in the special view, optimized for searching the required values (namely, this is essential for creating ordered sequences).

Separating data storage from their representation gives additional benefits in comparison with direct sorting – maybe you will need to sort the initial table in different ways. Then indices will help you – there can be up to 64 indices for every table!

If we speak about implementation of indices at the physical level, they represent a binary tree which nodes represent pairs "field value in index " – "data allocation in the table ". Retrieval of the required record in index is performed using the mechanism of hash-search - one of the fastest search algorithms.
Index application

Now, when it is clear what we can demand from indices, it is time to know about their function in a database. Indices are used in three main cases:

1. Speeding up of inquiry execution. Indices are created for the fields used under the conditions of search of SQL-inquiries.

2. Support of uniqueness of values in fields; a primary key constraint (about which it was told in the chapter "Tables. Primary keys ") demands that in the table there will be no two identical values of the fields included in a primary key. In order to meet this condition, when inserting a new record you should search for the same value that will be inserted. For record retrieval the special variety of index is used - a unique index (see below).

3. Support of reference integrity. Foreign keys constraints (which are considered in the chapter "Database constraints") are used to check that the values inserted into the table necessarily exist in other table. When creating a foreign key index is automatically created. This index is applied for speeding up the inquiries using join of tables, as well as for checking the conditions of the foreign key. We have briefly covered all possible index applications. Now we will consider the peculiarities of every case in more detail and we will answer most frequently arising questions concerning index application.


Read in the part 2:

* How to speed up SQL queries with indices
* Support of reference integrity
* Optimization of index performance

Friday 17 June 2011

Security enhancement in Firebird 2.5: reading password from the file

Issue:
======
All command-line utilities which support -password parameter are
vulnerable to password sniffing, especially when they're run from
scripts. Since 2.1, all Firebird utilities replace argv[PASSWORD]
with *, but better solution for hiding password from others in
process list should be reading it from file or asking for it on
stdin.

Scope:
======
Security issue.

Document author:
=================
Alex Peshkov (peshkoff@mail.ru)

Document date: 2008-11-30
==============


All utilities have new switch
-fetch_password
which may be abbreviated according with utility rules.
The exception is QLI, where -F should be used.

Switch has required parameter - name of file with password. I.e.:
isql -user sysdba -fet passfile server:employee
will load password form file "passfile", using its first line
as password.

One can specify "stdin" as file name to make password be read
from stdin. If stdin is terminal, prompt:
Enter password:
will be printed.

For posix users - if you specify '-fetch /dev/tty' you will also
be promted. This may be useful if you need to restore from stdin:
bunzip2 -c emp.fbk.bz2 | gbak -c stdin /db/new.fdb -user sysdba -fetch /dev/tty

Thursday 17 December 2009

Firebird at raw devices

Issue:
======
When database in forced writes mode grows rapidly, filesystem disk operations produce a serious overhead, making performance up to 3 times lower compared with turned off forced writes.

Scope:
======
Mainly affects Linux, because Linux misses appropriate system call to make database file grow efficiently.

Document author:
=================
Alex Peshkov (peshkoff@mail.ru)

Document date: 2007/11/21
==============


To make firebird have better performance under such circumstances you may place your database not in a regular file on some filesystem, but on raw device. Any type of block device is supported.

For example:

gbak -c my.fbk /dev/sda7

will restore your database on the third logical of extended partition of your SCSI(SATA) disk0.

Known issue:
To be able to do physical (using nbackup utility) copy of database you MUST specify explicit name of difference file:

# isql /dev/sda7
SQL> alter database add difference file '/tmp/dev_sda7';

This is required because default location of difference file will be in /dev, which is surely not what you need. It's also better to know how many blocks on block device are actually occupied (or you will have to copy all data on raw device, which can make size of your copy abnormally large). To obtain real size of database, you should use '-S' switch of nbackup:

# nbackup -s -l /dev/sda7
77173

Where 77173 is a number of pages, occupied by database. Take care - this is database's page size, not disk physical block size! If unsure, use

# gstat -h /dev/sda7
Database "/dev/sda7"
Database header page information:
Flags 0
Checksum 12345
Generation 43
Page size 4096 <== that's what you need
ODS version 11.1
. . . . . . .

You may use nbackup output directly in a script, performing database backup:

# DbFile=/dev/sda7
# DbSize=`nbackup -L $DbFile -S` || exit 1
# dd if=$DbFile ibs=4k count=$DbSize | # compress and record DVD
# nbackup -N $DbFile

Or perform physical backup using nbackup:

# nbackup -B 0 /dev/sda7 /tmp/lvl.0

In all other aspects raw devices do not have known specific in use.
Tip: it's good idea to have raw devices in aliases.conf - in case of HW reconfiguration of your server you will not need to change connection strings.

Wednesday 16 December 2009

Firebird 2.1 Manual Installation

--------------------------------------
Firebird 2.1 manual installation notes
--------------------------------------

Contents
--------

o Installation as a service
o Execution as an application
o Installation with a different service name
o Uninstallation


Installation as a service
-------------------------

Firebird can be manually set up to run as a service in a windows
environment by executing the following steps:

- Make sure that an existing version of Firebird is not running.

- Unzip the archive into the new directory.

- Set the FIREBIRD env var to point to the root of this directory.
(It is also recommended that the FIREBIRD env var also be used in
the PATH env var, rather than hard-coding a path.)

- change the current directory to \bin (here and below
is the directory where v2.1 files are located)

- Execute either install_super.bat or install_classic.bat depending
on architecture required.

- optionally, you can copy both fbclient.dll and gds32.dll
to the OS system directory. To do so, use the provided instclient.exe
tool.


Execution as an application
---------------------------

Alternatively, Firebird can be run as an application. Most of the instructions
for installing as a service should be followed. However, you should skip the
step to run the batch file.

Firebird is run as application by executing it with the -a switch:

fbserver.exe -a


Installation with a different service name
------------------------------------------

A new feature of Firebird 2.1 is the option to run multiple Firebird services
simultaneously. To do this you need to make appropriate changes to the
firebird.conf file and then run instsvc with the -name option. For example:

instsvc install -auto -superserver -guardian -name secondservice
instsvc start -name secondservice

The install_super.bat and install_classic.bat will do this automatically for
you. After modifying the firebird.conf file just pass the servicename as a
parameter:

install_super.bat secondservice

NOTE: If you are running multiple Firebird services it is essential that you
do not configure the FIREBIRD environment variable at the system level.


Uninstallation
--------------

- To remove FB 2.1 you should run the uninstall.bat.

- Pass the service name as a parameter if you have installed Firebird to
use a different service name.

- If you used instclient to deploy a library to system32 you should run

instclient.exe remove

- Delete installation directory. You may wish to keep the log file, security
database, firebird.conf and aliases.conf.

Followers

About Me

My photo
IBSurgeon was established in 2002, 10 years we recover databases and save Firebird/InterBase data.