News 2006

Z PostgreSQL
Skočit na navigaci Skočit na vyhledávání

translated by Marie Buřvalová

The year of intensive PostgreSQL development was finished by developers’ release of version 8.2. For the full version we will have to wait till Christmas. However, now we can create a certain idea of this version 8.2. This version does not contain any special surprise as the previous ones do. Probably the most expected support of bitmap indexes and editable views is missing. In spite of that, it does not mean that the changeover to this version will not pay. This version is markedly more efficient than the foregoing one. Moreover, the modification of FILLFACTOR parameter can influence the speed of doing UPDATE and DELETE operations. COPY is clearly faster (about 30% more).

In the new version we can adjust data pages fillfactor. This value determines how many percent of this data page will be used for new records, and how much space will be left there for contingent updated instruction commands UPDATE or DELETE. The fillfactor is determined in percentages from 10 to 100. The smaller, the possibility that the updated copy of a row will stay in the same page as the original one is higher. And this is, from the access point of view, more effective than its location in another page. Due to it, we will not have to activate VACUUM tables so often in some cases.

CREATE TABLE FOO(...) WITH (FILLFACTOR = 80);

Finding an optimum value might be a really difficult task. Experimentally, I have tried to adjust 50% in intensely modified tables used in pgbenchi and surprisingly, PostgreSQL operation markedly lowered. On one hand, the inquiry speed to modified rows is faster, but on the other hand, the number of data pages is higher, too, and so the sequence reading is slower. One of the reasons for not using pg_autovacuum is its dependence on operation statistics. These were allowed, in the previous versions, to have costs up to 20% in its full load. And this might have caused problems in many places. In 8.2 the costs of operation statistic are inconsiderable, too, but they are half (maximum 10%). Initial PostgreSQL8.2 configuration is a bit more realistic as for the present memory administration parameters. In the previous versions, configuration parameters used to be automatically several times extended. To have the full picture, I will put forward the performance in pgbenchi in the last five versions. Configuration of older versions is made up to correspond to the initial configuration of version 8.2.

Verze 7.3.15 7.4.13 8.0.8 8.1.4 8.2.beta1
tps 311 340 334 398 423

Please, take these numbers roughly. Pgbench (TPC-B) is more likely brute force testing. It will not show, for example, more sophisticated indexes using or more carefully worked-out optimalization of an operating plan. It was measured on an average notebook P(M)1.6G 512 MB RAM, Fedora6. The test would be more interesting on a more processor server, where the difference between versions 8.x and 7.x should be more obvious.

And what do I like best about version 8.2? I do not have a clear favourite. Nevertheless, a bit more intelligent inquiry planning and faster arranging is pleasant. I have noticed sequence reading, which is much faster. There is one really good thing which will lighten my life there, and this is NULL support in arrays (I am a programmer). Now, when the domains are fully supported in PL/pgSQL, I am beginning to think about their more intensive usage. I will spare myself some work with not calling ASSERT procedures.

CREATE DOMAIN pos_int int CHECK (VALUE >=0);
CREATE OR REPLASE FUNCTION test (p pos_int) REURNS pos_int AS $$
DECLARE v pos_int;
BEGIN v := p - 1;
  RETURN v - 1;
END; $$ LANGUAGE plpgsql IMMUTABLE;

Another pleasant thing is the justification of intervals:

justify_interval(interval '3 days 52 hours 3 minutes 2 seconds') ->5 days 04:03:02

Psql is regularly innovated. A presentation of a really large table could fail because of the low memory. We can avoid this by the activation of result reading with a mouse pointer. Psql better presents columns containing a text with more rows. Commands with more rows are stored as one block in history so the work with history is much better, e.g. this is just now when it is possible to work with commands having more rows. Now, in the system views it is possible to find time when the last performed operations VACUUM and ANALYZE were done. The list of functions is displayed by the return type and the list of arguments including their title and type:

postgres=# \df test
                           List of functions
 Schema | Name | Result data type  |        Argument data types
--------+------+-------------------+------------------------------------
 public | test | character varying | a integer, OUT b character varying
(1 row)

In all places, where the extent of int4 (LIMIT, OFFSET) was not sufficient enough, int8 replaced int4. PostgreSQL enables to define its more parametric aggregate functions now. The support of new SQL2003 aggregate binary functions (corr, regr_sxy, ...) is connected with it, as well. Most present installation scripts suffer from false error reporting when there is an attempt to remove still nonexistent object. In future, this problem is solved by the extension of the DROP command with the IF EXISTS phrase. The TRUNCATE command was extended by the CASCADE marker which is an effective way of complete database cleaning. The error in ILIKE was removed, too. This error did not allow using ILIKE in dimensions with more bytes.

Windows users and especially enemies of gcc translator will surely be pleased, that it is possible to translate PostgreSQL in Microsoft Visual Studio. Apart from probably better performance, mainly, it is possible to use Visual Studio setting. Slowly but surely, the port on WIN NT platform is going to be very similar to its UNIX original as for its reliability and performance. QNX and BEOS users will fail abysmally. These systems are not supported any more. A naive LDAP support should make a life to dba, who work hard with PostgreSQL on WinNT, pleasant. More effective tuning on Sunech should be possible thanks to built-in DTrace backing.

Joe Conway and Tom Lane brought in so called multi value insert. Due to generally viewed solution so called value constructor is supported now, too:

postgres=# select a from (values(1),(2)) a(a);
 a
---
 1
 2
(2 rows)

The multi value insert should not replace the COPY TO command. It is slower and using a lot of memory. Its effect is in simplification of imports from databases which generate dump in this format (e.g. MySQL). As far as I know, PostgreSQL is the only database which supports the table value constructor according to SQL2003.

A real blockbuster is the extension of DML commands with the RETURNING part. Syntax is more compatible with Oracle. What does it mean? If we use implicit values or terms in these commands, in fact we do not know the right answer. Very often, after these commands, there is an enquiry, where we can find the required values (e.g. PK from SERIAL columns). The RETURNING Phrase modifies INSERT, UPDATE and DELETE commands in the way of returning the table containing new values, or any term. CREATE TABLE users(id SERIAL PRIMARY KEY, inserted timestamp DEFAULT CURRENT_TIMESTAMP, ... );

--8.2.
INSERT INTO users (name, surname, ....) VALUES(...) RETURNING *;
--8.1
INSERT INTO users (name, surname, ....) VALUES(...);
SELECT id, inserted, name, surname WHERE id = lastval();

For these commands, the support in PL/pgSQL is not missing:

CREATE OR REPLACE FUNCTION testa() RETURNS VOID AS $$
DECLARE _a integer; _b integer; _c integer;
BEGIN
  FOR _a, _b, _c IN INSERT INTO foo VALUES (10,20,30),(10,11,12) LOOP
    RAISE NOTICE '% % %', _a, _b, _c;
  END LOOP;
END;
$$ LANGUAGE plpgsql;

Unfortunately, it is not possible to write the SQL command as in the following example, yet.

INSERT INTO archive SELECT * FROM (DELETE FROM dt RETURNING *)

Together with PostgreSQL, there have been finished or developed several extensions from contrib directory or from pgfoundry repository. At random, I will present some of them: tsearch2 (fulltext) supports UTF8 and should be markedly faster and at the same time, it also enables using OpenOffice dictionaries, pgstattuple (monitoring of dead records in a table), pgcrypto (cryptography features), orafunc (implementation of several tens of rdbms Oracle functions).

Large changes were made in PL/Python. Now, the Python support is on the same or even higher level than the Perl support. In Python, we can write functions using folded types, we can return tables or use named parameters too. Java is not supported straight in the main tree, so it does not have to do anything with beta. However, the Java support grew up and that is why we can design our own data types. It is the only programming language except C, where this is possible. SPI interface is made accessible through modified JDBC driver. Everything is respected by ANSI SQL 2003 SQLJ (so theoretically, the stored procedures should be compatible with Oracle, DB2, etc.).

package foo.fee;
import java.util.Iterator;

public class Bar
{
    public static Iterator getNames()
    {
        ArrayList names = new ArrayList();
        names.add("Lisa");
        names.add("Bob");
        names.add("Bill");
        names.add("Sally");
        return names.iterator();
    }
}

CREATE FUNCTION javatest.getNames()
  RETURNS SETOF varchar
  AS 'foo.fee.Bar.getNames'
  IMMUTABLE LANGUAGE java;

It is impossible not to notice how "commercialized" PostgreSQL development is. After the unsuccessful Great Bridge attempt, EnterpriseDB is trying to do the same. At the moment EnterpriseDB is employing most core developers. Some others are situated in RedHat, GreenPlum, SkyPe. There are only few developers (e.g. from universities) who are still independent. After all, there is nothing to be surprised with. It is impossible to work on such extensive and high quality software in a part-time job. SUN is trying to gain its share in the PostgreSQL success, too. SUN is building professional support teams and together with other companies it is offering a commercial support.

Nevertheless, not all users’ requests were fulfilled. Briefly, I will give you some examples of functions we will have to wait for in other versions: bitmap indexes, editable views, recursive and analytical inquiries, SQL MERGE command, collations support, SQL/XML support.

Unfortunately, I have not mentioned all the news you can try in PostgreSQL beta, now. And to those I have presented, I have not paid as much time as they would deserve. Beta is already persistent, as it is usual in PostgreSQL, so I do not see any reason for not recommending it to you to test.