PostgreSQL SQL Tricks

Z PostgreSQL
Přejít na: navigace, hledání

SQL is a language where one task can be solved multiple ways with different efficiency. You can see effective use of PostgreSQL's possibilities on this page.

Obsah

Logout all connections

Author: Pavel Stěhule

Execute query as user postgres:

postgres=# select * from pg_stat_activity ;
 datid | datname  |  pid  | usesysid | usename  | application_name | client_addr | client_hostname | client_port |         backend_start         |         
-------+----------+-------+----------+----------+------------------+-------------+-----------------+-------------+-------------------------------+
 12894 | postgres | 21091 |       10 | postgres | psql             |             |                 |          -1 | 2012-03-06 09:16:02.466983+01 |
 12894 | postgres | 21103 |    16384 | pavel    | psql             |             |                 |          -1 | 2012-03-06 09:17:02.827352+01 |         
 12894 | postgres | 21104 |    16384 | pavel    | psql             |             |                 |          -1 | 2012-03-06 09:17:12.176979+01 |         
(3 rows)

postgres=# select pg_terminate_backend(pid) 
               from pg_stat_activity 
              where pid <> pg_backend_pid() ;
 pg_terminate_backend 
----------------------
 t
 t
(2 rows)

postgres=# select * from pg_stat_activity ;
 datid | datname  |  pid  | usesysid | usename  | application_name | client_addr | client_hostname | client_port |         backend_start         |         
-------+----------+-------+----------+----------+------------------+-------------+-----------------+-------------+-------------------------------+
 12894 | postgres | 21091 |       10 | postgres | psql             |             |                 |          -1 | 2012-03-06 09:16:02.466983+01 |
(1 row)

Taking first unlocked row from table

Sometimes we need first unlocked row (queue implementation). We can use following trick:

postgres=# select * from queue;
 id | processed |  cmd  
----+-----------+-------
  1 | f         | task1
  2 | f         | task2
  3 | f         | task2
  4 | f         | task3
(4 rows)

Usage:

-- consument1
postgres=# begin;
BEGIN
postgres=# select * from queue where not processed and pg_try_advisory_xact_lock(id) for update limit 1;
 id | processed |  cmd  
----+-----------+-------
  1 | f         | task1
(1 row)

postgres=# update queue set processed = true where id = 1;
UPDATE 1
postgres=# commit;
COMMIT

-- consument2
postgres=# begin;
BEGIN
postgres=# select * from queue where not processed and pg_try_advisory_xact_lock(id) for update limit 1;
 id | processed |  cmd  
----+-----------+-------
  2 | f         | task2
(1 row)

postgres=# update queue set processed = true where id = 2;
UPDATE 1
postgres=# commit;
COMMIT

With advisory locks we can implement unblocked queue parallel processing.

Searching gaps in time series with window analytic functions

Source: Magnus Hagander http://blog.hagander.net/archives/203-Finding-gaps-in-partitioned-sequences.html

Function lag can be used for any sorted series:

SELECT * FROM (
   SELECT
      gropid,
      year,
      month,
      seq, 
      seq-lag(seq,1) OVER (PARTITION BY groupid, year, month ORDER BY seq) AS gap FROM mytable
) AS t
WHERE NOT (t.gap=1)
ORDER BY groupid, year, month, seq

Delete duplicate rows with window analytic functions

We can use a window function for very effective removing of duplicate rows:

DELETE FROM tab 
  WHERE id IN (SELECT id 
                  FROM (SELECT row_number() OVER (PARTITION BY column_with_duplicate_values), id 
                           FROM tab) x 
                 WHERE x.row_number > 1);

Some PostgreSQL's optimized version (with ctid):

DELETE FROM tab 
  WHERE ctid = ANY(ARRAY(SELECT ctid 
                  FROM (SELECT row_number() OVER (PARTITION BY column_with_duplicate_values), ctid 
                           FROM tab) x 
                 WHERE x.row_number > 1));

LIKE to list of patterns

PostgreSQL cannot to use (I)LIKE together list of patterns (in conformance with ANSI SQL):

-- this is not valid
WHERE somecolumn LIKE ('%some%','%someelse%') 

But we can use arrays or regular expressions:

-- this is valid
WHERE somecolumn ~~* any(array['%some%', '%someelse']));

Author: Tom Lane

Enforcing unique name for cursor in PL/pgSQL

PL/pgSQL cursors uses PostgreSQL SQL cursors. When we use PL/pgSQL cursor, then PL/pgSQL runtime creates SQL cursor with same name. This behave is unwanted sometimes. But when we know, so PL/pgSQL cursor is +/- text variable with SQL name, we can assign NULL to this variable. Then PL/pgSQL runtime cannot to use a predefined name and it will generate a unique name:

DECLARE
  mycursor CURSOR FOR SELECT * FROM int4_tbl WHERE f1 > id;
  newid INTEGER;
  out  TEXT;
BEGIN
  out := id::text || ' ';
  mycursor := null;
  OPEN mycursor;
  ..

Conversion BLOB to Bytea

Source: http://snipplr.com/view/62576/convert-oid-to-bytea/

CREATE OR REPLACE FUNCTION merge_oid(val oid)
RETURNS bytea AS $$
DECLARE 
  merged bytea; 
  arr bytea;
BEGIN
  FOR arr IN SELECT DATA 
                FROM pg_largeobject 
               WHERE loid = val 
               ORDER BY pageno 
  LOOP
    IF merged IS NULL THEN
      merged := arr;
    ELSE
      merged := merged || arr;
    END IF;
  END LOOP;
  RETURN merged;
END
$$ LANGUAGE plpgsql;

Function for decoding of url code

Author: Marti Raudsepp

CREATE OR REPLACE FUNCTION url_decode(input text) RETURNS text
LANGUAGE plpgsql IMMUTABLE STRICT AS $$
DECLARE
 bin bytea = '';
 byte text;
BEGIN
 FOR byte IN (select (regexp_matches(input, '(%..|.)', 'g'))[1]) LOOP
   IF length(byte) = 3 THEN
     bin = bin || decode(substring(byte, 2, 2), 'hex');
   ELSE
     bin = bin || byte::bytea;
   END IF;
 END LOOP;
 RETURN convert_from(bin, 'utf8');
END
$$;

Usage:

ohs=# select url_decode('Hell%C3%B6%20World%21');
  url_decode  
──────────────
 Hellö World!
(1 row)

Emacs's configuration for PLpgSQL syntax highlighting when function is edited from console

Author: Peter Eisentraut

modify file .profile

PSQL_EDITOR=emacs;
export PSQL_EDITOR.

modify file .emacs


(add-to-list 'auto-mode-alist
            '("/psql.edit.[0-9]+\\'" . sql-mode))

Cast bytea to text

When we need cast content of bytea variable to adequate text value directly (content of bytea is char codes), then following function is solution:

CREATE OR REPLACE FUNCTION bytea_to_text(bytea) 
RETURNS text AS $$
SELECT convert_from($1, current_setting('server_encoding'))
$$ LANGUAGE sql;

xxx=# SELECT bytea_to_text('žluťoučký kůň se napil žluté vody'::bytea);
           bytea_to_text           
───────────────────────────────────
 žluťoučký kůň se napil žluté vody
(1 row)

Simply age calculation from birth date

postgres=# SELECT CURRENT_DATE;
    date    
------------
 2011-09-20
(1 row)

postgres=# SELECT EXTRACT(YEAR FROM age('1972-08-20'::date));
 date_part 
-----------
        39
(1 row)

Is current server master or slave?

When we need to identify server's state, then we can use a function:

SELECT pg_is_in_recovery();

Author: Simon Riggs

Dynamic modification some record's field

PL/pgSQL can modify record's field only when field specification is static. Some fast C libraries are generally available, but when we have no necessary rights, then we cannot use it. Then we can use function:

CREATE OR REPLACE FUNCTION x.setfield2(anyelement, text, text)
 RETURNS anyelement
 LANGUAGE plpgsql
AS $function$
DECLARE 
  _name text;
  _values text[];
  _value text;
  _attnum int;
BEGIN
  FOR _name, _attnum
     IN SELECT a.attname, a.attnum
           FROM pg_catalog.pg_attribute a 
          WHERE a.attrelid = (SELECT typrelid
                                 FROM pg_type
                                WHERE oid = pg_typeof($1)::oid) 
  LOOP
    IF _name = $2 THEN
      _value := $3;
    ELSE
      EXECUTE 'SELECT (($1).' || quote_ident(_name) || ')::text' INTO _value USING $1;
    END IF;
    _values[_attnum] :=  COALESCE('"' || replace(replace(_value, '"', '""'), '''', '''''') || '"', '');
  END LOOP;
  EXECUTE 'SELECT (' || pg_typeof($1)::text || '''(' || array_to_string(_values,',') || ')'').*' INTO $1; 
  RETURN $1;
END;
$function$;

postgres=# select setfield2(mypoint '(10,)', 'b', '33');
 setfield2 
───────────
 (10,33)
(1 row)

Time: 9,480 ms

Author: Pavel Stěhule

Erwin Brandstetter designed better (more simply, faster) solution:

-- Erwin 1
CREATE OR REPLACE FUNCTION x.setfield3(anyelement, text, text)
RETURNS anyelement
AS $body$
DECLARE
 _list text;

BEGIN
_list := (
   SELECT string_agg(x.fld, ',')
   FROM   (
      SELECT CASE WHEN a.attname = $2
              THEN quote_literal($3)
              ELSE quote_ident(a.attname)
             END AS fld
      FROM   pg_catalog.pg_attribute a 
      WHERE  a.attrelid = (SELECT typrelid
                           FROM   pg_type
                           WHERE  oid = pg_typeof($1)::oid) 
      ORDER BY a.attnum
   ) x
);

EXECUTE '
SELECT ' || _list || '
FROM   (SELECT $1.*) x'
USING  $1
INTO   $1;

RETURN $1;
END;
$body$ LANGUAGE plpgsql;

Iteration over RECORD variable inside trigger

Author: Tom Lane (for PostgreSQL 8.4 and higher)

CREATE OR REPLACE FUNCTION dynamic_trigger()
RETURNS TRIGGER
LANGUAGE plpgsql
AS $$
DECLARE
    ri RECORD;
    t TEXT;
BEGIN
    RAISE NOTICE E'\n    Operation: %\n    Schema: %\n    Table: %',
        TG_OP,
        TG_TABLE_SCHEMA,
        TG_TABLE_NAME;
    FOR ri IN
        SELECT ordinal_position, column_name, data_type
        FROM information_schema.columns
        WHERE
            table_schema = quote_ident(TG_TABLE_SCHEMA)
        AND table_name = quote_ident(TG_TABLE_NAME)
        ORDER BY ordinal_position
    LOOP
        EXECUTE 'SELECT ($1).' || ri.column_name || '::text' INTO t USING NEW;
        RAISE NOTICE E'Column\n    number: %\n    name: %\n    type: %\n    value: %.',
            ri.ordinal_position,
            ri.column_name,
            ri.data_type,
            t;
    END LOOP;
    RETURN NEW;
END; $$;

Note: This technique needs one dynamic SQL per one record's field - functions from PL Toolbox library or from Hstore contrib module are more effective.

Enforcing some chars in to_char function

Sometimes we needs to show some chars, that are same like special chars (has some semantic) from mask. Text in quotes (in format mask) is not interpreted and it is showed without changes:

postgres=# select to_char(current_timestamp,'YYYY-MM-DDTHH24:MI:SS');
        to_char        
-----------------------
 2010-02-07THH24:38:10
(1 row)

postgres=# select to_char(current_timestamp,'YYYY-MM-DD"T"HH24:MI:SS');
       to_char       
---------------------
 2010-02-07T07:38:22
(1 row)

Author: Milen A. Radev

Barrier against to unwanted conversion char(n) to text

Almost all string functions have parameters of text type. PostgreSQL ensures automatic conversion from char(n) to varchar. But there are a few cases where this behave causes problems - mainly for older applications, where char(n) type was used (there is a clean advice - don't use this type). In some legacy application is comparison between char(n) type and literal constant with spaces and it doesn't work in PostgreSQL now:

SELECT ...
  WHERE substr(somecolumn,1,4) = 'ab  ';

It doesn't work, because substr function has text parameter - and returns text and 'ab' is not equal to 'ab '. But we can overload substr function with char(n) datatype (it is possible because varchar and char uses same binary format) - and it doesn't do unwanted conversion from char(n) to text:

create or replace function substr(character, int, int) returns character as $$
select substr($1::cstring::text,$2,$3)
$$ language sql;

Tome Lane wrote more effective proposal:

create function substr(char,int,int) returns char
 strict immutable language internal as 'text_substr' ;

It is little bit dirty trick, but 100% functional.

postgres=# create table f(a character(5));
CREATE TABLE
postgres=# insert into f values('a'),('ab'),('abc');
INSERT 0 3
postgres=# select * from f;
  a
-------
 a
 ab
 abc
(3 rows)

postgres=# select * from f where substr(a,1,3) = 'a  ';
  a
-------
 a
(1 row)

postgres=# select * from f where substr(a,1,3) = 'ab  ';
  a
-------
 ab
(1 row)

Get functions's DDL command

PostgreSQL allows simple way to get full source code of any function:

postgres=# select oid from pg_proc where proname = 'fu';
  oid  
-------
 16389
(1 row)

postgres=# select pg_get_functiondef(16389);
                         pg_get_functiondef                          
---------------------------------------------------------------------
 CREATE OR REPLACE FUNCTION public.fu(anyelement)                   
  RETURNS anyelement                                                
  LANGUAGE plpgsql                                                  
 AS $function$
begin 
  raise notice '%', $1; 
  return $1; 
end; $function$+
 
(1 row)

from Russian alphabet (Cyrillic) to ASCII

source: http://leha.papulovskiy.ru/blog/

CREATE OR REPLACE FUNCTION urltranslit(text) RETURNS text as $$
SELECT
regexp_replace(
	replace(
		replace(
			replace(
				replace(
					replace(
						replace(
							replace(
								translate(
									lower($1),
									'абвгдеёзийклмнопрстуфхыэъь',
									'abvgdeezijklmnoprstufhye'
								), 'ж',	'zh'
							), 'ц',	'ts'
						), 'ч',	'ch'
					), 'ш',	'sh'
				), 'щ',	'sch'
			), 'ю', 'yu'
		), 'я',	'ya'
	)
	,
	'[^a-z]+',
	'-',
	'g'
)
$$ LANGUAGE SQL;

to_string, to_array functions

Author: Pavel Stěhule. Functions string_to_array and array_to_string doesn't handle NULL value well. So I designed new functions with better NULL handling.

CREATE OR REPLACE FUNCTION to_string(anyarray, sep text, nullstr text DEFAULT '') 
RETURNS text AS $$
SELECT array_to_string(ARRAY(SELECT coalesce(v::text, $3) 
                                FROM unnest($1) g(v)),
                       $2)
$$ LANGUAGE sql;

postgres=# select to_string(array[1,2,3,4,null,5],','); 
 to_string  
------------
 1,2,3,4,,5
(1 row)

postgres=# select to_string(array[1,2,3,4,null,5],',','<NULL>');
    to_string     
------------------
 1,2,3,4,<NULL>,5
(1 row)

CREATE OR REPLACE FUNCTION to_array(text, sep text, nullstr text DEFAULT '') 
RETURNS text[] AS $$ 
  SELECT ARRAY(SELECT CASE 
                           WHEN v = $3 THEN NULL::text 
                           ELSE v END 
                  FROM unnest(string_to_array($1,$2)) g(v)) 
$$ LANGUAGE sql;

postgres=# select to_array('1,2,3,4,,5',',');
     to_array     
------------------
 {1,2,3,4,NULL,5}
(1 row)

postgres=# select to_array('1,2,3,4,<NULL>,5',',','<NULL>');
     to_array     
------------------
 {1,2,3,4,NULL,5}
(1 row)

postgres=# select to_array('1,2,3,,5',',')::int[];
    to_array    
----------------
 {1,2,3,NULL,5}
(1 row)

Cast between "unix timestamp" and timestamp

Author:??

CREATE OR REPLACE FUNCTION convert_timestamp_to_xtime(dt timestamp) 
RETURNS integer AS $$
  SELECT EXTRACT(EPOCH FROM $1)::integer
$$ LANGUAGE sql;

CREATE OR REPLACE FUNCTION convert_xtime_to_timestamp(xdt integer) 
RETURNS timestamp AS $$
  SELECT to_timestamp($1)::timestamp
$$ LANGUAGE sql;

Note: You can use casting to abstime type, but: "The types abstime and reltime are lower precision types which are used internally. You are discouraged from using these types in applications; these internal types might disappear in a future release."

Domain for time zone

David E. Wheleer sent a nice trick. PostgreSQL missing a data type for time zone and doesn't contain any function for time zone verification. But we can use operator AT:

CREATE OR REPLACE FUNCTION is_timezone( tz TEXT ) RETURNS BOOLEAN as $$
BEGIN
 PERFORM now() AT TIME ZONE tz;
 RETURN TRUE;
EXCEPTION WHEN invalid_parameter_value THEN
 RETURN FALSE;
END;
$$ language plpgsql STABLE;

CREATE DOMAIN timezone AS CITEXT
CHECK ( is_timezone( value ) );

Non-interactive entering a password for psql

psql missing any switch for password. People often ask, how can use psql inside scripts, when is necessary enter any password. One possibility is use a system variable PGPASSWORD.

postgres=# CREATE USER tom;
CREATE ROLE

postgres=# ALTER USER tom PASSWORD 'tiger';
ALTER ROLE
...
postgres=> \q

[pavel@nemesis ~]$ PGPASSWORD=tiger psql postgres -U tom
Welcome to psql 8.3.8, the PostgreSQL interactive terminal.

Type:  \copyright for distribution terms
       \h for help with SQL commands
       \? for help with psql commands
       \g or terminate with semicolon to execute query
       \q to quit

postgres=> ...

The sense of this method is hiding a password in process list. Warning: On Linux (at least) this is still visible through /proc/<pid>/environ

Another possibility is a .pgpass file in your home directory.

Faster table's list

If your database has a high number of tables - thousands and more, you will like next trick:

alter function pg_table_is_visible(oid) cost 10;

by Tom Lane for 8.3 and 8.4. For 8.5 and newer it is default.

Array of arrays

PostgreSQL supports n-dimensional arrays, but doesn't support array of arrays, so we cannot use cumulative aggregate functions over arrays. Here is workaround:

postgres=# create type intarr as (f int[]);
CREATE TYPE

postgres=# select * from t;
      a      
─────────────
 {10,20,30}
 {1,2,3,4,5}
(2 rows)

postgres=# select array_agg(a) from t;
ERROR:  could not find array type for data type integer[]

postgres=# select array_agg(distinct row(a)::intarr) from t;
                array_agg                 
──────────────────────────────────────────
 {"(\"{1,2,3,4,5}\")","(\"{10,20,30}\")"}
(1 row)

postgres=# select (unnest(array_agg(distinct row(a)::intarr))::intarr).f from t;
      f      
─────────────
 {1,2,3,4,5}
 {10,20,30}
(2 rows)

Sent by Sam Mason.

MySQL function group_concat in PostgreSQL

In MySQL we can find very useful aggregate function group_concat. For this function we can define separator and ordering. This isn't possible in PostgreSQL. But we can use workaround via array functions:

postgres=# SELECT * FROM x;
 cat |  town  
-----+---------
   1 | Benešov
   1 | Tábor
   1 | Písek
   2 | Praha
   2 | Hradec
   3 | Cheb
   3 | Aš
(7 rows)

postgres=# SELECT cat, array_to_string(ARRAY(SELECT unnest(array_agg(town)) 
                                               ORDER BY 1),',') 
              FROM x 
             GROUP BY cat;
 kat |   array_to_string   
-----+---------------------
   1 | Benešov,Písek,Tábor
   3 | Aš,Cheb
   2 | Hradec,Praha
(3 rows)

Since PostgreSQL 9.0 this is natively available as string_agg()

MySQL function field in PostgreSQL

Source: http://stackoverflow.com/questions/1309624/simulating-mysqls-order-by-field-in-postgresql

When we can explicitly to specify some order, we should to use function field in MySQL.

select * from pet order by field(species, 'cat', 'dog', 'bird') desc;

+----------+--------+---------+------+------------+------------+
| name     | owner  | species | sex  | birthday   | death      |
+----------+--------+---------+------+------------+------------+
| Fluffy   | Harold | cat     | f    | 1993-02-04 | NULL       | 
| Claws    | Gwen   | cat     | m    | 1994-03-17 | NULL       | 
| Buffy    | Harold | dog     | f    | 1989-05-13 | NULL       | 
| Fang     | Benny  | dog     | m    | 1990-08-27 | NULL       | 
| Bowser   | Diane  | dog     | m    | 1989-08-31 | 1995-07-29 | 
| Chirpy   | Gwen   | bird    | f    | 1998-09-11 | NULL       | 
| Whistler | Gwen   | bird    | NULL | 1997-12-09 | NULL       | 
+----------+--------+---------+------+------------+------------+

This functionality is missing in PostgreSQL - we could to use CASE statement or we could to write own function. It's possible - we have variadic functions:

select * 
   from pet 
  order by species, case species when 'cat' then 1 
                                 when 'dog' then 2 
                                 when 'bird' then 3 
                                 else 0 end desc;

Source code field function:

CREATE OR REPLACE FUNCTION field(varchar, VARIADIC text[]) 
RETURNS int AS $$
  SELECT i 
     FROM generate_subscripts($2,1) g(i) 
    WHERE $1 = $2[i] 
  UNION ALL
  SELECT 0 
  LIMIT 1 1
$$ LANGUAGE sql;
Complement of function field is function elt:
CREATE OR REPLACE FUNCTION elt(int, VARIADIC text[])
RETURNS text AS $$
  SELECT $2[$1];
$$ LANGUAGE sql;

The first from groups (with windows functions)

This exercise should be solved variously. One way is based on windows functions:

postgres=# SELECT * FROM staff;
┌───────────┬───────────┬────────────┬──────────┐
│   name    │ surname   │ department │  salary  │
├───────────┴───────────┴────────────┴──────────┤
│ Pavel     │ Stehule   │          1 │ 10000.00 │
│ Zdenek    │ Stehule   │          1 │  9000.00 │
│ Vladimira │ Stehulova │          2 │  9000.00 │
└───────────────────────────────────────────────┘
(3 rows)

postgres=# SELECT * 
              FROM (SELECT *, max(salary) OVER (PARTITION BY department) 
                       FROM staff) x 
             WHERE max = salary;
┌───────────┬───────────┬────────────┬──────────┬──────────┐
│   name    │ surname   │ department │  salary  │   max    │
├───────────┴───────────┴────────────┴──────────┴──────────┤
│ Pavel     │ Stehule   │          1 │ 10000.00 │ 10000.00 │
│ Vladimira │ Stehulova │          2 │  9000.00 │  9000.00 │
└──────────────────────────────────────────────────────────┘
(2 rows)

Bigint to ip

By Jasen Betts:

create function bigint_to_inet(bigint) returns inet as $$
select (($1>>24&255)||'.'||($1>>16&255)||'.'||($1>>8&255)||'.'||($1>>0&255))::inet
$$ language sql;

Marking files as orig (for difforig command) based on patch

PostgreSQL hackers like commands cporig and difforig. cporig creates copy file with .orig extension. difforig search files with extension .orig and does cumulative diff i.e. patch. Sometimes we need mark modified files from patch. This process can be simplified with unix commands:

cat mnnotation.diff | egrep -o -e "^\*\*\*.*(sql|out|h|c|sgml|y|lex|Makefile)" | replace "*** " "" | xargs cporig

Safe string to number conversion

Sometimes we do not want to raise an exception when the converted value is not a number. We can prevent the exception by testing if the input value is a number or not:

CREATE OR REPLACE FUNCTION read_int(varchar) 
RETURNS int AS $$ 
  SELECT CASE WHEN $1 ~ e'^\\d+$' THEN $1::int END; 
$$ LANGUAGE SQL IMMUTABLE STRICT;

Dropping milliseconds from timestamp

Usually we don't need timestamp value in maximum precision. For mostly people only seconds are significant. Timestamp type allows to define precision - and we could to use this feature:

postgres=# select current_timestamp;
             now              
------------------------------
 2009-05-23 20:42:21.57899+02
(1 row)

Time: 196,784 ms
postgres=# select current_timestamp::timestamp(2);
          now           
------------------------
 2009-05-23 20:42:27.74
(1 row)

Time: 51,861 ms
postgres=# select current_timestamp::timestamp(0);
         now         
---------------------
 2009-05-23 20:42:31
(1 row)

Time: 0,729 ms

Attention on IS NULL and IS NOT NULL operators for composite types

One may think that !(x IS NULL) = x IS NOT NULL is true in all cases. But there is an exception - composite types. When one field of a composite value is NULL and another field is NOT NULL, then result of both operators is false. IS NULL is true, only when all fields are NULL. IS NOT NULL is true, only when all fields are NOT NULL. For any case in between, then both operators return false.

CREATE OR REPLACE FUNCTION test_isnull()
RETURNS TABLE (a int, b int, isnull bool, isnotnull bool) AS $$
DECLARE r foo;
BEGIN
  isnull := r IS NULL; isnotnull := r IS NOT NULL; RETURN NEXT;
  a := NULL; b := 10; r := ROW(a, b);
  isnull := r IS NULL; isnotnull := r IS NOT NULL; RETURN NEXT;
  a := 10; b := 10; r := ROW(a, b);
  isnull := r IS NULL; isnotnull := r IS NOT NULL; RETURN NEXT;
  RETURN;
END;
$$ LANGUAGE plpgsql;
SELECT * FROM test_isnull();

 a  | b  | isnull | isnotnull
----+----+--------+-----------
    |    | t      | f
    | 10 | f      | f
 10 | 10 | f      | t
(3 rows)

Faster execution of PL/pgSQL function with explicit casting

PL/pgSQL execute right part of assignment statement independently on left part. Result of evaluation is checked and converted to destination type every time. This conversion should be little bit slow. So we could improve assignment statement via explicit casting to ensure type equality of left and right parts assignment statement. Use this technique carefully! And only when is it necessary.

create or replace function test1() 
returns int as $$
declare s int := 0; 
begin 
  for i in 1..100000 loop s := 4e3; end loop; -- numeric constant
  return s; 
end; 
$$ language plpgsql immutable;

create or replace function test2() 
returns int as $$
declare s int := 0; 
begin 
  for i in 1..100000 loop s := 4e3::int; end loop; -- int 
  return s; 
end; 
$$ language plpgsql immutable;

postgres=# select test1();
 test1 
-------
  4000
(1 row)

Time: 176,623 ms

postgres=# select test2();
 test2 
-------
  4000
(1 row)

Time: 47,673 ms

Same problem is implicit conversion in return statement:

create or replace function test1() returns int as $$begin return 4e1; end; $$ language plpgsql; 
create or replace function test2() returns int as $$begin return 4e1::int; end; $$ language plpgsql;

postgres=# select count(test1()) from generate_series(1,100000);
 count  
--------nebo v konverzi návratové hodnoty
 100000
(1 row)

Time: 682,005 ms
postgres=# select count(test2()) from generate_series(1,100000);
 count  
--------
 100000
(1 row)

Time: 528,099 ms

Protection from parallel run of some PL function

Author: Pavel Stěhule

Theoretically we should to detect an run of any function - but these information are not accessible from user interface (without C coding). We should to look to list of processed queries or we should to use advisory locks. Test based on select from pg_stat_activity view is useful only for function directly called by user. Functions called from other functions are invisible.

create or replace function long_run() 
returns void as $$
begin 
  if exists(select procpid 
               from pg_stat_activity 
              where procpid <> pg_backend_pid() 
                    and current_query like '%long_run(%') 
  then 
    raise notice 'procedure is running'; 
    return; 
  end if; 
  perform pg_sleep(10); 
end$$ 
language plpgsql volatile;
CREATE FUNCTION

Next possibility is using advisory locks http://www.postgresql.org/docs/8.3/interactive/functions-admin.html#FUNCTIONS-ADVISORY-LOCKS.

create or replace function long_run()
returns void as $$
declare foid oid:= 'long_run'::regproc::oid;
begin
  if pg_try_advisory_lock(foid::bigint) then
    perform pg_sleep(10); 
    perform pg_advisory_unlock(foid::bigint);   
  else
    raise notice 'procedure is running';
  end if;
  return;
end;
$$ language plpgsql;

Using scientific format

PostgreSQL doesn't support EEEE format in to_char function still (there is a patch for 8.5). We should to use external plperl function:

create or replace function floating_format(double precision, int) 
  returns varchar as $$
   sprintf("%.$_[1]e", $_[0])
$$ language plperl;

create or replace function floating_format(double precision) 
  returns varchar as $$
   sprintf("%e", $_[0])
$$ language plperl;

postgres=# select floating_format(10.223);
 floating_format 
-----------------
 1.022300e+01
(1 row)

postgres=# select floating_format(10.223,2);
 floating_format 
-----------------
 1.02e+01
(1 row)

postgres=# select floating_format(10.223,4);
 floating_format 
-----------------
 1.0223e+01
(1 row)

Terminal's configuration

I using two interfaces for access to PostgreSQL - emacs and psql. These application should be configured for more comfortable work:

  • emacs (.emacs)
(setq-default truncate-lines t)
(ansi-color-for-comint-mode-on)
(setq lazy-lock-defer-on-scrolling t)
(setq inhibit-startup-echo-area-message t)
(setq inhibit-startup-message t)
(show-paren-mode t)
(setq show-paren-style 'mixed)
(fset 'yes-or-no-p 'y-or-n-p)
(global-set-key "\M-g" 'goto-line)
(setq sql-database "postgres")  ;; name of most often used database
;; don't use pager, empty title
(setq sql-postgres-options '("-P" "pager=off" "-P" "title= ")) 
;; interesting idea is setting of start of every new command on new line
;; (setq sql-postgres-options '("-P" "pager=off" "-P" "title= " "-v" "PROMPT1=%/.%n%#\n" "-v" "PROMPT2=" "-v" "PROMPT3="))
  • psql
export PAGER="less -RSX"
psql ... -P pager=always

Moving tables from one schema to second schema

We have to use plpgsql function:
create or replace function mvtable(src_schema varchar, dest_schema varchar, mask varchar)
returns void as $$
declare r record;
begin
  for r in
          select *
             from information_schema.tables
            where table_schema = src_schema and table_name like mask
  loop
    execute 'alter table ' || quote_ident(r.table_schema)
                   || '.' || quote_ident(r.table_name)
                   || ' set schema ' || quote_ident(dest_schema);
  end loop;
  return;
end;
$$ language plpgsql;

Encoding number to any numeral system with CTE

Autor: Pavel Stehule

postgres=# create function to_base(num int, base int = 10) 
           returns varchar as $$
             with recursive z as (
                                  select $1 as a, null::int as b, 0 as i 
                                  union all 
                                  select a/$2, a % $2, i+1 
                                     from z 
                                  where a > 0
                                 ) 
                select array_to_string(array(select substring('0123456789abcdef' from b + 1 for 1) 
                   from z 
                  where i > 0  
                 order by i desc),'');
           $$ language sql immutable;

postgres=# select to_base(255);
 to_base 
---------
 255
(1 row)

postgres=# select to_base(255,16);
 to_base 
---------
 ff
(1 row)

postgres=# select to_base(255,8);
 to_base 
---------
 377
(1 row)

postgres=# select to_base(255,2);
 to_base  
----------
 11111111
(1 row)

Get default values from table definition

One typical query is getting the default values for a record. Since these definitions are stored in the system catalogs (tables), they are accessible.

CREATE OR REPLACE FUNCTION eval(varchar) 
RETURNS varchar AS $$
DECLARE result varchar;
BEGIN
 EXECUTE 'SELECT ' || $1 INTO result;
 RETURN result;
END;$$ LANGUAGE plpgsql STRICT;

CREATE OR REPLACE FUNCTION defaults(text, 
                                    OUT attname name, OUT type varchar, OUT default_val varchar)
RETURNS SETOF RECORD AS $$
SELECT a.attname,
       pg_catalog.format_type(a.atttypid, a.atttypmod),
       (SELECT eval(pg_catalog.pg_get_expr(d.adbin, d.adrelid))
           FROM pg_catalog.pg_attrdef d
          WHERE d.adrelid = a.attrelid AND d.adnum = a.attnum AND a.atthasdef)
   FROM pg_catalog.pg_attribute a
  WHERE a.attrelid = $1::regclass::oid AND a.attnum > 0 AND NOT a.attisdropped
  ORDER BY a.attnum
$$ LANGUAGE SQL STRICT;

Second solution, but not always usable is using RETURNING clause.

Function to_bin

PostgreSQL doesn't provide to_bin function. Implementation (based on casting to bit type) is simple and fast:

create or replace function to_bin(int) 
returns varchar as $$
declare r varchar; 
begin 
  execute 'select ' || $1 || '::bit(' || case when $1 = 0 then 1 else trunc(log(2, $1)) + 1 end || ')' into r; 
  return r; 
end;
$$ language plpgsql immutable strict;

create or replace function bin(varchar) 
returns varchar as $$
declare r varchar; 
begin 
  execute 'select b''' || $1 || '''::int' into r; 
  return r; 
end;
$$ language plpgsql immutable strict;

postgres=# select to_bin(5), bin('101');
 to_bin | bin 
--------+-----
 101    | 5
(1 row)

SQL code for hierarchical break-down of domains

based on Depesc's ideas from http://www.depesz.com/index.php/2008/12/01/getting-list-of-most-common-domains/#comment-27066

postgres=# create or replace function domain_list(varchar[]) 
           returns setof varchar as $$
              select array_to_string($1,’.') 
              union all 
              select domain_list($1[2:array_upper($1,1)]) 
                 where array_upper($1,1) > 1 
           $$ language sql immutable strict;
CREATE FUNCTION
postgres=# select domain_list(array['a','b','c','d']);
domain_list
————-
a.b.c.d
b.c.d
c.d
d
(4 rows)

postgres=# create or replace function domain_list(varchar) 
           returns setof varchar as $$ 
              select domain_list(string_to_array($1, ‘.’))
           $$ language sql immutable strict;
CREATE FUNCTION
postgres=# select domain_list(’a.b.c.d’);
domain_list
————-
a.b.c.d
b.c.d
c.d
d

Solution for migration to 8.3 (problem with tsearch2)

PostgreSQL documentation has a section with some advice that should to help with migration of older application (based on TSearch2) to PostgreSQL 8.3 and higher (with integrated fulltext). This works well when you migrate databases individually, but it fails when you use pg_dump. As documented, you should do some pre steps - import TSearch compatible module into template1. But complete dump generates databases from template0 (you can't to change it). So you cannot load this dump, and you have to edit dump (advice - dump structure and data to two different files).

so, when you find bug:

ERROR:  type "tsvector" already exists

You have to add import of tsearch2.sql to every creation database part in dump.

DROP DATABASE brigady;
CREATE DATABASE brigady WITH TEMPLATE = template0  ENCODING = 'UTF8';
--
-- PostgreSQL database dump complete
--

\connect brigady
-- new line
\i /usr/local/pgsql/share/contrib/tsearch2.sql

SET search_path = public, pg_catalog;
...

Expect SQL functions to be really fast

Much faster than PL/pgSQL - especially when they are wrappers around integrated features (in this example - integrated qsort). PL/pgSQL is usually fast to interpret - I know only about two slow operations - array update and string update. It's based on the internal architecture - every change generate a new object - it isn't significant for small objects (less than 1000), but for large objects this operations needs significant time. However, we can go a different way, using integrated functionality (this way is usually the fastest in any embedded language).

PostgreSQL use internal qsort -so, I wrote qsort in PL/pgSQL too (results will be comparable):

CREATE OR REPLACE FUNCTION quicksort(l integer, r integer, a int[])
RETURNS int[] AS $$
DECLARE akt int[] = a;
  i integer := l; j integer := r; x integer = akt[(l+r) / 2]; 
  w integer;
BEGIN
  LOOP
    WHILE akt[i] < x LOOP i := i + 1; END LOOP;
    WHILE x < akt[j] loop j := j - 1; END LOOP;
    IF i <= j THEN
      w := akt[i];
      akt[i] := akt[j]; akt[j] := w;
      i := i + 1; j := j - 1;
    END IF;
    EXIT WHEN i > j;
  END LOOP;
  IF l < j THEN akt := quicksort(l,j,akt); END IF;
  IF i < r then akt := quicksort(i,r,akt); END IF;
  RETURN akt;
END;
$$ LANGUAGE plpgsql STRICT IMMUTABLE;

postgres=# SELECT array_upper(quicksort(1,10000,array_agg(a)),1) FROM test;
 array_upper
-------------
       10000
(1 row)
Time: 5918,531 ms

Then I wrote same function (note. some 8.4 features are used) in SQL:

CREATE OR REPLACE FUNCTION sort(anyarray) 
RETURNS anyarray AS $$
  SELECT array(SELECT * FROM unnest($1) ORDER BY 1); 
$$ language sql;

postgres=# SELECT array_upper(sort(array_agg(a)),1) FROM test;
 array_upper
-------------
       10000
(1 row)
Time: 35,980 ms

Nota bene - The SQL function is about 150 times faster than the PL/pgSQL function. This isn't surprising. Integrated sorting is well optimized and doesn't do a lot of large memory operations. For information: I tested bublesort too, and I stopped its execution after five minutes.

subscript shift to zero

Implementation of these features is simple in C and it's little bit slow in PL/pgSQL (based on iterated shift). However, we can use the SQL parser capability to recognize an array range:

CREATE OR REPLACE FUNCTION shift_idx(anyarray) 
RETURNS anyarray AS $$
DECLARE 
  mi int := array_upper($1,1); 
  offset int := array_lower($1,1); 
BEGIN
  RETURN '[0:' || mi - offset || '] = ' || $1::varchar; 
END 
$$ LANGUAGE plpgsql IMMUTABLE STRICT;
CREATE FUNCTION

postgres=# SELECT shift_idx(array[1,2,3]);
   shift_idx   
---------------
 [0:2]={1,2,3}
(1 row)

Author: Pavel Stěhule

Record evolution

When we have functions that return a record but cannot use derived tables, we must use following syntax:

CREATE FUNCTION foo(OUT a int, OUT b int, IN c int)
RETURNS record AS $$
BEGIN
  a := c + 1; b := c + 2;
  RETURN;
END;
$$ LANGUAGE plpgsql;

CREATE TABLE t(c int);
INSERT INTO t VALUES(10),(20);

postgres=# SELECT c, foo(c) FROM t;
 c  |   foo   
----+---------
 10 | (11,12)
 20 | (21,22)
(2 rows)

postgres=# SELECT c, (foo(c)).* FROM t;
 c  | a  | b  
----+----+----
 10 | 11 | 12
 20 | 21 | 22
(2 rows)

Conversion between hex and dec numbers

source: http://www.varlena.com/GeneralBits/104.php.

=# select to_hex(11);
    to_hex
   --------
    b
   (1 row)
postgres=# create or replace function to_dec(text) 
           returns integer as $$
           declare r int; 
           begin 
             execute E'select x\''||$1|| E'\'::integer' into r; 
             return r; 
           end
           $$ language plpgsql;
CREATE FUNCTION
postgres=# select to_dec('ff');
 to_dec 
--------
    255
(1 row)

Faster variant (Michael Glaesemann):

CREATE FUNCTION hex2dec(in_hex TEXT)
RETURNS INT
IMMUTABLE STRICT LANGUAGE sql AS $body$
  SELECT CAST(CAST(('x' || CAST($1 AS text)) AS bit(8)) AS INT);
$body$;

Variant that is able to handle arbitrary length input (output as varchar, adapt if needed):

CREATE OR REPLACE FUNCTION hex2dec (varchar)
RETURNS varchar as 
$$
DECLARE 
	_x numeric;
	_count int;
	_digit int;
BEGIN
	_x := 0;
	for _count in 1..length($1) loop 
		EXECUTE E'SELECT x\''||substring($1 from _count for 1)|| E'\'::integer' INTO _digit;
		_x := _x * 16 + _digit ;
	end loop;
	return _x::varchar;
end
;
$$ language plpgsql immutable
;

Fast interval (of time or ip addresses) searching with spatial indexes

Searching intervals is usually slow, because the optimizer don't use an index. The reason lies in the dependency between the start and end columns. One solution is based on spatial indexes: it allows working with two dependent values as if they were a single value:

postgres=# EXPLAIN ANALYZE SELECT * FROM testip WHERE 19999999 BETWEEN startip AND endip;
                           QUERY PLAN
----------------------------------------------------------------
 Seq Scan on testip  (cost=0.00..19902.00 rows=200814 width=12) (actual time=3.457..434.218 rows=1 loops=1)
   Filter: ((19999999 >= startip) AND (19999999 <= endip))
 Total runtime: 434.299 ms
(3 rows)

Time: 435,865 ms

postgres=# CREATE INDEX ggg ON testip USING gist ((box(point(startip,startip),point(endip,endip))) box_ops);
CREATE INDEX
Time: 75530,079 ms
postgres=# EXPLAIN ANALYZE 
              SELECT * 
                 FROM testip 
                WHERE box(point(startip,startip),point(endip,endip)) @> box(point (19999999,19999999), point(19999999,19999999));
                                                                                                QUERY PLAN                                                            
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on testip  (cost=60.50..2550.14 rows=1000 width=12) (actual time=0.169..0.172 rows=1 loops=1)
   Recheck Cond: (box(point((startip)::double precision, (startip)::double precision), point((endip)::double precision, (endip)::double precision)) @> '(19999999,19999999),(19999999,19999999)'::box)
   ->  Bitmap Index Scan on ggg  (cost=0.00..60.25 rows=1000 width=0) (actual time=0.152..0.152 rows=1 loops=1)
         Index Cond: (box(point((startip)::double precision, (startip)::double precision), point((endip)::double precision, (endip)::double precision)) @> '(19999999,19999999),(19999999,19999999)'::box)
 Total runtime: 0.285 ms
(5 rows)

Time: 2,805 ms

Short form for insert ROW variables to table

postgres=# CREATE TABLE foo(a integer, b integer);
CREATE TABLE
postgres=# CREATE OR REPLACE FUNCTION fx() 
           RETURNS void as $$
             DECLARE r foo; 
           BEGIN 
             SELECT INTO r * FROM foo; 
             INSERT INTO foo VALUES(r.*); 
             RETURN; 
           END; 
           $$ LANGUAGE plpgsql;
CREATE FUNCTION
postgres=# SELECT fx();
 fx 
----
 
(1 row)

ALTER TABLE ALTER COLUMN USING

I hadn't known about the USING clause in ALTER TABLE ALTER COLUMN. I thought the old type had to be castable to the new type. But it's not true. Czech boolean literals cannot be cast directly to PostgreSQL's bool type, but, with the USING clause, I can do an ALTER COLUMN TYPE from varchar to bool:

postgres=# CREATE TABLE foo(a varchar);
CREATE TABLE
postgres=# INSERT INTO foo VALUES ('ano'); -- czech yes
INSERT 0 1
postgres=# ALTER TABLE foo ALTER COLUMN a TYPE boolean ;
ERROR:  column "a" cannot be cast to type "pg_catalog.bool"
postgres=# ALTER TABLE foo 
              ALTER COLUMN a TYPE boolean 
             USING CASE a 
                       WHEN 'ano' THEN true 
                       ELSE false END;
ALTER TABLE
postgres=# SELECT * FROM foo;
 a 
---
 t
(1 row)

Quote_ident for schema.name

Using single quotes is one protection against SQL injection. The function quote_ident checks its argument and, when the argument contains problematic chars, it returns the argument surrounded by single quotes. It's quite easy and useful. Problem is with schema.name pair, because the dot is a watched char:

postgres=# select quote_ident('public.foo');
 quote_ident  
--------------
 "public.foo"
(1 row)

postgres=# select * from "public.foo";
ERROR:  relation "public.foo" does not exist
postgres=# select * from public."foo";
 a 
---
(0 rows)

postgres=# select * from "public"."foo";
 a 
---
(0 rows)

We can get around this limitation with custom functions:

CREATE OR REPLACE FUNCTION quote_array(text[]) 
RETURNS text AS $$
SELECT array_to_string(array(SELECT quote_ident($1[i]) 
                                FROM generate_series(1, array_upper($1,1)) g(i)),
                       '.') 
$$ LANGUAGE SQL IMMUTABLE;

CREATE OR REPLACE FUNCTION quote_schema_ident(text) 
RETURNS text AS $$
SELECT quote_array(string_to_array($1,'.'))
$$ LANGUAGE SQL IMMUTABLE;

postgres=# select quote_schema_ident('public.foo tab');
 quote_schema_ident 
--------------------
 public."foo tab"
(1 row)

Autor: Pavel Stehule.

Fast searching of longer prefix

Searching longest prefix is common task - mainly in communications. We search for the longest prefix for some number from from set of prefixes. I solved this task some years ago with stored procedure and modified method of halving of interval. Maybe little bit slower but simple and robust method was described by Hubert Lubaciewski (Depesz).

--table contains about 5000 prefixes
postgres=# select * from prefixesx limit 5;
   costcode_name    | costcode 
--------------------+----------
 Afghanistan        | 93
 Afghanistan Mobile | 9370
 Afghanistan Mobile | 9379
 Afghanistan Mobile | 9380
 Alaska (USA)       | 1907
(5 rows)

Simple but slow solution with LIKE:

postgres=# SELECT * 
              FROM prefixesx 
             WHERE '420724181000' LIKE costcode || '%' 
             ORDER BY length(costcode) DESC 
             LIMIT 1;
          costcode_name          | costcode 
---------------------------------+----------
 Czech Republic Mobile - EuroTel | 42072

Fast solution is generating all possible prefixes from number and search this prefixes:

CREATE OR REPLACE FUNCTION prefixes(varchar) 
RETURNS varchar[] AS $$
SELECT ARRAY(SELECT substring($1 FROM 1 FOR i) 
                FROM generate_series(1, length($1)) g(i))::varchar[]; 
$$ LANGUAGE sql IMMUTABLE;

and use query:

postgres=# SELECT * 
              FROM prefixesx 
             WHERE costcode = ANY (prefixes('420724191000')) 
             ORDER BY length(costcode) DESC 
             LIMIT 1;
          costcode_name          | costcode 
---------------------------------+----------
 Czech Republic Mobile - EuroTel | 42072

On 5000 rows is this query 4 times faster than LIKE.

Little bit slower is query with LIKE and LIKE firstchar || %

SELECT * 
   FROM prefixesx
  WHERE '420724181000' LIKE costcode || '%'
    AND costcode LIKE substring('420724191000' FROM 1 FOR 1) || '%'  
  ORDER BY length(costcode) DESC 
  LIMIT 1;

Get count of substrings in string

This function is based on popular trick - using replace and length function without cycle, that is typical for interprets where built-in functions are faster than iteration:

CREATE OR REPLACE FUNCTION CountInString(text,text)
RETURNS integer AS $$
 SELECT(Length($1) - Length(REPLACE($1, $2, ''))) / Length($2) ;
$$ LANGUAGE SQL IMMUTABLE;

Sent by Rodrigo E. De León Plicet. Another my (array based) solution:

CREATE OR REPLACE FUNCTION CountInString(text, text) 
RETURNS integer AS $$
  SELECT Array_upper(String_to_array($1,$2),1) - 1;
$$ LANGUAGE SQL IMMUTABLE;

Timestamp recalculation for different timezone and its indexing

The simplest specification of time at some timezone is using AT TIME ZONE operator. We can't indexing timestamp with timezone, because we can change current timezone, so we can create index only for timestamp at some specified timezone. Source: pgsql-general, Tom Lane

CREATE INDEX created_tz_paris ON data((created_on AT TIME ZONE 'Europe/Paris'));

Using IMMUTABLE functions as hints for the optimizer

For highly specific data distributions, we can have a problem with prediction and generation of suboptimal execution plans. These problems we can solve by breaking a SQL query into more dependent queries and evaluating step by step. The following queries use real results (not estimations) precedent queries, and queries are executed optimally. Tom Lane's proposal is based on using IMMUTABLE functions that are evaluated before generating an execution plan:

CREATE OR REPLACE FUNCTION getTypesLieuFromTheme(codeTheme text) 
RETURNS text[] AS $f$
SELECT ARRAY(SELECT codetylieu::text FROM rubtylieu WHERE codeth = $1);
$f$ LANGUAGE SQL IMMUTABLE;

in query
SELECT ...
  WHERE ... AND gl.codetylieu = ANY(getTypesLieuFromTheme('RES'))

look execution plan doesn't contain function call and query to table rubtylieu too
->  Bitmap Heap Scan on genrelieu gl (cost=2.23..6.22 rows=88 width=4) (actual time=0.022..0.075 rows=88 loops=1)
       Recheck Cond: ((codetylieu)::text = ANY ('{RES}'::text[]))
       ->  Bitmap Index Scan on ind_genrelieu2 (cost=0.00..2.21 rows=88 width=0) (actual time=0.016..0.016 rows=88 loops=1)
                Index Cond: ((codetylieu)::text = ANY ('{RES}'::text[]))

Use this trick as last step. Test, please, by increasing the statistic number beforehand.

Autonomous transaction in PostgreSQL

Autonomous transaction is usable feature mainly for logging. This feature isn't supported in PostgreSQL yet. With untrusted languages we are able to emulate it. You can see solution by Jon Roberts.

CREATE OR REPLACE FUNCTION fn_log_error(p_function varchar, p_location int, p_error varchar) 
RETURNS void AS $$
DECLARE
  v_sql varchar;
  v_return varchar;
  v_error varchar;
BEGIN
  PERFORM dblink_connect('connection_name', 'dbname=...');

  v_sql := 'INSERT INTO error_log (function_name, location, error_message, error_time) '
           || 'VALUES (''' || p_function_name || ''', ' 
           || p_location || ', ''' || p_error || ''', clock_timestamp())';
  SELECT INTO v_return * 
     FROM dblink_exec('connection_name', v_sql, false);

  --get the error message
  SELECT INTO v_error * 
     FROM dblink_error_message('connection_name');

  IF position('ERROR' in v_error) > 0 OR position('WARNING' in v_error) > 0 THEN
    RAISE EXCEPTION '%', v_error;
  END IF;

  PERFORM dblink_disconnect('connection_name');
EXCEPTION
  WHEN others THEN
    PERFORM dblink_disconnect('connection_name');
    RAISE EXCEPTION '(%)', SQLERRM;
END;
$$ LANGUAGE plpgsql SECURITY DEFINER;

Optimization of repetitive assignment to varchar variable with arrays

plpgsql isn't a good language for intensive non SQL operations. It's best as glue for SQL statements or for prototyping. But sometimes plperl or plpython can be significantly faster. Plpgsql does not like cumulative iterations over varchar or array variables. When we can't use Perl, .. we could use SQL:

CREATE OR REPLACE FUNCTION SlowList(int) -- slow function, usable for N <= 100
RETURNS varchar AS $$
DECLARE s varchar = '';
BEGIN
  FOR i IN 1..$1 LOOP
    s := '<item>' || i || '</item>';  -- slow is s := s || ..
  END LOOP;
  RETURN s;
END; $$ LANGUAGE plpgsql IMMUTABLE;

CREATE OR REPLACE FUNCTON FastList(int) -- fast function
RETURNS varchar AS $$
BEGIN
  RETURN array_to_string(ARRAY(SELECT '<item>' || i || '</item>'
                                  FROM generate_series(1, $1) g(i)),
                         '');
END; $$ LANGUAGE plpgsql IMMUTABLE;

For N < 100 execution time is less then 4ms (so isn't any reason for dirty trick). For bigger N is execution time different (SlowList(10000) - 4000ms, FastList(10000) - 52ms). Author - Pavel Stěhule.

Use IS DISTINCT FROM without COALESCE

Sim Zacks reported slow queries after migration to 8.2.4. This version has problem with prediction of result of expression COALESCE(column, false) = false, and then execution plan is suboptimal. After correction PostgreSQL choose optimal execution plan.

SELECT ...
   FROM some_tab
        LEFT JOIN
        some_tab2
        ON sometab2.col IS DISTINCT FROM true; --> join false or NULL 

So don't use COALESCE in WHERE clause

Next week somebody reported similar problem:

-- execution plan is suboptimal
((COALESCE((at_type)::integer, 1) = 1) AND (COALESCE(at_language, 0::numeric) = 0::numeric))

-- execution plan is optimal
(at_type = 1 or at_type is null) AND (at_language = 0 or at_language is null)

Port to Oracle, problems with to_date function

Peter Eisentraut reported incompatibility to_date function between PostreSQL and Oracle.

SQL> select to_date('31-DEC-200700:00:00', 'dd-mon-yyyy hh24:mi:ss') from
dual;

TO_DATE('
---------
31-DEC-07

On PostgreSQL:

select to_date('31-DEC-200700:00:00', 'dd-mon-yyyy hh24:mi:ss');
  to_date
--------------
 200700-12-31

Oracle ignores spaces in format string, but PostgreSQL requires exact format. This behavior can be changed in custom function that drops spaces (Author: Jon Roberts):

CREATE OR REPLACE FUNCTION fn_to_date(p_date varchar, p_format varchar)
RETURNS timestamp AS $$
 SELECT to_timestamp(replace($1, ' ', ''), replace($2, ' ', ''));
$$ LANGUAGE SQL STRICT IMMUTABLE;

# select fn_to_date('31-DEC-200700:00:00', 'dd-mon-yyyy hh24:mi:ss');

    fn_to_date
---------------------
 2007-12-31 00:00:00
(1 row)

Identification and correction of wrong chars in PostgreSQL dump

8.2 is stricter than 8.1 for correct chars in dump. Reason is in SQL injection security fix (via incorrect UTF8 chars). This change of behavior can cause problems with migration. Fix is simple. We have to find wrong char and fix it.

$ iconv -f utf-8 -t utf-8 dump.sql > /dev/null
iconv: illegal input sequence at position 1000
$ head -c 1010 dump.sql | tail -c 20 | od -c
0000000 h . B u t i s n 222 t i t
0000020 h i s ?
0000024

Source: http://lca2007.linux.org.au/att_data/Miniconfs(2f)PostgreSQL/attachments/professional.pdf

Be carefull when using the NOT IN operator with NULL

Some SQL developer dislike NULL. Other like it. I like it, too. NULL is necessary and saves a lot of work. NULL in a result expression clearly signals some problem in the input data. People who dislike NULL don't understand it and use the following argument, citing an "anomaly":

postgres=# SELECT * 
              FROM (VALUES(10),(20),(30)) v(a) 
             WHERE a IN (10, 20, NULL); --> all is ok
 a  
----
 10
 20
(2 rows)

postgres=# SELECT * 
              FROM (VALUES(10),(20),(30)) v(a) 
             WHERE a NOT IN (10, 20, NULL);
 a 
---
(0 rows) --> anomaly, expects 30

But it isn't an anomaly. The NOT IN operator checks each item of a list, because the left-hand value must be different than all of them. Since we can never have a comparison with NULL that results in anything but NULL, no value can fulfill this predicate.

Fast specification of first rows of some table field

My task is specific. I have address book with 100 000 items and I have to get all first chars of surnames. This task is one from a few tasks, where correlated subquery is faster than everything else. Because seq scan and agg function need scan all 100 000 records, but correlated subquery need only 32 access to functional index:

SELECT * 
   FROM (
         SELECT substring('abcdefghijklmnopqrstuvwxyzěščřžýáíé' 
                          FROM i For 1) 
            FROM generate_series(1,32) g(i)
        ) p(onechar) 
  WHERE EXISTS(
               SELECT 1 
                  FROM address_book 
                 WHERE substring(surname from 1 for 1) = p.onechar  
                   and active
              );

Version for support czech char CH (and some others: úůňľó):

   SELECT * 
   FROM
     ( 
         SELECT substring('abcdefghijklmnopqrstuvwxyzěščřžýáíéúůňľó' FROM i For 1) FROM generate_series(1,40) g(i) UNION SELECT 'ch' 
     ) p(onechar) 
   WHERE
     EXISTS( SELECT 1 FROM a_osoba WHERE lower(substring(prijmeni from 1 for char_length(p.onechar))) = p.onechar );

An employee with highest compensation

It's typical school query: select the highest-paid employees in some group. This query can be solved with derivated table or with correlated subquery. We could use the nonstandard clause DISTINCT ON in PostgreSQL:

postgres=# SELECT * FROM employees ;
   name    | surname   | department | payment  
-----------+-----------+-==---------+-------
 Pavel     | Stehule   |          1 | 10000
 Zdenek    | Stehule   |          1 |  9000
 Vladimira | Stehulova |          2 |  9000
(3 rows)

postgres=# SELECT DISTINCT ON department * 
              FROM employees 
             ORDER BY department, payment DESC;
   name    | surname   | department | payment  
-----------+-----------+------------+-------
 Pavel     | Stehule   |          1 | 10000
 Vladimira | Stehulova |          2 |  9000
(2 rows)

Attention, DISTINCT ON isn't portable. A big disadvantage of this solution is that it shows only one person from every department, even when more employees has the same, highest, payment, yielding an incomplete result.

Any other session variables

PostgreSQL doesn't support server session variables. This mean, so we have to write stored procedures more often, because there are variables. Some substitution is module variables. These variables has to allowed in configuration. Sent by Andreas Kretschmer:


- define in your postgresql.conf:
 custom_variable_classes = 'myvar'

- use within psql:

test=# set myvar.benutzer = 'foo';
SET
test=*# select * from foo;
 id | name
----+------
 1 | foo
 2 | bar
(2 rows)

test=*# select * from foo where name=current_setting('myvar.benutzer');
 id | name
----+------
 1 | foo
(1 row)

test=*# set myvar.benutzer = 'none';
SET
test=*# select * from foo where name=current_setting('myvar.benutzer');
 id | name
----+------
(0 rows)

xpath function indexing

8.3 has integrated xpath function. There is gap in XML support, because XML type isn't supported with GIST or GIN index. So xpath function returns array of xml values. But we can write custom casting to int array:

CREATE OR REPLACE FUNCTION xml_list_to_int(xml[])
RETURNS int[] AS $$
SELECT ARRAY(SELECT to_number(($1[i])::text,'999999.99')::int
                FROM generate_series(1, array_upper($1,1)) g(i))
$$ LANGUAGE SQL IMMUTABLE;

CREATE CAST (xml[] AS int[]) WITH FUNCTION xml_list_to_int(xml[]);

-- array of integers are supported with GIST
CREATE INDEX fx ON foo USING
GIN((xpath('//id/text()',order_in_xml)::int[]));

psql together with less

psql has paging support, but it isn't comparable with the options offered by less. Here is example on how to setup psql to use less as the pager. Author: Merlin Moncure

#to profile
export PAGER=less
export LESS="-iMSx4 -FX"

#to .psqlrc
\timing
\pset pager always

Intersection of arrays

Very nice trick by David Fetter. It's based well known trick (universal sort of array):

CREATE OR REPLACE FUNCTION array_intersect(ANYARRAY, ANYARRAY)
RETURNS ANYARRAY
LANGUAGE SQL
AS $$
  SELECT ARRAY(
               SELECT $1[i] AS "the_intersection"
                  FROM generate_series(
                                       array_lower($1,1),
                                       array_upper($1,1)
                                      ) AS i
               INTERSECT
               SELECT $2[j] AS "the_intersection"
                  FROM generate_series(
                                       array_lower($2,1),
                                       array_upper($2,1)
                                      ) AS j
              );
$$;

An alternative and possibly faster way:

CREATE OR REPLACE FUNCTION array_intersect(ANYARRAY, ANYARRAY)
RETURNS ANYARRAY
LANGUAGE SQL
AS $$
  SELECT ARRAY(SELECT UNNEST($1) INTERSECT SELECT UNNEST($2))
$$;

VOID function in SQL language

SQL functions can contain only SQL statements. The results of the last statements are results of the functions. The type of a result have to be compatible with the function's type. Because VOID is fictive type, no statement can return this type with one exception. We can cast NULL to VOID.

note: void returning functions are not allowed to be used through the binary protocol

CREATE OR REPLACE FUNCTION none() 
RETURNS VOID AS $$ SELECT NULL::VOID; $$
LANGUAGE SQL;

Fast select of next row

For older application (ISAM appliaction works on series of rows) we have be able to search next row specified with some combination of values (a1, b1, c1). Traditional solution:

SELECT * 
   FROM data
  WHERE (a > a1)
     OR (a = a1 AND b > b1)
     OR (a = a1 AND b = b1 AND c > c1)
  ORDER BY a, b, c
  LIMIT 1;

Merlin Moncure noted simplified form that use row equation. This form is more readable and allows using multicolumn index:

SELECT *
   FROM data
  WHERE (a, b, c) > (a1, b1, c1)
  ORDER BY a, b, c
  LIMIT 1; 

ISAM access downgrade SQL serer performance. For effective SQL we have to operate over sets, but when we cannot rewrite application, we have not choice.

Slow queries in stored procedures

Sometimes some users reports slow queries in stored procedure. Direct execution of same query is fast. Reason is simple. Stored procedures uses prepared statements and optimalization of prepared statements runs without knowledge of real parameters. This behave ensure safety against sql injection, but carry some problems, planner is blind. This problem isn't too often and solution is simply - we have to use dynamic queries. But we cannot to forget carefully check against sql injection. This kind of problems we can find with analyzing of prepared statements.

t2=# PREPARE pp(integer) AS SELECT count(*) FROM foo WHERE a BETWEEN $1 + 100 AND $1 + 200; 
t2=# EXPLAIN ANALYZE execute pp(100);
                                                      QUERY PLAN                                                       
-----------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=438.60..438.61 rows=1 width=0) (actual time=16.149..16.150 rows=1 loops=1)
   ->  Bitmap Heap Scan on foo  (cost=13.39..437.35 rows=500 width=0) (actual time=1.391..5.458 rows=1033 loops=1)
         Recheck Cond: ((a >= ($1 + 100)) AND (a <= ($1 + 200)))
         ->  Bitmap Index Scan on fx  (cost=0.00..13.26 rows=500 width=0) (actual time=1.131..1.131 rows=1033 loops=1)
               Index Cond: ((a >= ($1 + 100)) AND (a <= ($1 + 200)))
 Total runtime: 16.340 ms
(6 rows)
t2=# EXPLAIN ANALYZE SELECT count(*) FROM foo WHERE a BETWEEN 100 + 100 AND 100 + 200; 
                                                      QUERY PLAN                                                       
-----------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=432.21..432.22 rows=1 width=0) (actual time=15.930..15.932 rows=1 loops=1)
   ->  Bitmap Heap Scan on foo  (cost=22.21..429.78 rows=971 width=0) (actual time=1.179..13.338 rows=1033 loops=1)
         Recheck Cond: ((a >= 200) AND (a <= 300))
         ->  Bitmap Index Scan on fx  (cost=0.00..21.97 rows=971 width=0) (actual time=0.917..0.917 rows=1033 loops=1)
               Index Cond: ((a >= 200) AND (a <= 300))
 Total runtime: 16.058 ms
(6 rows)

Execution plans are nearly identical, so we can use prepared statements.

Slow DISTINCT

Current implementation of DISTINCT needs sort, that can be slow. We should to use GROUP BY clause - this clause can to use hash aggregate - and it's significantly faster than DISTINCT.

postgres=# select count(*) from (select distinct i from g) a;
 count 
-------
 10001
(1 row)

Time: 1563,109 ms

postgres=# select count(*) from (select i from g group by i) a;
 count 
-------
 10001
(1 row)

Time: 594,481 ms

This topis is related to PostgreSQL 8.2 and older. PostgreSQL 8.3 and higher can to use a hash aggregation for distinct.

Taking quarter from interval

Interval arithmetic is little bit alchemy. Internally type interval carries days, months and years. Input value isn't normalized, it stays in entered units (i.e. 200 days are stored as 0 years, 0 months and 200 days). Some functions can't to work well with similar entered values (etc. EXTRACT QUARTER returns first quarter). We have to do normalization first with justify_interval function (so interval 200 days will be stored as 0 years, 6 months and 20 days). From this value we take correct information about quarter:

postgres=# SELECT interval '200 days', EXTRACT(QUARTER FROM interval '300 days');
 interval | date_part 
----------+-----------
 200 days |         1
(1 row)
postgres=# SELECT justify_interval(interval '200 days'),
                  EXTRACT(QUARTER FROM justify_interval('200 days'));
 justify_interval | date_part 
------------------+-----------
 6 mons 20 days   |         3
(1 row)

Author: Tom Lane from pgsql-bugs

Select random row

Hubert Lubaczewski showed in his [blog] problem with select random row from table. When I did one database audit I found typical but slow solution (from FAQ) ORDER BY random() LIMIT 1. This method never uses index and every similar query do: complete scan of table and sort of table. Next possible method WHERE id >= (random()*C+1)::int LIMIT 1 is much faster, but when series has some gaps then prefer first rows after gaps. I used with success method based on using set of random numbers. Disadvantage of this method is risk of returning zero rows (then we have to call statement again):

SELECT id, ...
   FROM data
  WHERE id = ANY(ARRAY(
                       SELECT (random()*max_id)::int 
                          FROM generate_series(1,20)))
  LIMIT 1;
-- max_id is constant equal max(id) from table + 20% reserve
-- id is primary key of table data

Use exception carefully

Trapping of exceptions has some overhead. So don't use it too often. In majority of cases this overhead is insignificant, but that can change when you are using cycles. Next two procedures add new item into table of unique records. If item exists, it returns false, else true.

CREATE OR REPLACE FUNCTION addnew1(integer) 
RETURNS boolean AS $$
BEGIN
  -- test(a) .. primary key
  INSERT INTO test(a) VALUES($1);
  RETURN true;
EXCEPTION WHEN OTHERS
  RETURN false
END; $$ LANGUAGE plpgsql;

CREATE OR REPLACE FUNCTION addnew2(integer)
RETURNS boolean AS $$
BEGIN
  IF NOT EXISTS(SELECT a FROM test WHERE a = $1) THEN
    INSERT INTO test(a) VALUES($1);
    RETURN true;
  END IF;
  RETURN false;
END; $$ LANGUAGE plpgsql;

Procedure addnew1 is about 2 times slower than addnew2, but addnew2 is more robust. Every protected block in plpgsql creates new subtransaction. And in this case time for creating new subtrans + INSERT > SELECT + INSERT.

Fast compare variables NEW and OLD in trigger's body

We can run trigger effectively if trigger's body is executed only when any columns are changed. This test can be simple and fast with new (from 8.2) row operator IS DISTINCT FROM.

IF ROW(NEW.*) IS DISTINCT FROM ROW(OLD.*) THEN
  -- continue only if some is changed
  IF NEW.a1 IS DISTINCT FROM OLD.a1 
     OR NEW.a2 IS DISTINCT FROM OLD.a2 THEN
     -- continue only if important columns are changed
     ...
  END IF;
END IF;
RETURN NEW;

or effectively:

IF ROW(NEW.a1, NEW.a2) IS DISTINCT FROM ROW(OLD.a1, OLD.a2) THEN
  -- important columns are changed
  ...
END IF;
RETURN NEW;

From 8.4 you can compare record values simply:

IF NEW IS DISTINCT FROM OLD THEN
  -- continue only if some is changed
  ...
END IF;
RETURN NEW;

Fast first n rows removing

PostgreSQL doesn't support LIMIT for UPDATE or DELETE statements. Sometimes we would to use it (for queue emulation, ...). Classic solution with IN and subquery isn't too effective:

postgres=# explain analyze delete from del where ctid  in (select ctid from del limit 10 offset 0);
                                                      QUERY PLAN                                                      
----------------------------------------------------------------------------------------------------------------------
 Merge IN Join  (cost=804.79..854.94 rows=10 width=6) (actual time=63.392..63.464 rows=10 loops=1)
   Merge Cond: (public.del.ctid = "IN_subquery".ctid)
   ->  Sort  (cost=804.39..829.39 rows=10000 width=6) (actual time=63.205..63.217 rows=11 loops=1)
         Sort Key: public.del.ctid
         Sort Method:  quicksort  Memory: 646kB
         ->  Seq Scan on del  (cost=0.00..140.00 rows=10000 width=6) (actual time=0.045..27.366 rows=9970 loops=1)
   ->  Sort  (cost=0.41..0.43 rows=10 width=6) (actual time=0.172..0.187 rows=10 loops=1)
         Sort Key: "IN_subquery".ctid
         Sort Method:  quicksort  Memory: 17kB
         ->  Limit  (cost=0.00..0.14 rows=10 width=6) (actual time=0.021..0.127 rows=10 loops=1)
               ->  Seq Scan on del  (cost=0.00..140.00 rows=10000 width=6) (actual time=0.016..0.030 rows=10 loops=1)
 Total runtime: 63.661 ms
(12 rows)

Tom Lane's proposal is based on using of array:

postgres=# explain analyze delete from del where ctid  = any (array(select ctid from del limit 10));
                                                    QUERY PLAN                                                    
------------------------------------------------------------------------------------------------------------------
 Tid Scan on del  (cost=0.14..40.37 rows=10 width=6) (actual time=0.200..0.237 rows=10 loops=1)
   TID Cond: (ctid = ANY ($0))
   InitPlan
     ->  Limit  (cost=0.00..0.14 rows=10 width=6) (actual time=0.046..0.139 rows=10 loops=1)
           ->  Seq Scan on del  (cost=0.00..140.00 rows=10000 width=6) (actual time=0.040..0.065 rows=10 loops=1)
 Total runtime: 0.376 ms

The core of this trick is protection from JOIN, what is not (in this case) effective as seq. scan with filter. This trick is usable in other cases. Generally we should to say, so for small N (N < 100) is predicate = ANY better than predicate IN (for PostgreSQL 8.2 and 8.3). But real efficiency depends on indexes.

Getting list of comma separated items

D. Dante Lorenso sent note about getting list from content of table:

SELECT a.id, a.name,
    array_to_string(ARRAY(
      SELECT b.name
         FROM b
        WHERE b.id = a.id
        ORDER BY b.name ASC
    ), ',') AS b_names
   FROM a
  ORDER BY a.id ASC;

[table a]
 id | name
----+------
1   | one
2   | two
3   | three
4   | four

[table b]
 id | name
----+------
1   | pizza
1   | hot dog
2   | gorilla
2   | monkey
3   | apple
4   | cheese
4   | milk
4   | eggs

--result
 id | name  | b_names
----+-------+---------
1   | one   | pizza,hot dog
2   | two   | gorilla,monkey
3   | three | apple
4   | four  | cheese,milk,eggs

Using functions pgstattuple and pgstatindex

You can find nice contrib package pgstattuple in PostgreSQL 8.1 and higher. This contrib module contains some diagnostics functions for detection of fraction of dead tuples and for detection of index's fragmentation. Using of these functions is more practical with next two table generating functions:

CREATE OR REPLACE FUNCTION print_table_dead_tp(OUT table_name varchar, OUT tuple_count int8, OUT table_len varchar, OUT free_space varchar,
                                               OUT dead_tuple_percent numeric(5,2), OUT dead_tuple_count integer) 
RETURNS SETOF RECORD AS $$
DECLARE r record; s record; 
BEGIN 
  FOR r IN SELECT c.oid, n.nspname || '.' || c.relname as "tablename"
              FROM pg_catalog.pg_class c
                   LEFT JOIN 
                   pg_catalog.pg_namespace n 
                   ON n.oid = c.relnamespace
             WHERE c.relkind = 'r'
               AND n.nspname NOT IN ('pg_catalog', 'pg_toast')
               AND pg_catalog.pg_table_is_visible(c.oid)
             ORDER BY 2
  LOOP
    s := pgstattuple(r.oid);
    table_name := r.tablename;
    tuple_count := s.tuple_count; dead_tuple_percent :=  s.dead_tuple_percent; 
    dead_tuple_count := s.dead_tuple_count;
    table_len :=  pg_size_pretty(s.table_len);
    free_space := pg_size_pretty(s.free_space);
    RETURN NEXT; 
  END LOOP; 
  RETURN;
END; $$ LANGUAGE plpgsql;

CREATE OR REPLACE FUNCTION print_indexes_info(OUT index_name varchar, OUT indexsize varchar, 
                                              OUT leaf_pages integer, OUT deleted_pages integer, 
                                              OUT leaf_fragmentation double precision, OUT avg_leaf_density double precision) 
RETURNS SETOF RECORD AS $$
DECLARE r record; o pgstatindex_type;
BEGIN
  FOR r IN SELECT schemaname || '.' || indexname AS indexname FROM pg_indexes LOOP
    BEGIN
      o := pgstatindex(r.indexname);
      index_name := r.indexname; 
      indexsize := pg_size_pretty(o.index_size); deleted_pages := o.deleted_pages;
      leaf_pages := o.leaf_pages; avg_leaf_density := o.avg_leaf_density;
      leaf_fragmentation := o.leaf_fragmentation; 
  
      RETURN NEXT;
    EXCEPTION WHEN OTHERS THEN
      indexsize := NULL; deleted_pages := NULL; leaf_fragmentation := NULL;
      index_name := r.indexname;
      RETURN NEXT;
    END;
  END LOOP;
  RETURN;
END; $$ LANGUAGE plpgsql;

Diacritic removing

We can simply remove diacritic with calling to_ascii function. There is small problem. This function doesn't support UTF8 encoding, so we have to use convert function.

postgres=# select to_ascii(convert('Příliš žlutý kůň' using utf8_to_iso_8859_2),'latin2');
     to_ascii     
------------------
 Prilis zluty kun
(1 row)

Note for 7.4, name of conversion is "utf_8_to_iso_8859_2". For 8.3 and 9.0 we have to use small workaround:

CREATE FUNCTION to_ascii(bytea, name) 
RETURNS text AS 'to_ascii_encname' LANGUAGE internal;

SELECT to_ascii(convert_to('Příliš žlutý kůň', 'latin2'),'latin2');

Predicate IN optimalization

I found note pg_performance about optimalization possibility of IN predicate when list of values is longer than eighty numbers. For longer list is better create constant subqueries with using multi values:

SELECT * 
   FROM tab 
  WHERE x IN (1,2,3,..n); -- n > 70

-- faster case
SELECT * 
   FROM tab
  WHERE x IN (VALUES(10),(20));

Using VALUES is faster for bigger number of items, so don't use it for small set of values.

BAR code validation

SQL Guru Joe Celko described DBAzinu interesting technique for BAR code validation with multi values.

CREATE FUNCTION barcode_sum(text)
RETURNS bigint AS 
$$
  SELECT ABS(SUM(CAST(SUBSTRING($1 FROM Weights.seq FOR 1) AS INTEGER) * Weights.wgt))
     FROM (VALUES (CAST(1 AS INTEGER), CAST(-1 AS INTEGER)),
                  (2, +1), (3, -1), (4, +1), (5, -1), 
                  (6, +1), (7, -1), (8, +1), (9, -1), (10, +1), 
                  (11,-1), (12, +1)) AS weights(seq, wgt)
$$ LANGUAGE sql;
Our function can be used like:
CREATE TABLE products(
  barcode char(13) NOT NULL
  CONSTRAINT valid_checkdigit
    check (MOD(barcode_sum(barcode),10) = CAST(substring(barcode from 13 for 1) AS integer))
  CONSTRAINT all_numeric_checkdigit CHECK(barcode NOT SIMILAR TO '%[^0-9]%')
);

INSERT INTO products VALUES('2837232811227');

This trick is based on behave of char type. Shorted text than 13 chars is filled with spaces, so test NOT SIMILAR ensure correct number of digits.

Using hash functions for ensuring uniqueness of texts

Classic unique index isn't well solution for longer texts. It's natural using some of hash functions. Output of these functions is 32 char long text. This is hexadecimal number, so we can cut it by casting to bytea type. Attention: hash doesn't ensure 100% uniqueness.

root=# CREATE TABLE test(a text);
CREATE TABLE

root=# CREATE UNIQUE INDEX uidx ON test((decode(md5(a),'hex')));
CREATE INDEX
root=# CREATE UNIQUE INDEX uidx ON test((decode(md5(lower(replace(a,' ',''))),'hex')));
CREATE INDEX
root=# INSERT INTO test VALUES('příliš žluťoučký kůň');
INSERT 0 1
root=# INSERT INTO test VALUES('příliš žluťoučký kůň ');
ERROR:  duplicate key violates unique constraint "uidx"
root=# INSERT INTO test VALUES('Příliš žluťoučký kůň');
ERROR:  duplicate key violates unique constraint "uidx"

-- less risk, using two hash functions
-- CREATE UNIQUE INDEX uidx ON TEST((decode(md5(a),'hex')),(hashtext(a)));

Getting OID of function

Every database object in PostgreSQL has Oid - unique identifier. We can find it in database system's tables or we can get it with casting:

CREATE OR REPLACE FUNCTION a(integer, varchar) RETURNS void AS 
$$
  BEGIN
  END;
$$ LANGUAGE plpgsql;

root=# SELECT 'a'::regproc::int;
ERROR:  more than one function named "a"
root=# SELECT 'a(integer, varchar)'::regprocedure::int;
 int4  
-------
 57507
(1 row)

root=# SELECT 57507::regprocedure;
         regprocedure         
------------------------------
 a(integer,character varying)
(1 row)

Table's Oid we could to get with cast to regclass type:

SELECT 'oo'::regclass::int;

General array sort

Very nice exhibition of PostgreSQL's features is general procedure for sorting by David Fetter.

CREATE OR REPLACE FUNCTION array_sort (ANYARRAY)
RETURNS ANYARRAY LANGUAGE SQL
AS $$
SELECT ARRAY(
    SELECT $1[s.i] AS "foo"
    FROM
        generate_series(array_lower($1,1), array_upper($1,1)) AS s(i)
    ORDER BY foo
);
$$;

Array transformation in PL/Perl

PL/Perl doesn't support OUT arrays. We can use undocumented procedure _plperl_to_pg_array:

CREATE OR REPLACE FUNCTION foo(OUT a integer[], OUT b integer[]) AS $$
  return { a=> _plperl_to_pg_array([1,2,3]),
           b=> _plperl_to_pg_array([4,5,6]) };

UPC code validation

This is sample of validation of code with control number in Perl (PL/perl). I used this function for compare with PL/pgSQL and C languages. Perl is about 20% faster, but C is 10 times faster than Perl.

CREATE OR REPLACE FUNCTION ean_perl (
    TEXT
) RETURNS boolean AS $_$
    my $ean = length $_[0] == 12 ? "0$_[0]" : $_[0];
    # Make sure we really have an EAN.
    return 'false' unless $ean =~ /^\d{13}$/;
    my @nums = split '', $ean;
    return 10 - (
        # Sum even numerals.
        (   (   $nums[1] + $nums[3] + $nums[5] + $nums[7] + $nums[9]
                    + $nums[11]
            ) * 3 # Multiply total by 3.
        # Add odd numerals except for checksum (12).
        ) + $nums[0] + $nums[2] + $nums[4] + $nums[6] + $nums[8] + $nums[10]
    # Compare to the checksum.
    ) % 10 == $nums[12] ? 'true' : 'false';
$_$ LANGUAGE plperl immutable;

I found interesting solution by Joe Celko in article. After rewriting to PostgreSQL:

SELECT (
   SELECT mod(
      sum(
         CAST(substring('2837232811227' from s.seq for 1) AS integer) * 
         CASE mod(s.seq,2) WHEN 0 THEN 1 ELSE -1 END),
      10) FROM generate_series(1,12) as s(seq)) =
   CAST(substring('2837232811227' FROM 13 FOR 1) AS integer);

Show only every n row

When we would to skip some rows from table, we should to use temporary sequences and modulo function:

  CREATE TEMP SEQUENCE number; 
  SELECT * FROM ( SELECT *, nextval('number') AS number FROM datatable ) foo 
    WHERE foo.number % 5 = 0;

With 8.4 this can be done without the use of a temporary sequence

  SELECT * FROM ( SELECT *, row_number() over () AS number FROM datatable ) foo 
    WHERE foo.number % 5 = 0;

Array to table

We can simply unpack array with generate_series function:

 SELECT (ARRAY[1,3,4,6,7])[idx.n] FROM generate_series(1,7) idx(n);

CREATE OR REPLACE FUNCTION unpack(anyarray)
RETURNS SETOF anyelement AS $$ 
SELECT $1[i] 
   FROM generate_series(array_lower($1,1), 
                        array_upper($1,1)) g(i);
$$ LANGUAGE sql STRICT IMMUTABLE;

postgres=# select unpack(array['a','b','c']);
 unpack 
--------
 a
 b
 c
(3 rows)


With 8.4, we can use the unnest()-function:

test=*# select unnest(array['a','b','c']);
 unnest
--------
 a
 b
 c
(3 rows)

Additionally, we can extend the unnest() function to return one row with a default value if the array is empty or null:

create or replace function unnest(anyarray, anyelement)
returns setof anyelement as $$
select unnest(case when array_upper($1, 1) > 0 then $1 else array[$2] end);
$$ language sql immutable;

gp=> create temp table foo (x text[], y int) distributed by (y);
CREATE TABLE
gp=> insert into foo values
(array['a','b','c'], 1),
(null, 2),
('{}'::text[],3);
INSERT 0 3

gp=> select *,array_upper(x,1) as size from foo;
    x    | y | size
---------+---+------
 {}      | 3 |
         | 2 |
 {a,b,c} | 1 |    3
(3 rows)

gp=> select unnest(x),y from foo;
 unnest | y
--------+---
 a      | 1
 b      | 1
 c      | 1
(3 rows)

gp=> select unnest(x,'empty'),y from foo;
 unnest | y
--------+---
 empty  | 3
 empty  | 2
 a      | 1
 b      | 1
 c      | 1
(5 rows)

LIKE optimalization

PostgreSQL use index for LIKE only when:

  • pattern doesn't start with % and _ symbols,
  • database cluster is initialized with C locale.

Last constraint we can pass by special index:

CREATE INDEX like_index ON people(surname varchar_pattern_ops);

When we work wit pattern like %some (for internet domain searching) we should to use trick:

  • create functional index on mirrored values.
  • search with mirrored pattern

PostgreSQL doesn't allow reverse (mirror) function. We can use rvrs function from Orafunc package or we can use PL/Perl.

CREATE OR REPLACE FUNCTION reverse(varchar) RETURNS varchar AS $$ 
  $reversed = reverse $_[0]; 
  return $reversed; 
$$ LANGUAGE plperlu IMMUTABLE;

CREATE INDEX rev_email ON users( (reverse(email) ) varchar_pattern_ops );
SELECT * FROM _users WHERE reverse(email) LIKE reverse ('%.cz');

Without plperl:

CREATE OR REPLACE FUNCTION reverse(TEXT) RETURNS TEXT AS $$
 SELECT 
    array_to_string( 
      ARRAY
        ( SELECT substring($1, s.i,1) FROM generate_series(length($1), 1, -1) AS s(i) ), 
      '');
$$ LANGUAGE SQL IMMUTABLE;

Note: PostgreSQL 9.1 has buildin function for reverse string.

Deferred constraints

PostgreSQL run all constrains immediately. SQL know deferred constraints, that allows delay tests to time closely to commit. PostgreSQL hasn't full support. You can deferred only referential integrity constraints. But PostgreSQL supports undocumented deferred triggers, so we should use it. Syntax is similar to trigger definition:

  CREATE CONSTRAINT TRIGGER sc
  AFTER INSERT 
  ON fieldtrip_students
  INITIALLY DEFERRED
  FOR EACH ROW EXECUTE PROCEDURE field_trip();

Removing of duplicate rows

Any table created without parameter WITHOUT OIDS has row unique identifier OID. We can use it for removing duplicate rows:

 DELETE FROM people WHERE 
   NOT oid IN (SELECT MIN(oid) FROM people GROUP BY name, surname);

But new tables are created with this parameter usually. So we can't use OID. Every PostgreSQL row has another unique identifier - ctid. For ctid type we have not defined any aggregate functions, we have to use correlated subquery. I found this trick in Greg Mullane presentation.

 DELETE FROM lidi WHERE
   NOT ctid = (SELECT ctid FROM lidi l WHERE 
     prijmeni=l.prijmeni AND jmeno=l.jmeno LIMIT 1);

Select first n rows from group

   SELECT * FROM people WHERE id IN (
     SELECT id FROM people s 
       WHERE people.category = s.category 
       ORDER BY age LIMIT 2) 
     ORDER BY category, age;

Other solution without correlated subquery is SELF JOIN. I have to look on problem from different perspective. I have to ask on persons for which only one person in group is older.

SELECT s1.* 
   FROM people s1
        LEFT JOIN
        people s2
        ON s1.category = s2.category AND s1.age < s2.age
  GROUP BY s1.id, s1.category
  HAVING COUNT(s2.id) <= 1
  ORDER BY s1.category, COUNT(s2.id);

Cast to varchar

Every data type in PostreSQL has input and output functions. We can use these functions for support casting to varchar (if we miss it). Don't use this trick on PostgreSQL 8.3 and higher

  testdb011=# SELECT '(1,1)'::point::varchar;
  ERROR:  cannot cast type point to character varying

but

 testdb011=# SELECT textin(point_out('(1,1)'::point))::varchar;
  textin
 --------
  (1,1)
 (1 row)

Create Aggregates

I needed to be able to do a fast

SELECT count(*) WHERE field='x'
instead of a simple
SELECT count (*)

Both of the above queries initiate sequential scans on a table (which takes too much time). Here is what I created:

CREATE TABLE aggregate AS 
     (SELECT result,count(result) AS count FROM original_table 
      GROUP BY result ORDER BY count DESC);

Further, my data contains many one-offs which I don't want to put into my aggregate table (I assume that if a value isn't in my aggregate table that it's too small to be useful)... here is what I do to leave those values out:

CREATE TABLE aggregate AS 
     (SELECT * FROM 
          (SELECT result,count(result) AS count FROM original_table 
           GROUP BY result ORDER BY count DESC) 
     AS countquery WHERE count > 3);

This method is fast enough. Because I have an index on the "result" field, It runs in under a minute against 3 million records. It takes approximately the same amount of time to generate counts for all the result values as it does to count the total number of records.

Now I can do a simple (very fast) query to get a count:

SELECT * FROM aggregate where result='xxxxx'
Osobní nástroje
Jmenné prostory

Varianty
Akce
Navigace
Google AdSense
Nástroje