PostgreSQL SQL Tricks III

Z PostgreSQL
Verze z 15. 9. 2013, 07:13, kterou vytvořil imported>Pavel
(rozdíl) ← Starší verze | zobrazit aktuální verzi (rozdíl) | Novější verze → (rozdíl)
Skočit na navigaci Skočit na vyhledávání

SQL is a language where one task can be solved multiple ways with different efficiency. You can see effective use of PostgreSQL's possibilities on this page.

Google AdSense

Older tricks Fresh tricks

to_string, to_array functions

Author: Pavel Stěhule. Functions string_to_array and array_to_string doesn't handle NULL value well. So I designed new functions with better NULL handling.

CREATE OR REPLACE FUNCTION to_string(anyarray, sep text, nullstr text DEFAULT '') 
RETURNS text AS $$
SELECT array_to_string(ARRAY(SELECT coalesce(v::text, $3) 
                                FROM unnest($1) g(v)),
                       $2)
$$ LANGUAGE sql;

postgres=# select to_string(array[1,2,3,4,null,5],','); 
 to_string  
------------
 1,2,3,4,,5
(1 row)

postgres=# select to_string(array[1,2,3,4,null,5],',','<NULL>');
    to_string     
------------------
 1,2,3,4,<NULL>,5
(1 row)

CREATE OR REPLACE FUNCTION to_array(text, sep text, nullstr text DEFAULT '') 
RETURNS text[] AS $$ 
  SELECT ARRAY(SELECT CASE 
                           WHEN v = $3 THEN NULL::text 
                           ELSE v END 
                  FROM unnest(string_to_array($1,$2)) g(v)) 
$$ LANGUAGE sql;

postgres=# select to_array('1,2,3,4,,5',',');
     to_array     
------------------
 {1,2,3,4,NULL,5}
(1 row)

postgres=# select to_array('1,2,3,4,<NULL>,5',',','<NULL>');
     to_array     
------------------
 {1,2,3,4,NULL,5}
(1 row)

postgres=# select to_array('1,2,3,,5',',')::int[];
    to_array    
----------------
 {1,2,3,NULL,5}
(1 row)

This issue is fixed in modern versions, that allow third parameter for NULL substitution.

postgres=# SELECT array_to_string(ARRAY[10,10,NULL,10], ',','');
 array_to_string 
─────────────────
 10,10,,10
(1 row)

Cast between "unix timestamp" and timestamp

Author:??

CREATE OR REPLACE FUNCTION convert_timestamp_to_xtime(dt timestamp) 
RETURNS integer AS $$
  SELECT EXTRACT(EPOCH FROM $1)::integer
$$ LANGUAGE sql;

CREATE OR REPLACE FUNCTION convert_xtime_to_timestamp(xdt integer) 
RETURNS timestamp AS $$
  SELECT to_timestamp($1)::timestamp
$$ LANGUAGE sql;

Note: You can use casting to abstime type, but: "The types abstime and reltime are lower precision types which are used internally. You are discouraged from using these types in applications; these internal types might disappear in a future release."

Domain for time zone

David E. Wheleer sent a nice trick. PostgreSQL missing a data type for time zone and doesn't contain any function for time zone verification. But we can use operator AT:

CREATE OR REPLACE FUNCTION is_timezone( tz TEXT ) RETURNS BOOLEAN as $$
BEGIN
 PERFORM now() AT TIME ZONE tz;
 RETURN TRUE;
EXCEPTION WHEN invalid_parameter_value THEN
 RETURN FALSE;
END;
$$ language plpgsql STABLE;

CREATE DOMAIN timezone AS CITEXT
CHECK ( is_timezone( value ) );

Non-interactive entering a password for psql

psql missing any switch for password. People often ask, how can use psql inside scripts, when is necessary enter any password. One possibility is use a system variable PGPASSWORD.

postgres=# CREATE USER tom;
CREATE ROLE

postgres=# ALTER USER tom PASSWORD 'tiger';
ALTER ROLE
...
postgres=> \q

[pavel@nemesis ~]$ PGPASSWORD=tiger psql postgres -U tom
Welcome to psql 8.3.8, the PostgreSQL interactive terminal.

Type:  \copyright for distribution terms
       \h for help with SQL commands
       \? for help with psql commands
       \g or terminate with semicolon to execute query
       \q to quit

postgres=> ...

The sense of this method is hiding a password in process list. Warning: On Linux (at least) this is still visible through /proc/<pid>/environ

Another possibility is a .pgpass file in your home directory.

Faster table's list

If your database has a high number of tables - thousands and more, you will like next trick:

alter function pg_table_is_visible(oid) cost 10;

by Tom Lane for 8.3 and 8.4. For 8.5 and newer it is default.

Array of arrays

PostgreSQL supports n-dimensional arrays, but doesn't support array of arrays, so we cannot use cumulative aggregate functions over arrays. Here is workaround:

postgres=# create type intarr as (f int[]);
CREATE TYPE

postgres=# select * from t;
      a      
─────────────
 {10,20,30}
 {1,2,3,4,5}
(2 rows)

postgres=# select array_agg(a) from t;
ERROR:  could not find array type for data type integer[]

postgres=# select array_agg(distinct row(a)::intarr) from t;
                array_agg                 
──────────────────────────────────────────
 {"(\"{1,2,3,4,5}\")","(\"{10,20,30}\")"}
(1 row)

postgres=# select (unnest(array_agg(distinct row(a)::intarr))::intarr).f from t;
      f      
─────────────
 {1,2,3,4,5}
 {10,20,30}
(2 rows)

Sent by Sam Mason.

MySQL function group_concat in PostgreSQL

In MySQL we can find very useful aggregate function group_concat. For this function we can define separator and ordering. This isn't possible in PostgreSQL. But we can use workaround via array functions:

postgres=# SELECT * FROM x;
 cat |  town  
-----+---------
   1 | Benešov
   1 | Tábor
   1 | Písek
   2 | Praha
   2 | Hradec
   3 | Cheb
   3 | Aš
(7 rows)

postgres=# SELECT cat, array_to_string(ARRAY(SELECT unnest(array_agg(town)) 
                                               ORDER BY 1),',') 
              FROM x 
             GROUP BY cat;
 kat |   array_to_string   
-----+---------------------
   1 | Benešov,Písek,Tábor
   3 | Aš,Cheb
   2 | Hradec,Praha
(3 rows)

Since PostgreSQL 9.0 this is natively available as string_agg()

MySQL function field in PostgreSQL

Source: http://stackoverflow.com/questions/1309624/simulating-mysqls-order-by-field-in-postgresql

When we can explicitly to specify some order, we should to use function field in MySQL.

select * from pet order by field(species, 'cat', 'dog', 'bird') desc;

+----------+--------+---------+------+------------+------------+
| name     | owner  | species | sex  | birthday   | death      |
+----------+--------+---------+------+------------+------------+
| Fluffy   | Harold | cat     | f    | 1993-02-04 | NULL       | 
| Claws    | Gwen   | cat     | m    | 1994-03-17 | NULL       | 
| Buffy    | Harold | dog     | f    | 1989-05-13 | NULL       | 
| Fang     | Benny  | dog     | m    | 1990-08-27 | NULL       | 
| Bowser   | Diane  | dog     | m    | 1989-08-31 | 1995-07-29 | 
| Chirpy   | Gwen   | bird    | f    | 1998-09-11 | NULL       | 
| Whistler | Gwen   | bird    | NULL | 1997-12-09 | NULL       | 
+----------+--------+---------+------+------------+------------+

This functionality is missing in PostgreSQL - we could to use CASE statement or we could to write own function. It's possible - we have variadic functions:

select * 
   from pet 
  order by species, case species when 'cat' then 1 
                                 when 'dog' then 2 
                                 when 'bird' then 3 
                                 else 0 end desc;

Source code field function:

CREATE OR REPLACE FUNCTION field(varchar, VARIADIC text[]) 
RETURNS int AS $$
  SELECT i 
     FROM generate_subscripts($2,1) g(i) 
    WHERE $1 = $2[i] 
  UNION ALL
  SELECT 0 
  LIMIT 1 1
$$ LANGUAGE sql;

Complement of function field is function elt:

CREATE OR REPLACE FUNCTION elt(int, VARIADIC text[])
RETURNS text AS $$
  SELECT $2[$1];
$$ LANGUAGE sql;

The first from groups (with windows functions)

This exercise should be solved variously. One way is based on windows functions:

postgres=# SELECT * FROM staff;
┌───────────┬───────────┬────────────┬──────────┐
│   name    │ surname   │ department │  salary  │
├───────────┴───────────┴────────────┴──────────┤
│ Pavel     │ Stehule   │          1 │ 10000.00 │
│ Zdenek    │ Stehule   │          1 │  9000.00 │
│ Vladimira │ Stehulova │          2 │  9000.00 │
└───────────────────────────────────────────────┘
(3 rows)

postgres=# SELECT * 
              FROM (SELECT *, max(salary) OVER (PARTITION BY department) 
                       FROM staff) x 
             WHERE max = salary;
┌───────────┬───────────┬────────────┬──────────┬──────────┐
│   name    │ surname   │ department │  salary  │   max    │
├───────────┴───────────┴────────────┴──────────┴──────────┤
│ Pavel     │ Stehule   │          1 │ 10000.00 │ 10000.00 │
│ Vladimira │ Stehulova │          2 │  9000.00 │  9000.00 │
└──────────────────────────────────────────────────────────┘
(2 rows)

Bigint to ip

By Jasen Betts:

create function bigint_to_inet(bigint) returns inet as $$
select (($1>>24&255)||'.'||($1>>16&255)||'.'||($1>>8&255)||'.'||($1>>0&255))::inet
$$ language sql;

Marking files as orig (for difforig command) based on patch

PostgreSQL hackers like commands cporig and difforig. cporig creates copy file with .orig extension. difforig search files with extension .orig and does cumulative diff i.e. patch. Sometimes we need mark modified files from patch. This process can be simplified with unix commands:

cat mnnotation.diff | egrep -o -e "^\*\*\*.*(sql|out|h|c|sgml|y|lex|Makefile)" | replace "*** " "" | xargs cporig

Safe string to number conversion

Sometimes we do not want to raise an exception when the converted value is not a number. We can prevent the exception by testing if the input value is a number or not:

CREATE OR REPLACE FUNCTION read_int(varchar) 
RETURNS int AS $$ 
  SELECT CASE WHEN $1 ~ e'^\\d+$' THEN $1::int END; 
$$ LANGUAGE SQL IMMUTABLE STRICT;

Dropping milliseconds from timestamp

Usually we don't need timestamp value in maximum precision. For mostly people only seconds are significant. Timestamp type allows to define precision - and we could to use this feature:

postgres=# select current_timestamp;
             now              
------------------------------
 2009-05-23 20:42:21.57899+02
(1 row)

Time: 196,784 ms
postgres=# select current_timestamp::timestamp(2);
          now           
------------------------
 2009-05-23 20:42:27.74
(1 row)

Time: 51,861 ms
postgres=# select current_timestamp::timestamp(0);
         now         
---------------------
 2009-05-23 20:42:31
(1 row)

Time: 0,729 ms

Attention on IS NULL and IS NOT NULL operators for composite types

One may think that !(x IS NULL) = x IS NOT NULL is true in all cases. But there is an exception - composite types. When one field of a composite value is NULL and another field is NOT NULL, then result of both operators is false. IS NULL is true, only when all fields are NULL. IS NOT NULL is true, only when all fields are NOT NULL. For any case in between, then both operators return false.

CREATE OR REPLACE FUNCTION test_isnull()
RETURNS TABLE (a int, b int, isnull bool, isnotnull bool) AS $$
DECLARE r foo;
BEGIN
  isnull := r IS NULL; isnotnull := r IS NOT NULL; RETURN NEXT;
  a := NULL; b := 10; r := ROW(a, b);
  isnull := r IS NULL; isnotnull := r IS NOT NULL; RETURN NEXT;
  a := 10; b := 10; r := ROW(a, b);
  isnull := r IS NULL; isnotnull := r IS NOT NULL; RETURN NEXT;
  RETURN;
END;
$$ LANGUAGE plpgsql;
SELECT * FROM test_isnull();

 a  | b  | isnull | isnotnull
----+----+--------+-----------
    |    | t      | f
    | 10 | f      | f
 10 | 10 | f      | t
(3 rows)

Faster execution of PL/pgSQL function with explicit casting

PL/pgSQL execute right part of assignment statement independently on left part. Result of evaluation is checked and converted to destination type every time. This conversion should be little bit slow. So we could improve assignment statement via explicit casting to ensure type equality of left and right parts assignment statement. Use this technique carefully! And only when is it necessary.

create or replace function test1() 
returns int as $$
declare s int := 0; 
begin 
  for i in 1..100000 loop s := 4e3; end loop; -- numeric constant
  return s; 
end; 
$$ language plpgsql immutable;

create or replace function test2() 
returns int as $$
declare s int := 0; 
begin 
  for i in 1..100000 loop s := 4e3::int; end loop; -- int 
  return s; 
end; 
$$ language plpgsql immutable;

postgres=# select test1();
 test1 
-------
  4000
(1 row)

Time: 176,623 ms

postgres=# select test2();
 test2 
-------
  4000
(1 row)

Time: 47,673 ms

Same problem is implicit conversion in return statement:

create or replace function test1() returns int as $$begin return 4e1; end; $$ language plpgsql; 
create or replace function test2() returns int as $$begin return 4e1::int; end; $$ language plpgsql;

postgres=# select count(test1()) from generate_series(1,100000);
 count  
--------nebo v konverzi návratové hodnoty
 100000
(1 row)

Time: 682,005 ms
postgres=# select count(test2()) from generate_series(1,100000);
 count  
--------
 100000
(1 row)

Time: 528,099 ms

Protection from parallel run of some PL function

Author: Pavel Stěhule

Theoretically we should to detect an run of any function - but these information are not accessible from user interface (without C coding). We should to look to list of processed queries or we should to use advisory locks. Test based on select from pg_stat_activity view is useful only for function directly called by user. Functions called from other functions are invisible.

create or replace function long_run() 
returns void as $$
begin 
  if exists(select procpid 
               from pg_stat_activity 
              where procpid <> pg_backend_pid() 
                    and current_query like '%long_run(%') 
  then 
    raise notice 'procedure is running'; 
    return; 
  end if; 
  perform pg_sleep(10); 
end$$ 
language plpgsql volatile;
CREATE FUNCTION

Next possibility is using advisory locks http://www.postgresql.org/docs/8.3/interactive/functions-admin.html#FUNCTIONS-ADVISORY-LOCKS.

create or replace function long_run()
returns void as $$
declare foid oid:= 'long_run'::regproc::oid;
begin
  if pg_try_advisory_lock(foid::bigint) then
    perform pg_sleep(10); 
    perform pg_advisory_unlock(foid::bigint);   
  else
    raise notice 'procedure is running';
  end if;
  return;
end;
$$ language plpgsql;

Using scientific format

PostgreSQL doesn't support EEEE format in to_char function still (there is a patch for 8.5). We should to use external plperl function:

create or replace function floating_format(double precision, int) 
  returns varchar as $$
   sprintf("%.$_[1]e", $_[0])
$$ language plperl;

create or replace function floating_format(double precision) 
  returns varchar as $$
   sprintf("%e", $_[0])
$$ language plperl;

postgres=# select floating_format(10.223);
 floating_format 
-----------------
 1.022300e+01
(1 row)

postgres=# select floating_format(10.223,2);
 floating_format 
-----------------
 1.02e+01
(1 row)

postgres=# select floating_format(10.223,4);
 floating_format 
-----------------
 1.0223e+01
(1 row)

Terminal's configuration

I using two interfaces for access to PostgreSQL - emacs and psql. These application should be configured for more comfortable work:

  • emacs (.emacs)
(setq-default truncate-lines t)
(ansi-color-for-comint-mode-on)
(setq lazy-lock-defer-on-scrolling t)
(setq inhibit-startup-echo-area-message t)
(setq inhibit-startup-message t)
(show-paren-mode t)
(setq show-paren-style 'mixed)
(fset 'yes-or-no-p 'y-or-n-p)
(global-set-key "\M-g" 'goto-line)
(setq sql-database "postgres")  ;; name of most often used database
;; don't use pager, empty title
(setq sql-postgres-options '("-P" "pager=off" "-P" "title= ")) 
;; interesting idea is setting of start of every new command on new line
;; (setq sql-postgres-options '("-P" "pager=off" "-P" "title= " "-v" "PROMPT1=%/.%n%#\n" "-v" "PROMPT2=" "-v" "PROMPT3="))
  • psql
export PAGER="less -RSX"
psql ... -P pager=always

Moving tables from one schema to second schema

We have to use plpgsql function:

create or replace function mvtable(src_schema varchar, dest_schema varchar, mask varchar)
returns void as $$
declare r record;
begin
  for r in
          select *
             from information_schema.tables
            where table_schema = src_schema and table_name like mask
  loop
    execute 'alter table ' || quote_ident(r.table_schema)
                   || '.' || quote_ident(r.table_name)
                   || ' set schema ' || quote_ident(dest_schema);
  end loop;
  return;
end;
$$ language plpgsql;

Encoding number to any numeral system with CTE

Autor: Pavel Stehule

postgres=# create function to_base(num int, base int = 10) 
           returns varchar as $$
             with recursive z as (
                                  select $1 as a, null::int as b, 0 as i 
                                  union all 
                                  select a/$2, a % $2, i+1 
                                     from z 
                                  where a > 0
                                 ) 
                select array_to_string(array(select substring('0123456789abcdef' from b + 1 for 1) 
                   from z 
                  where i > 0  
                 order by i desc),'');
           $$ language sql immutable;

postgres=# select to_base(255);
 to_base 
---------
 255
(1 row)

postgres=# select to_base(255,16);
 to_base 
---------
 ff
(1 row)

postgres=# select to_base(255,8);
 to_base 
---------
 377
(1 row)

postgres=# select to_base(255,2);
 to_base  
----------
 11111111
(1 row)

Get default values from table definition

One typical query is getting the default values for a record. Since these definitions are stored in the system catalogs (tables), they are accessible.

CREATE OR REPLACE FUNCTION eval(varchar) 
RETURNS varchar AS $$
DECLARE result varchar;
BEGIN
 EXECUTE 'SELECT ' || $1 INTO result;
 RETURN result;
END;$$ LANGUAGE plpgsql STRICT;

CREATE OR REPLACE FUNCTION defaults(text, 
                                    OUT attname name, OUT type varchar, OUT default_val varchar)
RETURNS SETOF RECORD AS $$
SELECT a.attname,
       pg_catalog.format_type(a.atttypid, a.atttypmod),
       (SELECT eval(pg_catalog.pg_get_expr(d.adbin, d.adrelid))
           FROM pg_catalog.pg_attrdef d
          WHERE d.adrelid = a.attrelid AND d.adnum = a.attnum AND a.atthasdef)
   FROM pg_catalog.pg_attribute a
  WHERE a.attrelid = $1::regclass::oid AND a.attnum > 0 AND NOT a.attisdropped
  ORDER BY a.attnum
$$ LANGUAGE SQL STRICT;

Second solution, but not always usable is using RETURNING clause.

Function to_bin

PostgreSQL doesn't provide to_bin function. Implementation (based on casting to bit type) is simple and fast:

create or replace function to_bin(int) 
returns varchar as $$
declare r varchar; 
begin 
  execute 'select ' || $1 || '::bit(' || case when $1 = 0 then 1 else trunc(log(2, $1)) + 1 end || ')' into r; 
  return r; 
end;
$$ language plpgsql immutable strict;

create or replace function bin(varchar) 
returns varchar as $$
declare r varchar; 
begin 
  execute 'select b''' || $1 || '''::int' into r; 
  return r; 
end;
$$ language plpgsql immutable strict;

postgres=# select to_bin(5), bin('101');
 to_bin | bin 
--------+-----
 101    | 5
(1 row)

SQL code for hierarchical break-down of domains

based on Depesc's ideas from http://www.depesz.com/index.php/2008/12/01/getting-list-of-most-common-domains/#comment-27066

postgres=# create or replace function domain_list(varchar[]) 
           returns setof varchar as $$
              select array_to_string($1,’.') 
              union all 
              select domain_list($1[2:array_upper($1,1)]) 
                 where array_upper($1,1) > 1 
           $$ language sql immutable strict;
CREATE FUNCTION
postgres=# select domain_list(array['a','b','c','d']);
domain_list
————-
a.b.c.d
b.c.d
c.d
d
(4 rows)

postgres=# create or replace function domain_list(varchar) 
           returns setof varchar as $$ 
              select domain_list(string_to_array($1, ‘.’))
           $$ language sql immutable strict;
CREATE FUNCTION
postgres=# select domain_list(’a.b.c.d’);
domain_list
————-
a.b.c.d
b.c.d
c.d
d

Solution for migration to 8.3 (problem with tsearch2)

PostgreSQL documentation has a section with some advice that should to help with migration of older application (based on TSearch2) to PostgreSQL 8.3 and higher (with integrated fulltext). This works well when you migrate databases individually, but it fails when you use pg_dump. As documented, you should do some pre steps - import TSearch compatible module into template1. But complete dump generates databases from template0 (you can't to change it). So you cannot load this dump, and you have to edit dump (advice - dump structure and data to two different files).

so, when you find bug:

ERROR:  type "tsvector" already exists

You have to add import of tsearch2.sql to every creation database part in dump.

DROP DATABASE brigady;
CREATE DATABASE brigady WITH TEMPLATE = template0  ENCODING = 'UTF8';
--
-- PostgreSQL database dump complete
--

\connect brigady
-- new line
\i /usr/local/pgsql/share/contrib/tsearch2.sql

SET search_path = public, pg_catalog;
...

Expect SQL functions to be really fast

Much faster than PL/pgSQL - especially when they are wrappers around integrated features (in this example - integrated qsort). PL/pgSQL is usually fast to interpret - I know only about two slow operations - array update and string update. It's based on the internal architecture - every change generate a new object - it isn't significant for small objects (less than 1000), but for large objects this operations needs significant time. However, we can go a different way, using integrated functionality (this way is usually the fastest in any embedded language).

PostgreSQL use internal qsort -so, I wrote qsort in PL/pgSQL too (results will be comparable):

CREATE OR REPLACE FUNCTION quicksort(l integer, r integer, a int[])
RETURNS int[] AS $$
DECLARE akt int[] = a;
  i integer := l; j integer := r; x integer = akt[(l+r) / 2]; 
  w integer;
BEGIN
  LOOP
    WHILE akt[i] < x LOOP i := i + 1; END LOOP;
    WHILE x < akt[j] loop j := j - 1; END LOOP;
    IF i <= j THEN
      w := akt[i];
      akt[i] := akt[j]; akt[j] := w;
      i := i + 1; j := j - 1;
    END IF;
    EXIT WHEN i > j;
  END LOOP;
  IF l < j THEN akt := quicksort(l,j,akt); END IF;
  IF i < r then akt := quicksort(i,r,akt); END IF;
  RETURN akt;
END;
$$ LANGUAGE plpgsql STRICT IMMUTABLE;

postgres=# SELECT array_upper(quicksort(1,10000,array_agg(a)),1) FROM test;
 array_upper
-------------
       10000
(1 row)
Time: 5918,531 ms

Then I wrote same function (note. some 8.4 features are used) in SQL:

CREATE OR REPLACE FUNCTION sort(anyarray) 
RETURNS anyarray AS $$
  SELECT array(SELECT * FROM unnest($1) ORDER BY 1); 
$$ language sql;

postgres=# SELECT array_upper(sort(array_agg(a)),1) FROM test;
 array_upper
-------------
       10000
(1 row)
Time: 35,980 ms

Nota bene - The SQL function is about 150 times faster than the PL/pgSQL function. This isn't surprising. Integrated sorting is well optimized and doesn't do a lot of large memory operations. For information: I tested bublesort too, and I stopped its execution after five minutes.

Older tricks Fresh tricks


Google AdSense