Automatic execution plan caching in PL/pgSQL

Z PostgreSQL
(přesměrováno z AutomaticExecutionPlanCachig)
Skočit na navigaci Skočit na vyhledávání

Translate by Hana Kabilková

Few years ago, when I wrote article about PL/pgSQL I found in documentation recommendation to not use string 'now' in this language. I accepted this, I tried it and it didn't work, but I realy didn't know why. I thought that it was because of bad transtation to byte-code of actually pseudofunction value, which made this problem (time returned of string 'now' agreed with function translation).

INSERT INTO (..) VALUES('now') -- don't use with plpgsql!

Here I first deviate. Lexical and syntactic analysis PL/pgSQL of functions is making only once, in first call function within the scope of login. The result is syntactic tree which is saves in session cache. Something like translation to byte-code in PL/pgSQL doesn't exist, syntactic tree is input for interpreter, it means that PL/pgSQL is classic interpreter which stands on lex, yacc generator. Power PL/pgSQL isn't in speed but in his accouplement with SQL. Because of this accouplement implementation PL/pgSQL can be as easy as it doesn't contain even simple evaluation of expressions. Everithing possible is send to interpreter SQL. Interpreter PL/pgSQL resolves only variables and operative construction.

In saving SQL term in PL/pgSQL is using type PLpqSQL_expr(plpgsql.h). For another description are important only array char *query (contains text sql statement) and void *plan (indicator on cach execute plan SQL statement). Execute plan is generating only once, in first requirement evaluation SQL expression.

if (expr->plan == NULL) 
  exec_prepare_plan(estate, expr) 
rc = SPI_execute_plan(expr->plan, ...

Caching is important. In easier expression generating of plan can be longer than expression evaluation. Measure difference between first and second start of PL/pgSQL function. You can write this time on score now generating execute plans. Caching solved problem with effective executing PL/pgSQL function and gave us two problems: Caching execute plans aren't shared and persistanced too (star application are slower, increasing of memory usage (resolution is f.e. pgool), caching execute plans can be sometimes not adequate and their executing makes run-time error.

When the plan is ready, it's save in cache and there it can be changed until log off or compilation on function. In PostgreSQL there is no attribute of analogy attribute WITH RECOMPILE MSSQL. Fortunately wrong executing plans errors are exception and only because of two reasons.

The first reason is change of database structure - when we canceled some of data objects (table, sequence) which was used in function, after the first called, the following called of function ends with error. The resolution isn't new object with the same name and type, because new object get new (different) oid (object identificator). Nobody think about canceled tabel during the operation, but every body want canceled temporary table - and because of this, is this problem bonded in ToDo with temporary tables. The resolution is all of temporary tables makes befor first called PL/pgSQL function and then not canceled, only clean. (Sometimes it trepan, especially when you convered from MSSQL to PostgresSQL, where the mechanism of transfering recordsets of procedurs is very different).

We can make temporary tables in the body of function because executing plan saves into the cache in the first moment of using the objet, not in the time of translation.

    PERFORM 1 
       FROM pg_catalog.pg_class c 
            pg_catalog.pg_namespace n ON n.oid = c.relnamespace 
      WHERE c.relkind IN ('r','') AND c.relname = 'tmptab' 
        AND pg_catalog.pg_table_is_visible(c.oid) AND n.nspname LIKE 'pg_temp%'; 
      TRUNCATE tmptab; 
      CREATE TEMP TABLE tmptab(... 
    END IF;

The second reason is dynamic questions. Their executing plan isn't caching, but their result can make error on difrent place.

  DECLARE _t varchar[] = '{integer, varchar}'; 
    _v varchar; _r record; 
    FOR _i IN 1 .. 2 LOOP 
      FOR _r IN EXECUTE 'SELECT 1::'||_t[_i]||' AS _x' LOOP 
        _v := _r._x; 
      END LOOP; 
    END LOOP; 
END; $$ LANGUAGE plpgsql; 
select foo();

Start of function ends with error (_v := _r._x;) in second passing of cycle FOR _i IN .. ERROR: type of "_r._x" does not match that when preparing the plan.

Why? Command of assigning contains SQL expression (every expression in PL/pgSQL is SQL expression). In first interace it makes executing plan, which supposed that value_r._x is integer, in second interace is type varchar, so the plan in cach is not adequate. The resolution is have as many assigning expressions as many possible executing plans combinations. At the firs look absurd code of function is right.

FOR _i IN 1 .. 2 LOOP 
  FOR _r IN EXECUTE 'SELECT 1::'||_t[_i]||' AS _x' LOOP 
    IF _i = 1 THEN _v := _r._x; 
    ELSIF _i = 2 THEN _v := _r._x; 
    END IF; 

I tell you that this behavior of PL/pgSQL is free for me. I know reason of the problem and I can set up for that. But definitive resolution (regeneration of plan when the error was detected) can be make. I tried macras PG_TRY(), PG_CATCH() and PG_END_TRY() for fixation of errorand regenerated wrong plan without loss of speed. It's only the question of time, when someone catch this resolution and make patch. Probablty not in version 8.1.

Second divagation : to show where is the problem with 'now' I have to mention proces of generation executing plan SQL expression.

One of a period of preparing the plan is reduction constants and simplification functions (f.e. 2+2=4, True Or whatever=True, alternative immutable functions with constant argument of function consequence - and this is 'now' case (backend/optimizer/util/clause.c - evaluate_function()), alternative value NULL STRICT function, if some of their arguments is NULL, etc.

In our case was 'now' replaced by argument which was saved in executing plan and it was still same repeated evaluated. Normaly it don't have to make some problems. Expect PL/pgSQL PostgreSQL doesn't contain any tools which can cache executing plans. Exception was PL/pgSQL, where this problem was first detected (and we still have to be carefull about this).

I can't remember if I ever used 'now'. I automatically use magic parameters CURENT_DATE a CURRENT_TIMESTAMP which haven't got problem with cache. And if I only for nostalgia want to use 'now' , i have to do this only with parameter.

DECLARE d date; BEGIN d := 'now'; INSERT INTO (..) VALUES(d); ...

Why? For function with parameter is optimizer so short ( here function datein()). PL/pgSQL doesn't do something like optimalization, so it can't detected that it he is the parameter, and string 'now' will have to retype on corresponding type only in time of evaluating expression and consequence will be corresponding time of evaluating expression.