Project of UDF and its realization at C for PostgreSQL

Z PostgreSQL
Skočit na navigaci Skočit na vyhledávání

Translated by Jiří Krejčí

Project of functions

Enlarging of PostgreSQL by the help of C-functions is not too complicated (it is comparable to project of applications for other toolkits), nevertheless it's complicated than project of function at PL/pgSQL or at other of supported languages. At least they are more complicated translation and testing of functions. Practically importable code together with higher work difficulty is evidently reason of the fact, that the libraries write in C are very rare. On the other hand we create our own functions (User Defined Functions) at C much faster and more simply. Except our own functions we have possibility to design personal data type in PostgreSQL. By using personal data type we practically give up portability of our application on other RDBMS, in return our application may be more effective and readable (unfortunately only for us).

Any function which we want to use in PostgreSQL must come out from following schema.

PG_FUNCTION_INFO_V1 (name_of_function);

Date name_of_function (PG_FUNCTION_ARGS)
{
  BpChar *arg1 = PG_GETARG_BPCHAR_P (0);

  PG_RETURN_BOOL (false);
}

It is so-called first version of calling conventions. Because using of so-called zero variants is unrecommended, only thing I will tell you about it is, that it is classic definition of function at C.

We can transmit parameters of function by link or value. Macros for reference of value by link ends _P. Number represents order of obtained parameter. We can reference value only by types with fixed size whose size isn't bigger than 4 bytes (or sizeof(Datum)). Datum has nothing to do with date (number of day), it is so-called universal data type.

For entirety I mention advanced form of maker with example.

extern Pg_finfo_record * pg_finfo_check_rc (void);
Pg_finfo_record * pg_finfo_check_rc (void) 
{
  static Pg_finfo_record my_finfo = { 1 };
  return &my_finfo; 
}

Date name_of_function (FunctionCallInfo fcinfo)
{
  BpChar *arg1 = ((BpChar *) pg_detoast_date ((struct varlena *) 
	 ((Pointer) ((fcinfo->arg[0]))));

  return ((Date) (((bool) 0)) ? 1 : 0));
}

If we untitled function like isStrict during the declaration at PostgreSQL, we will have to test every parameter on value NULL before using (and in case of need return value NULL).

IF (PG_ARGISNULL (0)) PG_RETURN_NULL ();

After translation and copying of library to the directory of PostgreSQL (usually /usr/local/pgsql/lib), we must create function in PostgreSQL.

CREATE OR REPLACE FUNCTION name_of_function (char) RETURNS bool
  AS 'library.so', 'name_of_function ' LANGUAGE 'C';

Project of function is complicated by fact that PostgreSQL do not use (more or less) classical string which is ended by zero. PostgreSQL use string with fixed size, i.e. size of the string is carried by the first 4 bytes of the string (including heading). Fortunately we have several makers for operations with strings (we can use that for types BpChar*, VarChar* and text* which respond in PostgreSQL to types char, varchar and text). Macros stands above the structure of varlena (it contains two fields va_header and va_content)

// obtaining of size and own string

BpChar *arg1 = PG_GETARG_BPCHAR_P (0);
long len = VARSIZE (arg1) - VARHDRSZ;
char *sf = VARDATA (arg1); // be careful it is not sz

// creation of string and its filling by one mark

BpChar *result = palloc (VARHDRSZ + 1);
VARATT_SIZEP (result) = VARHDRSZ + 1;
*(VARDATA (result)) = 'M';

For allocation of memory we use function palloc instead of usual malloc. Memory which is allocated by this function (palloc) is automatically released after termination of operation.

Type DateADT, which is used below, is used in PostgreSQL to storage of date. PostgreSQL use for storage of date whole long signed number, which is interpreted like number of days from 1st January 2000. Simply it is possible to obtain this number by conversion to the Julian calendar.

// transmission 10.2.1973 to DateADT

DateADT d = date2j (1973, 2, 10) - date2j (2000, 1, 1);

// obtaining of year, month and day from DateADT

int year, month, day; DateADT d = -9821;
j2date (d + date2j (2000, 1, 1), &year, &month, &day);

You can find functions date2j and j2date at src/backend/utils/adt/datetime.c in source files of PostgreSQL. Format of date is not control during conversion, i.e. it convert date 32.13.2002 without suggestions.

Implementation of functions dateseria and timeserial

We start with simple functions. PostgreSQL standardly does not contain functions dateserial and timeserial, which are known from Visual Basic or VBA. These functions are expendable for us, but we must perform relatively complicated and slow reformat of trinity of numbers to string in format, which is readable for PostgreSQL.

With basic knowledge we can create following functions (null variant - non-check)

#include "postgres.h"
#include "fmgr.h"
#include "utils/date.h"
#include "utils/nabstime.h"

PG_FUNCTION_INFO_V1 (dateserial);
PG_FUNCTION_INFO_V1 (timeserial);

Datum dateserial (PG_FUNCTION_ARGS)
{
  int32 year = PG_GETARG_INT32(0);
  int32 month = PG_GETARG_INT32(1);
  int32 day = PG_GETARG_INT32(2);

  DateADT d = date2j (year, month, day) - POSTGRES_EPOCH_JDATE;
  PG_RETURN_DATEADT(d);
}

Datum timeserial (PG_FUNCTION_ARGS)
{
  int32 hour = PG_GETARG_INT32(0);
  int32 minutes = PG_GETARG_INT32(1);
  int32 seconds = PG_GETARG_INT32(2);

  TimeADT result = (((hour * 60) + minutes) * 60) + seconds;  
  PG_RETURN_TIMEADT(result);
}

It is very hard to use these functions really because they have no control of parameters. But if we set parameters in correct intervals, this functions will work well. There is nothing complicated on these functions.

Correct setting of parameters of translation can ensure Makefile in following form

PG_INCLUDE=`pg_config --includedir-server`
PG_LIBDIR=`pg_config --libdir`
PG_PACKAGE=`pg_config --pkglibdir`

all: datefunc

datefunc:
	gcc -ggdb -fpic -I$(PG_INCLUDE) datefunc.c -c -o datefunc.o
	gcc -ggdb -shared -o datefunc.so datefunc.o
	cp datefunc.so $(PG_PACKAGE)/datefunc.so

After completion of necessary verifications we can make these functions available for other users. Code of verifications stem from code of modulus datetime.c and pick some variables, functions and macros. Code was in addition completed with functions for conversion of date from utc format and MSSQL format to type of PostgreSQL timestamp. We use these functions especially when we must safely transfer date-stamp items among application layers, for example javascript->asp-> PostgreSQL. Especially using of asp extorting using of these functions, because conversion of string datum depends on setting of national environment of server.

PG_FUNCTION_INFO_V1 (dateserial);
PG_FUNCTION_INFO_V1 (timeserial);
PG_FUNCTION_INFO_V1 (utc_timestamp);
PG_FUNCTION_INFO_V1 (mssql_timestamp);

Datum dateserial (PG_FUNCTION_ARGS)
{
  int32 year = PG_GETARG_INT32(0);
  int32 month = PG_GETARG_INT32(1);
  int32 day = PG_GETARG_INT32(2);

  if (year >= 0 && year < 73) 
    year += 2000;
  else if (year >= 0 && year < 100)
    year += 1900;

  if (month > 12 || day > 31)
    {
      ereport(ERROR, 
	      (errcode(ERRCODE_DATETIME_VALUE_OUT_OF_RANGE),
	       errmsg("month or day out of range")));
    }
  else if (day > day_tab[isleap(year)][month - 1]) 
    {
      ereport(ERROR, 
	      (errcode(ERRCODE_DATETIME_VALUE_OUT_OF_RANGE),
	       errmsg("day out of range")));
    }
  else if (IS_VALID_UTIME(year, month, day)) 
    {
      DateADT d = date2j (year, month, day) - POSTGRES_EPOCH_JDATE;
      PG_RETURN_DATEADT(d);
    } 
  else
    {
      ereport(ERROR, 
	      (errcode(ERRCODE_DATETIME_VALUE_OUT_OF_RANGE),
	       errmsg("day out of range")));
    }
}

Datum timeserial (PG_FUNCTION_ARGS)
{
  int32 hour = PG_GETARG_INT32(0);
  int32 minutes = PG_GETARG_INT32(1);
  int32 seconds = PG_GETARG_INT32(2);

  if (hour < 0 || hour > 59 || minutes < 0 || minutes > 59 
      || seconds < 0 || seconds > 59)
      ereport(ERROR, 
	      (errcode(ERRCODE_DATETIME_VALUE_OUT_OF_RANGE),
	       errmsg("time out of range")));
  else 
    {
      TimeADT result = (((hour * 60) + minutes) * 60) + seconds;  
      PG_RETURN_TIMEADT(result);
    }
}

Datum utc_timestamp (PG_FUNCTION_ARGS)
{
  int64 utc = PG_GETARG_INT64(0);

  Timestamp result = utc /1000 - 
    ((POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) * 86400);

  PG_RETURN_TIMESTAMP(result);
}

Datum mssql_timestamp (PG_FUNCTION_ARGS)
{
  float mstime = PG_GETARG_FLOAT4(0);

  Timestamp result = ((mstime - 25567.0) * 86400.0) - 
    ((POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) * 86400);

  PG_RETURN_TIMESTAMP(result);
}

Registration of SQL file can have following form for above-mentioned functions

 CREATE FUNCTION mssql_timestamp(float4) RETURNS timestamp 
     AS 'datefunc.so', 'mssql_timestamp' LANGUAGE 'C' STRICT;

   CREATE FUNCTION utc_timestamp(bigint) RETURNS timestamp 
     AS 'datefunc.so', 'utc_timestamp' LANGUAGE 'C' STRICT;

   CREATE FUNCTION timeserial(int, int, int) RETURNS time 
     AS 'datefunc.so', 'timeserial' LANGUAGE 'C' STRICT;

   CREATE FUNCTION dateserial(int, int, int) RETURNS date 
     AS 'datefunc.so', 'dateserial' LANGUAGE 'C' STRICT;

   GRANT EXECUTE ON FUNCTION dateserial(int, int, int) TO PUBLIC;
   GRANT EXECUTE ON FUNCTION timeserial(int, int, int) TO PUBLIC;
   GRANT EXECUTE ON FUNCTION utc_timestamp(bigint) TO PUBLIC;
   GRANT EXECUTE ON FUNCTION mssql_timestamp(float) TO
   PUBLIC;

Verification of personal number

Following example includes function for verification of personal number (it is test for null modulus 11), obtaining of date of birth from personal number and identification of sex.

We begin with function for parsing of personal number. It is possible to order personal number with or without mark of slash, which separate date of birth and index. Function returns zero or issue of mistake. Function tolerate 90 spaces at the most behind personal number (so I can use function for type CHAR(n) too).

char *error [] = {
  "Personal number is too long",
  "Symbol / is on a bad position",
  "Personal number contains inadmissible mark",
  "Personal number has wrong format",
  "Personal number set in before 1954 can contains only nine digits",
  "Modulo 11 of personal number is not zero",
  "Personal number can contains only nine or ten digits"};

#define MAXRCLEN     100

int parsen_rodne_cislo (char *rc, int parts[3], long length)
{
  int i = 0, s2 = 0; char c; 
  int64 s = 0;
  
  while (length--)
    {
      c = *rc++;
      if (c == ' ')
        break;
      if (i > 10)
        return 1;
      if (c == '/')
        if (i == 6)
	  continue;
	else
	  return 2;
      if (c <'0' || c > '9')
        return 3;
      s  = (s  * 10) + (c - '0');
      s2 = (s2 * 10) + (c - '0');
      i++;
      
      if (i == 2) {parts[0] = s2; s2 = 0;}
      if (i == 4) {parts[1] = s2; s2 = 0;}
      if (i == 6) {parts[2] = s2; s2 = 0;}           
    }
    
  while (length-- > 0)
    {
      if (i > MAXRCLEN)
        return 1;
      if (*rc++ != ' ')
        return 4;
      i++;
    }
  if (i == 9)
    if (parts [0] > 53)
      return 5;
    else
       parts [0] += 1900;
  else
    if (i == 10)
      {
        if (parts [0] > 53)
          parts [0] += 1900;
        else
          parts [0] += 2000;
        if ((s % 11) != 0)
	  return 6;
      }
    else
      return 7;

  parts[3] = s2;
  return 0;
}

There are following rules:

  • Personal numbers which were set in before 1954 have index only with three digits and it is impossible to test them for modulus 11.
  • Personal numbers which were set in after 1954 have index with four digits and it is possible to test them for modulus 11.
  • We can identified century when personal number was set in because it depends on the quantity of digits in index. If I have initial digits, for example 19, and index with three digits, then it will be year 1919. If I have index with four digits, it will be year 2019.

On the following lines I will speak about functions, which were called by PostgreSQL. They must be declared according to the v1 convention.

PG_FUNCTION_INFO_V1 (check_rc);
PG_FUNCTION_INFO_V1 (birth_date);
PG_FUNCTION_INFO_V1 (sex);

Datum
check_rc (PG_FUNCTION_ARGS)
{
  int parts [] = {0, 0, 0, 0}, result;
  BpChar *rc = PG_GETARG_BPCHAR_P (0);
  result = parsen_rodne_cislo (VARDATA (rc), parts, VARSIZE (rc) - VARHDRSZ);

  if (result == 0)
    PG_RETURN_BOOL (true);
  else
    {
      elog (WARNING, error [result - 1]);
      PG_RETURN_BOOL (false);
    }
}

Date
birth_date (PG_FUNCTION_ARGS)
{
  int parts [] = {0, 0, 0, 0}, result; DateADT d;
  BpChar *rc = PG_GETARG_BPCHAR_P (0);
  result = parsen_personal_number (VARDATA (rc), parts, VARSIZE (rc) - VARHDRSZ);

  if (result != 0)
    elog (ERROR, error [result - 1]);

  if (parts[1] > 50)
    parts [1] -= 50;

  d = date2j (parts[0],parts[1], parts[2]) - date2j (2000, 1, 1);  
  PG_RETURN_DATEADT (d);
}

Date
sex (PG_FUNCTION_ARGS)
{
  int parts [] = {0, 0, 0, 0}, result; char s;
  BpChar *rc = PG_GETARG_BPCHAR_P (0);
  result = parsen_rodne_cislo (VARDATA (rc), parts, VARSIZE (rc) - VARHDRSZ);
  BpChar *res;

  if (result != 0)
    elog (ERROR, error [result - 1]);

  if (parts[1] > 50) s = 'F'; else s = 'M';

  res = palloc (VARHDRSZ + 1);
  VARATT_SIZEP (res) = VARHDRSZ + 1;
  *(VARDATA (res)) = s;

  PG_RETURN_BPCHAR_P (res);
}

Function elog is good for display on user's consol. At levels of ERROR it leads to interruption of performing function. At levels of WARNING it leads only to extraction of notification. Using is same as at PL/pgSQL construction RAISE.

SQL commands, which are established to PostgreSQL are following. To perform these commands can only user of PostgreSQL.

CREATE OR REPLACE FUNCTION check_rc (char) RETURNS bool
  AS 'rc.so', 'check_rc' LANGUAGE 'C' STRICT;

CREATE OR REPLACE FUNCTION birth_date (char) RETURNS date
  AS 'rc.so', 'birth_date' LANGUAGE 'C' STRICT;

CREATE FUNCTION sex (char) RETURNS char
  AS 'rc.so', 'sex' LANGUAGE 'C' STRICT;

I used simple Makefile (I suppose that source files for PostgreSQL are in /usr/src/postgresql-7.3b1/src/include.

PG_INCLUDE = '/usr/src/postgresql-7.3b1/src/include'
PG_LIBDIR = `pg_config --libdir`
PG_PACKAGE = `pg_config --pkglibdir`

all:
	gcc -ggdb -fpic -I$(PG_INCLUDE) personal_number_fce.c -c -o rc.o
	gcc -ggdb -shared -o rc.so rc.o
	cp rc.so $(PG_PACKAGE)/rc.so

If everything is all right, you will be able to try created functions.

// personal number is fictional
SELECT check_rc('806115/0262');
SELECT birth_date('806115/0262');
SELECT sex('806115/0262');

Debugging

Note to debugging. Developing of functions in C has more pleasant sites too. One of them is possibility of stepping during the functions debugging (thanks for kick in conference). For tuning you may use common gdb debugger or some of his followers, which is more comfortable (for example ddd). It is necessary to know, that function is performing on the server, which is working below another account than client. We can tune application only below the user, who is owner of the tuned process. In this case it is user of PostgreSQL. Whole process should be evident from following commands.

  • I run new client of psql and I will let perform minimally once
  • tested function, to download library
[paul@localhost]$su postgres
[paul@localhost]$ps -ax | grep postgres
1940 ? S  0:00 postgres: paul testdb011 [local] idle

[paul@localhost]$gdb
(gdb)attach 1940
(gdb)break check_rc
Breakpoint 1 at 0x404040fa: file rc.c line 50.
# at psql we execute the performing of function
(gdb)list 50
47
48 Datum
49 check_rc (PG_FUNCTION_ARGS)
50 {
51   int parts [3], result;
52   BpChar *rc = PG_GETARG_BPCHAR (0);
        ...

(gdb)step
51   int parts [3], result;
(gdb)step
52   BpChar *rc = PG_GETARG_BPCHAR (0);
(gdb)...
(gdb)continue
(gdb)detach
Detaching from program /usr/local/pgsql/bin/postgres, process 1940
(gdb)quit
[postgresql@localhost]$

Project of personal data types

From last chapter we dispose with functions for determination of validity, date of birth and sex from personal number. Meanwhile I suppose, that personal number is saving like CHAR(11). Evident disadvantage is, that every time when I want to determine some attribute of personal number, it leads to parsing of string. Because of effectivity I can not use personal number like an index and I can not sort according to it, because I may but I need not to use symbol of slash in personal number.

We can solve these problems by creation of personal data type (but it increase charges during displaying of value). Creation of data type rests in:

  • definition of type
  • creation of converse functions between our type and type of cstring
  • creation of functions for operators
  • registration of type and registration of operators

Type personal number consists from date of birth and index. By reason that index is always smaller than 32767 I can use the highest bit for determination of sex.

typedef struct personal_number {
  DateADT born;
  int16   index;
} personal_number;

Size of type is 8 byte, therefore it will be transmitted to all functions by link. I will define macros (it is just type cast of registers).

#define DateGetPersonalNumberP(X)    ((personal_number*) DatumGetPointer(X))
#define PersonalNumberPGetDate(X)    PointerGetDatum(X)
#define PG_GETARG_RC_P(n)            DateGetPersonalNumberP(PG_GETARG_DATUM(n))
#define PG_RETURN_RC_P(X)            return PersonalNumberPGetDatum(X)

Bases are converse functions from and to the type of cstring. PostgreSQL use them automatically during displaying and setting of the concrete value. It is advised to called these functions typ_out and typ_in. In the example there is used function parsen_personal_number from previous part.

PG_FUNCTION_INFO_V1 (personal_number_in);
PG_FUNCTION_INFO_V1 (personal_number_out);

Date
personal_number_in (PG_FUNCTION_ARGS)
{
  int parts [] = {0, 0, 0, 0}, result, sex;
  char *rcstr; personal_number *rc;

  if (PG_ARGISNULL (0)) 
    PG_RETURN_NULL ();
  
  rcstr = PG_GETARG_CSTRING (0);
  
  result = parsen_personal_number (rcstr, parts, strlen (rcstr));
  if (result != 0)
    elog (ERROR, error [result - 1]);

  rc = (personal_number*) palloc (sizeof (personal_number));

  if (parts [1] > 50)
    {
      sex = 0;
      parts [1] -= 50;
    }
  else
    sex = 1;  

  rc -> born = date2j (parts [0], parts [1], parts [2]) -
    date2j (2000, 1, 1);
  rc -> index = (parts [3] | sex <<15);

  PG_RETURN_RC_P (rc);
}

char *sts_ito2d (int i, char *temp);
  
Date
personal_number_out (PG_FUNCTION_ARGS)
{
  int r, m, d, r2, length;
  char temp [10], *rcstr;
  personal_number *rc;

  if (PG_ARGISNULL (0)) 
    PG_RETURN_NULL ();

  rc = PG_GETARG_RC_P (0);

  j2date (rc -> date + date2j (2000, 1, 1), &r, &m, &d);

  if ((rc -> index & 32768) == 0) m += 50;

  if      (r > 2000) r2 = r - 2000;
  else if (r > 1900) r2 = r - 1900;
  else if (r > 1800) r2 = r - 1800;

  rcstr = (char*) palloc (12); rcstr [0] = '\0';

  strcat (rcstr, sts_ito2d (r2, temp));
  strcat (rcstr, sts_ito2d (m,  temp));
  strcat (rcstr, sts_ito2d (d,  temp));

  sprintf (temp, "%d", rc -> index & 32767);
  length = strlen (temp);

  if (r <= 1953)
    {
      strcat (rcstr, "/000");
      strcpy (&rcstr [10 - l], temp);
    }
  else
    {
      strcat (rcstr, "/0000");
      strcpy (&rcstr [11 - l], temp);
    }
  
  PG_RETURN_CSTRING (rcstr);
}

char *sts_ito2d (int i, char *buf)
{
  if (i <10)
    sprintf (buf, "0%d", i);
  else
    sprintf (buf, "%d",  i);
  return buf;
}

Now we can already declare these functions and type personal_number in PostgreSQL.

DROP TYPE personal_number CASCADE;

CREATE OR REPLACE FUNCTION personal_number_in (cstring) 
  RETURNS personal_number AS 'rc.so','personal_number_in' LANGUAGE 'C';

CREATE OR REPLACE FUNCTION rc_out (personal_number) 
  RETURNS cstring AS 'rc.so', 'personal_number_out' LANGUAGE 'C';

-- at first warning that type personal_number do not exist is wrote out

CREATE TYPE personal_number (
  internallength = 8,
  input          = rc_in,
  output         = rc_out
);

Now we are ready to create table of person and insert one note into it.

CREATE TABLE persons (
  rc personal_number, 
  name VARCHAR(20), 
  surname VARCHAR(20)
);

INSERT INTO persons VALUES ('7307150807','Pavel','Stěhule');

We can write functions, which are analogue to both functions sex and birth_date from previous part of serial. It would be better to write our own type cast (at least in the event of date of birth), then we obtain date of birth by type cast of personal number to date and sex to char.

Conversions must be realized by function, where parameter is value in original type and result is value in new type.

PG_FUNCTION_INFO_V1 (personal_number_todate);
PG_FUNCTION_INFO_V1 (personal_number_dobpchar);

Date
personal_number_todate (PG_FUNCTION_ARGS)
{
  personal_number *rc = PG_GETARG_RC_P (0);

  PG_RETURN_DATEADT (rc -> born);
}

Date
personal_number_tobpchar (PG_FUNCTION_ARGS)
{
  personal_number* rc = PG_GETARG_RC_P (0);
  BpChar *result = (BpChar *) palloc (VARHDRSZ + 1);
  VARATT_SIZEP (result) = VARHDRSZ + 1;
  if (rc -> index & 32768)
    *(VARDATA (result)) = 'M';
  else
    *(VARDATA (result)) = 'F';

  PG_RETURN_BPCHAR_P (result);
}

In SQL we declare new functions and type cast.

CREATE OR REPLACE FUNCTION personal_number_todate (personal_number)
  RETURNS date AS 'rc.so', 'personal_number_todate' 
  LANGUAGE 'C' STRICT IMMUTABLE;

CREATE OR REPLACE FUNCTION personal_number_tobpchar (personal_number)
  RETURNS char AS 'rc.so', 'personal_number_tobpchar' 
  LANGUAGE 'C' STRICT IMMUTABLE;

DROP CAST (personal_number AS date);

CREATE CAST (personal_number AS date) 
  WITH FUNCTION personal_number_todate (personal_number);

DROP CAST (personal_number AS char);

CREATE CAST (personal_number AS char)
  WITH FUNCTION personal_number_tobpchar (personal_number);

Functions, which we want to use for type cast must be IMMUTABLE, it means that result of function depends on parameters of function. Thanks to STRICT, if parameter is NULL, then result of functions will be automatically NULL. That is the reason why it is not necessary to control input parameters at NULL.

We can try type cast by command.

SELECT CAST ('7307150807'::personal_number AS date);

Creating of operators for personal data type

In last chapter I demonstrated how to create personal data type in PostgreSQL. I finish project of type personal_number in the middle of process. We can not searching, sorting and indexing according personal number yet.

We always miss basic operators =, <>, <, >, <=, >=. Test on equality or inequality is the simplest so I start with project of operators = and <>.

PG_FUNCTION_INFO_V1 (personal_number_eq);
PG_FUNCTION_INFO_V1 (personal_number_ne);

Date
personal_number_eq (PG_FUNCTION_ARGS)
{
  personal_number *rc1 = PG_GETARG_RC_P (0);
  personal_number *rc2 = PG_GETARG_RC_P (1);

  PG_RETURN_BOOL ((rc1 -> born == rc2 -> born) 
               && (rc1 -> index   == rc2 -> index));
}

Date
personal_number_ne (PG_FUNCTION_ARGS)
{
  personal_number *rc1 = PG_GETARG_RC_P (0);
  personal_number *rc2 = PG_GETARG_RC_P (1);

  PG_RETURN_BOOL ((rc1 -> born != rc2 -> born) 
               || (rc1 -> index   != rc2 -> index));
}

In SQL we create operators = and <> by following commands.

CREATE OR REPLACE FUNCTION personal_number_eq (personal_number, personal_number)
  RETURNS bool AS 'rc.so', 'personal_number_eq' LANGUAGE 'C' STRICT;

CREATE OR REPLACE FUNCTION personal_number_ne (personal_number, personal_number)
  RETURNS bool AS 'rc.so', 'personal_number_ne' LANGUAGE 'C' STRICT;

DROP OPERATOR = (personal_number, personal_number);

CREATE OPERATOR = (
  leftarg   = personal_number,
  rightarg  = personal_number,
  procedure = personal_number_eq
);

DROP OPERATOR <> (personal_number, personal_number);

CREATE OPERATOR <> (
  leftarg   = personal_number,
  rightarg  = personal_number,
  procedure = personal_number_ne
);

Project of others operators is little bit complicated. To exclude typing error I will create function rodne_cislo_cmp. It will returns -1 first parameter is bigger than second one, 0 if they are equivalent and 1 if second one is bigger then first one.

int sts_intcmp (long i1, long i2);

int personal_number_cmp (personal_number *rc1, personal_number *rc2)
{
  int r1, m1, d1, r2, m2, d2, r;
  long d2000 = date2j (2000, 1, 1);

  // if there is same sex, it will be enought to compare born and index
  if ((rc1 -> index> 32768) == (rc2 -> index> 32768))
    {
      r = sts_intcmp (rc1 -> born, r2 -> born);
      if (r != 0) return r;
      return sts_intcmp (rc1 -> index, rc2 -> index);
    }

  j2date(rc1 -> born + d2000,&r1,&m1,&d1);  
  j2date(rc2 -> born + d2000,&r2,&m2,&d2);  

  r = sts_intcmp (r1,r2); if (r != 0) return r;

  if ((rc1 -> index& 32768) == 0) m1 += 20;
  if ((rc2 -> index& 32768) == 0) m2 += 20;

  r = sts_intcmp (m1,m2); if (r != 0) return r;
  r = sts_intcmp (d1,d2); if (r != 0) return r;

  return sts_intcmp (rc1 -> index, rc2 -> index);
}

int sts_intcmp (long i1, long i2)
{
  if (i1 == i1)
     return 0;
  return (i1 <i2? -1 : 1);
}

It is easy to obtain C functions for comparing of operators. Therefore they are similar, I will not mentione all of them.

PG_FUNCTION_INFO_V1 (personal_number_lt);
PG_FUNCTION_INFO_V1 (personal_number_gt);
PG_FUNCTION_INFO_V1 (personal_number_le);
PG_FUNCTION_INFO_V1 (personal_number_ge);

Date
personal_number_lt (PG_FUNCTION_ARGS)
{
  personal_number *rc1 = PG_GETARG_RC_P (0);
  personal_number *rc2 = PG_GETARG_RC_P (1);

  PG_RETURN_BOOL (personal_number_cmp (rc1, rc2) <0);
}

Date
personal_number_le (PG_FUNCTION_ARGS)
{
  personal_number *rc1 = PG_GETARG_RC_P (0);
  personal_number *rc2 = PG_GETARG_RC_P (1);

  PG_RETURN_BOOL (personal_number_cmp (rc1, rc2) <= 0);
}

Declaration in SQL.

CREATE OR REPLACE FUNCTION personal_number_lt (personal_number, personal_number)
  RETURNS bool AS 'rc.so', 'personal_number_lt' LANGUAGE 'C' STRICT;

CREATE OR REPLACE FUNCTION personal_number_le (personal_number, personal_number)
  RETURNS bool AS 'rc.so', 'personal_number_le' LANGUAGE 'C' STRICT;

DROP OPERATOR <(personal_number, personal_number);

CREATE OPERATOR <(
  leftarg   = personal_number,
  rightarg  = personal_number,
  procedure = personal_number_le
);

DROP OPERATOR <= (personal_number, personal_number);

CREATE OPERATOR <= (
  leftarg   = personal_number,
  rightarg  = personal_number,
  procedure = personal_number_le
);

To every operator we can supply other attributes (if they exist), which can speed up performance of request (optimalization of request) which contains given operator.

  • Operation A will be commutator of operation B if for random x and y is valid formula: (x A y) = (y B x). For example = is commutator of itself. As well <>. For < is commutator > and vice versa. + is commutator of itself. - or / have not commutators.
  • Operation A will be negator of operation B if for random x and y is valid formula: (x A y) = NOT (x B y). Result of both operations must be only logical value. Negator of = is <> and vice versa, negator of < is >= and vice versa.
  • Clause RESTRICT describe behaviour of operator in face of constant during its using. We have several possibilities on selection:
eqsel for = result is small subset of table
neqsel for <> result is almost whole table
scalarltsel for < or <= result depends on position of constant in the table, smaller values are taken
scalargtsel for > or >= the same but in opposite direction.
  • Clause JOIN refer operations to function of estimation of selectivity. Sense is similar as with RESTRICT. Available functions are eqjoinsel, neqjoinsel, scalarltjoinsel a scalargtjoinsel.

After supplementation of clause mentioned above, definition of operator of comparison of personal number will looks like in following example.

DROP OPERATOR = (personal_number, personal_number);

CREATE OPERATOR = (
  leftarg    = personal_number,
  rightarg   = personal_number,
  procedure  = personal_number_eq,
  commutator = =,
  negator    = <>,
  restrict   = eqsel,
  join       = eqjoinsel   
);

DROP OPERATOR <> (personal_number, personal_number);

CREATE OPERATOR <> (
  leftarg   = personal_number,
  rightarg  = personal_number,
  procedure = personal_number_ne,
  commutator = <>,
  negator    = =,
  restrict   = neqsel,
  join       = neqjoinsel
);

If you try index according to rc in table of person, you will probably obtain error message, that for given data type there is not determined class of operator for access to "btree".

The reason is, that in PostgreSQL every type must have determined one (most frequent case) or several so-called classes of operators (for example for complex numbers I can make index for absolute size of number, for real or imaginary part). Class of operators determine which operators will be used for indexing and what semantic meaning they have, so-called Strategy Number.

For B-tree there are determined this Strategy Number:

Strategy Number
lesser than 1
lesser or equal 2
equal 3
bigger or equal 4
bigger than 5

PostgreSQL allows to use several methods of indexing (R-tree, Hash, GiST). Every method has own table of Strategy Numbers. Except that it is important to define so-called Support skill. For B-tree index there is only one (Strategy Number is 1). Its parameters are two keys and result is integer number. Negative number, if first parameter is lesser than second one, zero, if they are equal and positive number, if first parameter is bigger than second one.

We have basis of Support function. Now we must to make it available in PostgreSQL.

PG_FUNCTION_INFO_V1 (personal_number__cmp);

Date
personal_number__cmp (PG_FUNCTION_ARGS)
{
  personal_number *rc1 = PG_GETARG_RC_P (0);
  personal_number *rc2 = PG_GETARG_RC_P (1);

  PG_RETURN_INT32 (personal_number_cmp (rc1, rc2));
}

CREATE OR REPLACE FUNCTION personal_number_cmp (personal_number, personal_number)
  RETURNS int AS 'rc.so', 'personal_number__cmp' LANGUAGE 'C' STRICT;

CREATE OPERATOR CLASS personal_number_ops
  DEFAULT FOR TYPE personal_number USING btree AS
    OPERATOR  1	   <>,
    OPERATOR  2	   <=,
    OPERATOR  3	   = ,
    OPERATOR  4	   >=,
    OPERATOR  5	   >,
    FUNCTION  1	   personal_number_cmp (personal_number, personal_number);

In this moment our type personal_number is full-value data type, which we can use like primary key. For practical using it is necessary to remark that personal numbers which were set in before 1954 may not be unique, it was brought to light during the coupon privatization. If you are users of PostgreSQL v7.3 then you will have to set appropriate laws to other users (in this case it is probably PUBLIC) by the command GRANT.

Acess to internal functions of server

In previous chapters we designed functions, their results depended only on parameters of function, and which worked only with PostgreSQL data structures. But we dispose with SPI (Server Programming Interface) and it allows us much more.

Remember for example on aggregation functions html_list which were implemented in PL/pgSQL. This function returns list of items in column, formatted like html list. Identical function will have function html_list. Its parameters will be SQL request, column, and we want to save its items to list, maximum number of processed lines and size of block of allocated memory.

For our example, we need only five basic functions: SPI_connect, SPI_finish, SPI_exec, SPI_fnumber and SPI_getvalue. With this set of functions we can just read result of request, nevertheless it is enought in this moment. Function SPI_connect initiatives interface, therefore it must be called first. On the other hand SPI_finish releases all memory allocated during running of functions palloc, therefore it must be called at last. If our function returns register on memory allocated in procedure (at all values which are transferred by subquest) you will not able to allocate memory by command palloc. The reason is that SPI_finish releases memory with this value, before we can transfer value to calling function. This "unsolvable" problem is solved by alternative procedures for work with memory: SPI_palloc a SPI_replloc. Calling SPI_finish has no influence on memory, which is allocated by these functions. If it reach to termination of function which is calling elog (ERROR,...) then SPI_finish is calling automatically.

If error is caused then SPI function will usually returns negative number. After activation of interface we can let perform SQL command by function SPI_exec. The second parameter of function SPI_exec is maximum number of returned registers. If it is zero, then it will use all registers. After performance without error we can use global variable SPI_processed, which contains number of returned registers and global register SPI_typtable.

Register SPI_typtable is register on structure which contains field with returned lines and link on description of structure of returned dates. For obtained values in column we need to know order of column in request. Order is returned by function SPI_fnumber, for example:

column = SPI_fnumber(SPI_tuptable->tupdesc, "name");
if (column <= 0)
  elog (ERROR, "SPI_ERROR_NOATTRIBUTE");

Access to column number n of line number x is provided by function SPI_getvalue. It is necessary to know, that all values returned by SPI_getvalue are textual. Obtaining of the first value of the first line is provided by command:

printf ("%s", SPI_getvalue(
			SPI_tuptable->vals[0], 
			SPI_tuptable->tupdesc, 1));

To accelerate executing of function, we do not allocate memory at every addition of string, but block-by-block (or page-by-page) whose size is established by us.

text *sts_strcatex2 (char *str, text *txt, 
                     long *wrote, long *have_left, long page)
{
  long len = strlen (str);
  if (len > page)
    elog (ERROR, "String is bigger than size of page");
  if (*have_left <len)
    {
      long nsize = (((*wrote + len) / page) + 1) * page;
      txt = (text *) SPI_repalloc (txt, nsize + VARHDRSZ);
      *have_left = nsize - *wrote;
    }
  memcpy ((char*)VARDATA (txt) + *wrote, str, len);
  *wrote += len; VARATT_SIZEP (txt) = *have_left + VARHDRSZ;
  *have_left -= len;
  return (txt);
}

We can allocate memory by calling of function SPI_repalloc and not repalloc because if I use palloc, function SPI_finish will release this memory. Subsequent calling PG_RETURN_TEXT_P would returns invalid pointer, and it leads to fall of backend (in 80% of all cases).

Basis of function html_list is following:

SPI_connect ();

  if ((ret = SPI_exec (query, max)) <0)
    elog (ERROR, "SPI_exec return %d", ret);
  
  column = SPI_fnumber(SPI_tuptable->tupdesc, jm_column);

  if (column <= 0)
    elog (ERROR, "SPI_ERROR_NOATTRIBUTE");
  
  wrote = 0; have_left = page;
  txt = (text*) SPI_palloc (VARHDRSZ + have_left);

  txt = sts_strcatex2 ("<ul>\n", txt,&wrote,&have_left, page);

  for (line = 0; line <SPI_processed; line++)
    {
      txt = sts_strcatex2 ("<li>", txt,&wrote,&have_left, page);
      txt = sts_strcatex2 (SPI_getvalue (SPI_tuptable->vals[line], 
      					SPI_tuptable->tupdesc, column),
    			  txt,&wrote,&have_left, page);
      txt = sts_strcatex2 ("</li>\n", txt,&wrote,&have_left, page);
    }
  txt = sts_strcatex2 ("</ul>", txt,&wrote,&have_left, page);
  SPI_finish ();
  PG_RETURN_TEXT_P(txt);

There are no notes to text. In cycle, we are reading single values of column (SPI_getvalue returns register on string every time), which we add to structure txt of type text.

After translation, we can register function in PostgreSQL.

CREATE OR REPLACE FUNCTION html_list (cstring, cstring, int, int)
  RETURNS text AS 'html_list.so','html_list' LANGUAGE 'C';

Table, which I have used for test, contains 192 registrations, the result is text with length slightly bigger than 6KB. For Measuring of case I wrote short script in Python. By reason that I wanted to use profiler of Python, I divided commands to separate functions.

#!/usr/local/bin/python
import psycopg, time

def plain(cur):
    cur.execute ("SELECT name from names")

def html_list(cur):
    cur.execute ("SELECT html_list ('select name from names','name',0,8000)")
    rows = cur.fetchall ()
    print rows[0][0]

def pyt(cur):
    cur.execute ("SELECT name from names")
    rows = cur.fetchall ()
    print "<ul>"
    for r in rows:
        print "<li>%s</li>" % r[0]
    print "</ul>"


def main ():
    for i in range(100):
        con = psycopg.connect ("dbname=testdb011")
        cur = con.cursor ()
        plain(cur)
        html_list(cur)
        pyt(cur)

import profile
profile.run ('main()')
<pre>
This table is the result:
<pre>
 ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    4.670    4.670 <string>:1(?)
        1    0.090    0.090    4.760    4.760 profile:0(main())
        0    0.000             0.000          profile:0(profiler)
      100    3.670    0.037    3.670    0.037 test.py:12(pyt)
        1    0.620    0.620    4.670    4.670 test.py:20(main)
      100    0.250    0.003    0.250    0.003 test.py:4(plain)
      100    0.130    0.001    0.130    0.001 test.py:7(html_list)

It stands to reason that creation of list of functions html_list is in comparison with "normal" composition of list is 28 times quicker. The reason are probably slower operations of Python with strings, but also bigger charges connected with browsing of field in Python.

Table functions

Table Function API allows to design personal functions which returns tables (below Table Functions = TF). TF are able to returns table which contains scalar or composite (contains more columns) types. TF are specific because its repeated calling during the returning of table - TF is calling for every returned line.

If we work with functions with more columns (and we usually will), we will have to be able to create value of composite type. Procedure is following: we create a form, and its helps us to create space for data, and at last we fill this space by data. We provide data in format of cstring and then functions of PostgreSQL ensure correct conversion (according to form) to types of PostgreSQL. If there exist composite type, it may comes two basic cases

tupdesc = RelationNameGetTupleDesc ("_tabsin");

or if composite type is dynamically definite

    tupdesc = CreateTemplateTupleDesc (8 , false);
    TupleDescInitEntry (tupdesc,  1, "Type",    CHAROID,    -1, 0, false);
    TupleDescInitEntry (tupdesc,  2, "Laws",    VARCHAROID, -1, 0, false);
    TupleDescInitEntry (tupdesc,  3, "User",    VARCHAROID, -1, 0, false);
    TupleDescInitEntry (tupdesc,  4, "Group",   VARCHAROID, -1, 0, false);
    TupleDescInitEntry (tupdesc,  5, "Size",    INT4OID,    -1, 0, false);
    TupleDescInitEntry (tupdesc,  6, "Created", ABSTIMEOID, -1, 0, false);
    TupleDescInitEntry (tupdesc,  7, "Modif",   ABSTIMEOID, -1, 0, false);
    TupleDescInitEntry (tupdesc,  8, "Name",    VARCHAROID, -1, 0, false);

We can find more id types in catalog/pg_type.h. Next procedure is same for both mentioned variants:

  slot = TupleDescGetSlot (tupdesc);
  attinmeta = TupleDescGetAttInMetadata (tupdesc);
      ... filling of values values ....
  tuple = BuildTupleFromCStrings (attinmeta, values);
  result = TupleGetDatum (slot, tuple);

Last command perform conversion of type Tuple on Date which only one we can return. TF must use V1 calling convention. Conversion into the native types of PostgreSQL is performed by function BuildTupleFromCStrings. Parameter is structure of type AttInMetadata (form for creation of value) and field with registers on single values in text format. If some of the registers is NULL, we will use NULL like converted value. It is possible to repeatedly use (and we should use) structures tabdesc, slot and attinmeta for every line. It is ineffective to create them for every returned line.

Before I will devote to table functions, it is necessary to mentioned way how PostgreSQL give out dynamic memory. We can dynamically allocate memory space by the calling of function palloc or change its size by function repalloc. Memory is give out from so-called actual memory context. Memory context is standardly created during the start of function and cancelled during the finishing of function. Thanks to it, it is ensured that all dynamic memory ,allocated by called functions, will be released.

But in the case of table functions this behaviour is unwanted. Table functions repeatedly starts for every line of returned table. But we need space for saving of data structures, which will exist for all the time of repeated calling of table function (of above mentioned structure). Therefore we have multicall memory context, which is released after last calling of tabular function (it is when function return no data). Every memory context must be activated by calling of function MemoryContextSwichTo.

if (SRF_IS_FIRSTCALL ())
  {
    MemoryContext  oldcontext;
    funcctx = SRF_FIRSTCALL_INIT ();
    oldcontext = MemoryContextSwitchTo (funcctx->multi_call_memory_ctx);

       ... obtaining of users from multicall context

    MemoryContextSwitchTo (oldcontext);
       ... from this moment everything is as before
  }
funcctx = SRF_PERCALL_SETUP ();

Function SRF_IS_FIRSTCALL will returns nonzero value, if function starts at first time (within the disposing of one request for table, not for a lifetime of library in memory). SRF_FIRSTCALL_INIT creates multicall memory context. From structure functx we will use field user_fctx, which we will use for saving of register on our own "statistic data".

The first example of TF is function fcetabsin, which returns table with values of function sin. Parameters of functions has meaning: from, to and step.

#include "postgres.h"
#include "funcapi.h"
#include <math.h>

typedef struct fcetabsin_info {
  float8 iter;
  float8 step;
  float8 max;
} fcetabsin_info;

PG_FUNCTION_INFO_V1 (fcetabsin);

Date
fcetabsin (PG_FUNCTION_ARGS)
{
  FuncCallContext *funcctx;
  TupleDesc        tupdesc;
  TupleTableSlot  *slot;
  AttInMetadata   *attinmeta;
  fcetabsin_info  *fceinfo;

  if (SRF_IS_FIRSTCALL ())
    {
      MemoryContext  oldcontext;
      funcctx = SRF_FIRSTCALL_INIT ();
      oldcontext = MemoryContextSwitchTo (funcctx->multi_call_memory_ctx);

      tupdesc = RelationNameGetTupleDesc ("_tabsin");
      slot = TupleDescGetSlot (tupdesc);
      funcctx -> slot = slot;
      attinmeta = TupleDescGetAttInMetadata (tupdesc);
      funcctx -> attinmeta = attinmeta;

      fceinfo = (fcetabsin_info*) palloc (sizeof (fcetabsin_info));
      fceinfo -> iter = PG_GETARG_FLOAT8 (0);
      fceinfo -> max  = PG_GETARG_FLOAT8 (1);
      fceinfo -> step = PG_GETARG_FLOAT8 (2);
      funcctx -> user_fctx = (void*) fceinfo;

      MemoryContextSwitchTo (oldcontext);
    }
  funcctx = SRF_PERCALL_SETUP ();
  fceinfo = (fcetabsin_info*) funcctx -> user_fctx;
  if (fceinfo -> iter <= fceinfo -> max)
    {
      Date result; char **values;
      HeapTuple tuple;

      values = (char **) palloc (2 * sizeof (char *));
      values [0] = (char*) palloc (16 * sizeof (char));
      values [1] = (char*) palloc (16 * sizeof (char));

      snprintf (values [0], 16, "%f", fceinfo -> iter);
      snprintf (values [0], 16, "%f", sin(fceinfo -> iter));
      fceinfo -> iter += fceinfo -> step;

      tuple = BuildTupleFromCStrings (funcctx -> attinmeta, values);
      result = TupleGetDatum (funcctx -> slot, tuple);

      SRF_RETURN_NEXT (funcctx, result);
    }
  else
    {
      SRF_RETURN_DONE (funcctx);
    }
}

We will register function by SQL commands:

CREATE TYPE _tabsin AS (i float8, j float8);

CREATE OR REPLACE FUNCTION fcetabsin (float8, float8, float8)
 RETURNS SETOF _tabsin AS 'tab_fce_sin.so', 'fcetabsin' LANGUAGE 'C';

and test it by command:

SELECT * FROM fcetabsin (0.1, 3, 0.1);

Example is very academic, but thanks to this, it is very transparent. Similar example (but more meaningful) we will find in contrib, where are functions which returns tables with numerical values in a given division etc.

The second command is function ls, which returns transcript of directory.

#include <sys/types.h>
#include <sys/stat.h>
#include <sys/dir.h>
#include <stdio.h>
#include "postgres.h"
#include "funcapi.h"
#include "catalog/pg_type.h"
#include <pwd.h>
#include <time.h>
#include <grp.h>

char *modes [] = {
  "---", "--x", "-w-", "-wx",
  "r--", "r-x", "rw-", "rwx"
};

typedef struct UsrFctx {
  DIR *dp;
  char *name;
  char **values;
} UsrFctx;

char **fill_array (char **item, char *directory, struct direct *dir)
{
  char whole_way [1024]; struct stat sbuf;
  char laws [10]; int i, j; 
  struct passwd *pw; struct group *gw;
  struct tm *loctime;

  sprintf (whole_way , "%s/%s", directory, dir-> d_name);
  stat (whole_way ,&sbuf);

  switch (sbuf.st_mode& S_IFMT) 
    {
    case S_IFREG: snprintf (item [0], 2, "-"); break; 
    case S_IFDIR: snprintf (item [0], 2, "d"); break; 
    case S_IFCHR: snprintf (item [0], 2, "c"); break; 
    case S_IFBLK: snprintf (item [0], 2, "b"); break; 
    default:      snprintf (item [0], 2, "?"); break;
    }

  laws [0] = '\0';
  for (i = 2; i >= 0; i--)
    {
      j = (sbuf.st_mode >> (i *3))& 07;
      strcat (laws, modes [j]);
    }
  if ((sbuf.st_mode> S_ISUID) != 0) laws [2] = 's';
  if ((sbuf.st_mode> S_ISGID) != 0) laws [5] = 's';
  if ((sbuf.st_mode> S_ISVTX) != 0) laws [8] = 't';

  snprintf (item [1], 16, "%s", laws);
  if ((pw = getpwuid (sbuf.st_uid)) != 0)
    snprintf (item [2], 32, "%s", pw -> pw_name);
  else
    snprintf (item [2], 32, "%d", sbuf.st_uid);
  if ((gw = getgrgid (sbuf.st_gid)) != 0)
    snprintf (item [3], 32, "%s", gw -> gr_name);
  else
    snprintf (item [3], 32, "%d", sbuf.st_gid);
  snprintf (item [4], 16, "%d", sbuf.st_size);
  loctime = localtime (&sbuf.st_ctime);
  strftime (item [5], 32, "%F %T", loctime);
  loctime = localtime (&sbuf.st_mtime);
  strftime (item [6], 32, "%F %T", loctime);
  snprintf (item [7], 255, "%s", dir -> d_name);

  return item;
}

PG_FUNCTION_INFO_V1 (ls);

Date
ls (PG_FUNCTION_ARGS)
{
  FuncCallContext *funcctx; TupleDesc        tupdesc;
  TupleTableSlot  *slot;    AttInMetadata   *attinmeta;
  UsrFctx *usrfctx; struct direct *dir;

  if (PG_ARGISNULL (0))
    elog (ERROR, "Parameter of function must be directory");

  if (SRF_IS_FIRSTCALL ())
    {

      MemoryContext  oldcontext;
      funcctx = SRF_FIRSTCALL_INIT ();
      oldcontext = MemoryContextSwitchTo (funcctx->multi_call_memory_ctx);
      
      usrfctx = (UsrFctx*) palloc (sizeof (UsrFctx));
      usrfctx -> name = PG_GETARG_CSTRING (0);  

      if ((usrfctx -> dp = opendir (usrfctx -> name)) == NULL)
	elog (ERROR, "%s cannot be opened",usrfctx ->  name);

      usrfctx -> values = (char **) palloc (8 * sizeof (char *));
      usrfctx -> values  [0] = (char*) palloc   (2 * sizeof (char));
      usrfctx -> values  [1] = (char*) palloc  (16 * sizeof (char));
      usrfctx -> values  [2] = (char*) palloc  (32 * sizeof (char));
      usrfctx -> values  [3] = (char*) palloc  (32 * sizeof (char));
      usrfctx -> values  [4] = (char*) palloc  (16 * sizeof (char));
      usrfctx -> values  [5] = (char*) palloc  (32 * sizeof (char));
      usrfctx -> values  [6] = (char*) palloc  (32 * sizeof (char));
      usrfctx -> values  [7] = (char*) palloc (255 * sizeof (char));

      funcctx -> user_fctx = (void *)usrfctx;
   
      tupdesc = CreateTemplateTupleDesc (8 , false);
      TupleDescInitEntry (tupdesc,  1, "Type",        CHAROID,    -1, 0, false);
      TupleDescInitEntry (tupdesc,  2, "Laws",        VARCHAROID, -1, 0, false);
      TupleDescInitEntry (tupdesc,  3, "User",        VARCHAROID, -1, 0, false);
      TupleDescInitEntry (tupdesc,  4, "Group",       VARCHAROID, -1, 0, false);
      TupleDescInitEntry (tupdesc,  5, "Size",        INT4OID,    -1, 0, false);
      TupleDescInitEntry (tupdesc,  6, "Created",     ABSTIMEOID, -1, 0, false);
      TupleDescInitEntry (tupdesc,  7, "Modified",    ABSTIMEOID, -1, 0, false);
      TupleDescInitEntry (tupdesc,  8, "Name",        VARCHAROID, -1, 0, false); 


      slot = TupleDescGetSlot (tupdesc); 
      funcctx -> slot = slot;

      attinmeta = TupleDescGetAttInMetadata (tupdesc);
      funcctx -> attinmeta = attinmeta;

      MemoryContextSwitchTo (oldcontext);
      
    }
  funcctx = SRF_PERCALL_SETUP ();
  usrfctx = (UsrFctx*) funcctx -> user_fctx;

  if ((dir = readdir (usrfctx -> dp)) != NULL)
    {
      Datum result;  HeapTuple tuple;
      while (dir->d_ino == 0)
	{
	  if ((dir = readdir (usrfctx -> dp)) != NULL)
	    {
	      closedir (usrfctx -> dp);
	      SRF_RETURN_DONE (funcctx);      
	    }
	}
      fill_array (usrfctx -> values, usrfctx -> name, dir);

      tuple = BuildTupleFromCStrings (funcctx -> attinmeta,
				      usrfctx -> values);
      result = TupleGetDatum (funcctx -> slot, tuple);
      SRF_RETURN_NEXT (funcctx, result);
    }
  else
    {
      closedir (usrfctx -> dp);
      SRF_RETURN_DONE (funcctx);      
    } 
}

You will register function:

CREATE OR REPLACE FUNCTION ls (cstring) RETURNS SETOF record
   AS 'ls.so', 'ls', LANGUAGE 'C';

And then you can write out content of directory.

testdb011=#select * from ls ('/home/') as 
  (type "char", laws varchar, user varchar, group varchar, 
size int4, created abstime, modified abstime, name varchar);
 type |   laws   | user     | group    |     size |         created        |    modified         | name
 ----+-----------+----------+----------+----------+------------------------+------------------------+----------
 d   | rwxr-xr-x | root     | root     |     4096 | 2002-09-03 12:04:09-04 | 2002-09-03 12:04:09-04 | .
 d   | rwxr-xr-x | root     | root     |     4096 | 2002-11-19 20:47:43-05 | 2002-11-19 20:47:43-05 | ..
 d   | rwxr-xr-x | pavel    | users    |     4096 | 2002-11-19 22:25:56-05 | 2002-11-19 22:25:56-05 | pavel
 d   | rwx------ | zdenek   | zdenek   |     4096 | 2002-09-03 08:52:54-04 | 2002-09-03 08:52:54-04 | zdenek
 d   | rwx------ | postgres | postgres |     4096 | 2002-11-19 22:26:02-05 | 2002-11-19 22:26:02-05 | postgres
(5 rows)

This function is more useful. Thanks to it we can from PL/pgSQL monitoring content of directories, by transcript of directory /tmp we can find out, when and how long is PostgreSQL initialized etc. Certainly we can write TF for import of CSV or XML files, for dismembering of MIME files, for access to IMAP, POP3 or LDAP servers.


This is translation of Czech article. You can find original version at http://www.pgsql.cz/index.php/N%C3%A1vrh_a_realizace_UDF_v_c_pro_PostgreSQL. I am sorry for possible innacuracy or grammatical mistakes.