PL/Perlu - Untrusted Perl (en)

Z PostgreSQL
Verze z 16. 11. 2010, 21:27, kterou vytvořil imported>Pavel
(rozdíl) ← Starší verze | zobrazit aktuální verzi (rozdíl) | Novější verze → (rozdíl)
Skočit na navigaci Skočit na vyhledávání

Translated by Barbora Řeháková

The question is if the database should have the access to the external sources, access to the files or to the application. Why not? From this moment PostgreSQL stops to be RDBMS and begins to be the application server for me. What is it all about? In other words: I transfer a part of code from 'bussineslevel' (which is non-interactive) down to the database. The more simple is the bussineslevel the more complicated is db level. When I move the accent to the first or second level, I do not change work difficulty of application development. It is necessary to write an amount of code solving a given problem and this amount is not usually depend on application architecture.

Well, what do I mean when I talk about application server? Application server (abb. AS) is a combination of sw and hw instruments which enable an activationand organization of information services. AS arranges IO by unit process between services available from outside, enables bilateral communication between informatin services, unified parameterization, automation of startup, planning, control etc. When we think about this definition, we find out, this definition includes everything from IT and it is not surprising. Using AS for realization user applications is not the newest idea. Using AS in this way means that I do not create one piece but a set of applications (services), which will uniformly communicate between each other and with users. Application container is for me a synonym of application server. Here are some examples: academical project Oberon, EJB, Python-based Zope and in one way also Apache and PHP.

If I accept the idea that 'the most important are data and data stream', I can accept the idea of AS at the level of database. Anyone who saw what can Microsoft do with Data transformation services will agree with me. We cannot talk about AS, when we mention DTS: MsSQL has not the acces to the external sources (it has to use vbscript and COM), the definition of the services is stored in database, we can design DTS in administrative interface, database cares about running and controlling DTS. Microsoft used to enforce the idea that everything is a source of data – it can be handled with a lot of data like a database via ADODB, e.g.it existed both COM library and ADODB controller, so the mails could be treated via application with ADODB interface etc. An analogy does not exist in Linux. SPI in PL/Perl can supply it (in PostgreSQL). I use Perl like a stick (analogy to vbscript) and join CPAN modules and transform the input and output of modules.

How do I create AS from PostgreSQL? It is easy. I identify stored procedures as single aplication (services). Arguments of the function program running service. Output is in a form of table, square or scalar. Here are some reasons for transfering function from aplication to stored procedures: I can activate the service everywhere I can reach to database. I can set up access privileges. In addition the compatibility of data types between databases and basis is not a problem nowadays – I can not care of forms of data. The database layer provides me code rewrite, right scheme and adaptation to national habits. I can use the language SQL for filtration and formate of outputs. The disadvantages are: more complicated debugging, the need of code rewrite of output data for SQL, bigger load of a server, lower transfer load to the clients or to the server with bussineslevel and the bigger dependence on data basis.

The combination of Perl and PostgreSQL has also the restrictions, which are caused by immaturity of Pl/Perl in PostgreSQL. The interface has not developed for a long time, but they started to work on it in the latest version 8.0. The data is transfer between Perl and PosgreSQL by a single application and also in the case of SFR function. If SETOF record comes back, the function in Perl makes the square of hash which contains the whole finite aggregate of records and then this square is passed to PostgreSQL (PL/plPGSQL passed values to single records). This solution is suitable for smaller tables (up to 10 000 lines). We have to use the function of return_next for bigger tables.

CREATE OR REPLACE FUNCTION perl_set()
RETURNS SETOF testrowperl AS $$
    return_next({ f1 => 1, f2 => 'Hello', f3 => 'World' });
    return_next({ f1 => 2, f2 => 'Hello', f3 => 'PostgreSQL' });
    return_next({ f1 => 3, f2 => 'Hello', f3 => 'PL/Perl' });
    return undef;
$$ LANGUAGE plperl

The services are the fundamentals of any application server. The described and inplemented services do not have the unified character. I did not design them to solve certain task, but to try the possibilities of PL/Perl or to simplify my life and for server administration. I will be glad if I can inspire by your suggestions. CPAN is a source of knowledge for me.

  1. accessing the list of users
  2. access to any files (e.g. a list of processes)
  3. processing of log
  4. achieving of uuid
  5. universal notification about the change by email
  6. access to information about a process
  7. access to items about the process PostgreSQL
  8. finding items in external base (SOAP, Amazon)
  9. joining a table from another database (mysql via DBI)
  10. acquiring information from external base (http, HTML)
  11. universal generating XML following the enquiry
  12. universal import XML

I wrote the majority of services to try and demonstrate certain kind of communication in Perl. Its profit is controversal, if there is any. I consider this things useful: accessing the list of users, access to any files, universal notification (development), joining a table from another database (slow, but good updating), generating XML.

You have certainly noticed, that I like the architecture of AS when I speak about more complicated projects. It does not matter, if the technology of Java (Tomcat) or SQL (PostgreSQL, Firebird) is used.On the other side I understand only PostgreSQL, so I use this RDB. In addition, I do not understand Java. I think that writing in Tomcat is more complicated – of course, the posibilities in Java are different. The importatnt thing is the change in design of application, let’s say its decomposition to interactive and non-interactive blocks and use the rule of maximum re-use in imlementation of non-interactive part.

The services, which are from parsing of file

These are the easiest sort of sevices. Every single line is processed after opening the file. The output can be both the table of records (log, passwd, cpuinfo, status) or the value (uuid). The input can be the real file (log, password), the tube (the parameter ‘ps |‘) or the virtual files from /proc/ name space (status, uuid, cpuinfo). There is something interesting: the way of generating of output square of SETOF functions. The safty/unsafty of these services is also worth mentionig. It is necessary to use untrusted perl for access to files frame. But only superuser can use this.

CREATE OR REPLACE FUNCTION ext.rtab(varchar, varchar) RETURNS SETOF RECORD AS $$
   open FILE, $_[0]; my @cntn = ();
   while (<FILE>) { chop; 
      my @items = map {s/^\s+|\s+$//g; $_;}  split ($_[1]);
      my %iitems; @iitems{map 'col'.$_, 0..$#items} = @items;
      push @cntn, \%iitems; }
   close FILE;
   return \@cntn; 
$$ LANGUAGE plperlu;

CREATE VIEW ext.passwd AS SELECT col0 AS account, col1 AS passwd, col2 AS uid,
       col3 AS gid, col4 AS description, col5 AS home,
       col6 AS shell FROM ext.rtab('/etc/passwd',':') 
  AS (col0  varchar, col1 varchar, col2 integer,
      col3 integer, col4 varchar, col5 varchar,
      col6 varchar);

The function runs on the server, where the outcome also is. Thats why we have to také data from perl in the correct coding. The statement SET client_encoding is worthless for this procedure. E.g., we have to enlarge the procedure with coding of loaded data, when we have the data in latin2 and the database in UTF-8. The third argument is coding input file.

CREATE OR REPLACE FUNCTION rtab(varchar, varchar, varchar) RETURNS SETOF RECORD AS $$
   use Encode 'from_to';
   my $rv = spi_exec_query("SELECT current_setting('server_encoding')",1);
   my $encoding = lc($rv->{rows}[0]->{current_setting});
   open FILE,$_[0]; my @cntn = ();
   while (<FILE>) { chop;
      from_to($_, $_[2], $encoding);  
      my @items = map {s/^\s+|\s+$//g; $_;}  split ($_[1]);
      my %iitems; @iitems{map 'col'.$_, 0..$#items} = @items;
      push @cntn, \%iitems; }
   close FILE;
   return \@cntn;
$$ LANGUAGE plperlu;

Joining a table from another database

It is uneffective, but on the other hand it is also useful. The users of PostgreSQL have not an alternative possibility to access to the another non-postgresql database. It is easy to connect to Oracle, MsSQL, mySQL, Informix, Firebirtd (by changing DNS) thanks to perl. This can be the end of loops like bump and import, if I want to get data from non pg rdbms to PostgreSQL. The DBI support ADO, CSV, DB2, Informix, Oracle, ODBC, Sybase, Xbase (ADO only with win32), JDBC, LDAP, MySQL, MaxDB, SQLLite, Solid. The disadvantage is the speed, it does not fit for big consequent aggregate. It needs to do the work flow.

CREATE OR REPLACE FUNCTION ext.rmysql(varchar, varchar, varchar,
  varchar) RETURNS SETOF RECORD AS $$
   use DBI;
   my $dbh = DBI->connect('dbi:mysql:'.$_[0],$_[1],$_[2], 
      { RaiseError => 1, AutoCommit = > });
   $dbh->do("set character_set_results='latin2'");
   my $sth = $dbh->prepare($_[3]);
   $sth->execute(); my $myref; 
   while ($dat = $sth->fetchrow_hashref) {push @$myref, $dat; }
   $sth->finish(); $dbh->disconnect();
   return $myref;
$$ LANGUAGE plperlu;

CREATE VIEW ext.names AS SELECT * FROM ext.rmysql('instalation:dirty','dbuser','****',
  'SELECT ixcontact, name, surname, title, email, tel1 FROM contact') AS
  (ixkontakt integer, name varchar, surname varchar, 
   title varchar, email varchar, tel1 varchar);

Finding data in external base

I wrote two services. The first one can obtain entries about the domain from web NIC, the second one find the publications of Amazon following the key words. The result of these functions is the chart. The first service is quite complicated. To obtain the data from classical html data form is more complicated then a SOAP calling. The library WWW::Mechanize is used to simplification of processing the data form – for automatization of filling the web forms – testing self forms, automatization of others forms. The library HTML::TreeBuilder is used for conversion HTML chart to 2D square. SOAP is optically longer, more arguments are transfered. I would use the module WWW::Mechanize for external platforms (e.g. trigger, filling the register form).

CREATE OR REPLACE FUNCTION ext.amazon_search(varchar) RETURNS SETOF ext.amazon_lst AS $$
   my $dev_token='insert developer token'; my $af_tag='insert associate tag';
   my $amazon_wdsl = "http://soap.amazon.com/schemas2/AmazonWebServices.wsdl";
   use strict; use SOAP::Lite; my @listbooks = (); 

   my $amazon_search = SOAP::Lite->service("$amazon_wdsl");
   my $results = $amazon_search ->

      KeywordSearchRequest(SOAP::Data->name("KeywordSearchRequest")
         ->type("KeywordRequest")
            ->value(\SOAP::Data->value(
                SOAP::Data->name("keyword" => $_[0]), SOAP::Data->name("page" => "1"), 
                SOAP::Data->name("mode" => "books"), SOAP::Data->name("tag" => $af_tag), 
                SOAP::Data->name("type" => "lite"), SOAP::Data->name("devtag" => $dev_token),))
   );

   foreach my $result (@{$results->{Details}}){
      $result->{UsedPrice} =~ s/\$//g;
      push @listbooks, {
         productname => $result->{ProductName}|| "no title", 
         authors => "{".join (', ', @{$result->{Authors}}) . "}",
         price => $result->{UsedPrice},asin => $result->{Asin}}}
   return \@listbooks;
$$ LANGUAGE plperlu;

SELECT * FROM ext.amazon_search('xml perl');

plperlu=# SELECT productname, authors FROM ext.amazon_search('postgresql');
                      productname                      |                    authors                     

-------------------------------------------------------+------------------------------------------------
 Beginning Databases with PostgreSQL                   | {"Richard Stones","Neil Matthew"}
 PostgreSQL                                            | {"Korry Douglas","Susan Douglas"}
 PHP and PostgreSQL Advanced Web Programming           | {"Ewald Geschwinde","Hans-Juergen Schoenig"}
 PostgreSQL Developer's Handbook (Developer's Library) | {"Ewald Geschwinde","Hans-JĂźergen SchĂśenig"}
 PostgreSQL Essential Reference                        | {"Barry  Stinson","Barry Stinson"}
 PostgreSQL: Introduction and Concepts                 | {"Bruce Momjian"}
 PostgreSQL                                            | {"Jeff Perkins"}
(7 rows)

plperlu=# SELECT * FROM ext.read_nic('jetspeed.cz');
-[ RECORD 1 ]+-----------------------------------------------------------------
domain       | jetspeed.cz
definition   |
situation    | The domain is booked us and is in CZ
glue         | {"A - ns.inway.cz","A - ns.inway.net"}
key          |
registered   | 2004-03-09
expiration   | 2005-03-09
registrator  | {{REG-GENERAL-REGISTRY,"GENERAL REGISTRY, s.r.o.",2004-11-13, },
             |  {REG-INWAY,"InWay, a.s.",2004-03-09,"2004-11-13 10:25"}}
owner        | {{IWS-PAVEL_STEHULE,"Ing. Pavel Stehule",2004-03-09, }}
techspr      | {{INWAY-TECH,"InWay, a. s.",2004-03-09, }}

Universal notification about the change of data via email

The changed items are in xml forms and are sent in the enclosure of notification email. This is the sample of universal trigger in PL/Perl, which can work with random chart.

if ($_TD->{event} eq 'INSERT') {
   foreach $key (keys %{$_TD->{new}}) {
      my $knode = $doc->createElement($key);
      $row->appendChild($knode);
      if (defined($_TD->{new}{$key})) {
        $knode->appendChild($doc->createTextNode($_TD->{new}{$key}));
      } else {
         $knode->setAttribute('null','true');
      }
   }
}

and the example of compilation of email and its sending:

$msg = MIME::Lite->new (From => $_TD->{args}[0], To => $_TD->{args}[1],
   Subject => $_TD->{args}[2], Type =>'multipart/mixed');
$msg->attach (Encoding => 'binary', Type => 'TEXT; charset=iso-8859-2',
   Data => "In database: $dbname is in $activated change of data.\n" .
           "The detail description is in the enclosure in the XML form");
$msg->attach(Type => "text/xml; charset=$encoding", Data => $doc->toString,
   Disposition => 'attachment', Filename => 'change.xml');   

$msg->send_by_sendmail();

Generating, processing of XML

XML is one of the format which is supported by every database. PostgreSQL fails to satisfy the specification SQL/XML (only DB2 supports it fully) and probably it will not support it (a sort of support is in accessories (contrib) – equalizing of libXML2). Perl has more possibilities and it is natural to use it to generate XML. If the database can organize XML, the application level can be simplified. The making of charts (this is the major target of db application) confines itself on sending SQL directions and following transformaton XML to charts via XSLT (easier option on the client side, more complicated on the server side). The current CPAN XML::generator::DBI does not support styles, it is necessary to use patch:

55a56,59
>     if (defined($proxy->{Stylesheet})) {
>        $proxy->SUPER::processing_instruction( {Target => 'xml-stylesheet',

>      Data => "type='text/xsl' href='$proxy->{Stylesheet}'"});
>     }

CREATE OR REPLACE FUNCTION ext.xml(varchar, varchar) RETURNS text AS $$
   use DBI;
   use XML::Generator::DBI;
   use XML::Handler::YAWriter;
   my $rv = spi_exec_query("SELECT current_setting('client_encoding'), " .
                           "current_database()", 1);
   my $encoding = lc($rv->{rows}[0]->{current_setting});
   my $cdb = $rv->{rows}[0]->{current_database};
   my $dbh = DBI->connect ("DBI:Pg:dbname=$cdb", "", "", { RaiseError => 1, PrintError => 0});
   my $ya = XML::Handler::YAWriter->new (Encoding=>$encoding, 
        Pretty => {PrettyWhiteNewline => 1, PrettyWhiteIndent => 1,});
   my $gen = XML::Generator::DBI->new (Handler => $ya, dbh => $dbh,
                                       ByColumnName => 1, ShowColumns => 1,
                                       Stylesheet => $_[1],);
   $gen->execute ($_[0]); $dbh->disconnect ();
   return join('',@{$ya->{Strings}});
$$ LANGUAGE plperlu;
SELECT ext.xml('SELECT * FROM ext.jmena LIMIT 10','x01.xsl');

PHP script:

require("common.inc");
header('Content-type: text/xml');
header('');
echo getSkalar("SELECT ext.xml('SELECT * FROM ext.cpu','x01.xsl')");
pg_close($cnh);

x01.xsl is universal XSL script, this means for any xml generating by function ext.xml. The result of transformation can be reviewed on screenshot.

<?xml version="1.0" encoding="UTF-8"?>

<xsl:stylesheet version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<html>
<head>
<style type="text/css">
BODY      {font-family:Arial,sans-serif; font-size:8pt; font-weight:100; margin-top:0; margin-left:6; margin-right:0}
TD        {font-family:Arial,sans-serif; font-size:8pt; font-weight:100;}
TH        {font-family:Arial,sans-serif; font-size:12px; font-weight:bold; text-align:left}
.r1      {background-color:#E9F2F2; cursor:hand; xline-height:17px;}
.r0      {background-color:#C9E5E7; cursor:hand; xline-height:17px;}
.colhead {color:white; padding-right:0; cursor:hand;
          background-color:#003366; height:12; 
          text-align:left; font-weight:bold; border-top:1px solid white; border-right:1px solid white;}

</style>
</head>
<body>
<table cellspacing="1" cellpadding="2">
  <tr class="colhead">
  <xsl:for-each select="database/select/columns/column/name">
    <th><xsl:value-of select="text()"/></th>

  </xsl:for-each>
  </tr>
  <xsl:for-each select="database/select/row">
    <tr class="r{position() mod 2}">
        <xsl:for-each select="*">

      <td><xsl:value-of select="text()"/></td>
        </xsl:for-each>
    </tr>

  </xsl:for-each>
</table>
</body>
</html>
</xsl:template>
</xsl:stylesheet>

Contrariwise, XML document can be one value (the structured data are the content), e.g. record of SOAP communication, or the chart. We safe our time in both cases. Validation of structure the values is reppraised to DTD (or XML scheme) and only loading of values is being solved. The XML Parser supports only coding like iso-8859-2 or windows-1250, it disallows the synonyms of terms. If we want to get the document with different coding, it is necessary to create new map – statements make_encmap and compile_encoding, or to use another parser.

CREATE OR REPLACE FUNCTION ext.rxml(varchar) RETURNS SETOF RECORD AS $$
   use XML::Parser; use Encode;
   my %rec; my @tref = ();
   my $parser = new XML::Parser (
      Handlers => {
         Start => sub {
        my ($p, $tag) = @_; 
            if ($tag eq "row" && $p->depth==2) { %rec = ();}},
         End   => sub {
            my ($p, $tag) = @_;
            if ($tag eq 'row') {push @tref, +{ %rec }; }},
         Char  => sub {
            my ($p, $data) = @_; 
            my $tag = $p->current_element ();
            if ($p->depth == 4) {
               chomp $data; $data =~ s/\^s+|\s+$//g;
               $rec{$tag} .= encode("iso-8859-2", $data) if $data ne "";}}, 
      });
   $parser->parsefile ($_[0]);
   return \@tref;
$$ LANGUAGE plperlu;

CREATE VIEW ext.xmlj AS SELECT * FROM ext.rxml('/tmp/names.xml') AS (
  ixcontact integer, name varchar, surname varchar, 
  title varchar, email varchar, tel1 varchar);