ramblings on PHP, SQL, the web, politics, ultimate frisbee and what else is on in my life
back

To serialize or to not serialize?

In my own framework I have decided that there is plenty of structured data that I will never query on that I will just stick into the database as a serialized array. To make life easier I introduced a "serialize" datatype into the MDB2 datatype module of the WebBuilder2 framework. That way MDB2 will handle the (de-)serialization in your DML or SELECT SQL statements.


<?php
$array = array('foo');
$mdb2->quote($array); // returns a serialized string because no datatype was giving
$mdb2->quote($array, 'text') // would probably give a notice due to an implict array to string cast done by php
$mdb2->quote($array, 'serialize') // returns a serialized string as requested

$result = $mdb2->query('SELECT someint, somearray FROM foo', array('integer', 'serialize'));
$row = $result->fetchRow() // would return an array with an integer and whatever was serialized into the somearray column
?>

Now the other day I modified the auto type handling in MDB2's quote() method to automatically serialize array's if no type is explicitly passed to the quote() method. Matthias then mailed me, why I do not just add the serialize datatype from my framework to the official MDB2 version.

I send out a question to pear-dev@ and it was not unanimously well received. So I sat down and pondered a more elegant approach. Actually there is already one approach implemented in MDB2 since ages. As I mentioned before the datatype handling is actually a separate module. What this allows is that someone can simply wrap around that module via php5's overloading and thereby easily add new functionality. The following is an untested example:


<?php
$mdb2->loadModule('MyDatatypeModule', 'datatype', false);

class MyDatatypeModule extends MDB2_Module
{
  var $datatype;

  function MyDatatypeModule()
  {
    $db =& $this->getDBInstance();
    if (PEAR::isError($db)) {
        return $db;
    }
    $this->datatype = $mdb2->loadModule('Datatype', null, true);
  }

  function __call($method, $params)
  {
    // implement any custom magic here ..
    return call_user_func_array(array(&$this->datatype, $method), $params);
  }
} 
?>

I also explored and already implemented another solution. I recently got a bug report for PostgreSQL where a user was using one of the custom datatypes supported by PostgreSQL and ran into issues when preparing a statement with a placeholder on this parameter. I therefore added a new option called "datatype_map" to handle these cases. I decided to see if I could extend this solution some more. The following is a working solution based on the current CVS code.


<?php
function serialize_callback(&$db, $method, $parameters)
{
  switch($method) {
    case 'quote':
      $value = serialize($parameters['value']);
      return $db->datatype->quote($value, 'text');
    case 'convertResult':
      $value = $db->datatype->convertResult($parameters['value'], 'text');
      return @unserialize($value);
    case 'getDeclaration':
      return $db->datatype->getDeclaration('text', $parameters['name'], $parameters['field']);
    case 'mapPrepareDatatype':
      return 'text';
  }
}

$mdb2->setOption('datatype_map', array('serialize' => 'serialize', 'telephone' => 'text'));
$mdb2->setOption('datatype_map_callback', array('serialize' => 'serialize_callback'));

$location = $mdb2->quote(array('foo' => 'bar'), 'serialize');
$telephone = $mdb2->quote('(030) 23615467', 'telephone');
$query = "INSERT INTO branches VALUES ($location, $telephone)";
var_dump($query); // 'INSERT INTO branches VALUES ('a:1:{s:3:\"foo\";s:3:\"bar\";}', '(030) 23615467')' (length=80)
$result = $mdb2->exec($query);
$query = "SELECT location, telephone FROM branches";
$result = $mdb2->queryAll($query, array('serialize', 'telephone'), MDB2_FETCHMODE_ASSOC);
var_dump($result);
/*
array
  0 =>
    array
      'location' =>
        array
          'foo' => 'bar' (length=3)
      'telephone' => '(030) 23615467' (length=14)
*/
?>

I know that a datatype "telephone" is probably not the coolest thing you can think of. But the idea is that you may still want to maybe use that "telephone" type inside some validation handling. However something like "email", "url" might be more exciting when this is interfaced with your form generators validation rules. Also note that with both solutions you will be able to use these custom datatypes all the way down to your MDB2_Schema based xml database schema files. So you could specify your data structure in a single place and generate your database schema, forms and validation rules from that single definition.

So I am just wondering what people think of these solutions? Should I just add the "serialize" datatype as a new native datatype? Is the ability to overload the datatype module sufficient? Or should I keep the extended "datatype_map" functionality in, even though it will add an additional if statement in a few places?

Comments



Re: To serialize or to not serialize?

Hey, this seems to be a good solution!

This allows to extend datatypes as you need. I think adding the serialize datatype is no longer 'subject of discussion', as it is no 'real database type'. If people need it (or any other special/magic type) they can simply use the nice datatype mapping.

I also like the idea of integrating validation rules into custom datatypes - but how could this "error" be checked if my validation rule says 'invalid'?

Re: To serialize or to not serialize?

Validation itself is beyond the scope of MDB2. However I am sure you can put something together using Quickforms or Validate PEAR packages that will do the necessary work for you.

Re: To serialize or to not serialize?

No, sorry.. that was not the question - believe me, I know how to validate form data, even without QF ;-)

Imagine, if you validate $parameters['value'] within your callback, how will you get the result of the validation?

Ok, I guess I have to care for it by myself in the callbacks and create a check-function, e.g.

<?php
$quoted_email = $mdb2->quote($email, 'email');
if (my_validation_has_errors()) {
echo "invalid form data";
} else {
$mdb2->exec($query);
}

?>

?

Re: To serialize or to not serialize?

What I meant is that validation does not fit anywhere in MDB2, that includes this callback mechanism. However what you would do is something like the following generated from an MDB2_Schema xml schema file:


<?php
class TableFooBar
{
  var $fields = array(
    'foo' => 'integer'
    'bar' => 'telephone'
  )
}
?>

Then you can pass "telephone" to MDB2 and it will be handled like a "text" datatype after having validated the field using Validate QF "telephone" validation rule. So again the beauty is that you say column "bar" is of type "telephone" and every layer will know what to do automatically. MDB2 and MDB2_Schema will handle it like a "text" column and the validation and form handlers will handle it with more specific "telephone" rules. So the form handler might automatically generate 3 input columns. One for the international and one for the city code and finally one for the rest of the phone number. The validation handler will check that its a valid international and city code etc.

Re: To serialize or to not serialize?

Acknowledged - where is the "buy now button"? ;-)

Re: To serialize or to not serialize?

I wonder if this is possible / useful / part of datatype mapping !?


<?php
$address = array(
  'streetname' => 'MyStreet',
  'housenumber' => '1',
  'pcode' => '12345',
  'city' => 'Berlin'
);
$quoted_address = $mdb2->quote($address, 'address');
$sql_address = $mdb2->buildSqlInsert($address, 'address');

$query = 'INSERT INTO my_table (' . $sql_address . ') VALUES ' . $quoted_address;

// ...
?>

Or maybe $address could be an object with a given interface for attribute -> field mapping !?

;-)

Re: To serialize or to not serialize?

That is the job of an entirely separate layer implementing the active record pattern or something like that. We already had a proposal for that in PEAR, but it was removed to due IP issues (the developer did not have the right to release code as open source).

Re: To serialize or to not serialize?

Hi. Function serialize_callback from third sample are erroneous in some situations.

For example, when we use mysqli driver and prepare()+execute() with placeholders and parameters, we have double escaped serialized value in DB. Mysql executes queries like this:

/*Prepare [1]*/ insert into branches values(?,?);
/*Execute [1]*/ insert into branches
 values('\'a:1:{s:3:\\\"foo\\\";s:3:\\\"bar\\\";}\'','333-33-33')

Fixed version of serialize_callback() :


<?php
function serialize_callback(&$db, $method, $parameters)
{
  switch($method) {
    case 'quote':
      $value = serialize($parameters['value']);
      //return $db->datatype->quote($value, 'text'); -- lost 2 parameters
      return $db->datatype->quote($value, 'text', $parameters['quote'], $parameters['escape_wildcards']);
    case 'convertResult':
      //$value = $db->datatype->convertResult($parameters['value'], 'text');  -- lost 1 parameter
      $value = $db->datatype->convertResult($parameters['value'], 'text', $parameters['rtrim']);
      return @unserialize($value);
    case 'getDeclaration':
      return $db->datatype->getDeclaration('text', $parameters['name'], $parameters['field']);
    case 'mapPrepareDatatype':
        //return 'text'; -- error in mysqli_stmt_bind_param(), needs 's', but not 'text'
        return $db->datatype->mapPrepareDatatype('text');
  }
}
?>

Re: To serialize or to not serialize?

And (for completeness) we need cases for 'getValidTypes' and 'compareDefinition' calls in serialize_callback().

For example:


<?php
    //........
    case 'compareDefinition':
        $parameters['current']['type']='text';
        if($parameters['previous']['type']=='serialize') $parameters['previous']['type']='text';
        return $db->datatype->compareDefinition($parameters['current'], $parameters['previous']);
    case 'getValidTypes':
        return $db->datatype->valid_default_values['text'];
    //.......
?>