ramblings on PHP, SQL, the web, politics, ultimate frisbee and what else is on in my life
back

Microbenchmarking caching solutions

The observant reader of the internals PHP mailinglist might have noticed me crying about the lack of support for getting associative arrays instead of stdClass from ext/soap, which in turn causes issues with caching soap replies into APC.

I am working in a setup where we have a backend JBoss server that provides the PHP frontend with information via a SOAP API. Depending on what information I am fetching, the backend can set a caching strategy (as in what parameters are relevant to determine the cache key) as well as an expiration time. For most services the expiration time is set to 1 hour or more. Some even going up to a week. We are expecting quite a lot more reads over writes (though our preliminary user tracking analysis is still a bit lacking to get proper numbers). Now all I am looking for is a decently fast way to write into a cache, with a very fast solution to read from the cache.

So in an effort of trying to understand the performance implications of the various options I wrote up a micro benchmark to give me a better idea. What I found are some surprising results. So surprising that I am actually wondering if I screwed up with the actual benchmark. As you can see in the code I tested the following options:

  1. write a serialized variable with nested instances of stdClass into a file, read contents of file and unserialize
  2. write a serialized variable with nested arrays into a file, read contents of file and unserialize
  3. write a nested array via var_export into a file, for reading include the file (*)
  4. write a jsoned variable with nested instances of stdClass into a file, read contents of file and decode
  5. write a jsoned variable with nested arrays into a file, read contents of file and decode
  6. write a nested array into APC, read via APC
  7. write a serialized variable with nested instances of stdClass into APC, read via APC and unserialize
  8. write a serialized variable with nested arrays into APC, read via APC and unserialize

(*) Note that using var_export() on an object (including stdClass instances) does not produce proper PHP code you can dump into a file and then include.

Now here are the results (all writes were performed 1000 times to give a better idea of the importance of write versus read performance):

Option Write Read
1. file + serialized objects 14591 7986593
2. file + serialized arrays 20503 6717205
3. file + arrays via var_export 26798 14187598
1. file + jsoned objects 27704 9648394
2. file + jsoned arrays 16713 9716892
4. apc as is 415110 2753591
5. apc + serialized objects 22196 3677606
6. apc + serialized arrays 16188 2719998

Now there are a number of surprises here. First writing a nested array without prior serialization with APC takes quite some time. Less surprising is that reading this data is then the fastest. I also did not expect the read performance for the include to be this slow. This seems to imply that if you are using php arrays for configuration that its faster to store them serialized in a file.

At any rate, the solution that best fits my needs atm is writing serialized nested instances of stdClass into APC, since I have no way of easily casting all the stdClass instances returned by the various SOAP services into arrays.

It is also clear that if ext/soap could get an option to serialize arrays, it would yet be faster, but Dmitry is not convinced that this a good idea. But the best solution would be if APC could treat stdClass instances as glorified associative arrays. Gopal hinted on IRC that he might be inclined to do this, though he is currently more concerned about QA'ing APC, than adding features like this.

UPDATE 11/06/07
I also added JSON, as you can see its slower than PHP's native serialize, but still not too shabby for something a lot more portable. I have updated the entire benchmark result listing. Generally I should not that I did not take down other services on the machine while running these tests, but while the numbers jump around a bit, the order of what is fast and slow remains unchanged. They are just a bit closer now and then.

Comments



Re: Microbenchmarking caching solutions

Hmm, I just realized, that I probably need to revisit the var_export() solution to ensure that the file that was generated is properly handled as a file to cache via the byte code cache. But I am actually pretty certain that it should have ended up in the byte code cache on the first read iteration.

BTW: I should also explain that in the micro benchmark, I wrote once to all variants before running the benchmarks, because I did not want to have the write performance screwed up by file or cache entry creation, since usually there will be a prior version that will be updated.

Re: Microbenchmarking caching solutions

It doesn't surprise me that include is slow. I didn't test with any cache engaged, but unserialize is faster than json_decode, which is in turn faster than include, for reading a bunch of nested arrays in 5.1.x.

I'm pretty sure unserialize will always win because it stores string-lengths, so that it doesn't have to parse every character. Although that may change for PHP6 when you need to unescape the Unicode.

Re: Microbenchmarking caching solutions

Have you tried to send PHP serialized fromat from the Java App as a response to the soap request instead of a structured response? In Java, assembling a PHP-serialized string should be as fast as assembling an XML SOAP response. Then the return value of your soap request would only be a string that you can directly store into whatever caching solution you want.

Gaylord

Re: Microbenchmarking caching solutions

Well not exactly, but it might be worth investigating a bit, since this way we might be able to get arrays instead of objects out of the soap replies. The serialization is not the expensive or difficult challenge, since we do much fewer writes than reads. The problem is therefore some way to get the data out quickly. But if we only get arrays, we could unserialize the reply and stick the nested arrays into APC, which is the fastest solution for reads.

Re: Microbenchmarking caching solutions

Well I just talked to the Java guys and they said that it would require hooking into the AXIS framework in order to change the serialization. Instead of PHP serialization we would probably then explore JSON, so I might try to add JSON to my little micro benchmark.

Re: Microbenchmarking caching solutions

I'm not sure I'm following what you are trying to do but is not using the ext/soap classmap support a way to get out of the stdClass stuff you get back?

--Tony

Re: Microbenchmarking caching solutions

Well with the classmap, I can force it to use other classes. However I do not want classes, I want arrays. On top of that I am not sure if I can force everything to be casted to a single specific class that would then implement some magic to get me arrays (not sure what that magic would be at any rate).