ramblings on PHP, SQL, the web, politics, ultimate frisbee and what else is on in my life

Short HipHop blurp

Well we all have read the announcement and chatted, twittered or whatever about it. I agree thats its cool this is being open sourced. I do fear however this means that there is no one less company looking after APC (mainly leaving just Yahoo then?). For now I think this is mainly interesting for "researchers" willing to work on improving the compatibility and more importantly the workflow around HipHop. I do not expect that anyone can really get a meaningful return on investment who makes this their business anytime soon (unless they can sell people on the hype), besides FB of course.

Overall I am not so impressed with the numbers. Even a 50% speed gain across the board .. what does that really mean? Lets say Facebook spends one third of the request (and I expect this to be even lower) on network and disk I/O, then this means that they managed a performance increase by a factor of 4 in the pure PHP processing parts. Note that I talked to some of the guys at the presentation and they said that the noted 50% speed up was not a result of switching the webserver (the binary generated by way of the C++ transformation includes a multi threaded webserver).

This is of course a nice speedup but wouldn't we expect at least a 2 fold .. or more increase if we could transform our PHP code into a C extension? This approach seems a lot more feasible for the masses. Like we could just drop in a ext/symfony or ext/ZF and thanks to autoloading magic our setup would continue to run unchanged. While it might not get that 4 fold speed up .. I would be willing to wager that it would get a 2-3 speed increase in the pure PHP processing.

So again I welcome any company open sourcing stuff that solved real world problems, especially if they are also willing to support the community in understanding their solution. But to me this solution will only matter for the ultra large companies where every single percent of saved CPU cycles matter. For the rest of us I think we need to rely on APC for a bit longer. But maybe with this transformator out there, other people will start working on the much more feasible approach of writing a PHP Code to C extension transformator.

Update: Belorussian translation of this article


Re: Short HipHop blurp

What is this? http://www.phpcompiler.org/
Isn't that a compiler that can transform PHP code to a C extension?

Btw. I also think that APC is still important as it is much easier to set it up. So.. I hope that APC development will not stop because of HipHop..

Re: Short HipHop blurp

Yeah something along those lines. I mean, even if we can afford the CPU and energy .. don't we owe it to earth to conserve energy where we can? But of course we still have to choose our battles in this capitalistic society .. so either write efficient code and use efficient deployment tools .. or pay your way out by getting a hoster that uses solar energy :)

Re: Short HipHop blurp

It seems the most important enhancement was not noticed. It is that HipHop generates a Web server executable that is multi-threaded.

This means that it will handle all the HTTP requests with a multi-threaded fashion. There will be only one memory poll shared by all the simultaneous requests.

This is a great victory when compared to the Apache pre-forked model because it will save a lot of RAM. Let me explain.

In Apache pre-forked model, each process has its memory poll. If a process has a memory usage spike, the allocated memory is not returned to the OS until the process is killed. After a while you notice many processes wasting a lot of RAM that cannot be reused.

In multi-threaded Web servers the memory pool is shared, so the memory spikes do not prevent other thread to reuse the RAM. Overall this makes the whole Web server consume less RAM.

If the Web server consumes less RAM to handle the same amount of simultaneous request, it can handle more simultaneous requests without exhausting the server available RAM.

The bottom line is that HipHop approach allows you to fit more users in less serves, thus cutting the costs of scaling up.

Why didn't they explain this in great detail? I believe it is not a trivial matter and most people would not understand.

It is easier to say it will take 50% less CPU, as most people understand that means it is 2x faster. With an easier to explain argument, it gets more attention by the majority of the PHP developers.

Now, the hard part is for core developers to accept that Zend Engine is made obsolete with this approach. As you may have noticed from the reactions, everybody is seeking an excuse to discourage the adoption of HipHop.

HipHop needs some work, probably borrowed from RoadSend, to eliminate the excuses raised by those that for some reason oppose to HipHop, but believe me it is possible. RoadSend did it, so can HipHop.

Re: Short HipHop blurp

Even a 50% speed gain across the board .. what does that really mean?

It's not about that. Facebook spends a LOT of money on PHP servers and datacenter space. Reducing that cost by half is really huge. Lots of $$$ ... Lots

Re: Short HipHop blurp

The point about memory consumption that Manuel brings is very important, maybe more than CPU in some cases. Even though RAM is cheap, we see that many times, specially with sites based on frameworks, memory consumption is one of the main problems when traffic spikes. If HipHop means less memory usage, it will be a really big improvement.

Re: Short HipHop blurp

What would interest me the most is if it supports FastCGI rather than HTTP. If that is the case it would be trivial to phase it into production by simply making a specific rule directing specific URIs to the FastCGI running the HipHop code. I imagine compiling a small script and making it available via FastCGI to test rather than an entire application.

It's unlikely I'd ever be willing of putting HipHop code alone as a webserver, and I'm not a fan of calling it via a proxy.

Re: Short HipHop blurp

Yeah, reduced memory usage is indeed something that was apparently mentioned at the event. This is a result of not only the fact that they have a multi threaded webserver, but more importantly because they do their best to pick more specific datatypes which are obviously way less flexible than zval's. Actually the later has been mentioned more often as the cause of the reduced memory overhead.

If transforming to an extension this benefit would again also be visible, but not as significantly. I am not sure how things look for Project Zero in this department, since they again can run in a multi threaded environment and they could in theory also try to pick more optimal non one size fits all data types internally.

Then again I do not know how they can really predict the possible values for a datatype and what happens if you keep changing form int, to string to array (I often explode, then process and then implode again for example) and if this will hurt memory usage or CPU usage.

Re: Short HipHop blurp

@Manuel: It's not mentioned too much since it's irrelevant -- Apache can run with a threaded model just fine, since version 2, which came out ages ago.

The primary reason why PHP users usually don't run with the threaded MPM and use prefork instead is simply that there's a plethora of PHP extensions out there that are not thread safe, and most would rather take the risk than do an assessment of all the external stuff they use.

/* Steinar */

Re: Short HipHop blurp

Right, and since they do not use the original libs that these extensions are based on, but instead use C++ libs which are supposedly thread safe, they can run multi threaded .. then again imho multi threading should really be left to small code paths .. its just too hard to get it right for long running complex stuff.