ramblings on PHP, SQL, the web, politics, ultimate frisbee and what else is on in my life
back 1  2  »  

Session clustering, who is online and replication lag

So I need to create a portal site, where we will require multiple frontends. As most portal's we need to store some state information inside a session. We also need to show how many users are online, but more importantly be able to filter searches in the member database by who is online (we do not need to filter by how long the last site interaction as been, but you never know with changing requirements).

Now in order to filter by users in the member search, we need to have some information about active sessions by user id inside MySQL database. But if we were to put the entire session data inside the database we would have to hit the master on every request to update the session timeout. More importantly we would have to deal with replication lag.

The alternative approach to working around replication lag is using memcache. But with memcache we would not have access to who is currently online in an easy queryable way when we are doing our member searches. So my idea is to combine the two. Use memcache for the session management, but also write to the database the session id and user id on session creation and clean this up when the session is destroyed. This way there would be no replication lag for subsequent requests on the same session. We would also take off the load from the master database from having to deal with all the writes on a clob field as we would only hit the master on session creation and whenever we trigger the session garbage collector.

The only tricky aspect of memcache is that we obviously need to provide enough ram to not run out of space if we have a lot of active sessions. So we will have to sit down and properly figure out how large our sessions will likely be and how many users we can handle with whatever ram we allocate. We will obviously have to continuously monitor this. While googling around I also just came across Sharedance which seems to store things on disk, but I am not yet clear how that would be better than using NFS or whatever.

Comments



Re: Session clustering, who is online and replication lag

A solution I like to use is this:
Use a heap/memory table in mysql to store id (for joining), login and expiry time stamp. You keep a session variable which stores of the last time that table was updated for that user. There is no reason to update on every request. Just update the expiry time stamp (eg. now + 20min expiry time) every 3 - 5 minutes should be fine. You then select all the users based on expiry > now. You can clean up the table if you want, using a cron to delete all expired entries.

Less tricky. Easy to maintain.

Re: Session clustering, who is online and replication lag

Depending of how important your session data is, meaning if you can't loose any session data under "any" circumstances, i wouldn't use a memcached only solution for session storage. It's after all just a cache and that's where it's good at. Too many things can happen, where data gets lost (planned reboot of a node, for example)

Re: Session clustering, who is online and replication lag

@Joel: Hmm this is more or less what I had planned with the exception that I planned to use memcache to manage the expiry, though thinking about it now, I am not sure I really can use memcache for this. So your suggestion sounds good.

@Chregu: Loosing a few sessions now and then is not a big deal for this application.

Re: Session clustering, who is online and replication lag

You probably don't want to hit the database and do a SELECT COUNT(*) FROM Logged_In_Users if you're planning on having lots of logged in users and/or lots of page views.

You could keep a fuzzy estimate of the # of logged in users, and cache this result.

Either keep this updated in memcached (need some work to keep it in cache, but if you're using it on every log in/log off/page hit it should be kept pretty fresh depending on memcached's replacement policy)

OR

Keep sessions in MySQL (good idea anyway), do the SELECT COUNT(*), cache the result with a short TTL (a few minutes?) in memcached.

Re: Session clustering, who is online and replication lag

Storing sessions in MySQL in generally not a great idea. It your site becomes busy it can be a bit difficult to scale (although there is a Php Distributed Session Manager Class out there somewhere which uses multiple standalone mysql nodes ;)) and you may run into locking issues.

Have you looked at Zend's Session Clustering solution? It is part of Zend Platform. ZP is a commercial product. If you can't/don't want to purchase a license, it wouldn't be too difficult to implement such an approach to clustering yourself. In a nutshell: when a session is created it appends the webserver's internal ip to the session identifier. If the next request is routed to another webserver it will fetch the session data from the original node. You could use memcache/tegula/sharedance/mcache/etc. on each node to store the session data (or just let php manage it in memory or on disk).

You certainly don't want to use NFS or a single memcache/sharedance node to store *all* your sessions. You want to make it scalable and depending on your application you might even want to make it redundant so that in case of a failure of a session management node the session data is not lost.

I'd really recommend you to read a little bit about Zend's approach; it is simple and scales well.

Re: Session clustering, who is online and replication lag

It's not really a free product, but you may want to have a look at Zend Platform's session clustering feature. It uses sockets to communicate the session data between servers, and does not depend on a single database/file/piece of memory to store it's sessions, so it scales surprisingly well.

Re: Session clustering, who is online and replication lag

Nice one Ivo, at the exact same time :)

Re: Session clustering, who is online and replication lag

Here is the Zend Session Clustering Whitepaper:
http://www.zend.com/content/download/1416/7891/version/1/file

Re: Session clustering, who is online and replication lag

If your storing sessions in mysql you already have a custom session handler, Why not make the session handler smart enough to not update the session time Every page load..

Figure out what you want your margin for error to be and only update the timestamp that often.

I do that plus I hash the session data and only update session data if something has changed, so its very feasible for my session handler to only update the timestamp every X minutes if the session data is static.

1  2  »