ramblings on PHP, SQL, the web, politics, ultimate frisbee and what else is on in my life
back «  1  2 

Re: Dealing with uploads in a cluster

For a social networking site I implemented a nasty hack for uploading files to a remote content delivery network once the file was uploaded. Basically process images, and make a note in the database of what has been uploaded. Another script runs in the background which checks for new meta data, marks a file as being uploaded, uploads, marks as uploaded and moves onto the next file.

Re: Dealing with uploads in a cluster

Thank you all for your feedback. I am still going through all the recommendations. Right now I am still hoping to not have to modify the application for this. So MogileFS, [http://pooteeweet.org/blog/909/924 database based hacks are out for now. Buying a SAN is also currently not on the table, so stuff like GFS2 is out as well.

So one approach could be improving the current NFS based approach by triggering the rsync through some notification layer. Also it seems that there are some ways to enhance NFS with local caches.

I am not sure if I understand or trust the quick hack networked RAID solution. But a few solutions seem promising at this point this includes the FUSE based chironfs and GlusterFS as well as Starfish Distributed Filesystem. Maybe once the MogileFS developers have finished the FUSE wrapper I can also consider it.

Re: Dealing with uploads in a cluster

You might want to take a look at CouchDB, which you can think of a like an Amazon S3, but you host it yourself.

CouchDB is a clustered document store, which sounds like exactly what you need.

Re: Dealing with uploads in a cluster

Hey Lukas - why not use HTTP_Session2. I am using it myself on a fairly popular website. I'd be very interested in your feedback.

In regard to the Amazon S3 suggestions - I had the same problem and created a package called File_S3. I am trying to finish it/polish it up for a proposal. Hit me up if you want to sneak peak.

I'm using both on a very busy website and they work amazingly well.

Till

Re: Dealing with uploads in a cluster

I didn't read all comments, also check ZFS. If you want a cheaper cluster FS.

Btw, I think I saw a wrapper for MogileFS somewhere in the Mediawiki code.

Re: Dealing with uploads in a cluster

i use csync2 and the inotify-tools package to approximate as close to real-time *full* replication as one can get without dropping $80K on a SAN... and it is extremely fast.

http://inotify-tools.sourceforge.net/
http://oss.linbit.com/csync2/

my setup:
#!/bin/sh
inotifywait -mrq -e close_write -e create -e delete -e move /var/www | while read file; do
csync2 -x >/dev/null 2>&1
done

my csync2.cfg contains standard includes & excludes. one really cool thing about csync2 is that you can configure "action items" that will have process commands on other nodes in the cluster. so for example, if you modify the httpd.conf file on one it will replicate the changes to all other nodes, followed by a "apachectl graceful" command... smooth!

installation on centos 5 was simple, providing you install flex and bison -- both available via standard yum repositories... also, following this simlpe guide helped get through the xinetd stuff:
http://zhenhuiliang.blogspot.com/2006/04/csync2-is-so-cool.html

... but check release versions to make sure you're up to date.

Re: Dealing with uploads in a cluster

Here are two options:

1) (I have this working, but it is limited to 2 servers) Two servers mirroring a block device via DRBD. NFS running on the primary with heartbeat failover to the second. Due to the statelessness of NFS, a fail of the primary node causes a short timeout period on NFS clients, while the secondary starts its services. Clients then continue with no data/session loss.

2) (Currently investigating, but highly scalable) Multiple pairs of servers, each mirroring a block device via DRBD. each pair exports one or more GNBD/iSCSI targets, which are 'merged' using clvm. The clvm volumes are formatted using gfs2, and mounted on all nodes on the cluster which need access. This scenario allows easy expansion of the storage both in terms of space, and network access speed.

«  1  2