This is just a quick start for a brainstorming of what we all hate in todays CMS (I am including portal/community software here as well and I guess most also applies to web shops) software out there. I have written a very small CMS application myself ages ago so I do not have experience in what its really like writing and maintaining a big one. All I know is that its insanely painful to deal with any of them, though if your site is all about having admins managing tons of static content or end users wanting to interact, there is little way around these ugly beasts. I guess it all boils down to how to persist changes made through and admin panel. Somewhat related is the issue of scalability which to me mainly boils down to how easily can the storage logic be changed without changing the business logic on top.
The biggest gripe that results from these ever so powerful admin panels is the tendency to have these settings stored in the database. In combination with non universal auto generated primary keys it means that not only will it be hard to keep development and production in sync. Some things done inside the admin panel is classic content but a fair bit is configuration which we traditionally want to store inside our software code management repository. Not sure how, but my dream CMS would know the difference and commit configuration to my SCM and it would not depend on meaningless integer ID's that only have meaning inside a single database.
The other big thing I dread is the fact that while most people these days have bought into the MVC concept, few people actually choose a technology that really enables their model layer to offer a true separation of how their data is persisted. I think we all agree that one of the key selling points of all the currently popular CMS is the large number of out of the box features and available plugins. However for the most part they end up using the EAV pattern if they use an RDBMS. In the PHP space I am not sure if we even have a feature rich CMS based on document stores yet, then again I also do not see these as the holy grail either. Document stores are nice in a lot of ways, mainly because they offer a clean solution to having to deal with hierarchical data and more importantly flexibility in managing different content types. But at the same time they loose a lot of the power of an RDBMS that allows for powerful reporting, data manipulation via SQL. Of course the same can be achieved by custom coding but not with the same speed both in development and execution.
Anyways I digress, all I want to say is that my dream CMS gives me as much flexibility in managing data persistence. So no stupid EAV pattern (*), no hardcoded relational tree algorithms or ways to leverage RDBMS specific features from the chosen RDBMS .. be it Oracle, MySQL, PostgreSQL or whatever. But also no insisting that a document store is the holy grail for sorting any kind of data. As such I see much appeal in content stores (we are developing PHP implementation for a content store here at Liip for this very reason) or at the very least using a real ORM like Doctrine (and no none of your active records on steroids qualify).
I guess the last one is very obvious and at different levels this has been done well in existing CMS solutions or not: Define a solid API for adding additional functionality. I think the biggest challenge here is figuring how to balance the want for a low barrier to entry for non developers versus enabling developers to properly design their software. Simple usually means few hooks that are insanely (and increasingly) wide. At the other end you get a situation where everything is done inside layers and layers of inheritance which means you need to open/customize a bazillion number of files to do anything (that cannot be done through the admin panel).
This point is easiest to fail at, since its very grey-ish. At least to me its obviously bogus to mix content with configuration or to use EAV to store custom content types. But its not so easy to balance out ease of use versus flexibility. Either things are too hard for non developers or they are too simple for real developers. Either the app does not gain a large enough user base or it gets too many non developers destroying any sense of consistency (resulting in mass suicide of the only people who are able to fix things). I guess nobody can get it right the first time, then again looking at the current CMS it seems that nobody managed to get to the right balance through evolution either.
(*) Yes my answer to all people using the EAV pattern is to allow the admin panel to generate and execute DDL.
You seem to assume it's possible to create a perfect, universal CMS. The reality is that "perfect" and "universal" are opposite goals. Universality is always born from compromise and choice.
compromise and redundancy*
No, I don't assume that at all. Just like you I fear that its not possible. But I still find it surprising that all the big ones fall short on all the 3 above mentioned points. It seems like PHP CMS's are doomed to all succeed/fail the same way. Would be nice to have another big one that gets the above 3 right and finds new ways to fail :)
(*) So what's your answer for EAV + multi tenant?
For one I must admit that probably a lot of "short cuts" taken that I hate about CMS's today are probably there because they try to please shared hosting setups.
You kept your question quite short, so I am assuming you are asking how to maintain multiple different configurations for different clients. Above I am advocating to not keep configuration in the database, or rather I am advocating to keep it separate and to not use auto-generated surrogate keys. In general would just generate/maintain separate configuration files for each client.
As for data itself (lets say a tree structure of content pages), each client would have their own root node. However the root node would be identified by a natural key (like the domain name or project name). Below that data can reference itself just as any relational data does. Do note that as I said above I would appreciate it if tree structure algorithms could easily be switched (adjacency list model, materialized path, nested set etc.) via some config setting (plus some code generation) but without having to change actual business logic.
Again for getting around EAV I am advocating giving some admin process access to a database user that can execute DDL statements at run time so that every model can use a proper normalized and relational approach to data persistence.
"Auto-generated surrogate keys" by itself is not a problem, and wish people would stop presenting it as one.
I know of atleast one distributed version control system (Fossil) that has uses auto-generated surrogate keys in its database schema.
Could you elaborate on how Fossil is relevant here? Its not immediately obvious to me. The issue with auto generated (sequential) surrogate keys is that if you try to sync up your dev boxes and production databases you inevitably end up with ID clashes. This is the well known problem we also see when doing master-master replication. Adding additional column to store the originating machines name is not really nice and probably not support in 99,99% if the CMS's out there.
So usually you end up having to generate new ID's on the production machine (sort of reserving them in production during development), which means that developers have to jump through all sorts of hoops to adjust any IDs generated while using the admin panel locally. It gets worse if you want to maintain a set of pre-packaged modules that you want to be able to deploy even into existing installations.
The point was trying to make with Fossil, is that it handles replication at an application level, and not by any inbuilt database replication.
This is far more preferable, it seems to me, as there can be more than just a database to sync, anyway.
I know exactly what you're talking about. I tried a lot of CMS out there but no one satisfied me.
Creating a new one which meets the requirements you mentioned would be challenging, but time is spare.. ;)