ramblings on PHP, SQL, the web, politics, ultimate frisbee and what else is on in my life
back 1  2  »  

Gimme a schema for the schema-less

One of the key features of NoSQL is the fact that its schema-less. Awesome. Of course I could just dump a serialized string of my "document" into an RDBMS and I could end up with more or less the same, but the big difference of course is that NoSQL (to me key-value stores do not fall under the NoSQL umbrella) still supports non hacky ways to interact with individual values inside a document as well as indexing. But while at first it might seem great to not have at the database enforce a specific schema, the app developer better have a good idea of his schema. Otherwise one developer might call a field "is_active" the next one might call it "isActive" and another one "enabled". I have little to no experience with CouchDB, MongoDB etc. but I am not really all that thrilled about schema-less for the above reason, what I want is no-cost-for-schema-changes, I do want a schema!

This is why I was quite thrilled back when IBM come out with top level XML support in DB2 Viper. Basically in DB2 you can store your documents as XML strings. You can query and manipulate these XML strings right inside the database and you can even define indexes to speed up access. But and here comes the big but for me, you could also optionally define an XML schema that the given XML needs to adhere to. IIRC DB2 supports storing the XML string as is or in an optimized (binary?) representation. I never really used DB2 as it didn't really felt like the answer to the web problem, but I wish MySQL or PostgreSQL would provide similar capabilities. Maybe I should give DB2 a second look, their express edition is much less restricted compared to Oracle's offering too. Also I wonder if any of the NoSQL stores support validating documents against some centrally managed data structure definition, aka a schema, of course with no cost for making changes to that schema.

PS: I tweeted about this a few days ago, but the Doctrine developer team is looking for people to help out with writing a driver.

Comments



Re: Gimme a schema for the schema-less

I find it always interesting that people argue the need for a schema is development, because apparently, developers are unguided sheep and screw things up otherwise. :)

IMHO, the great thing about "NoSQL" is (not the name but) the different perspective these projects bring to the table.

Give it a try some time!

Re: Gimme a schema for the schema-less

Do you use an editor that tells you if there is a syntax error? Thats all I am asking for here. I want to have my database tell me if I dump in an unexpected data structure. Heck for PHP we all know the syntax, is fairly well defined, but for a NoSQL database the structures are ad hoc, meaning that if you have more than one developer on the team (I guess thats the definition of team) you are bound to have to communicate the data structures you are using. Why would it be a feature to not have a native way to do this and instead requiring every team to come up with their own solution?

Re: Gimme a schema for the schema-less

Yeah, but the lack of schema really is the beauty. I can show you a couple use cases next time you're in town. They are pretty legit.

NoSQL is not the answer for everything, but for some. :)

And as for validation or errors - I can only comment on CouchDB but there are validation functions:
http://books.couchdb.org/relax/design-documents/validation-functions

Must admit I've never used one, but anyway -- that could be your DTD.

And re: your editor question, I do use source code highlighting always, I rarely ever have true syntax validation inside my editor. I guess I still haven't found -the- IDE. ;-) Different topic, but you asked.

Re: Gimme a schema for the schema-less

Hi Lukas,

I agree with what you say about the benefit of having schemas. It seems to me that the proponents of "schemaless" actually really would like to have a flexible schema (at least more flexible than in a RDBMS)

I have recently been looking at Freebase and its query language MQL a lot. They manage to strike a pretty interesting balance to having clearly defined schemas that define properties and relationships (which have a well-defined data type). At the same time, the solution is still flexible, because the entities can belong to many types at the same time.

Give it a look - I think having MQL grafted unto a RDBMS could be a pretty interesting solution for these problems.

Re: Gimme a schema for the schema-less

@till: Thanks for the link, this is exactly the sort of thing I was looking for, though this is more powerful than what I was even asking for, as in with a simpler syntax to just be able to specify that is required and what is optional in yaml syntax I could handle most cases yet still be able to quickly skim this, rather than having to read code. But I guess I could create a helper function for this.

Does MongoDB and the various other NoSQL databases have something similar?

Re: Gimme a schema for the schema-less

I work on MongoDB so I can only speak for how our users handle schema. In general the schema lives in code rather than in the database. A lot of the "ODMs" (ORM for documents) focus heavily on schema validation.

Since the schema is in your codebase and (hopefully) under version control, its much easier to work with than if it lives in the database. You can use all the techniques you know like grep, git blame, and plain old comments to make your intent for each field clear to other devs (including the future you).

Also, even with relational dbs where the schema is kept in the database, it is common for ORMs to require you to also keep a copy of the schema in the code. This can cause issues where the code and the db don't match, such as when you checkout an old version of the code to test.

Re: Gimme a schema for the schema-less

Sure sure, but that seems like a somewhat fragile assumption. Plus there might be multiple applications talking to the same MongoDB instance as you are converting legacy code. I am not the stored procedure kind of guy, but some basic level of schema would be nice to have as close to the storage layer as possible. Heck I might disable it in production.

Re: Gimme a schema for the schema-less

(Sidenote: Your comments are somewhat annoying. ;( No notification, the login always redirects me to your blog homepage, etc.. Replace them with disqus for a quickfix. =))

People mistake schema-less for no schema. In theory each "document" could be different, but really -- that's not the case in most apps. Schema-less gives you flexibility though.

I disagree on the "too powerful" assertion -- schemas and XML are by no means simple and just plain annoying. (IMHO, of course! :D)

I don't want to generalize for NoSQL -- in CouchDB, this kind of work is pushed into the client/app. That's all, and that's probably one of the difference some people have to get used to when they come from a traditional RDBMS.

On the other hand, as Mathias mentioned, in many cases the ORM requires you to duplicate all kind of logic in code anyway.

Re: Gimme a schema for the schema-less

"what I want is no-cost-for-schema-changes, I do want a schema!"

well... you got it > http://www.redbeanphp.com/

1  2  »