The main project I have been working on for the past year, never had automated testing. I must admit I am not a testing guru and none of the other people on the team were experienced. Furthermore I am not that convinced of having to change all of my code to be more testable or that everything that isn't easily testable is necessarily wrong anyways. For the most part the application consists of various callable methods that do some sort of SQL queries, ranging from a single one to a series of highly complex dynamic queries. Personally I am not a believer in maintaining SQL dumps that map to returned data, I want to query the real database. I also felt that making the various methods all unit testable was going to make the actual code more complex. So I really wanted functional testing, but I didn't see an efficient way to do this.
The returned data for most methods usually is JSON, but most of the time it contains a fair bit of HTML. That last bit always made it complicated for me. When the HTML changes, I do not want to change all my tests. Frontend testing is a totally different beast. So as the pains of minor issues in the various permutations of the dynamic queries just got big enough, I finally had the idea that we could simply use a different "theme" during functional testing. In those templates I essentially just return the parameters that were handed to the template via var_export() or some similar method.
Now its quite easily to write functional tests that just match patterns on the returned data structures. I am obviously not testing the HTML output this way. But its quite easy to now call the various module methods in all their relevant permutations, check if there are any SQL errors etc. but also check if the data passed to the templates matches the expectations. The only difference is that the tests are running on the CLI, use a different template and use a partially mocked authentication container. Its of course important to remember this, since those parts need special attention given that they are not tested.
One of the issues I ran into was how to manage the actual test data. Since the application internally does not use any transactions, I can wrap all test methods inside a transaction that I rollback at the end. However auto id generators are not rolled back this way. The solution I am using now requires that I have a script that can reset the schema, including all auto increment counters etc. I then have a script that inserts the test data. Since I can easily reset the counters if I ever need to, I can hardcode the auto generated ID's in the tests where ever necessary. However the idea is that the tests are written in such a way that I only need to reset the counters before running the test suite again, if at all, and not between every test method, since this would slow down the test execution way too much, however I might eventually just separate the identity resetting into a separate script I run before every test method.
Dont stay too long with your fixed reallife dataset. In the end it hurts more than going the whole other way. Been there ;)
Well what is the other way in this context? Testing live data sets? Wouldn't I need to code a lot to make my test cases sufficiently dynamic for this purpose? Actually how do you deal with a search that sorts based on relevance? How can you test that with a production data set without essentially re-implementing the relevance sort in the test suite?