How do you test things who's input data has to come from real world (and so differs every time)

Yes, include your tests even if they are about your local data. Your tests document and enforce what you have actually written your code to do -- others wanting to use your code will want to see them, to see for themselves if their data is enough like yours that your code will do what they need, or if perhaps they need to add their own tests, or even discover via their own tests that your code is not going to work with their data.

Next is the technical issues of how you make your tests runnable to someone who doesn't have your environment. If the connections to "the environment" are over HTTP, vcr is pretty great. If they are not, what I would do is: cleanly separate in your code architecture the code that actually connects to your ' 'environment' from the code that validates the data. Have tests that can run without your environment, with some kind of fake/mock environment -- if your code does local_environment.get_data, then you can provide a fake/mock object that provides data from #get_data, but it's just made up sample data for testing. The default suite of tests runs like this. But you could have an env variable RUN_AGAINST_REAL_ENVIRONMENT=1 rake test that triggers the tests against the real environment that you still want too. Most people who don't have an environment just like yours won't run them, but you still can.

That proposed solution is a compromise for sure -- with even better designed code and environment, it might not be neccesary. In a better world, your code might have fewer assumptions about the environment, to be more flexible, and you might indeed have a script to set up the environment. But we don't always have the luxury of that, and it's what I'd do as the best compromise fit for real world kludge that you sometimes need to deal with.

/r/ruby Thread