PostgreSQL anti-patterns: Unnecessary json

It's not that its weird, it's just completely unrelated to what you want to do.

The problem that always comes up is that we want to solve a problem and order data in a way, and then realize that it wasn't what we wanted.

Say that you have a catalog of pictures, and all you know how/can use is the hierarchical file-system. I ask you to find me pictures of a certain subject: I might ask for pictures of kittens, or puppies, or such. So you divide your directory into folders, first with sort of general themes "animals", "machines" and then go into more specifics "cats", "dogs", "clocks", "cars". That's great, until you have a picture that has both a cat and a car on it (maybe the cat is driving) so what you do is you put the picture on one folder, and then symlink it to another one. Easy right?

Except that one day I get a picture of a cat that makes me sad, and I didn't really want a sad picture. So I tell you, we also need to know if pictures are sad, or funny, or cute, or neutral, etc. At this point you can either make a separate theme directory structure per emotion, or a separate emotion per theme. Since you already divided by theme, you decide on the latter (though it doesn't really matter), so now our cat pictures are divided into the emotions they elicit, and as always symlinks are used for things that fall into multiple categories.

Then I start seeing pictures that are like 8 years old. So I tell you, we have to also filter pictures by their age. Now remember: you don't have tags, you don't have search, all you have is the directory structure. So you can now divide the directory by age. Again we decide that the easiest solution is to make the sub-directories within the emotional sub-directories.

It's annoying, though, because this structure was born ad hoc, and it might not be the optimal. Especially when we are told that pictures older than a certain amount should be deleted. Oh I hope your symlinks were actually hard links, otherwise if you delete a picture that has symlinks that are still "valid" you won't be able to access that info.

The relational model is one that solves this. It's the way in which data relates from an "objective" view, that is one that is not related to any specific problem. It's just the data, not a solution, it doesn't solve the part were you actually do something with the data, it just sits there. The wisdom of the relational model is that it's not related to any specific use or problem, it just explains what the data is, not how it's going to be used.

The problem is that we must realize that the way we store data is completely orthogonal to what we are solving. It means that we actually have to sit down and solve a problem that has nothing to do with what we are doing and then solving that. It's not intuitive at all. Ultimately as people create really bad databases and try to alter they end up "recreating" a simpler, more limited, relational system (EAV, and now the Key-JSON blob). It's relearning the lessons of before, except without the hundreds of people researching on it for years.

/r/programming Thread Parent Link - blog.2ndquadrant.com