Kuro5hin.org: technology and culture, from the trenches
create account | help/FAQ | contact | links | search | IRC | site news
[ Everything | Diaries | Technology | Science | Culture | Politics | Media | News | Internet | Op-Ed | Fiction | Meta | MLP ]
We need your support: buy an ad | premium membership

[P]
Java ORM: lessons learned

By mirleid in Op-Ed
Mon Mar 13, 2006 at 01:30:33 PM EST
Tags: Software (all tags)
Software

ORM stands for Object Relational Mapping. At its most basic level, it is a technique geared towards providing an application with an object-based view of the data that it manipulates.

I have been using ORM in the scope of my professional activities for the better part of three years. I can't say that it has been a smooth ride, and I think other people might benefit from an account of my experiences.

Hence this story.


1. ORM: a primer

The basic building block of any application written in an OO language such as Java is the object. As such, the application is basically a more or less large collection of interacting objects. This paradigm works relatively well up to a point. It is when such an application is required to deal with something with a completely different worldview, such as a database, that the brown matter definitely hits the revolving propeller-shaped implement. The term Object-Relational impedance mismatch term was coined to represent this difference in worldviews.

The basic purpose of ORM is to allow an application written in an object oriented language to deal with the information it manipulates in terms of objects, rather than in terms of database-specific concepts such as rows, columns and tables. In the Java world, ORM's first appearance was under the form of entity beans.

There are some problems with entity beans:

  • They are J2EE constructs: as such, they cannot be used in a J2SE application
  • The fact that they require the implementation of specific interfaces (and life-cycle methods) pollutes the domain model that you are trying to build
  • They had serious shortcomings in terms of what could be achieved with them (the whole Fast Lane Reader (anti-)pattern issue and others like it)

The first problem does not kill you, but it also does not make you stronger. In fact, the dependency on a container implies that proper unit testing of entity beans is convoluted and difficult. The second problem is where the real pain lies: the programming model and the sheer number of moving parts will make sure that building a moderately complex, working domain model expressed as entity beans becomes a frustrating and tortuous exercise.

Enter transparent persistence: this is an approach to object persistence that asserts that designers and developers should never have to use anything other than POJOs (Plain Old Java Objects), freeing you from the obligation to implement life-cycle methods. The most common frameworks that claim to provide transparent persistence for Java objects today are JDO, Hibernate and TopLink. At this point, I'd like to clarify that I am not about to discuss the great JDO vs EJBernate 3.0 religious wars, so, don't even think about it.

Hibernate and TopLink are reflection-based frameworks, which basically means that they use reflection to create objects and to access their attributes. JDO on the other hand is a bytecode instrumentation-based framework. While this difference might not seem to be immediately relevant to you, please bear with me: its significance will become apparent in due course.

2. ORM: the 50000 feet view

At a high level, you need to perform the following tasks when using an ORM framework:

  • Design and code your domain model (as in, the POJO, java bean-like classes that represent the data that your application requires)
  • Derive your database schema from the domain model (I can hear the protests: again, please bear with me)
  • Create the metadata describing how the object map to the database and what their relationships are

Assuming that you have sufficiently detailed requirements and use cases, the first step is a well-understood problem with widely accepted techniques available for its solution. As such, we'll consider the first step as a given and not dwell on it.

The second step is more controversial. The easy way to do it is to create a database schema that mimics the domain model: each class maps to its own table, each class attribute maps to a column in the given table, relationships are represented as foreign keys. The problem with this is that the database's performance is highly dependant on how "good" the schema is, and this "straight" way of creating one generates, shall we say, sub-optimal solutions. If you add to that the fact that you will be constrained (by the very nature of the ORM framework that you are using) in terms of the database optimisation techniques that you can use, and that that one-class-one-table approach will tend to generate a disproportionately large number of tables, you realise pretty soon that the schema that you have is, by DBA standards, a nightmare.

The only way to solve this conundrum is to compromise. From both ends of the spectrum. Therefore, using an ORM tool does not really gel with waterfall development, for you'll need to continually revisit your domain model and your database schema. If you're doing it right, changes at the database schema level will only imply changes at the metadata level (more on this later). Obviously, and by the same token, changes at the domain model level will should only imply changes in the metadata and application code, but not on the database (at least not significant changes).

Creating the metadata for mapping your domain model to the database is where it gets interesting. At a high level, the basic construct available to you is something called a mapping. Depending on which framework you use, you might have different types available to you doing all kinds of interesting stuff, but there is a set that is commonly available:

  • Direct to field
  • Relationship

A direct to field mapping is the basic type of mapping that you use when you want to map a class attribute of some basic type such as string directly onto a VARCHAR column. A relationship mapping is the one that you use when you have an attribute of a class that holds a reference to an instance of some other class in your domain model. The most common types of relationship mappings are "one to one", "one to many" or "many to many".

At this juncture, we need an example to illustrate the use of these mappings. Let us consider the domain model for accounts in a bank; you'll need:

  • A Bank class
  • An Account class
  • A Person class
  • A Transaction class

The relationships between them are as follows:

  • Bank has a "one to many" relationship with Account, meaning that a bank holds a lot of accounts, but that the accounts can only belong to one bank. This translates into the Bank class having an attribute of type List holding references to its accounts and that the Account class has an attribute of type Bank holding a reference to the owning Bank instance (this reference is commonly called the "back link" in ORM-speak, because it is used to generate the SQL that will populate the list on Bank: something like SELECT * FROM ACCOUNT WHERE BACK_LINK_TO_BANK_INSTANCE = BANK_INSTANCE_PRIMARY_KEY)
  • Account has a "many to many" relationship with Person, meaning that an account belongs to one or more persons and that a person may have one or more accounts. In code terms this translates into Account having an attribute of type List holding references to instances of Person and Person having an attribute of type List holding references to instances of Account. It should be noted that this relationship has database schema side effects: it normally requires the creation of a relation table that holds the primary keys of Account and Person objects.
  • Account has a "one to many" relationship with Transaction (see the relationship between Bank and Account)
  • Transaction has a "one to one" relationship with Account, meaning that a transaction is to be executed against a single account (this is a simplification for the sake of this example). In code terms, this means that Transaction has an attribute of type Account that holds the reference to the Account instance it is to be performed against.

Please note that the terminology that I am about to use is somewhat TopLink-biased, but you should be able to find out the appropriate Hibernate or JDO equivalent without too much trouble. Anyway, once you have figured out the relationships between classes in your domain model, you need to create the metadata to represent them. Prior to Tiger (JDK 5.0), this was typically done via a text file containing something vaguely XML-like describing the mappings (and a lot of other stuff). If you are lucky, you'll have access to a piece of software that facilitates the creation of the metadata file (TopLink provides you with something called the Mapping Workbench, and I understand that there are Eclipse plug-ins for Hibernate).

With TopLink and Hibernate, once you have the metadata file you are away. If you are using JDO, there is an extra step required, which is to instrument your class files (remember that JDO uses bytecode instrumentation), but this is relatively painless, since most JDO implementations provide you with Ant tasks that automate it for you.

3. IRL ORM

What follows is an account of my experience (and exploits) with TopLink. Some of the issues encountered will be, as such, somewhat specific, but I think that most of them are generic enough to hold for most ORM frameworks.

The first problem that you face is documentation. It is not very good, ambiguous, and only covers the basics of the framework and its use. Obviously, this problem can be solved by getting an expert from Oracle (they own TopLink): I guess that that sort of explains why the documentation isn't very good.

The second problem that you face is that if you are doing something real (as in, not just playing around with the tool, but actually building a system with it), you typically have more than one person creating mappings. You would have thought that Oracle would have considered that when creating the Mapping Workbench. They did not. It is designed to be used by one person at a time, and there's no chance that you can use it in a collaborative development environment. Additionally, it represents the mapping project (the Mapping Workbench name for the internal representation of your mapping data) in such a huge collection of files that storing them in a VCS is an exercise in futility. So, mapping your domain model becomes a project bottleneck: only one person at a time can edit the project, after all. As such, the turnaround time for model changes and updates impinges quite a lot on the development teams, since they can play around with the domain model in memory, but they can't actually test their functionality by trying to store stuff to the database.

When you finally get a metadata file that holds what you need to move your development forward, and you run your code, you start receiving angry e-mails from the project DBA, reading something like "What in the name of all that is holy you think you are doing to my database server?"

At first, you don't know what he is talking about: you are only reading a couple of instances from the database, updating something and saving it. You decide to look at the SQL logs, thinking that the guy must be having problems at home or something. That is when you receive the shock of a lifetime: when you read those two instances, the ORM framework is actually reading half of the database into memory, giving the database server the machine equivalent of grand mal in the process.

After some digging around, you realise that that happens because the ORM framework will not, by default, lazy load relationships. This means that if the metadata contains an association between two classes, and you read in one instance of the association's source class, the framework will read in the target of that association as well so as to resolve it. It will then happily proceed to apply the same principle to the class that it just read in (and which you did not ask for, since you weren't even planning to de-reference that association in your code), ad nauseam.

It's "back to the mappings" time, at this point. The ORM framework offers lazy loading, but it can't be blanket-turned on. It needs to be switched on, one association mapping at a time. You remember what I said about the creation of the mappings being a bottleneck? It just got worse: you are now receiving death threats from the people that need to go through your entire domain model and switch it on. You only ask for a couple of them to be turned on, but the mappings people decide to be proactive, and turn it on for every single association that they can find: they don't want to have to go through this stuff again, so, they might as well bite the bullet right here and now and get it over with.

Eventually, you get the new metadata, you run your code, and the net result is that e-mail tone from the DBA goes from angry to downright shitty. There's no pleasing some people, you think while you scan the SQL logs. That's when it hits you. When you are finally able to breathe again, you seriously consider suicide. Since the mappings team decided to be proactive (read: overzealous), all the mappings are now lazily loaded, and the problem is now that every time you (or somebody else, for that matter) follow an object reference or cycle through a list, there's a database hit.

You guessed it: "back to the mappings" time. This time, though, in order to make a call on whether a relationship should be lazily loaded, you need to trawl through all the use cases, involve a bunch of people, and come up with the most likely usage and access scenarios for each of the classes in your domain model. That takes a lot of time and money and the PMO is not amused. Eventually, the problem is sorted to the point that the abuse that you get from the DBA and the mappings team recedes to background-noise level (the occasional "arsehole" muttered as you walk past one of them in the corridor).

This is when the people managing the servers come to you with a purchase order for enough memory to comfortably run SkyNet in. Trying not to let the tick in you left eye show, you politely ask what this is about. For your sins, they tell you. They have been monitoring the performance and memory usage, and they figure from their projections that that is what you need to see you through the next 6 months. You reply that you need to do some due diligence, and that you'll get back to them. After some investigation, you realise that they are right. The problem is that reflection-based ORM frameworks figure out what needs to be flushed to the database (as in, what you created or updated, and what SQL needs to be generated) by comparing a reference copy that they keep against the instance that you modified. As such, and at the best of times, you are looking at having twice as many instances of a class in memory as you think you should have. The problem gets further compounded by the fact that you are using a multithreaded architecture, so, it is a case of each of the threads holding at least twice as much memory as the code running in them directly manipulates.

(Side note 1: It's actually worse than that as far as TopLink is concerned. It actually holds 2 reference copies.)

(Side note 2: JDO is much more efficient in this respect. When they instrument the classes, they add the equivalent of boolean "dirty" attributes for each attribute that you originally had. So, even though the actual class that is being run is somewhat fatter than the original, it goes nowhere near having multiple copies of it in memory)

At a high level, the only way that you can get around this is to actually figure out which classes are read-only from the application standpoint. TopLink will stop its copying process at the point that it finds a reference to a class that is thus marked, which should save you a considerable amount of memory. This basically amounts to identifying which classes are reference data, and which are routinely modified by the application (leaving aside the issue that you need your application to be able to modify its reference data in some way). You guessed it: meetings with everybody and their dog first, "back to mappings" time later. And since nothing comes for free, the downside of marking some classes read-only is that, once they get into the ORM framework's cache, they won't be read from the database again. Well, you say, that is got to be good for performance, right?

Right. But did I mention that we run in a clustered environment (two boxes, two application server instances, live-live)? And that reference data is sometimes modified? If you put all that together, you get some quite dysfunctional scenarios whereby one application server instance's view of the world is quite different from the other, where reference data is concerned. And none of it is predictable, it depends on whether one of them had already read that particular piece of data and the other had not, which processed the reference data change, and whether that night is going to be a full moon.

Oh. Did I mention that the Operations Support team have been on your back for a number of weeks now, basically saying with increasing levels of venom that your application is, from their point of view, unsupportable? Well, they kind of have a point: they are used to fixing problems by firing up SQL*Plus and tweaking a couple of database tables. They just can't do it now: the schema is not human-readable, for it is a reflection of the domain model and the ORM framework's requirements. Adding insult to injury, surrogate keys are used consistently everywhere. This means that every primary (and, by extension, foreign) key is actually a very long number, meaning nothing to anybody unless they know which domain class was being manipulated at the time that the problem occurred. The final nail in your coffin is the fact that, since your domain model uses inheritance liberally (as any self-respecting object model geared towards behaviour and code reuse should), you have spurious table with cryptic looking keys all over the shop. Adding to that, they just can't change some reference data value in the database. The application might never know that it had been changed (see above).

At this point, life in the streets having breakfast out of a bottle wrapped in a brown paper bag starts becoming your definition of heaven.

4. Looking to windward

Using ORM is difficult. However attractive the notion of having your application communicate with the database using a paradigm that it is very familiar with (object construction, destruction and modification), there are a number of downsides. In general, these are the things that you consider before embarking on the ORM bandwagon:

  • It does not make the project cheaper. If you are lucky, it will cost the same as if you implement your code using straight JDBC and something like the DAO pattern. If project price is your selling point for ORM, YFI.
  • It is not more performing than carefully tuned JDBC. If performance of the solution is your selling point for ORM, as above.
  • It is not a silver bullet for technical inadequacies in the project team. If you think that you can dumb down the workforce (as in, hire more inexperienced people because, after all, all they need to know is bog standard Java, none of that JDBC and DBMS skills BS), as above.

On the other hand, there are a number of things that you gain from using ORM:

  • It makes your code simpler to understand (well, assuming that your domain model is worth a damn). Additionally, if you do it right, you write less of it. This means that maintainability of a system created using ORM should be significantly higher than one created using straight JDBC calls.
  • It gives you a number of technical options that would be quite complex to code from scratch. I am talking about stuff like lazy loading, predictable SQL generation order (important in deadlock avoidance for clustered applications) and query by example.
  • It gives you database independence. Your application does not even know what the database is; its only connection to it is via the ORM framework and the JDBC driver. This means that your development environment can be a cheaper version of the production environment (as in, using an open source database for development would save you all that Oracle development licenses cash), and that you can confidently upgrade database versions without expecting to have to modify the application at all.

In short: if you have to make a decision either way, carefully examine the problem, and make informed decisions. Simple, really.

Sponsors

Voxel dot net
o Managed Hosting
o VoxCAST Content Delivery
o Raw Infrastructure

Login

Related Links
o Object-Rel ational impedance mismatch
o Fast Lane Reader
o JDO
o Hibernate
o TopLink
o JDO
o EJBernate 3.0
o surrogate keys
o Also by mirleid


Display: Sort:
Java ORM: lessons learned | 123 comments (78 topical, 45 editorial, 0 hidden)
+1 FP....I program... (2.00 / 7) (#4)
by terryfunk on Sat Mar 11, 2006 at 12:12:15 PM EST

in ruby and got to wondering if there ORM for ruby. Did a quick google search and found Nitro and Og. So because of your article, I may have found a solution to several problems this will solve on a couple of projects I am working on.

Thanks!

I like you, I'll kill you last. - Killer Clown
The ScuttledMonkey: A Story Collection

ActiveRecord? (none / 1) (#59)
by birdsong on Mon Mar 13, 2006 at 03:38:25 PM EST

ActiveRecord is the ORM to use for Ruby on Rails. I'm not sure if ActiveRecord is available for Ruby in general.

[ Parent ]
ActiveRecord (none / 1) (#89)
by richieb on Tue Mar 14, 2006 at 03:31:53 PM EST

ActiveRecord can be used outside rails. Here is an in depth article on Active Record.

...richie
It is a good day to code.
[ Parent ]

wrapping so easier than mapping (none / 0) (#111)
by echarp on Sun Mar 19, 2006 at 08:51:38 AM EST

I can't believe I had to put up with so much crap, while active record is so simple. It in fact seems *too* simple. Any body has found its drawbacks yet? -- http://leparlement.org

[ Parent ]
So how does ORM work with transactions? (1.50 / 2) (#6)
by ksandstr on Sat Mar 11, 2006 at 03:35:33 PM EST

The database kind. You know, the kind that most MySQL people to this day aren't convinced that they're necessary, and an even greater proportion doesn't use them correctly.

Transaction brackets I can already guess; we've been using a method based on anonymous subclassing for quite a while at work. (Anonymous delegates would probably work just as well in C#.) Simply disallowing write operations when the thread is not in a transaction bracket through some sort of a checking protocol and making serializable the default isolation level would result in reasonable safety; use of the unsafe bits can be grepped for in the source.

Assuming that something like this exists in the top ten ORM frameworks, how does one do transaction restarts in the face of, say, a serialization failure or when chosen to be the losing party in a deadlock? When does the error get reported if and when a database constraint is violated? Does anything protect against dumbass junior programmers who don't understand the "no side effects, your code may be called multiple times" rule? Does any ORM framework provide a method to pass hints on an impending read-modify-write cycle (i.e. SELECT ... FOR UPDATE)?

Pardon my skeptical tone. I've thought about rolling my own ORM layer in C# for a while now, and I've come to think that trying to go too far overboard on the ORM stuff is indeed, or very very close to, bugfuck insane. I mean things like coming up with an object model first and letting the Tool Of The Day write your schema for you, or ignoring SQL constraints because the tools don't support them. (My current approach is to tag classes and fields with table and column names, thus making the OO side of the system slave to the database schema, As It Should Be.)

My conclusion, far as I've gleaned from this article and others not unlike it, is that ORM is just another rung in the Java ladder of thick mittens for the fresh-out-of-college junior programmers. I have to wonder though, is it worth the cost of pissing off the server and database administrators? One can learn SQL and the fundamentals of how a SQL database works with regard to the application in about a year or two. How long does it take to learn J. Random Object-Relational Mapping System with all its quirks, including how to write code that is at least as readable as your average JDBC code?

Fin.

Transactions in ORM... (3.00 / 2) (#10)
by mirleid on Sat Mar 11, 2006 at 04:02:47 PM EST

Well, this is sort of a complicated matter, and I did not go into any sort of detail in the piece on that front because it would have made even longer that it already is.

If you are using ORM in the scope of a J2EE container, you normally use CMP (container managed transactions). This means that it is the container's responsibility to start and then subsequently commit or rollback the transaction depending on what happens during processing. All the ORM frameworks that I have mentioned integrate with the container's transaction management (normally via the TransactionController's callback interface). This means that, at the start of each transaction, the ORM framework registers a generic callback handler with the container's Transaction Controller and then, at commit time, when that callback is executed, it proceeds to identify what was created, modified and deleted in the scope of the transaction in question so as to be able to generate the required SQL. This actually makes the database access profile to look like "couple of querys, a bit of silence with a subsequent storm of insert/update/delete statements" when the transaction actually commits.

TopLink actually gives you mechanisms to add hints to the SQL (if the target database is Oracle) that is generated, but that is not a standard ORM feature, AFAIK.

Generally, acquiring hard locks (such as the ones acquired via SELECT...FOR UPDATE) on the database is frowned upon in the J2EE development world. You normally make do with optimistic locking.

Transactions restarting is normally handled by the container itself: if you use asynchronous invocation of business processes via Message Driven Beans (as in, activated by JMS messages), and the transaction rolls back, then normally the container offers you configuration options that allow you to configure the redelivery of the message. This means that if you transaction fails because of an Optimistic Locking exception, the message is redelivered by the container after some grace period and reprocessed. It normally succeeds on the second attempt. You can also configure a redelivery threshold, whereby if the message is delivered n times without committing, then the message is moved to some error queue for inspection by human operators.

Chickens don't give milk
[ Parent ]
With Hibernate... (3.00 / 5) (#30)
by skyknight on Sun Mar 12, 2006 at 02:49:02 PM EST

you open a session, which means getting back a session object from a factory, with which you will associate new objects and from which you will load existing objects. Such object manipulations are surrounded by the initiation and termination of transactions, for which you can specify the isolation level. I don't know about other frameworks, but Hibernate does take transactions seriously.

ORM is definitely not a tool that people should use if they don't have a solid understanding of relational database technology and issues, or perhaps more generally an understanding of computer architecture. Rather, it should be used by people who have substantial experience writing database applications and have after much hard-won experience gotten tired with the grinding tedium of manually persisting and loading objects to and from relational databases. You need the understanding of relational databases so that you can get good performance from an ORM, and without it you'll have the horrible performance that the original piece characterizes in its anecdotes.

I've been dealing with the ORM problem for 5+ years, with a brief escape for grad school. I've written raw SQL. I've used home grown ORM frameworks written by other people. I've written my own substantial ORM frameworks in each of Perl, Python and Java. I've actually done it twice in Perl, with my latest instantiation being pretty good, and yet still being dwarfed in capability by Java's Hibernate. As such, I've recently started learning Hibernate. Hibernate is extremely complicated, and most certainly not for the weak of heart or for a junior programmer with no relational database experience, but it is also extremely powerful. In learning Hibernate I've been very appreciative of many of the hard problems that it solves, problems with which I have struggled for years, in many cases unsuccessfully.

Mind you, even with Hibernate, ORM is still ugly. The fact that you need to persist your objects to a database is largely an artifact, an accidental component of your development process stemming from limitations in today's technology, not an intrinsic facet of the thing that you're trying to accomplish. Also, ORM is inherently duplicative, in that you end up defining your data model twice, as well as a mapping between the two instantiations of it. Such is life... It would be nice if we had "object servers", as well as cheap and performant non-volatile RAM, but we don't, and we aren't going to have such things for well over a decade at least, not in reliable versions anyway.

As someone who has slogged through implementing his own ORM on a few occasions, I can say that it is a great learning experience, but if your goal is a production quality system, then you should probably use something like Hibernate. The existence of Hibernate alone is probably a strong argument for using Java when writing an application that requires complex ORM. I don't know that C# has solved the problem, but I haven't looked, honestly.



It's not much fun at the top. I envy the common people, their hearty meals and Bruce Springsteen and voting. --SIGNOR SPAGHETTI
[ Parent ]
EJB 3.0 helps here (3.00 / 2) (#63)
by ttfkam on Mon Mar 13, 2006 at 08:43:32 PM EST

New versions of Hibernate implement the EJB 3.0 EntityManager interface. So instead of separate XML schema definition and mapping files, you simply annotate the POJOs and go.

The downside is that persistence info is in your Java source. The upside is, well, that persistence info is in your Java source.

And using EJB 3.0 means that you can swap between Hibernate in standard J2SE apps, JBoss, Glassfish, and the others simply.

My $0.02

If I'm made in God's image then God needs to lay off the corn chips and onion dip. Get some exercise, God! - Tatarigami
[ Parent ]

Hoooray!!! (none / 0) (#73)
by mirleid on Tue Mar 14, 2006 at 04:51:22 AM EST

Now, would you please try to explain that to trhurler? Help greatly appreciated...

Chickens don't give milk
[ Parent ]
and Spring makes it even better (none / 0) (#114)
by Lars Rosenquist on Mon Mar 20, 2006 at 09:35:31 AM EST

By using the TransactionProxyFactoryBean and the HibernateTransactionManager to manage your transactions. Use the OpenSessionInViewInterceptor to support lazy initializing outside of a Hibernate Session scope (e.g. web context).

[ Parent ]
But we do... (none / 0) (#115)
by ckaminski on Mon Mar 20, 2006 at 10:52:26 AM EST

Native object storage in ObjectStore http://www.objectstore.com, Java and C++.  

[ Parent ]
Library code means some things are left to you (none / 1) (#62)
by Ufx on Mon Mar 13, 2006 at 07:36:06 PM EST

Assuming that something like this exists in the top ten ORM frameworks, how does one do transaction restarts in the face of, say, a serialization failure or when chosen to be the losing party in a deadlock? When does the error get reported if and when a database constraint is violated?
This should, without a doubt, be left up to the application developer or the developer in charge of writing your data access layer. Our strategy is to catch a few well-known exceptions and automatically resubmit the transaction for these. With our system, the transaction batch can simply be resubmitted to an execution method, so it's literally a generic one-liner to do a restart. YMMV with other systems.

Some exceptions represent legitimate errors that need to be passed up to calling code - The ORM shouldn't be doing anything aside from throwing an exception if there's a serialization failure, for example. Your libraries should provide hooks so that you can build your own error logging.

Still other kinds of exceptions should be passed up to the application layer because they represent a decision the user must make. Concurrency violations might ask the user to overwrite or abort their changes.

[ Parent ]
Also, what the hell? (1.33 / 3) (#7)
by ksandstr on Sat Mar 11, 2006 at 03:43:33 PM EST

Reading through the article a second time, this little nugget jumps out.

Bank has a "one to many" relationship with Account, meaning that a bank holds a lot of accounts, but that the accounts can only belong to one bank. This translates into the Bank class having an attribute of type List holding references to its accounts and that the Account class has an attribute of type Bank holding a reference to the owning Bank instance (this reference is commonly called the "back link" in ORM-speak)

Does this mean that in addition to the back link (which one would naturally use in an ordinary SQL database schema) there is another table besides Bank and Account which establishes a one-to-many relationship from 1 bank to N accounts, even though the same information existed in the Account "bank_id" back link?

If so, ewww! No wonder you pissed the DBA off -- that's duplicated information right there, and a really nice spot for referentially inconsistent data to sneak in (barring constraints).

Fin.

Nope... (none / 0) (#8)
by mirleid on Sat Mar 11, 2006 at 03:48:21 PM EST

The only association type that has database schema implications is the "many to many", which requires you to create a relation table to hold the (source foreign key, target foreign key) pairs that represent the relationship at database level. "one to many" relationships only require the "back link" column in the target table...

If you think that the sentence is ambiguous, could you please give some pointers as to how it could be improved?

Thanks for reading...

Chickens don't give milk
[ Parent ]
(Should've made that one editorial.) (none / 0) (#9)
by ksandstr on Sat Mar 11, 2006 at 03:57:02 PM EST

Oh well, mea culpa.

You might point out that the back link is how the one-to-many relationship is encoded, or something like that. Now it seems as if the one-to-many relationship, because it's mentioned next to a List field in the Bank class slash table and seemingly separately from the back link, were distinct from Account's back link in the schema. (This is just another instance of me thinking "it's written in Javur, Javur is stupid, therefore the program is stupid and produces redundant schema components"...)

Apart from that, no, I haven't got further suggestions.

[ Parent ]

Added some stuff...Please check...[] (none / 0) (#11)
by mirleid on Sat Mar 11, 2006 at 04:09:34 PM EST



Chickens don't give milk
[ Parent ]
Mileage varies (2.00 / 3) (#18)
by Scrymarch on Sat Mar 11, 2006 at 10:21:53 PM EST

I've used Hibernate on a few projects now and been pretty happy with it.  I've found it a definite productivity increase on raw JDBC - there's simply less boilerplate, and hence less stupid typo errors.  The overwhelmingly most common class -> table relationship is 1:1, so you cut out a lot of code of the

account.setAccountTitle( rs.getString(DataDictionary.ACCOUNT_TITLE) );
account.setAccountBalance( rs.getInteger(DataDictionary.ACCOUNT_BALANCE) );
collection.add(account);

... variety.

It does irritate me that you end up with HQL strings everywhere, but you ended up with SQL strings everywhere before, so shrug.  Really the syntax should be checked at compile time, instead of implicitly by unit tests. Such a tool shouldn't even be that hard to write, but I guess I'm lazy. I'd be uneasy letting devs near HQL without a decent knowledge of SQL.  For mapping, we used xdoclet or hand editing the result of schema-generated xml files. Usually the same developer would be adding tables or fields, the relevant domain objects, and required mappings.

Now I think about it though, every time I've used ORM I've laid out  the data model first, or had legacy tables I had to deal with. Inheritance in the domain model tended to be the interface driven variety rather than involving a lot of implementation inheritance. Relational database have a pretty good track record on persistance, maybe you could let them have a little more say.

We did still get burnt a bit by lazy loading. We were working with DAOs which had been written with each method opening and closing a session. So sometimes objects would make it out to a higher tier without having the right dependant detail-style attributes loaded, which throws a lazy loading exception. We got around this by moving the session control up into the business layer over time. This is really where it should have been in the first place, not being able to go:

session.open(); // or txn.start or whatever
data
data
think
data
think
session.close()

is kind of crazy.

These projects were with small teams on the scale of half a dozen developers. Sounds like you were on a bigger project, had higher interpersonal communication overheads, etc. Just to put all my bias cards on the table, I gave a little yelp of pain when you said "waterfall".

Waterfall... (3.00 / 3) (#24)
by mirleid on Sun Mar 12, 2006 at 01:59:06 AM EST

The client that I am (still) working for does not use Waterfall, but neither are they too adept at using Iterative. You end with a bit of a mess, as in, most of problems of either, and none of the benefits.

You sound like you are using Hibernate in a J2SE environment, or, if you are using a container, you are using BMP. We have always used CMP, so, by definition, the session is started by the container at the very top of the stack. This means no lazy loading exceptions at all (not to mention that TopLink does not throw lazy loading exceptions a la Hibernate).

Our domain model is quite complex; in fact, it is an adaptation of a commercially available standard model for finance and trading organizations. The most common relationship type is "one to many", with a few "many to many" thrown in for good measure, so, you can guess what that means database schema-wise.

I take your point about HQL; the thing is, TopLink does not have something like that, their expression language is fully programmatic. You build expressions by creating an expression object that you can nest/combine with others. So, in that sense, you don't have JDBC-like strings all over the shop, but you some pretty cryptic pieces of code that build query criteria.

Chickens don't give milk
[ Parent ]
Splashing around in circles (3.00 / 2) (#47)
by Scrymarch on Sun Mar 12, 2006 at 08:49:02 PM EST

Yeah, it was Hibernate with J2SE or BMP (different projects). CMP would skip that problem.

If you have inheritance hassles you can just not tell the persistence mechanism about it. Views can even help you here. Eg SavingsAccount may be a subclass of Account, but as far as TopLink is concerned it's a separate entity. It's a bit hacky I guess, but ORM is a work in progress.

I think I'd prefer HQL to programmatic query building.  (In fact you can do programmatic query building in Hibernate, but I chose to avoid it.) HQL just makes it painfully obvious you should be able to check the syntax at compile time.

Thanks for the tech article.

[ Parent ]

HQL (none / 0) (#113)
by Lars Rosenquist on Mon Mar 20, 2006 at 09:22:07 AM EST

You don't have to use HQL if you don't want to by using Criteria and Restriction objects.

[ Parent ]
Sessions, do you like them? (none / 0) (#122)
by rbygrave on Sun Jan 14, 2007 at 01:55:15 AM EST

Hi,

I was wondering what you (and anyone else) thinks of Sessions in ORM (Hibernate Session, JPA EntityManager etc).

In short I created an ORM that was like JPA/Hibernate but without requiring sessions. This is my opinion makes it much easier to use and understand.

Automatic lazy loading, and save() delete() instead of the merge() flush() type mechanisims in JPA and Hibernate.

If you'd like to check it out and send back feedback I'd be very interested.

Ebean ORM at  http://www.avaje.org

Thanks, Rob.

[ Parent ]

JAVA! (1.11 / 9) (#21)
by k31 on Sun Mar 12, 2006 at 12:01:40 AM EST

I used to work doign jJAVA development. If you must do something for a salary, then it is not as mind-numbing as I imaging mining would be, but it JAVA is a tool to sell Sun servers, and thus does not even come close to it's potential as a platform.  UnrealScript and UnrealEd, and of course, Unreal Tournament, are a much better example of software development languages, tools, and platforms/vms/applications which are ... better than the traditial plain-old-C / procedueral programming paradigm stuff.

Your dollar is you only Word, the wrath of it your only fear. He who has an EAR to hear....
You know Java (none / 0) (#100)
by mettaur on Wed Mar 15, 2006 at 08:20:13 AM EST

So why can't you learn English?
--
[Applying business theory to trolling]
[ Parent ]
+2, I love coffee! (1.10 / 10) (#25)
by nostalgiphile on Sun Mar 12, 2006 at 10:39:35 AM EST

My impression of a K5 null0 is over. I will sit back down now. But what did you say an ORM was again?

"Depending on your perspective you are an optimist or a pessimist[,] and a hopeless one too." --trhurler
-1, programming $ (1.07 / 14) (#26)
by Psychology Sucks on Sun Mar 12, 2006 at 12:40:40 PM EST



Imagine a world... (2.33 / 6) (#31)
by skyknight on Sun Mar 12, 2006 at 03:04:45 PM EST

where there is cheap and performant non-volatile RAM, such that we can do away with hard drives and keep entire databases in memory. Further imagine that there exist "object servers", namely database engines that allow applications to dynamically bind to and detach from regions of object graphs, perhaps manipulating them, and doing so in a transactional fashion. I suspect that there would be no place for ORM-ish technology in such a world.

Unfortunately, while we'll almost certainly get there eventually, it's probably a long way off, say ten, fifteen or twenty years. For the time being, anyone who wants to write a complex multi-user application that requires persistence and deals with a non-trivial amount of data is stuck pushing and pulling data from a disk as he goes from object representation to relational database representation of his domain model. It is an odious thing to have to do, but unavoidable. I suspect that one day I'll be telling the next generation of programmers about ORM and eliciting a reaction much like the one my father gets whenever he tells me about working with punch cards.

This makes me somewhat leery of investing too much mindshare in ORM, as ultimately it will prove to be a temporary problem, but it's probably going to be with us for such a long time, and fill such an important niche for the kind of work that I often do, that I have little choice but to embrace it. This, of course, is just a specific instance of the more general problem of making one's knowledge timeless. You need to know enough about today's systems to be useful, while simultaneously not indulging overly much in knowledge that will eventually become obsolete.



It's not much fun at the top. I envy the common people, their hearty meals and Bruce Springsteen and voting. --SIGNOR SPAGHETTI
Er (2.66 / 3) (#36)
by trhurler on Sun Mar 12, 2006 at 03:50:57 PM EST

Really, huge SRAM arrays are not THAT expensive these days. And if you really want to, you can serialize objects to disk instead of using a database - but nobody does it that way because doing it halfway makes it hard and doing it balls out is outside the mainstream, which means "dangerous" if you're a corporate IT manager.

The problem is not technology, but the ways people choose to use it. Which are overly conservative.

--
'God dammit, your posts make me hard.' --LilDebbie

[ Parent ]
They are indeed approaching affordability... (none / 1) (#37)
by skyknight on Sun Mar 12, 2006 at 04:10:31 PM EST

but you'll still need software to leverage it, and given how long it took to create robust relational database systems, I suspect that we're at least a decade away from robust "object servers". Also, I'm not sure what you mean by "serialize objects to disk". That sounds like what Java provides with Serialieable, or Perl provides with Data::Dumper, both of which are nice for very simple persistence tasks, but completely fall over when you need to have multiple users and the entire object graph is so gigantic that you'd only ever want a small piece of it. Did you mean something else? What we really need is a way to bind to and detach from an object graph without constantly having to marshal and unmarshal it. As far as I know, there is no such technology in existence, at least not at a sufficient level of robustness to survive industrial use.

It's not much fun at the top. I envy the common people, their hearty meals and Bruce Springsteen and voting. --SIGNOR SPAGHETTI
[ Parent ]
Well, (none / 1) (#41)
by trhurler on Sun Mar 12, 2006 at 05:22:48 PM EST

First of all, you need to think closely about what you mean by "object server." Even if you use a gigantic SRAM array, the problems related to multiple users and large scale are basically identical to what they'd be if you write to disk; the only difference is that with disk you'll be using memory caches to speed things up. I would agree that commonly available serialization mechanisms are not robust enough, BUT:

It does not follow that they can't be or that the needed technology is massively complex. In fact, I think it could be fairly simple. Really, all you need is common serialization with software transactional memory and memory mapped disk or an SRAM array. You can avoid constantly serializing and unserializing objects by keeping an object cache or by arranging your in memory object format so that objects never need anything more than alignment padding and byte ordering changes when they come from storage or both.

Naturally this means you need some sort of object ID setup and an "object table" that corresponds conceptually to page tables in a virtual memory system. Not a big deal.

This would all be smoother and faster with OS support, but that's not as hard as people act; most people think it is hard because they've never worked on an OS before, but other than a few stack constraints and a need to be more careful to keep debugging time somewhat reasonable, it isn't all that different really. (Than the way good programmers do other stuff; bad programmers are bad at everything:) If you don't want to do OS modification, you could just arrange to manage a disk device yourself the way Oracle does. A bit more complexity in your system, but you can hide all of it in a fairly small section of the code.

--
'God dammit, your posts make me hard.' --LilDebbie

[ Parent ]
ObjectStore... (none / 0) (#116)
by ckaminski on Mon Mar 20, 2006 at 11:05:05 AM EST

http://www.objectstore.com

Almost 20 years old.  About as robust as you can get.

[ Parent ]

you mean a world (2.75 / 4) (#48)
by creativedissonance on Sun Mar 12, 2006 at 11:19:52 PM EST

where you have titanic servers you can never reboot?

sounds like a great idea to me


ay yo i run linux and word on the street
is that this is where i need to be to get my butt stuffed like a turkey - br14n
[ Parent ]

Well, maybe (3.00 / 5) (#51)
by jolly st nick on Mon Mar 13, 2006 at 10:33:31 AM EST

but while a huge amount of work has gone into RDBMS engines with respect to getting data on and off of slow persistent devices, RDBMS aren't really optimal for that either. It's necessary to what they are for, but contrary to common belief it is not what they are for.

The reason that the relational model is important, is that it separates the representation of persistent data from the application. In short -- it promotes reuse of data accross applications. Of course, reuse of both data and logic would be the goal of a persistent object store, and there is nothing that can be represented in a relational model that cannot likewise be represented in an object store.

However, because the object graph is more representationally powerful doesn't mean it's automatically better.

There are two advantages to the relational model. The first is that the data is separated from the application logic, which is the exact opposite idea of an object representation. However I feel this facilitates data reuse across applications. Most class designs, while ideally not tied to a specific application, are certainly tied to a specific pattern which may not be what we want or need in every future context for using the data. We'd end up needing some sort of object->object data mapping, only it would be specific to the pairing of the original application to the new application, as opposed to being generalized from one programming model to another.

The second reason is that relational calculus and algebra are simple, regular and closed. All operations on relations (or tables if you prefer) produce other tables, allowing arbitrary data sources to be combined and provide results which can be further worked on. This is very powerful when optimization of complex and unforseen operations combining data from different sources.

By contrast, there are no predefined operations that are guaranteed to work objects of varying classes, other than equality and identity. Thus persistent object models do not have the property of closure, which is critical to long term, cross application storage of data.

Now it so happens that many types of applications may have no need for this. There is no reason that a web log must be stored in a relational database; a persistent object graph would serve as well or better. However when it comes to business information, cross application re-use is critical, and if the relational model does not surivive long into the future, it will be replaced by a different, likewise constrained model with similar properties, and programmers using an object model will find themselves mapping to that.

[ Parent ]

seperate issues (none / 1) (#52)
by balsamic vinigga on Mon Mar 13, 2006 at 12:15:58 PM EST

there's nothing about disc storage that forces the relational database model.

There's no reason you can't write out objects to disc.

OMG YOU'VE NEVER HEARD OF OBJECT DATABASES?

relational database models are still the popular choice because of their time tested performance for doing complicated queries and such. Simplified schemas, easy browsability by no-programmers..  etc etc etc.

---
Please help fund a Filipino Horror Movie. It's been in limbo since 2007 due to lack of funding. Please donate today!
[ Parent ]

fyi: (1.00 / 6) (#53)
by mercury poisoning on Mon Mar 13, 2006 at 01:30:55 PM EST

Your vote (1) was recorded.
This story currently has a total score of 70.

You're the straw that broke the camel's back!
Your vote put this story over the threshold, and it should now appear on the front page. Enjoy!

Some people will hate you for that...[] (none / 1) (#54)
by mirleid on Mon Mar 13, 2006 at 01:35:03 PM EST



Chickens don't give milk
[ Parent ]
Hey! But I was the ... (none / 1) (#56)
by terryfunk on Mon Mar 13, 2006 at 01:44:03 PM EST

first one to vote FP! Excellent article too.

I like you, I'll kill you last. - Killer Clown
The ScuttledMonkey: A Story Collection

[ Parent ]
Congrats on FP! /nt (1.20 / 5) (#55)
by terryfunk on Mon Mar 13, 2006 at 01:41:41 PM EST



I like you, I'll kill you last. - Killer Clown
The ScuttledMonkey: A Story Collection

Ta...[] (none / 0) (#57)
by mirleid on Mon Mar 13, 2006 at 01:46:22 PM EST



Chickens don't give milk
[ Parent ]
Spring Framework data access classes... (1.00 / 2) (#58)
by claes on Mon Mar 13, 2006 at 01:50:14 PM EST

seem to help a lot.

It seems as if every time I start a new project I go around and around with exactly the same issues -- how "close to the database" or "how close to the object model" to put the interface.

Currently I think if you know something about databases you're better off starting with a schema that really, truly, reflects the actual "business objects" of your application. Then wrap this in a couple of DAO (buzzword, blech) classes (The Spring Framework classes help here), and deal with it.

Once you get the schema right, it tends to stay put. In our current major project I end up actually doing something at the SQL> prompt about once a month or so, mostly for debugging. That tells me that the model is good -- matching both the underlying logic, as well as being easy to get at from the Java classes.

To go up a level, there are a couple of things like ORM (HTML page generation is another) where there are constantly new frameworks, new buzzword, and "better" ways to do things. My feeling is that when this goes on for a while, it just means that the problem is just plain hard, and if magic frameworks haven't fixed it in the past, they arent' going to fix it in the future.

Thanks for the write up.

-- claes

Hibernate (none / 1) (#60)
by bugmaster on Mon Mar 13, 2006 at 05:22:23 PM EST

I've had some very happy experiences with Hibernate. It lets you specify lazy-loading strategies, not-null constraints, caching strategies, subclass mapping (joned-subclass, union-subclass, table-per-subclass-hierarchy), associations, etc... All in the mapping file. And it has the Hibernate Query Language that looks, feels and acts just like SQL, but is about 100x shorter. Hibernate rules.
>|<*:=
Comments from a successful ORM-using architect (2.50 / 6) (#61)
by Ufx on Mon Mar 13, 2006 at 06:10:35 PM EST

I've had the pleasure of using an ORM for multiple projects, and my experiences have compelled me to reply to your article.  Our platform is .Net, which offers some more flexibility in the reflection space due to generic type information being preserved.  This is particularly useful when figuring out what type your lists contain.

First, let me first state the requirements of one of my projects.  There is a client application that must support the following:
1) Occasional disconnections from the master database, usually caused by an internet service interruption.  While disconnected, all changes must be stored, and as much work as possible must be allowed to continue.  Upon reconnection, changes must be merged with the master database and the user notified of any conflicts.
2) Extremely granular security of virtually every user-updateable field.
3) General safety against malicious messages.  Par for the course here - nobody must hit our webservices unless expressly authorized to do so.
4) Excellent performance.  The application will be in a setting where work must be done very quickly.

As an architect and developer working this project, I added my own required features above those required by the application:
1) Minimal configuration.  The point is to reduce work, not generate more of a different kind.
2) SOA data transaction batches.  We have code that should function either on the client side or in our own internal network.  Transactions must be handled in both cases transparently.
3) Simple optimizations for lazy and eager loading.  I don't want to create custom DTOs just because we need a little more of the object graph than usual.
4) Transparent security.  Outside of the UI layer, security breaches are an exceptional condition and shouldn't need boilerplate nastiness to verify that a transaction is authorized.
5) Transparent disconnected functioning.  Except in very specific circumstances, data access code should not care whether or not it is being executed in the disconnected environment.
6) Transparent concurrency control.  Again, code should generally not care about handling concurrency errors unless there is a specific exception, and these should be handled in a generic fashion.
7) Ability to execute user-defined functions or stored procedures when necessary.
8) Transparent cache control.  Accessing an object ought to have one interface, regardless of whether or not the object is cached.

We currently use an ORM that meets all of the above requirements.  Allow me to share my thoughts on some very big issues we had to solve regarding these requirements, and mix in some responses to your issues.

As far as configuration is concerned, the system uses a configurable conventions class that allows programmatic configuration of defaults, and it uses attribute decorators for all other configuration.  I know that this ties our domain model to a specific data model, but in our situation that tradeoff wasn't so bad.  Furthermore, my experience is that data schema and object schema are usually versioned together anyway.  Contention for configuration is exactly the same as contention for the domain objects themselves, so there are rarely problems.  I'm surprised that your system did not allow you to split the configuration files by some logical boundaries that would've reduced the contention issues you had.

The key factor to easy mapping is consistency.  Most mapping issues arise out of essentially idiomatic models being twisted for no good reason.  Put your foot down: Everything must follow the idioms unless there is a *very* good reason not to.  Usually that reason is performance, and when the need arises you most likely have to introduce an extra step in your mapping in the form of a DTO.  While this reduces the transparency of the system, in my experience the need for these is rare enough not to have to worry about it as the vast majority of the system should be plenty performant by default.  If it isn't, you're using the wrong technology!

Developer productivity is the most important resource.  The more time you save with an excellent model, the more time you have to work on optimizations when the need arises.  Normally you should not be coding with consistency-breaking optimizations in mind.  Typically when confronted with a performance problem, we need to either eagerly load something, or we need to cache something, or we need to create an index.  Most performance issues can be resolved in a matter of minutes.

Your statement about reflection-based ORM frameworks needing to keep copies of the object in memory isn't entirely accurate.  Our system does not do this, and instead relies on user code to tell the data layer what to send back to the database.  I find this works rather well, because most of the time you definitely know what has changed.

Security-wise, the rich metadata that an ORM provides was a godsend when writing our security system.  Because there is exactly one gateway in which all transactions flow, and this gateway had all of the information necessary to perform a verification, even our extremely granular security was easy to implement in a generic manner.

Our disconnected architecture was also aided by the metadata and design of the ORM.  When we went into disconnected mode, queries were simply re-routed to a SqlLite driver instead of the default webservices driver.  Also, the single point of entry for all returned data allowed for easy caching of that data.

Most good ORM systems can use natural or surrogate key structures.  My preference is for surrogate keys, because let's face it: Natural keys change and developers are not always experts in their application's domain.  Not every developer can make the call as to what is a natural key 100% of the time and what is a natural key 95% of the time.  It's far easier to drop a unique constraint than it is to change your key structure when the inevitable happens.

I understand that our requirements are quite different from those that exist in the web-based world.  The world will be a happier place for us developers when we can dump the web as an application platform and replace it with an application delivery platform.

.Net ORM (none / 1) (#71)
by mirleid on Tue Mar 14, 2006 at 02:31:01 AM EST

As far as configuration is concerned, the system uses a configurable conventions class that allows programmatic configuration of defaults, and it uses attribute decorators for all other configuration. I know that this ties our domain model to a specific data model, but in our situation that tradeoff wasn't so bad. Furthermore, my experience is that data schema and object schema are usually versioned together anyway. Contention for configuration is exactly the same as contention for the domain objects themselves, so there are rarely problems. I'm surprised that your system did not allow you to split the configuration files by some logical boundaries that would've reduced the contention issues you had.
Well, data and object schema are not necessarily versioned together at a stage in the project when you are doing database performance optimization (as much as you can and the ORM framework will allow you). At this stage, the database schema will tend to evolve while (if you are doing it right) the object model will remain the same. There is a measure of separation of configuration files that is allowed by TopLink, but it is very convoluted and involves having multiple copies of the same class' mappings around, which is a nightmare for consistency. Also, the Mapping Workbench does not help.
Your statement about reflection-based ORM frameworks needing to keep copies of the object in memory isn't entirely accurate. Our system does not do this, and instead relies on user code to tell the data layer what to send back to the database. I find this works rather well, because most of the time you definitely know what has changed.
Well, if your code needs to tell the ORM framework what to send back, how is that different from the good old DAO pattern? One of the points of using an ORM framework is that you get metadata-driven persistence. And how do you know, in general what has changed? Basically, if you modify something in an object that causes side-effects somewhere else (as in, adding a Transaction to an Account causes another entity called Balance to be updated), how do you know that this has happened without trawling through the entire codebase?

Just out of curiosity: which ORM framework are you using?

Chickens don't give milk
[ Parent ]
RE: .Net ORM (none / 0) (#81)
by Ufx on Tue Mar 14, 2006 at 08:38:23 AM EST

Well, data and object schema are not necessarily versioned together at a stage in the project when you are doing database performance optimization (as much as you can and the ORM framework will allow you). At this stage, the database schema will tend to evolve while (if you are doing it right) the object model will remain the same.
Right, I said in another paragraph that performance is usually the reason for consistency-breaking changes. This is one of the reasons that I advocate a strong data layer even when talking to the ORM. When this inevitability occurs, the data layer can simply translate reasonable changes to the pristine domain model. In my experience an ORM is not a tool that will get you 100% of the way there - but done right, it'll be very close and well worth the effort.
Well, if your code needs to tell the ORM framework what to send back, how is that different from the good old DAO pattern? One of the points of using an ORM framework is that you get metadata-driven persistence. And how do you know, in general what has changed?
An ORM can be considered a generic superset of the DAO pattern. FooDAO.Save(fooInstance) is semantically similar to ORM.Save(fooInstance).

The UI generally keeps track of changes anyway so that it can provide undo support and change highlighting. It's not a stretch to extend this ability to change batching.
Basically, if you modify something in an object that causes side-effects somewhere else (as in, adding a Transaction to an Account causes another entity called Balance to be updated), how do you know that this has happened without trawling through the entire codebase?
Basically, every alteration is submitted to a transaction batch. This is essentially a function of our domain, and it provides natural change tracking at the object level. The ORM orders the operations inside a batch, so dependencies are inserted without causing constraint violations.

In general though, I would shy away from such side-effects. I want my developers to know exactly what effect they are having on the system.
Just out of curiosity: which ORM framework are you using?
We're a special case - I've tinkered with building ORMs in the past, and so I've had past experience with the architectural challenges they present. I and another developer decided to write an ORM to our specification over the course of a few weekends and nights for fun, because we couldn't find an out of the box system that did what we needed easily. We'll probably be releasing the code later on because despite its featureset, it encodes neither application logic nor major architectural decisions.

[ Parent ]
Semantic similarities... (none / 0) (#82)
by mirleid on Tue Mar 14, 2006 at 08:59:38 AM EST

While FooDAO.Save(fooInstance) is semantically similar to ORM.Save(fooInstance), the basic difference is that you can go ORM.Save(barInstance) if you add metadata for bar, but in the DAO case, you need to create new code (the barDAO). Accordingly, if you need to change the way foo is saved, you need to change fooDAO, whereas with an ORM framework, you only need to change the foo metadata.

Or am I missing something wrt your approach?

Chickens don't give milk
[ Parent ]
ORMs and DAOs (none / 0) (#88)
by Ufx on Tue Mar 14, 2006 at 02:41:49 PM EST

While FooDAO.Save(fooInstance) is semantically similar to ORM.Save(fooInstance), the basic difference is that you can go ORM.Save(barInstance) if you add metadata for bar, but in the DAO case, you need to create new code (the barDAO). Accordingly, if you need to change the way foo is saved, you need to change fooDAO, whereas with an ORM framework, you only need to change the foo metadata.

Or am I missing something wrt your approach?
No, I think you've sufficiently captured how the two patterns differ. I simply like to think of an ORM as a generic DAO that (given metadata) will execute the same code you would have written for that DAO.

I think one of the major problems with tools like this is that people think they're to be used 100% or 0% of the time. Even though I've designed and built ORMs from scratch, I don't think they're a replacement for DAOs. In my ideal design, an ORM is what the DAO uses under the hood to remove as much boilerplate as possible. When that corner case comes up and the tool gets in your way, by all means write your own mapping code.

[ Parent ]
The question not answered is "Why?" (1.75 / 4) (#64)
by expro on Mon Mar 13, 2006 at 09:04:23 PM EST

Why go to all the trouble of ORM, when a simple hierarchical property approach would seem to satisfy as well. Why store the info in tables at all? Is it worth all the trouble, or are there other solutions to the problems you thought SQL was the answer to, given all the mess it creates? Once you have seen the typical abuse that this sort of approach suffers, putting many new things into the tables that never belonged in tables just because tables is your nail and SQL is your hammer, why did you do it?

Yeh. (none / 0) (#68)
by NoMoreNicksLeft on Tue Mar 14, 2006 at 01:49:10 AM EST

Until you have 9 million records, and you constantly want to pull just a single one out of the bunch. Not everything belongs in a database, but the things that do belong just don't work anywhere else.

--
Do not look directly into laser with remaining good eye.
[ Parent ]
Please elaborate (none / 0) (#72)
by expro on Tue Mar 14, 2006 at 04:41:26 AM EST

It could be 9 billion records, and pulling a single one out of a bunch doesn't seem to necessitate relational/SQL-style tables. Good file systems can do this without SQL as does the web, etc. especially if you are planning to use something like EJB to get your caching anyway (not the only sane approach). SQL is one way of doing this, but SQL does not seem particularly well-suited in its structure to hold all the data of an application model. You have also substituted the word "database" for SQL, as though the terms were interchangeable. File systems are one other form of database. So is an index in a caching file-based BTree implementation, and neither one will encourage me to pull the sorts of tricks used to stuff garbage into relational tables as is frequently seen since that is all developers think they have available. There are certainly other ideas possible.

If I asked why you use a locomotive to carry messages to Alaska, arguing that the distances requires some travel-related machinery does not seem to answer the question, nor does the argument that a bicycle would take too long.

SQL relational databases seem to often be out of place in the process of making this sort of thing work.



[ Parent ]
Well, if we're going to get nitpicky. (none / 0) (#76)
by NoMoreNicksLeft on Tue Mar 14, 2006 at 05:57:25 AM EST

Yes, I used database as if you meant "SQL". But even SQL is a pisspoor implementation of what a decent query language would be.

Once you start talking down to the bare wire stuff, algorithms and whatnot, it's probably a debate well out of my league.

Still, I'm not sure where you think you're going with this.

Suppose we have a bunch of customer data, and yet more data on each of the products they purchased this. Do you slap it into two files, relating records from one to another? (And reinvent the wheel?)

Or do you store the purchase data in some XMLized bullshit underneath each customer's record? I wish I was more than just a lousy hack, but doesn't this see some performance inefficiency when you go to add more purchase info to a customer record?

How does this speed up data inserts? How does this speed up anything? How does it make it simpler?

--
Do not look directly into laser with remaining good eye.
[ Parent ]

No. (none / 0) (#77)
by expro on Tue Mar 14, 2006 at 06:44:14 AM EST

Still, I'm not sure where you think you're going with this.

There are lots of places to go besides relational tables. I have experience with a number of them, but I was not particularly preferring any one of them, just questioning the value of stuffing all the data into relational tables.

Suppose we have a bunch of customer data, and yet more data on each of the products they purchased this. Do you slap it into two files, relating records from one to another? (And reinvent the wheel?)

No. I'd give the data particles independent hierarchical identity. A property can then trivially refer to any other by its identity. Then perhaps choose a file system that creates and handles files efficiently and store it in many well-identified small files as one approach. This allows easy caching of name/value pairs of particles on an as-used basis, etc. not preferring one application's object subset over another, etc.

Or do you store the purchase data in some XMLized bullshit underneath each customer's record? I wish I was more than just a lousy hack, but doesn't this see some performance inefficiency when you go to add more purchase info to a customer record?

Why use "XML bullshit" or SQL bullshit when it is not really important to the task at hand. One simple direct approach is small property files for sibling properties, one to a line, of a good data hierarchy containing name-value pairs, which can be trivially appended. The leaves are in small files and the branches are directories. Indexes are easy enough to produce where needed.

How does this speed up data inserts? How does this speed up anything? How does it make it simpler?

It clearly speeds them up over all the SQL/relational nonsense. Appending a file is the most basic operation and if done reasonably, it is practically corruption proof, easy to transaction, easy to diagnose, etc. and it seems much more natural to the data in question. This is one approach of many possibilities.

The question is why design such contorted mechanisms to use relational tables and SQL databases? You are making it difficult and it was never intended as an object store in the first place and so you have to do all the extra mapping.

You can also more-easily address more data requirements that can be hard to address using relational tables, like (usually hierarchical) security and being able to trivially log all changes and trivially see snapshots of or roll back to an arbitrary date/time (admittedly I don't know if you can get this in MySQL, but it is trivial using property files or other mechanisms).



[ Parent ]
Sorry, that doesn't make sense... (none / 1) (#78)
by mirleid on Tue Mar 14, 2006 at 07:11:38 AM EST

If the attribute values are expressed as name/value pairs in flat files, I would guess that you'd express the value of an attribute references another class as a string pathname, right? If so, then it would be exceedingly difficult to maintain referencial integrity when doing deletes, for it would imply a scan of the entire filesystem to determine who is referencing the instance being deleted.

If you really want a database that thinks hierarchically, then you you can go the hierarchical database route. One of the problems with HDBMSs was that they did not allow cross-referencing of a single record by two separate trees, which lead to problems where referencial integrity and data consistency was concerned.

Additionally, another problem that you face with your filesystem approach is that it is quite difficult to implement queries with anything remotelly resembling joins in them.

Chickens don't give milk
[ Parent ]
I was only questioning SQL (none / 0) (#80)
by expro on Tue Mar 14, 2006 at 07:44:23 AM EST

If the attribute values are expressed as name/value pairs in flat files, I would guess that you'd express the value of an attribute references another class as a string pathname, right? If so, then it would be exceedingly difficult to maintain referencial integrity when doing deletes, for it would imply a scan of the entire filesystem to determine who is referencing the instance being deleted.

If unwritten or deleted values are null, then the references are just referring to a nulled property. This does not seem like such a problem that it cannot be dealt with. In at least some cases, I would want the reference to persist, and nulling the reference does not seem to solve the issue any better than a reference to a nulled-out property. Do not recycle IDs if you don't want collisions, but if it is not a generated ID, there may be an advantage to being able to reconstitute it.

If you really want a database that thinks hierarchically, then you you can go the hierarchical database route.

I am not advocating hierarchical databases or property files in general, but only questioning the RDB route for this type of data, but I think the problem you cite of lack of referential integrity does not seem like a problem to me. Just don't recycle IDs if you don't want collisions.

One of the problems with HDBMSs was that they did not allow cross-referencing of a single record by two separate trees, which lead to problems where referencial integrity and data consistency was concerned.

You are worrying more about these dangling references than I would care to, in my experience. If they are going to dangle, then deal with it. Just use basic synchronization if you need to ensure unique IDs, etc. and dangling is probably as good as anything else you could expect to happen in a complex object system. Nulling references does not let the application understand the full story.

Additionally, another problem that you face with your filesystem approach is that it is quite difficult to implement queries with anything remotelly resembling joins in them.

Remind me again, why I need joins to persist and otherwise handle the objects of my system? I will just go out and get the data as I need it and cache it using a simple mechanism for caching. If a property value is the identity of another property in the hierarchy, I may go get other properties based upon that identity. Perhaps you can construct a more concrete example of how a direct hierarchical storage of properties fails. I do not consider lack of SQL queries a disadvantage at this point.



[ Parent ]
If you store it as an XMLType... (none / 0) (#79)
by mirleid on Tue Mar 14, 2006 at 07:13:13 AM EST

...column in Oracle, you can even use XPath expressions in your SQL when querying. Granted that this is an Oracle hack, and not really part of the SQL standard, but still...

Chickens don't give milk
[ Parent ]
Object Database (none / 1) (#65)
by grant7 on Mon Mar 13, 2006 at 09:30:11 PM EST

why not use an object database like ZOPE? or an XML database? far more efficient for development... then you can piggyback on a relational structure later and save yourself a *lot* of effort

I like Zope (none / 0) (#84)
by jolly st nick on Tue Mar 14, 2006 at 09:46:59 AM EST

I like the fact that it follows the principle of making easy things easy, and hard things possible. It's the (or at least a) right answer for a whole universe of appliations. However it's not the right answer for the universe of ALL appliations. For example if you have a distributed, transactional application with high scalability and concurrency requirements right from the get-go, it wouldn't be my first choice.

[ Parent ]
this story is embarrassing (2.25 / 4) (#74)
by th0m on Tue Mar 14, 2006 at 05:05:44 AM EST

well written, deserves FP, but just shows how convoluted and awful software has become. never mind: we'll look back on these things and laugh.

Well... (none / 0) (#75)
by mirleid on Tue Mar 14, 2006 at 05:30:27 AM EST

...there's a bit of 'artistic license' in there (I had to make it more or less entertaining, otherwise people would lose interest halfway through - it's kind of long), but all the problems described are genuine ones.

Having said that, I still think that ORM is fundamentally a good thing, but, as with most 'new' stuff, it still has some way to go in terms of addressing common usage issues...

Chickens don't give milk
[ Parent ]
Comments (1.50 / 2) (#83)
by jolly st nick on Tue Mar 14, 2006 at 09:40:46 AM EST

It makes your code simpler to understand (well, assuming that your domain model is worth a damn). Additionally, if you do it right, you write less of it. This means that maintainability of a system created using ORM should be significantly higher than one created using straight JDBC calls.

I am skeptical of this claim. Your own story is a good counterexample: ORM appears in your case to have swept a great deal of detail about the dynamic behavior of your application under the rug; unfortunately since some of those details were elephant sized, the result lacks in neatness. Furthermore, having wrestled with the problem today is no guarantee of unpleasant surprises tommorrow. By contrast, coding direct to JDBC may not be such a bad thing. The main problem from a design standpoint is that it litters your code with implementation details, for example syntax that is supported by one platform or another. It also requires that programmers understand your data model (which in turn requires having a data model that makes sense, but that's a good thing). Perhaps DAOs generated by abstract factories are the sweet spot here. Since you have to code the CRUD operations somewhere, it makes sense to abstract the policies and implementations.

In any case, I think the lesson is all the complexity of persisting objects has to go somewhere, and you still have to deal with the consequences of how you handle this. So in that sense, I'd say that ORM doesn't really make an application more maintainable. If it is true, as you say, that ORM based projects are not cheaper than JDBC based projects, I can't see how they could be claimed more maintainable. Personally, I think apps using ORM can be, but stirring another framework into the mix is never going make your application more maintainable all by itself. That's still your job.

It gives you a number of technical options that would be quite complex to code from scratch. I am talking about stuff like lazy loading, predictable SQL generation order (important in deadlock avoidance for clustered applications) and query by example.

I agree completely with this.

It gives you database independence. Your application does not even know what the database is;

But you can get this with the DAOs generated by absract factories.

A few other comments on points that you raise in your excellent article:

The problem with this is that the database's performance is highly dependant on how "good" the schema is

This raises two points in my mind. The first is that the essence of good design is clean separation. Any framework or component that creates a mess in some other part of your system has to be considered crap. Secondly, I'd say your statement is not precisely true. One of the goals of the relational model might be stated as consistently mediocre performance. The assumption is that we live in the world where programming time is, relative to machine time, very expensive. Therefore allowing the programmer to be able to write a query or update in a way that simply looks reasonable, and have the results come out good enough, is a very good trade-off. Therefore a "good" schema is primarily one that forces the fewest constraints on the programmer, either in the application at hand or any future application that might use this data. And indeed optimizers these days are very sophisticated. They're a lot like modern automatic transmissions. A highly skilled driver who is paying attention can outperform them, but a typical driver under typical conditions is better of without the stick.

Of course, this utopia of physical data independence turns out to be nowhere as perfect as we would wish. So there is a separate level of design, physical design, which allows the DBA to fine tune the performance of the database. DBAs from the SQL Server world may not be able to do much, but with a product like, say, Oracle, the DBA can implement a lot of the performance strategies that a programmer would otherwise do, such as keeping sets of data that are frequently used together in the same database blocks, or prioritizing current data by placing it on faster devices, etc. Of course, he can't do any of this with a schema that makes no sense; at least it's much harder.

Eventually, you get the new metadata, you run your code, and the net result is that e-mail tone from the DBA goes from angry to downright shitty

One lesson you neglected to draw from this experience is this: befriend your DBA. The Golden Rule applies here. You wouldn't like it if you got called in on a weekend because somebody out of the blue dropped a pile of shit on your work, so don't do it to him.

I've been in this business now for over twenty years. New technologies are "hot", but mature technolgies are what make the new ones usable. In the 1980s, RDBMS technology was extremely hot. People went to training and tried to put as many RDBMS buzzwords on their resumes as possible. Now RDBMS technology is mature, and a lot of people don't really pay much attention to it or understand it very well.

So, your DBA may not get the respect that a professional of his skill (presuming he is competent) deserves. But if he knows his business, you would do well to consult him earlier, rather than later. Ideally before you choose your ORM tools. He may bring perspectives that are foreign to you. For example the reason he doesn't like cryptic or even meaningless tables is that from th DBA's perspective, data doesn't belong to an application, it belongs to the organization. So, while you may get your application out the door, you may be creating a big problem for the next project or feature that needs that data. So, if your DBA says your ORM generates crappy schema, you should listen to him. The person who thanks you for this may be your future self.

On your comments... (none / 0) (#85)
by mirleid on Tue Mar 14, 2006 at 11:11:58 AM EST

On Maintainability

My claim on maintainability is based on the fact (supported by my experience, but your mileage may vary) that, on the one hand, you write less code, so, you have a smaller codebase to contend with when doing system maintenance; on the other hand, as you rightly wrote, you do not pollute your code with the ins and outs of writing JDBC code (platform dependencies, catching SQLException all over the shop, etc). Additionally, your codebase is written in terms of a domain model, which can be successfully discussed with the business owners, rather than on technical concerns such as DAOs and database tables. In my experience, the latter really helps with regards to making sure that everybody is talking about the same kind of fruit. I don't think that it is a bad thing if developers understand the data model, but, IMHO, more importantly, they should be made to understand the domain model, because, at the end of the day, DAO pattern or not, those java bean-like constructs are what they are expected to deal with.

Also, it should be noted that there's a bunch of queries that people tend to write repeatedly (like "find all objects of a given class", or "find this object given this primary key") that are supplied pretty much natively by ORM frameworks, which further reduces the amount of code that you have.

But you can get this with the DAOs generated by absract factories.

Yes, granted, but you still have to code those DAOs, adding to the application's codebase, whereas with an ORM framework, it is a given.

On schema independence

Please do not get me wrong: I am sure that it is possible (in fact, I am certain) that you can get a good schema and still use an ORM framework. My point (maybe I should have been more clear) was that it is not a given, as in, the default schema that you get is at best not very good. There are several schema and query optimisation techniques that you can use (and I am keeping those for a follow-up story at some point in the future on ORM hints and tips) that go a long way towards making sure that your schema is as good as it can be. Just do not expect it to be as good as if you'd be using JDBC, and do not expect that the exercise is going to be painless or quick.

On DBAs

Completely agree.

On a final note, I am actually kind of surprised that nobody has picked up on the "unsupportable" thread of discussion. I decided not to go into the mitigations that you can put in place to address that so as not to make the piece even longer than it already is, but it would certainly make for some interesting discussions.



Chickens don't give milk
[ Parent ]
A lot depends on your application & mission (none / 0) (#86)
by jolly st nick on Tue Mar 14, 2006 at 12:59:26 PM EST

Yes, granted, but you still have to code those DAOs, adding to the application's codebase, whereas with an ORM framework, it is a given.

Naturally.

The question is, however, one of complexity. The two big bugaboos in maintenance are (1) where do I look for something? and (2) if I change this, what's going to break? To my mind, coding a DAOs against JDBC does not add much if anything maintenance wise. Queries of the "find all objects of a given class", or "find this object given this primary key" may be neatly handled by the ORM, but they're such trivial operations you're more than compensated by not having to deal with an other framework and any bugs/pitfalls that that entails. Where things get messy hand coding the SQL is just where they get messy using an ORM mapper, when you start dealing with composition and loading collections. The ORM mapper DOES have the considerable advantage of having a pre-defined mechanism for handling this messiness.

Of course, it's not acceptable practice to enter parameters into code; parameters belong in files. However the DAO approach doesn't presuppose that everything is hard coded. Furthermore, there's no reason to assume your DAO doesn't use an ORM framework underneath. This in turn can be configured through IOC. That is why issues with handling SQL exceptions to my mind don't signify. They should be handled in the part of the application that handles persistance, and if necessary converted to abstract exceptions that are meaningful to the controller.

The only place where push comes to shove is when you decide to do non-trivial queries: find me objects that meet certain (non-trivial) criteria. Trivial criteria include finding all, finding all matches by a single attribute, finding all matches that are equal to an instance of an object, and, possibly, finding by composite primary keys depending on the logic of the applicaiton. But beyond the point of trivial queries, your underlying persistence framework is bound to bleed through. You're going to have to choose a query language. While SQL has many limitations, it's very powerful so it works for me.

There's probably not any "one true way" to do this. I should say that I probably work in very different kinds of applications than you do. We don't have scalability or transaction volume problems to speak of, but we have extremely complex query requirements. I get engineers with general kinds of app development experience and they tend go cross eyed when they get a first look at some of the queries we do. Which I suppose could argue means that the relational model is not maintainable for programs, except by the time they generate a procedural program to do the same thing, it's much larger, harder to understand, and often has subtle bugs that are hard to track down. Better for us to get a little practice using SQL . Studying a few of the finer points of relational theory that are common stumbling blocks is not a bad idea either (e.g. what "null" signifies and how it's handled in various contexts).

That said, I'm currently looking at iBatis; it seems to be about right for my kind of application.

I'll add one more thing. I gather from your post that for you, the object model is the linchpin design product, and everything else flows from this. I won't say it's wrong, but I will assert it's not the right choice for every situation. Partly this is a matter of my generation, but I tend to start with the data model first, and build things around that. I'm not sure that many people these days are even aware that this approach can be used and works quite well. It leads to particular challenges and problems of course, but so does every strategy.

Again it's not the only way to do this, nor is it necessarily right for the applications you do. But it does seem to be right for the kind of applications I do, which tend to be heavy on data analysis. For my applications, the database is a key, long term asset, data that goes in is expected to take part in many data products, most of which probably haven't been thought of yet.

[ Parent ]

Logical model as linchpin design product (none / 0) (#87)
by Ufx on Tue Mar 14, 2006 at 02:25:39 PM EST

I'll add one more thing. I gather from your post that for you, the object model is the linchpin design product, and everything else flows from this. I won't say it's wrong, but I will assert it's not the right choice for every situation. Partly this is a matter of my generation, but I tend to start with the data model first, and build things around that. I'm not sure that many people these days are even aware that this approach can be used and works quite well. It leads to particular challenges and problems of course, but so does every strategy.
I don't think that either the object model or the data model should be the linchpin design product. IMHO, the logical model is the most important piece of the puzzle. It is from this model that both the data and object models ought to be derived.

The most popular development environments today are supersets of the relational model. OOP allows the encoding of relational constructs as well as other abstractions like inheritence trees and behavioral functionality. For this reason the object model will usually resemble the logical model more than the data model (for various definitions of resemble ;)). The problem I see is that the average developer encodes too much application-specific knowledge into the object model, making it less functional or downright dysfunctional other applications. Thus we end up with shared data but not much shared functionality.

[ Parent ]
Well this can go pretty far afield I guess (none / 0) (#90)
by jolly st nick on Tue Mar 14, 2006 at 03:53:57 PM EST

I don't think that either the object model or the data model should be the linchpin design product. IMHO, the logical model is the most important piece of the puzzle.

...

The most popular development environments today are supersets of the relational model.

This reminds me of a brain science prof I had who had a picture in his office of this figure from, I think, a Buddhist temple. It looked like a guy wearign a sombrero. When I asked about it, he said the sombrero represented the ten thousand parts of your mind, all of which need to be enlightened if you are to reach enlightenment.

There are many design products, each of which at some stage in the process becomes critical. Screwing up on any one of them spells disaster at some point in the game. The data model may not be the most important thing for getting the project out the door. In fact, critical problems in the data model may not emerge until the project is successfully delivered and you, hopefully, are long gone. But for many applications it is the one piece that you will live with longest. Data often outlives requirements.

[ Parent ]

I've been out on the piss tonight... (none / 0) (#91)
by mirleid on Tue Mar 14, 2006 at 05:02:32 PM EST

...hope it is OK with you if I reply tomorrow (that is, if you actually insist that the reply makes sense)...

Chickens don't give milk
[ Parent ]
Non-trivial queries... (none / 0) (#99)
by mirleid on Wed Mar 15, 2006 at 05:44:42 AM EST

Apologies for the on-the-piss post. It was one of those nights.

Anyway, I completely agree with your statement on non-trivial queries. The thing is that there are techniques to overcome this. Assume that you have a monster, 2 sides of A4 query that you need to run for generating a report. This query is a nightmare to do in anything other than SQL. So, most of the time, you do just that (or you shove it into a stored proc and call that so that you don't have to spend app time preparing the query and so forth). With an ORM framework, you have another option, which is to create a view based on the query, and create a read-only object mapped onto the view. This way, you have access to the report data without running anything other than a trivial query (including possible report refinement criteria such as from date x to date y), you are leveraging the database's strengths while you're still maintaining the OO paradigm at application level. And like this one, there's a number of other things that you can do...

I wouldn't say that the object model is the linchpin product, but it is certainly central to the design tasks. The client that I am currently working for uses RUP, so, as would be expected from any Use Case driven design methodology, the analysis (and design) object models play a central part to that activity. Having said that, I am sure that there's a happy medium that can be struck between both approaches.

Chickens don't give milk
[ Parent ]
Core differences are the problem (none / 1) (#101)
by bslade on Wed Mar 15, 2006 at 12:51:51 PM EST

[rant] By definition objects are actions (er, methods) associated with a piece of data.   Relating data together just doesn't fit in that definition.

You can see how OOD falls apart as soon as you try and add object1+object2.  Which object does the "plus" belong to?  Probably it's declared as belonging to one of the two objects, but it really belongs to the parent class of both objects.

This basic incompatibility extends into relating any kinds of data together.   The logic only relation operators (feels sort of procedural) need to be defined separately from the objects.

Building objects to implement the data relationships is not OOD because the key aspect of the object is the procedural logic involved, there's no data the object holds.

Saying it's an impedance mismatch is like saying the extension cord is a problem for an electric car.   It's not a fine tuning issue, it's a conceptual difference.

Ben in DC
Ben in DC
PublicMailbox at benslade.com (put 030516 anywhere in the subj to get thru)
"It's the mark of an educated mind to be moved by statistics" Oscar Wilde

Perhaps your perception of OOD is the problem (none / 0) (#102)
by Ufx on Wed Mar 15, 2006 at 01:56:35 PM EST

You can see how OOD falls apart as soon as you try and add object1+object2. Which object does the "plus" belong to? Probably it's declared as belonging to one of the two objects, but it really belongs to the parent class of both objects.
I think you're a bit misguided here. The lack of a feature in some OO lanugages does not cause OO designs to fall apart. What you're looking for is something like double dispatch or multimethods, features which are indeed supported by some OO languages. The answer to the question "which object does a binary operator belong to" is not the parent object, but both acting objects. Multimethod behavior can be simulated when this feature does not exist in your language of choice, and the lack of hard support definitely makes implementations less elegant, but it hardly undermines designs at a core level.
This basic incompatibility extends into relating any kinds of data together. The logic only relation operators (feels sort of procedural) need to be defined separately from the objects.

Building objects to implement the data relationships is not OOD because the key aspect of the object is the procedural logic involved, there's no data the object holds.
If you're building objects that handle relationships between other objects then I would say you are undoubtedly still within the realm of OOD. Just because a particular object is all methods and no data does not mean you've circumvented OOD, because it most likely still acts upon other objects which encapsulate both data and operations.

A dynamic set of raw data can definitely be represented as an object. Microsoft's DataSet implementation does just that. There's a lot of power to be gained by representing data as objects: They can expose events to be fired whenever the data changes, enforce relationship constraints, provide sorting and filtering hooks, merge like data by intelligently examining changesets and primary keys, and even perform set-based operations such as joins and unions.

[ Parent ]
test (none / 0) (#104)
by moseph77 on Thu Mar 16, 2006 at 12:16:32 AM EST

.

[ Parent ]
Programming (none / 0) (#106)
by dogeye on Thu Mar 16, 2006 at 04:39:28 PM EST

This article makes me sad. I used to be a solid programmer. I have a bachelors and masters in computer science. I quit programming 5 years ago for another career, and this kind of article makes me feel like I'd have to do years more studying to reenter the world of programming.

Not really. (none / 0) (#109)
by porkchop_d_clown on Fri Mar 17, 2006 at 09:34:39 AM EST

It depends on where/what you want to do.

Clearly, my Java 1.2 certification is worthless at this point - but at the same time I haven't done anything professionally in any language except straight C in years.

I'm not saying OO doesn't have it's place - yesterday I was sorely wishing for a Dictionary class to replace the unsorted linear array of strings some idiot put into a program I was maintaining.


People who think "clown" is an insult have never met any.
[ Parent ]

That's why we wrote Mr. Persister (none / 0) (#108)
by jjenkov on Fri Mar 17, 2006 at 07:59:46 AM EST

Many the problems mentioned with the common ORM's is the reason why we developed our own ORM, Mr. Persister. Essentially it just makes the most common JDBC tasks a lot easier, but leaves you with all the power and flexibility of JDBC / SQL. I don't need lazy loading. I don't need to automatically read total object graphs. What I do need is not to have to write all the boilerplate ResultSet --> Object copying code, the opening and closing of connections, statements and result sets, and the commit/rollback of transactions. Mr. Persister does that for us now. The SQL and table design is up to you. Any SQL query can be read into objects. IBatis is another similar solution that doesn't pose restrictions on the models or query language, but does most of the boilerplate object reading and connection handling that is the main reason (I think) that developers look to ORM's.
Jakob Jenkov
Relax people (none / 0) (#110)
by Roman on Sat Mar 18, 2006 at 11:20:31 AM EST

nothing beats a simple well designed approach:
  • package:
    com.somefirm.someproject.domain
    classes:
    Account (a business specific data structure that possibly mirrors a db table called account)
    Client (a business specific data structure that possibly mirrors a db table called client)
     
  • package:
    com.somefirm.someproject.dao
    classes:
    DAOFactory (abstracts of getter methods for each specific DAO)

    (assume Oracle database)

    ORADAOFactory (getter methods of actual specific DAOs)

    AccountDAO (Account db API)
    ClientDAO (Client db API)

    ORAAccountDAO (Account db implementation)
    ORAClientDAO (Client db implementation)
     
  • DAOFactory defines getAccountDAO(), getClientDAO()
    DAOFactory also defines: getORADAOFactory()
     
  • ORADAOFactory implements getAccountDAO() and getClientDAO()

    Example:
    public AccountDAO getAccountDAO() {
    return new ORAAccountDAO(this);
    }

     
  • ORAAccountDAO and ORAClientDAO implement constructor that takes ORADAOFactory as a parameter. This parameter is kept in the instance of this specific DAO and allows the DAO to get connection references (there can be different types of connections for different schemas, this is business specific.)
     
  • ORADAOFactory implements ORADAOFactory(Connection con) constructor, so that a unit test can instantiate this factory outside of an application server.

    ORADAOFactory also implements ORADAOFactory() constructor, which initializes a connection in a different way, for example gets it from the JNDI.
     
  • If there is a need for a transaction, it can be done either with a Session EJB or with just a user transaction if there is no application server.

    A transactional method starts by instantiating a needed DAOFactory. Oracle example:

    DAOFactory daoFactory = DAOFactory.getORADAOFactory();

    Now to run any specific DAO method, just do this: daoFactory.getAccountDAO().insertAccount(Account); or for example daoFactory.getClientDAO().updateClient(Client);

    Whether it is a Session EJB that handles transactions or user transaction code is irrelevant.
     
  • A unit test would do this:

    1. use some property manager to load db properties from some file.
    2. use a db connection manager to load db sources.
    3. on setup of a unit test use the connection manager to create a new connection (or retrieve one from a pool,) it is possible now to just set the needed connections as private members of the unit test.
    4. run the unit tests, rely on the private connections and ORADAOFactory(Connection con) constructor to setup the connections.
    5. run the DAOs needed for this unit test.
    6. on teardown just rollback the connection(s). This allows to run the same set of unit tests over and over again, especially when complex setup procedures are used by the tests, such as deleting all data from a table, setting up mock data etc.


Oracle licenses (none / 0) (#112)
by cerberusss on Mon Mar 20, 2006 at 06:38:46 AM EST

using an open source database for development would save you all that Oracle development licenses cash

AFAIK Oracle licenses are free for development purposes. Just a sidenote.

Depends on your site licensing terms... (none / 0) (#118)
by mirleid on Tue Mar 21, 2006 at 09:49:28 AM EST

They're normally present as being free, but if you use Oracle in a large project, for a large site, the price of the site license will reflect how many expected developers you will have. That premium could be kept and used for other stuff...

Chickens don't give milk
[ Parent ]
Any ORM users try an OODB instead of an RDBMS? (none / 0) (#117)
by nullchar on Tue Mar 21, 2006 at 12:59:34 AM EST

Anyone out there tried an object oriented database for "mapping" objects to "relations"?  You might not need a mapping or a relation.  The DB would simply store your objects.

A relational database implicitly stores relations between data -- not relationships between objects.

That's why the ORM "solutions" baffle me so much.  It seems that your OO code can use objects to do object things, and a RDBMS to do data things.

Assuming you have a sane schema, how hard is it to just:  select attribute(s) from table(s) where some condition is met  ?  Then your objects just use that data as necessary.


Simple answer (none / 0) (#120)
by ttfkam on Fri Mar 24, 2006 at 10:55:49 AM EST

Assuming you have a sane schema, how hard is it to just: select attribute(s) from table(s) where some condition is met? Then your objects just use that data as necessary.
In this simple case, not hard at all -- which is why we have the ORM library do it instead of writing yet another boilerplate layer. Grabbing an object from Hibernate and others is just that: a select statement with the properties of an object or set of objects assigned a table's values.

They even support lazy loading, loading pre-defined values on demand instead of with the rest of the object, useful for blobs/clobs.


If I'm made in God's image then God needs to lay off the corn chips and onion dip. Get some exercise, God! - Tatarigami
[ Parent ]

My Object Repository Framework.. (none / 0) (#119)
by benow on Thu Mar 23, 2006 at 08:11:09 PM EST

I've played with this problem a bit, and have come to the (oft) conclusion that less is more. I've built up a set of interfaces for persistence... ObjectRepository, ObjectRepositoryConnection, PersistentObject, etc.

I've built several implementations, the most complex and slow of which is a castor jdo wrapper which uses xml object descriptors and reflection population. A modification to castor allows for population of private and protected fields.

The implementation I use most often is a jdbc implementation, which is nice and light... when objects are persisted (myObject.update(tx)) it passes the object to a registered handler and the handler implements the basic handling (add, remove, update). In these handlers I define common queries, centralizing the db code. Queries contain the expected result class and the handlers are called to populate from the result. I do member loading through populate methods... query.createContext().getObject(tx).populateAuthor(tx). A blanket Populatable interface can be implemented and dependant objects brought in thru calls to populateX from within the populate method. All data objects are implementations of interfaces (AuthorImpl, TopicImpl, etc) and extend PersistentObjectImpl which takes care of key handling and persistence tie-in. All this works well... faster than reflection, but still too slow when accessing 1000s of objects. In such cases (complex pages, etc) I grab the connection directly from the BasicObjectRepitory and work with typical (fast) jdbc statement/row/col.

My next step is to make a light castor replacement, cached reflection based object population guided by metadata. Reflection caching can really speed things up. Ideally, moving towards round trip would be nice. Change object have change show up in metadata... tho human intervention is sometimes nice. Auto population of fields on access would be very nice, but I fear may add too much bloat unless going the bytecode markup route.

If you want the best of best and are only using objects (with no sql level inter-app data sharing) and have some cash to spare, go with a real oodb, like poet or objectivity. They can be horrificly expensive, tho.

Yes ... and no (none / 0) (#121)
by fupeg on Thu Mar 30, 2006 at 06:19:41 PM EST

The article hints out the bigger problems. Such as
  • Many developers simultaneously working on object , domain models, and database schemas
  • DBAs
  • "people managing the servers"
  • Operations Support
In other words, lots of cooks in the kitchen and no head chef. You need one person to define your models and schemas. That person probably has a grandiose title like "architect". Is said architect a bottleneck? Absolutely. That's why he/she needs to be a superstar. This person should then work with the other folks (DBAs, sys admins, support) and better already have an understanding of the implications of their design BEFORE they work with the other folks.

If you have a complex system (and given the presense of DBAs, sys admins, support, this system had better be complex or people need to get canned) and have lots of develoeprs all doing their own thing, using a technology they don't fully understand, then of course you are going to have problems. That's not a product of the technology, that's a product of poor organization and lack of leadership.

Personally, I have worked on several large, complex projects using Hibernate for ORM. In some cases, there was existing JDBC based apps being replaced. The Hibernate based app was definitely quicker to develop, easier to maintain, and did not change the way that tech support did things. We were definitely able to use less-experienced (including offshore) developers by putting all the data access code (including the Hibernate code) behind a facade and just having the junior devs work against the facade's interface. Of course the data access code could have been JDBC (and it actually did in a few special cases,) that made no difference to the junior devs. Using Hibernate made it much more possible for the architect to manage all this reponsibility and code, since much of it could be generated and then tweaked.

As for performance, it's always going to be a little slower, maybe a lot slower. Of course just using Java means that a lot of things are going to always be a little bit slower, maybe a lot slower. At the end of the day, hardware is a lot cheaper than developers.

Once you know ORMs, most of these problems go away (none / 0) (#123)
by Wolf Keeper on Tue Jul 10, 2007 at 11:49:18 AM EST

I'm familiar with Hibernate.  I can't speak for the others.  The learning curve is steep, but once you've got it you can build applications very fast.
1. "Derive your database schema from the domain model."
Hibernate is flexible enough to work with an existing schema.  You can write your Hibernate objects to work with your database, and not vice versa.  You lose the ability to use polymorphism in some of your database objects, but ORM remains very handy.
2. "The first problem that you face is documentation."
The Hibernate website is a font of information, and the Hibernate In Action book remains the only IT tome I have read cover to cover.  It is outstanding.
3. "you typically have more than one person creating mappings."
Hibernate handles that fine.  The XML files governing Hibernate behavior are human-readable, and can be team developed just like any other Java code.
4. "you realise that that happens because the ORM framework will not, by default, lazy load relationships"  
The importance of lazy load relationships is highly documented in Hibernate and the default behavior for a few years now.
5. "This time, though, in order to make a call on whether a relationship should be lazily loaded, you need to trawl through all the use cases, involve a bunch of people, and come up with the most likely usage and access scenarios for each of the classes in your domain model."
You have to do this whether you use an ORM or just JDBC.  Your JDBC code can just as easily hit the database too often.  Either way, the research should be done before the application is built and not after.
6. " The problem is that reflection-based ORM frameworks figure out what needs to be flushed to the database (as in, what you created or updated, and what SQL needs to be generated) by comparing a reference copy that they keep against the instance that you modified. As such, and at the best of times, you are looking at having twice as many instances of a class in memory as you think you should have."
I believe Hibernate simply tracks whether an object has been changed and does not keep a reference copy.  Regardless, there's well documented guidelines for evicting objects you don't need from memory to cap memory use. And RAM is cheap.
7. "At a high level, the only way that you can get around this is to actually figure out which classes are read-only from the application standpoint."
Again, whether you use ORM or SQL and JDBC, identifying read-only classes is part of application development.  Setting up read-only caches of objects that don't change is easy.
8. "Surrogate keys"
I have to disagree that surrogate keys are a drawback.  Put a unique constraint on the columns you would have preferred as a primary key (i.e. what would have been the "intelligent keys").  Then you can do joins, updates, deletes, etc... using the intuitive primary key column and the application can chug right along with surrogate keys.  

It's also worth mentioning that Hibernate also has easy tools for using straight SQL and for using prepared statements and languages like Transact SQL, PL/SQL, or PL/pgSQL.

I can't say ORM is the best solution for mapping between object oriented languages and databases. But for big applications, it's much easier than rolling your own JDBC code for all of the database work.  Someone skilled on Hibernate with input in your project planning could have made life tremendously easier.

Java ORM: lessons learned | 123 comments (78 topical, 45 editorial, 0 hidden)
Display: Sort:

kuro5hin.org

[XML]
All trademarks and copyrights on this page are owned by their respective companies. The Rest 2000 - Present Kuro5hin.org Inc.
See our legalese page for copyright policies. Please also read our Privacy Policy.
Kuro5hin.org is powered by Free Software, including Apache, Perl, and Linux, The Scoop Engine that runs this site is freely available, under the terms of the GPL.
Need some help? Email help@kuro5hin.org.
My heart's the long stairs.

Powered by Scoop create account | help/FAQ | mission | links | search | IRC | YOU choose the stories!