CCK & Views, My First Experiences

and probably my last one ..

I mentioned a couple of weeks ago that I was looking into a way of converting my static page with published papers, articles and presentations in a more dynamic page where I could create RSS feeds from the updates and feed them into another site for everybody to use.

Some people suggested that I'd have a look at CCK & Views for this. While up till now when creating an app I had usually written my own Drupal module with my own database schema and manually written SQL Queries. This indeed looked like the perfect opportunity to dig into the CCK and Views thingie.

Now I must admit that I`m not really fond of "generated queries"

I've had nightmares before when having to debug 2Mb Hibernate queries that could be reduced to a single select value from table where query but Hibernate used a zillion join on tables I didn't need.
So I was a bit afraid of what Views would do ..

So here's what happened, I created 2 pages .. presentations. and publications. Personally I'd go for 1 table each with some smart indexes and then the only thing you want is SELECT * FROM table ORDER BY date

Now if I'd create a special content type for this I probably get some more benefits.
And I could get the data from the tables with a rather simple query like this :

  1. SELECT node.nid,
  2. content_type_presentation.field_datum_value,
  3. content_type_presentation.field_location_value,
  4. content_type_presentation.field_country_value ,
  5. content_type_presentation.field_event_link_title,
  6. content_type_presentation.field_event_link_url
  7. FRON content_type_presentation, node
  8. WHERE node.type="presentation" AND node.nid = content_type_presentation.nid
  9. ORDER BY content_type_presentation.field_datum_value desc;

However when using Views , this is what is being created.

  1. SELECT DISTINCT(node.nid) AS nid,
  2. node_data_field_country.field_datum_value AS node_data_field_country_field_datum_value,
  3. node_data_field_country.nid AS node_data_field_country_nid,
  4. node.type AS node_type,
  5. node_data_field_country.field_event_link_url AS node_data_field_country_field_event_link_url,
  6. node_data_field_country.field_event_link_title AS node_data_field_country_field_event_link_title,
  7. node_data_field_country.field_event_link_attributes AS node_data_field_country_field_event_link_attributes,
  8. node_data_field_country.field_location_value AS node_data_field_country_field_location_value,
  9. node_data_field_country.field_country_value AS node_data_field_country_field_country_value,
  10. node_data_field_link.field_link_url AS node_data_field_link_field_link_url,
  11. node_data_field_link.field_link_title AS node_data_field_link_field_link_title,
  12. node_data_field_link.field_link_attributes AS node_data_field_link_field_link_attributes,
  13. node_data_field_link.nid AS node_data_field_link_nid
  14. FROM node node
  15. LEFT JOIN content_type_presentation node_data_field_country ON node.vid = node_data_field_country.vid
  16. LEFT JOIN content_field_link node_data_field_link ON node.vid = node_data_field_link.vid
  17. WHERE UPPER(node_data_field_country.field_country_value) != UPPER('')
  18. ORDER BY node_data_field_country_field_datum_value DESC

So there's some joins, a zillion more fields that I don't really need being fetched from my database.
The explain also tells me some interesting things.. like the Using temporary AND Using Filesort

  1. | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
  2. +----+-------------+-------------------------+--------+---------------+---------+---------+-------------------------------------------+------+----------------------------------------------+
  3. | 1 | SIMPLE | node_data_field_country | ALL | PRIMARY | NULL | NULL | NULL | 33 | Using where; Using temporary; Using filesort |
  4. | 1 | SIMPLE | node | eq_ref |vid | vid | 4 |buytaert_blog.node_data_field_country.vid | 1 | |
  5. | 1 | SIMPLE | node_data_field_link | eq_ref | PRIMARY | PRIMARY | 4 | buytaert_blog.node.vid | 1 | |
  6. +----+-------------+-------------------------+--------+---------------+---------+---------+-------------------------------------------+------+----------------------------------------------+

Now my simpler query uses the same amount of rows to return the result but doesn't build a temporary table.

  1. +----+-------------+---------------------------+------+-----------------------+------+---------+---------------------------------------------+------+----------------+
  2. | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
  3. +----+-------------+---------------------------+------+-----------------------+------+---------+---------------------------------------------+------+----------------+
  4. | 1 | SIMPLE | content_type_presentation | ALL | nid | NULL | NULL | NULL | 33 | Using filesort |
  5. | 1 | SIMPLE | node | ref | PRIMARY,node_type,nid | nid | 4 | buytaert_blog.content_type_presentation.nid | 1 | Using where |
  6. +----+-------------+---------------------------+------+-----------------------+------+---------+---------------------------------------------+------+----------------+

In this small example there isn't much difference in performance to be noticed, but with bigger tables you will see other results. However I still prefer my personal query approach.

The other thing that bugs me is the excessive use of aliases. The MySQL internal experts can shed a light on the performance impact of using them, but if you are running queries remotely you at least have an increase in bandwidth usage. Not a problem now, but who knows later when you are big and famous :)

Now Drupal seems to do it pretty ok .. it's aliasing table.field to table_field which still makes sense. Not unlike some other tools that alias table.field to dynamically_generated_unreproducable_unreadable_key_word_with_no_relevant_meaning , try debugging that for a change. And yes I see valuable use for resultset aliasing, but I also se it is often overused.

But the short summary ... given my background I prefer writing the module and crafting my own queries over having Views create
a query for me.

Comments

Simon Hobbs's picture

#1 Simon Hobbs : You should be comparing views

You should be comparing views with other query builders, not the infinitely complex coder's mind.


Pasqualle's picture

#2 Pasqualle : Views module

Come on, don't be stupid, do not even try using Drupal without views. I am sorry, it is only your fault, that the generated query does not look like as you expected..

First thing everyone should do when starting a new site is to enable the views module. No excuses..


gaele's picture

#3 gaele : @greggles: don't forget 3)

@greggles: don't forget

3) Maintenance.

Who's going to maintain the custom queries after the developer has left? Who's going to do the upgrade to a next Drupal version? You did fully document your own code, didn't you?


Larry Garfield's picture

#4 Larry Garfield : Three reasons for CCK+Views

While hand-crafted anything will generally get you better performance than a generic solution, I will almost always favor CCK+Views as a solution, even if I have to do more work to set it up initially with a custom plugin to one of them. There are three main reasons for that:

1) Non-programmers can manipulate and change it. More than half of my company is non-programmers. Having 90% of the company rather than 40% able to build new data types and queries against them instead of 40% is an absolute godsend. That occasionally a query is a bit less performant is well worth it for me as a programmer to be doing 80% less grunt work.

2) Because of the way CCK works with multiple value fields and shared fields, it is possible for the database structure to change out from under me. If I write my own queries, those queries will then die a horrible death. CCK, however, will transparently update the info it feeds to Views and Views will therefore automagically work with the new table structure.

3) Perhaps the least appreciated part of Views, especially in Views 2, is its many-layered theming. The way different plugin configurations can let you dip into the query results at any level to override things locally or globally, or swap out entire rendering engines, is simply phenominal. Again, it goes back to non-programmers being able to do work that you say you don't need Views to do because you're a programmer. That's well and good, but if my theming team can do 7/10 of the work building a site without having to talk to me about code, that means that I as a programmer have done my job properly. It also means I can focus on the cool stuff, rather than just writing simple queries all day.

You can have my CCK+Views when you pry them from my cold dead hands. :-)


Bram's picture

#5 Bram : Re that "hybernate" book

Re that "hybernate" book cover..Let me put it gently: garbage in, garbage out. If you use hibernate like a retard, then yes, you'll end up with monsters of queries fetching all sorts of crap from your database that no sane person would ever need. If, however, you use your brain and tell hibernate to fetch only what you need, then it will gladly comply. Of course, this functionality is somewhat poorly document and actually requires more than a 5 minute google search followed by a quick copy/paste.

Here's a gentle hint:
"SELECT new foo.bar.MyClass(foo.foo, bar.bar FROM foo LEFT JOIN bar.property bar.WHERE blablabla)".
All you have to do is be willing to think about what exactly you need from which model and fetch it. Only a retard fetches the whole frigging database for no reason whatsoever.

Of course, it all starts with your hibernate model..if you screw that up with billions of eager-loaded collections and joins, then you deserve whatever crappy queries you get.

As for aliases...I think they're supposed to improve query performance on mysql, although do correct me if I'm wrong, brain's still a bit frozen after the long walk to work :-)


Mikkel Høgh's picture

#6 Mikkel Høgh : Hardly a fair comparison

As Earl says, this is hardly a fair comparison. I have seen a lot of Views-generated queries in my time, and none of them joined more tables or pulled in more fields than it had to be able to fulfill the requirements of the Views configuration.

You may prefer that 2-5% extra performance you gain by carefully hand-crafting every bit of SQL, but I'm just grateful that we have talented people like Earl that created powerful tools for us to use.


dalin's picture

#7 dalin : I have to agree with both

I have to agree with both Earl and Greg on this one. This post might be classified as FUD.


Caleb G's picture

#8 Caleb G : Worthwhile post

Thanks for this article - despite any imperfections it provides information that isn't going be obvious to everyone, and raises issues which are very interesting.

Personally, I too prefer to write my own queries/modules/content types whenever possible, but at the same time I value having Views and CCK available for any number of reasons (mostly having to do with *massive* time savings and/or the slew of other modules which hook/are-hooked by them).


Scott Reynolds's picture

#9 Scott Reynolds : I would be interested to see

I would be interested to see if you remove the DISTINCT flag from the your view if it prevents the temporary table. (And any further analysis you care to provide).

With most views I create I start with how I would write the query, compare it to how views wrote the query, and see what I can improve it (or in some cases why my first query was wrong) and why Views wrote it the way it did. I find that I learn a lot about how Views works and better ways to write my own queries.


greggles's picture

#10 greggles : caching and value of time

There are at least two good reasons to use views+cck.

1) If you are doing any caching (including the Drupal page caching) then the results of the query will be cached so it doesn't matter how bad the performance is as long as it can be done at least once.

2) There are two major sources of costs on a project: consulting time and server resources. Consulting time is usually 90% of the cost. Server resources something less than 10%, and other more minor things even less. I value my time quite highly (as do my clients) so if I can save a few hours using Views/CCK then it justifies any necessary spending on additional server resources...


merlinofchaos's picture

#11 merlinofchaos : Hmm.

While I agree with you that generated queries are by default less efficient than hand build queries, I find your example actually disingenuous and overemphasizing it.

Let's start with this:

FRON content_type_presentation, node

This is equivalent to doing a INNER JOIN, so you've used an abbreviation to make your query seem a little simpler than it is.

Also, the fact that the data is stored in separate tables, and not all together in one table is both not Views' fault, and in fact, it's YOUR fault because CCK only stores the data that way if you do something that requires a single field to be shared across node types, or if it has the 'multiple' option checked, since you can't have multiple values in a single table without doing things kind of strangely.

Finally, any field being fetched is needed by something you added to your view, for whatever reason. Saying you don't really need it may be quite wrong. Perhaps for reasons you don't understand, but not understanding something isn't a reason for it not to happen.

I will say that you have more fields in the generated query than you have in your query and I suspect it's because you have more fields in the view than you talk about.

node_data_field_link.field_link_url AS node_data_field_link_field_link_url,
node_data_field_link.field_link_title AS node_data_field_link_field_link_title,
node_data_field_link.field_link_attributes AS node_data_field_link_field_link_attributes,
node_data_field_link.nid AS node_data_field_link_nid

That table, in particular, isn't something you mention; but it's only in there because you added something to the view that wants that data.

Finally, using aliases is important for a variety of reasons, not the first of which is that Drupal's prefixing makes it important to refer to the aliases, but also because there are so many opportunities for collisions that automatic aliasing makes things work a lot smoother. Perhaps there are some issues with aliasing but enh. They're worth it.

You're perfectly welcome to build your own stuff without CCK or Views, but I feel your public post about it rather unfairly treats what it generates, especially when you blame Views for things that it doesn't deserve being blamed for.

Really, if you actually expected an open source freely available query generator to write something as efficiently as you would've written it without having to worry about infinite combinations, but only doing what you want it to do, you started from an incredibly foolish starting point and you pretty much set up the results. The fact that you published them publicly is just sad.