Book: "Pentaho Kettle Solutions", Matt Casters, Roland Bouman, & Jos van Dongen, Wiley 2010 Book: "Pentaho Solutions", Roland Bouman & Jos van Dongen, Wiley 2009

Thursday, March 29, 2007

Window WTF: This Device Driver Software is Not Digital

I'm currently enjoying a 10Mbps glassfibre internet connection from the Dutch Internet Service Provider XMS.

They have a contract with the company from which we rent our house in Leiden. Basically, the deal is that we can try this internet service free of charge (! YEP) for 6 months. After that we can decide to subscribe to a package deal that includes television, radio, phone and internet and costs 50 euro/month (which is way cheaper than the sum of all the separate services we are paying for now BTW).

The only downside from switching is that I have had to move my wireless router near the modem, and that I had to go out and buy some hardware for the family desktop to enable wireless internet. I settled for a Sweex USB wireless adapter, because the distances are so small that I don't need to worry much about the USB stick not having an antenna, and I figured an USB stick is more flexible.

To my surprise, the Windows XP desktop desperately wanted me to install the drivers myself. So, I'm not scared of that...or only a little....Anyway, installing the drivers led to new complaints as you can see:

niet-digitaal2

See, that's why I had to install the drivers manually. The encircled text reads "This device driver software is not digital". So, that's why I had to install htem myself. You see, somehow, windows does make sense after all ;)

Tuesday, March 13, 2007

So, are Database Stored Procedures Good or Bad?

A little less than a week ago, I opened a zoomerang survey about database stored procedures. In this post, I'm presenting the results.

In just a few days, the survey was visited 232 times, and no less than 155 people completed the survey! (There was a very small (2) number of people that answered some, but not all the items, and these are excluded from the results proper.)

I didn't really know what to expect, but I did not expect as much as this! So, I'm quite pleased, and I want to thank everybody that took the time to complete the survey. Thank You very, very much, I appreciate your efforts a lot!

Before we Begin


Before I present the results, I want to explain a few things about the survey.

First of all, I want to stress that the results I am about to present are not to be generalized. There is no way to discover to what extent the responses are representative for a larger group.

To put things in perspective, one should realize that I actively solicited response by posting announcements on my own blog, the O'Reilly Database blog and the MySQL stored procedure forum. In addition, I posted to the PlanetMySQL Google Group.

If you want to use the results somewhere, feel free but be sure to link back here so people can judge for themselves what the status is of the results. Thank you.

Secondly, why do I need this survey? Isn't it clear when and where to use stored procedures?

Well, I have the impression that there are quite a few myths concerning stored procedures in general. Stored procedures seem to be the kind of topic that deeply divides database professionals in two camps. The discussion whether stored procedures are "Good" or "Bad" seems to have revived ever since MySQL started supporting them.

I happen to believe that stored procedures are Good in some cases, but Bad in others. I also think that in most cases, it is possible to deduce whether one should use a stored procedure - and when not. I've been wanting to write an article about this for some time now. However, there are already a lot of articles about it, and I decided it would be fun to link it to an inquiry so it would be easier for people to relate to it.

(The article is going to have to wait until next time - I'm just presenting the results of the survey -your opinion- today)

Now - without further ado - the results!

Performance


The first item in the survey was:
Certain application tasks can be implemented either as a server-side stored procedure or using client-side application code and 'pure' SQL statements. What is your opinion on the difference in performance between these two approaches?
Three alternative answers were offered:
  • Server side stored procedures are faster than client side application code and pure SQL statements (72%; 112)
  • Server side stored procedures are slower than client side application code and pure SQL statements (8%; 12)
  • I have a different opinion (20%; 31)
Here's a pie chart of the results:

Database Stored Procedures - Good or Bad-RawData_html_m368298f8

I must say, this one really surprised me. There seems to be quite a large group of people that have the opinion that procedures outperform normal, plain SQL statements. The people that think the opposite are in the smallest group. There is quite a substantial group that did not want to choose a particular stand and chose to formulate a different opinion.

In the group that chose "I have a different opinion", the vast majority (25 respondents, 16% of the total) provided an answer to the effect that it all depends on the specific situation. In some cases, a reason was given, usually to indicate that stored procedures could help to reduce network roundtrips.

One respondent had a very original and interesting answer:
Maybe faster but a severe CPU scalability risk.
If I understand correctly, what is meant is that although executing an individual stored procedure maybe faster than executing the equivalent application code, stored procedures can have an adverse effect on scaling up the solution.

Because stored procedures reside in the database, and are executed on the database server, the application will be limited by the processing power of the database server. This can be remedied by physically separating the processing tasks from the database tasks. This allows one to grow the application layer more independently from the data layer - something which will become more and more important as the number of application users grows.

Two persons gave a response that was very MySQL specific. Here are the responses:
Neither. Depends with MySQL on num of things...
and
*MySQL* SP are slower than normal SQL
I find the latter comment interesting because MySQL stored procedures are indeed different from, say, Oracle or MS SQL procedures in this respect. MySQL compiles stored procedures very differently than Oracle or MS SQL does. I'm planning to illustrate this in more detail next time.

A minority (2 respondents) just didn't have a specific opinion, and 1 person argued that it didn't matter at all, and that stored procedure performance would be about the same as that of application code running ordinary SQL statements.

Portabilty

The second item in the survey:
What is your opinion on the effect that stored procedures have for the portability of database applications?
Again, three alternative answers were offered:
  • Using stored procedures increases the portability of database applications (21%; 33)
  • Using stored procedures decreases the portability of database applications (68%; 105)
  • I have a different opinion (11%; 17)
And the pie chart of the results:

Database Stored Procedures - Good or Bad-RawData_html_m7d5cfeeb

There's a large majority that thinks stored procedures make applications less portable. The second largest group thinks the opposite. The group of people that have another opinion is now much smaller as compared to the previous item, but still substantial.

The largest (7) group of people with another opinion say something to the effect of "It depends". I was a bit surprised that only two people answered:
Portability of what - application or database?
Three individuals answered that portability is indeed hampered, but they argued that portability just isn't an issue at all, because they don't port. (Presumably, they mean they don't ever port the database)

Appropriateness

The third item tried to probe how people felt about database stored procedures in general. Given that databases are supposed to offer an implementation of the relational model, some people argue that stored procedures are actually completey alien and should not be a part of them. So my third item was:
Which of the following statements best matches your own opinion regarding stored procedure support in relational database products?
Again, three alternatives:

  • A stored procedure is a completely non-relational thing. Therefore, stored procedures do not belong inside a relational database management system. (12%; 19)

  • Stored procedures offer functionality that is complementary to the relational features of a relational database management system. Therefore, stored procedures are a useful feature for building database applications. (83%; 127)

  • I have a different opinion (5%; 7)

Here's a piechart of the results:

Database Stored Procedures - Good or Bad-RawData_html_42b9a6c1

So, this certainly is the least controversial item when looking at the numbers: a clear majority seems convinced that stored procedures are a useful tool for building applications. Although it is still considerable, it is the minority that feels stored prodedures are inappropriate (the survey does not grant extra points for passionate defense of the opinons ;).

In this case the group of people with different opinions was small, at least as compared to the other items. It was also not really possible to categorize these responses any further: everyone had their own reason for not choosing one of the other alternatives.

Programming Languages

The survey also inquired after the programming languages people use for their stored procedures and application. These item are a bit different as they ask about a fact rather than an opinion. Also, people could select multiple answers in addition to adding extra options.

First of all, I wanted to know which stored procedure languages are used. The item:
Which stored procedure language do you use regularly to build databases/database applications? Please check only those languages from which you use procedural constructs (rather than pure SQL constructs)
People could choose from these languages:

  • DB2 Stored Procedures - IBM DB2 (5%; 8)

  • java stored procedures - Oracle (5%; 8)

  • MySQL Stored Procedures - MySQL (45%; 65)

  • PL/pgSQL - Postgres (18%; 26)

  • PL/SQL - Oracle (36%; 52)

  • T-SQL - Sybase (7%; 10)

  • T-SQL - MS SQL Server (33%; 48)

  • .NET stored procedures - MS SQL Server (6%; 9)

  • Other, specify database vendor and language name (10%; 15)
Here's a bar chart, sorted by frequency:

Database Stored Procedures - Good or Bad-RawData_html_1d6d0594

So, most people that did the survey use MySQL. This is not very surprising as it is my usual topic (and I expect most respondents to be reader of my blog). Still, a considerable number of respondents use other languages (too); Oracle PL/SQL, and T-SQL (MS SQL) are really not that far behind. Beyond that, the numbers become considerably smaller, although PL/pgSQL (Postgres) and "Other" are still quite significant.

The "Other" category indicated two important omissions on my part: 6 respondents (4%) named "Firebird", "Interbase" or some combination of both. I feel I should've included that in the survey and I'm ashamed to say it did not cross my mind when I created the survey. Second to that, 5 respondents (3%) specified that they used no stored procedures at all, or at least not regularly. I should have made a separate category for that too I guess.

What is maybe one of the most interesting obervations is the relatively small but still substantial amount of general purpose languages in the stored procedure arena. Oracle java stored procedures (8 respondents; 5%) and MS SQL .NET stored procedures (9 respondents; 6%) show that these things are actually used. It would be very interesting to find out if these respondents used that in addition to or instead of traditional PL/SQL and T-SQL.

The last item in the survey was about application programming languages. The item:
Which computer languages do you use regularly for programming database applications?
The options were:

  • C (17%; 25)

  • C++ (15%; 23)

  • C# .NET (19%; 28)

  • COBOL (1%; 1)

  • Delphi (5%; 8)

  • Java (38%; 57)

  • Javascript (14%; 21)

  • LISP (0%; 0)

  • Perl (33%; 50)

  • Python (15%; 23)

  • PHP (53%; 80)

  • Smalltalk (1%; 2)

  • VB (9%; 13)

  • VB .NET (8%; 12)

  • VBScript (5%; 8)

  • .NET - other (1%; 1)

  • Other (15%; 23)
And the bar chart:

Database Stored Procedures - Good or Bad-RawData_html_15ca2d92

Clearly, PHP is the most popular language among the respondents, but Java and Perl are also quite strong. After that, there are quite some languages that certainly are used, but not by far as much as PHP, Java and Perl.

The "other" category showed quite a few recurring responses, again indicating I forgot to mention a few important ones. Ruby, Rails, and combinations of both were mentioned 13 times (8%) which is too substantial to ignore. PL/SQL was mentioned 4 times (3%) and it would be interesting to see if these respondents mean they use PL/SQL to create full blown end-user (web)applications (this is a feature I heavily used when I was working as application developer) or 'just' utilities, ETL and the like.

Summary



  • Most repondents think stored procedures are faster than pure SQL and application code

  • Most repondents think stored procedures make the application less portable

  • Most repondents think stored procedures are a useful addition to the database

  • Most repondents use MySQL, PL/SQL and/or MS SQL stored procedures

  • Most repondents use PHP, Java and/or Perl


Stuff that remains to be done


These are just the results. I'm planning to do a real analysis of the results to see if I can identify groups of respondents. For example, it seems obvious that respondents that use PHP and MySQL stored procedures are also the ones that think stored procedures are a good tool for application development, and that they are faster but less portable (it's a safe assumption, as these sum up all majorities ;). But how well do these groups really hang together? This an other things shortly on this blog.

Wednesday, March 07, 2007

Stored procedures - Good or Bad?

The topic of database stored procedures does not cease to divide the community of database application developers and DBAs. There are many articles around, expressing various opinions about this subject. But what do you think?

Please let me know and take this survey. The survey contains 5 multiple choice questions, and will take about 1 to 5 minutes to complete.

The results will be posted soon here on my blog at http://rpbouman.blogspot.com/. Thank you in advance!

(Disclaimer: I'm an employee of MySQL AB, but this survey does not in any way relate to my position with my employer. Also, I have made an utmost effort to write an unbiased survey. If you feel that something is wrong in this regard, please post a comment on this post.)

Thursday, March 01, 2007

MySQL Cluster Certification and Study Guide

About a week ago, I wrote about the upcoming Certification for MySQL Cluster and that you can take the exam at the MySQL Conference and Expo. I promised then to write some more about why we are offering this certification and also about the cluster certification study guide.

Cluster Certification, Why?


MySQL just released the MySQL Carrier Grade Edition, which is a special edition of MySQL Cluster that is especially targeted at the telecom market. A large number of people attended an online webinar about this edition, clearly showing that MySQL Cluster fills a gap in the marketplace. (Download webinars here)

In MySQL version 5.1, a number or important additions and improvements have been made to the cluster technology. MySQL Cluster is becoming increasingly popular, and new features such as disk-based storage and cluster master/slave replication are likely to increase its popularity even more.

Thanks to Jon Stephens, the MySQL Cluster documentation in the reference manual has radically improved and expanded, and even the cluster API as been documented.

All in all, the time is right for MySQL AB to introduce a MySQL Cluster certification product, in addition to developing a new training course for MySQL Cluster 5.1.

Who should be interested?


By offering a Cluster certification, we are providing an instrument for employers to recognize those professionals that are sufficiently equipped to install, design, configure, tune and monitor a MySQL 5.1 Cluster. Any MySQL DBA that has set up a MySQL cluster once, can testify that it takes more than just traditional MySQL DBA skills to manage a production MySQL Cluster.

Based on the judgments of our own MySQL Cluster developers, our experienced trainers as well as MySQL Cluster DBAs from the community, we feel that passing the cluster certification exam is a pretty solid indication that a DBA will be successful in managing a MySQL cluster. Failing the exam almost certainly means that the necessary knowledge and experience are lacking in certain areas, and that it will take more training, time and efforts to acquire these.

MySQL Cluster certification can be attained by MySQL 5.0 Certified DBAs that want to distinguish themselves from traditional DBAs. We require that candidates are already MySQL 5.0 Certified DBAs, because a lot of the skills required to manage MySQL Cluster necessitate deep knowledge and experience about MySQL database administration. Simply put, we believe that it is impossible to be a good MySQL Cluster database administrator without being a good MySQL database administrator. The exam would become too lengthy if we would have to cover the normal DBA topics such as backup and recovery, security and performance tuning too, and therefore we require proof that these skills have already been acquired.

The MySQL 5.1 Cluster Certification exam


I expect the exam to be generally available in March 2007. As soon as the details are known we will announce cluster certification in a more formal manner, but the price and the procedures for signing up and taking the exam will be pretty much as ususal.

Like all exams, the MySQL 5.1 Cluster Certification exam will cost 200 $US. Candidates will be able to sign up for a certification exam on the Pearson/VUE website, and take it in one of their testing centers. If you are coming to the MySQL Users conference, you can take the exam for 25 $US instead of 200 $US.

The exam itself consists of 70 multiple choice questions, which have to be answered within 90 minutes. An important difference with the prior exams is that the cluster exam is completely homogenous. Each question has exactly 4 answers, and only of those must be selected. So, no more "check all that apply", it's always: "pick-one-out-of-four".

For an outline of the exam contents, take a look at the relevant section of the MySQL Certification Candidate Guide

The MySQL 5.1 Cluster Certification Study Guide


I already mentioned a study guide is being prepared for the cluster certification. It is a great pleasure for me to introduce you to the authors:

  • Jon Stephens, technical writer for MySQL AB. Jon is also author of the MySQL Cluster documentation in the reference manual and the NDB API documentation.

  • Mike Kruckenberg, Co-Author of Pro MySQL and a professional Database Administrator

  • Solomon Chang, founding member of LAMPSIG of Los Angeles and a professional Database Administrator

  • Stewart Smith, Software Engineer for MySQL AB and part of the MySQl Cluster development team.


I'm really greatful for this mix of members from the community and employees from MySQL AB. It just could not've turned out any better! Note that both Stewart and Mike will also be speaking at the MySQL Users Conference and Expo: Mike will be talking about the MySQL information schema, and Stewart will be doing quite a number of sessions on MySQL Cluster.

Currently, the study guide is being reviewed by the external reviewers. We can't disclose all details concerning publication, pricing and availability yet, but to give you an idea, here are the contents at a glance:

  • Part I: Foreword and Introduction

    • Preface


  • Part II: Components and Processes

    • Cluster Concepts

    • Architecture and Organization

    • Cluster Processes

  • Part III: Usage

    • Software Configuration

    • Deployment

    • Cluster Management


  • Part IV: Internals

    • The NDB Storage Engine

    • Performance and Tuning


  • Part V: Data Safety

    • Cluster Backup and Recovery

    • Cluster External Replication

  • Part VI: Problems

    • Security Issues

    • Troubleshooting and Problem Solving

  • Part VII: Appendices

    • Errors

    • Schemas and Configurations used in Examples and Exercises

    • MySQL Cluster Glossary