C++ considered Harmful (and why E is better)

30-12-2008, by Chris Handley

http://cshandley.co.uk/email

This UNFINISHED article is my attempt to comprehensively catalogue most C++'s main failings, since I haven't yet come across anything that does so to my satisfaction. While it is based upon several years of thought on the subject, I may have made some mistakes, since I tend to steer clear of actual use of C++ where-ever possible. So please let me know if I have erred anywhere :-) . You can get my email address from the web page given at the top.

With C++, there isn't really one single feature that makes it bad, but rather it is the case of death by a thousand cuts. So it is only by cataloguing C++'s many (sometimes small) problems that a proper explanation can be given.

For examples of how C++ could have done things better, I will tend to refer to AmigaE, which is my favourite programming language. But AmigaE has not been developed in 10 years, so it isn't fair to make this a direct comparison between C++ & AmigaE. In cases where AmigaE has been left behind, I may refer to PortablE (my re-implementation of AmigaE, which improves upon it) or Java.

To make replies & discussion easy, I have used a numbered chapter for each point.

Problems which can be categorised

Let's start with BS's (Bjarne Stroustrup's) overall design philosophy for C++, for which I will paraphrase him:

I believe that his choices are very bad for a *general-purpose* programming language that is intended to be used by *all levels* of experience. C++ may still be the best programming language in certain domains, or when only used by experts who know the many pitfalls & work-arounds. My alternative philosophy is:

As these may seem a bit abstract, and since I believe arguments should be made using specific examples, I will provide some examples and use them to explain why my philosophy is better than BS's:


CONTENTS


1. 'Where there is a choice, the language should default to the most efficient (fastest) case.'

Where-as I say 'Where there is a choice, the language should default to the safest (or otherwise most common) case. Extra effort should be required to make choices that are more likely to be wrong.' This seems to be accepted by many new languages that are gaining in popularity.

1.1. Virtual method calls

The obvious example for point 1 is OOP method calls (member function calls) for an object/class. C++ defaults to statically-dispatched method calls (which are decided at compile-time, based upon the object's static type), and you must use the "virtual" keyword before each method declaration to indicate that it should be a dynamically-dispatched method call (which is decided at run-time by the objects actual type). Static dispatch is slightly faster, because there isn't an additional level of pointer indirection through the object's vtable - but on most modern CPUs the difference should small, and in fact negligible for all but a few inner loops.

As OOP allows (in fact encourages) polymorphism, where child objects can be used when their parents are expected, a non-virtual method call will cause the code associated with the parent (and not the child) to be called - and oh dear, we probably have a bug (because the child object is now being treated as if it was a parent object, which probably violates some new invariant that the child enforces). Of course, if the programmer knows that this case will never happen, then static dispatch is faster (at least by a little bit).

My argument here is that BS wrongly assumes that all programmers will know at all times when virtual calls are unneeded - often it is *not* clear, and may require a deep understanding of the system and/or how the object may be used in the future. Certainly for beginners this will cause many problems, and even if they are taught to write "virtual" before every method declaration, that is a significant extra effort that most programmers will try to avoid will try to avoid for "efficiency" reasons - and then make mistakes. And worse, he does this for what is a negligible speed-up in most cases.

What he should have done is make virtual methods the default, as this works correctly in all situations, and is only marginally slower. Ideally optimisations (such as adding a "nonvirtual" keyword) should be left until after the program is working. Certainly it would be an extra effort to make the mistake of making it non-virtual. And even better, the default ensures that objects/classes can be extended (via polymorphism) without problem, rather than relying on the foresight of the designer to see that this might be useful (duh! In most cases this is true). Neither AmigaE nor Java even offer the option of non-virtual methods, simply because they are very rarely a good idea, and in most cases they do not offer a significant speed-up.

Also, Java allows classes to be declared "final", which means that it's method calls will automatically be statically-dispatched - but without the possibility of bugs. Yes, this means that ALL of the class's methods must be statically-dispatched, unlike C++, but the whole point is that making just a few methods statically-dispatched is very error prone, not least because you are assuming nobody will ever want to override certain specific methods. Thus you should only make a class final when there is absolutely no likelyhood of anyone wanting to override any of it's methods.

1.2. The infamous Switch statement

(This was moved to a later chapter, as it doesn't really relate to the current point.)

1.3. Virtual inheritance

C++ provides the option of virtual inheritance as a solution to the "Deadly Diamond of Death" (which I describe in the "Multiple inheritance" sub-chapter). To use it one must put the "virtual" keyword before the classes being inherited.

It's actual effect is to combine the implementation of several (related) classes, where their shared base class would otherwise be duplicated. Access to a member or method of any such combined implementation has an overhead, although this is apparently small on some modern compilers - just a few machine instructions.

In any case, BS opted to avoid virtual inheritance unless it was explicitly requested, in line with his principle of defaulting to the most efficient case (however small that improvement may be). Again, I believe that was a mistake, albiet a less serious one than for virtual methods:

If you really must have multiple inheritance (see the next chapter), the general case is that you *do* want virtual inheritance when the "Deadly Diamond of Death" occurs, since otherwise you get ambiguity over which duplicated base class should be used, so it should default to virtual inheritance. And as it happens, virtual inheritance isn't much of an overhead anyway.


Go back to CONTENTS


2. 'Where there is a choice, the language should give the programmer all the options, even if it's possible for him to make a very bad choice.'

Where-as I say 'Where there is a choice, the language should not provide options which are rarely useful but easily misunderstood or misused.'

2.1. Virtual method calls

Example 1.1 fits the bill here too - it is arguable that non-virtual methods are of so little benefit most of the time, while easily causing many problems unless you know exactly what you are doing, that they shouldn't even be provided (as AmigaE does not). But this case isn't as clear-cut as for point 1, so I won't make a big thing of it, and would prefer if other examples are used to argue point 2.

Java has a smarter alternative to non-virtual methods - declaring that a class is final (and so can't have any polymorphic children), so that all it's methods are automatically non-virtual. I will probably add this to PortablE.

2.2. Multiple inheritance

Multiple inheritance is a fairly good example of BS giving the programmer enough rope to thoroughly hang himself. I will admit that multiple inheritance can be useful on a few occasions, but it is likely to get used far more often than is needed (there are alternatives) - and it is quite easy to get the famous "Deadly Diamond of Death" where class A inherits classes B & C, both of which inherit class D, resulting in two copies of D being inherited in one object, and thus ambiguity over which copy should be used for any D method calls (or member accesses) on the object or if the object is used polymorphically where D is expected.

C++ actually compounds the problem by providing an *optional* solution (virtual inheritance), which thus multiplies the possible complexities (and misuse) even further.


Even if there was no alternative to multiple inheritance, it might be best to simply leave such a dangerous feature out of the language altogether, but as it is Java has demonstrated one solution - that of interface inheritance, so that only method interfaces are multiply-inherited and not the implementations themselves (thus the implementation must be done each time an interface is inherited). I'm not entirely happy with Java's version of interfaces, but it's still a hell of a lot safer than multiple inheritance, while providing most of the benefits. I hope to add interface inheritance to PortablE...

A less flexible alternative is to simply avoid inheritance altogether (which is over-used & over-rated anyway), and use the mechanism of composition instead (where objects are members rather than parents). A fancy version of composition is delegation, where the method call of an object just makes a similar call to an object member - and combining this with Java-style inheritance gives most of the benefits of multiple-inheritance (and none of the problems), at the cost of some additional run-time overhead (which can be optimised away by the compiler anyway if the class is final).

In summary, multiple inheritance is far too easily misused, while most of it's benefits can be provided in far safer ways.

2.3. Function overloading

Function overloading is a nice idea in principle, but it's just so damn easy to overuse (resulting in confusion as to which procedure is being called), and to misuse (so that a new overloaded procedure actually replaces calls to an old procedure, by accident). Overloading does not even provide any new functionality, so it can be classed as "syntatic sugar". And in some cases overloading can be replaced by default parameters. As such, I feel that the potential for overuse & misuse greatly outweight the fairly small benefit. Neither AmigaE nor Java provide overloading.

2.4. Casting

C++ provides 4 kinds of casting - (i) the original C kind, (ii) dynamic_cast, (iii) reinterpret_cast, and (iv) static_cast. I think that BS took his philosophy to the extreme here, I mean, do we really need four kinds of casts? Talk about trying to confuse the programmer! Even if we exclude the original C casts, which are pretty similar to reinterpret casts, that's still 3 kinds of casts.

I think that a strong case can be made that having just two kinds of casts (i.e. static + dynamic) is more than sufficient for the majority of cases; so either static_cast or reinterpret_cast should be removed, depending on the language's aims.

AmigaE doesn't really provide any kind of casting, since it is (almost) a typeless language. PortablE is a properly typed language, in line with C++ & Java. Java only does dynamic casts, which seems reasonable for a slightly slow language that is designed to be "safe".

2.5. Object passing semantics

C++ provides 3 ways to pass objects - (i) pass-by-pointer, (ii) pass-by-copy, and (iii) pass-by-reference. While C already provided the first two, BS felt it necessary to add a third, rather than removing pass-by-copy as I think he should have done (and as Java did). As with casts, this is providing too many ways to do essentially the same thing. But I will go into more detailed arguments about this in 3.1...

Go back to CONTENTS


3. 'Where there are multiple programming paradigmes, the language should offer as many of them as possible.'

Where-as I say 'Where there are multiple programming paradigmes, the language should only offer the one that is *usually* best, to avoid interfaces becoming a mish-mash of semi-incompatible paradigmes. This also reduces the search space that a programmer needs to consider for the best solution, which means that he is more likely to find a good solution. Multiple paradigmes should only be provided when there is truely no "one good paradigm for most cases".'

3.1. Object passing semantics

Take example 2.5, where there are 3 ways to pass objects. Languages like BASIC traditionally only supported pass-by-copy, which is easy to understand & use, but it put limits of what you can achieve, and is quite inefficient in general. Languages like assembler only really support pass-by-pointer. For some reason C's designers decided to support both kinds, which means that "pointer semantics" and "value/copy semantics" are mixed within the same language, resulting in what can only be described as "messy semantics":

Have a statically allocated object, but need to pass it to a procedure which expects a pointer? You have to convert it using the lovely & (address-of) operator. Got a pointer to an object, but need to pass it to a procedure which expects an object? What luck, we have to convert it again, this time with the * (dereference) operator! The end result is that in any C expression, there is a good chance that you will need to carefully think whether or not you have a pointer or value, and whether or not a pointer or a value is expected . And of course you end up with a smattering of * and & all over the place. C++ makes things even worse, by providing object references, which are kind of half way between pointers and static objects, thus confusing things even more, not to mention complicating the syntax further & adding yet more rules to remember.

These mixed semantics makes learning to use pointers far harder with C/C++, than with a language like AmigaE or Java that only uses pointer semantics (which are more technically known as "reference semantics", but I don't want you to confuse that with C++'s "references"). And the sad fact is that most C programmers don't even realise the unnecessary hoops that they are being made to jump through, so I better explain the difference:

Taking my example from two paragraphs ago, AmigaE only allows you to declare pointers to objects for parameters, so you don't have to worry about whether you need to pass a pointer to an object or a static object - you know it must be a pointer. But doesn't that mean you can't pass a static object where a pointer is expected? No! Since static objects are still pointers, just to something stored on the stack rather than the heap, passing a static object automatically passes it's pointer - with no & in sight! And of course, no * either. What do you loose? Only the inefficient & potentially incorrect pass-by-copy ability for objects, and really this is no great loss - plus it can be compensated for by a clone() method, which is more flexible anyway.

*WORSE STILL*, because C++ broken type system treats pointers & values (i.e. non-pointers) as fundamentally different entities, due to the mixing of pointer & value semantics, it is impossible to write a general function or class that handles both pointers & non-pointers. So even the simple idea of a general-purpose linked-list (which can handle both numbers and pointers to objects) is impossible! This then means that templates cannot handle both either, so that (as far as I can tell) it is impossible to have a templated linked-list which can be specialised for both numbers and pointers. Maybe some serious template voodoo can work-around that problem, but it simply shouldn't be that hard. While AmigaE doesn't have a problem, because it doesn't do any type checking, PortablE is type-checked but still allows this by virtue of the fact that it's VALUE type is inherited by both pointers & numbers.

3.2. Overloading & default parameters

C++ provides both overloading & default parameters, even though overloading does not mix well with default parameters. This happens because default parameters effectively create a collection of overloaded procedures (one for each default) that have varying numbers of parameters - and these procedures may conflict with real overloading of the procedure. While the compiler will always warn about any such ambiguity, it may be too late to easily solve because some of the conflicting functions could be widely used. Sure, the problem is quite easy to understand, but that doesn't stop it from being quite likely to happen, simply because default parameters (effectively) create so many overloaded procedures, that you have a very high chance of a confict when overloading is actually used.

The simple solution is to provide *either* overloading or default parameters, *but not both*. There are good arguments for choosing either one, but I favour default parameters because of the problems with overloading outlined in 2.3 - and AmigaE just happens to do this too.

Go back to CONTENTS


4. 'The language should be concise, to reduce the amount of typing necessary.'

Where-as I say 'The language should be as clear & obvious as possible, to avoid the possibility of confusion, and to make it easier to learn. The best case is that the language would mirror English sentences, and at the very least it should obey a left-to-right order of interpretation.'

This is somewhat subjective, but I still honestly believe that "The language should be as clear & obvious as possible, to avoid the possibility of confusion, and to make it easier to learn." My main reason for placing this above typing time is that programmers typically spend rather more time reading & thinking about their code, than they do writing it, so optimising for typing time is a false economy. Making the language closer to the English language also reduces learning time - while it may seem obvious now that && means "logical And", I can assure that it wasn't when you first learnt the language!

4.1. Symbols not keywords

C/C++ is obsessed with using symbols instead of words, and then re-using the same symbols in different contexts for completely different meanings. The infamous {curley brackets} are the most obvious case, but perhaps not the most damning. Why is it necessary that * is used to mean "dereference this pointer", "this is a pointer", and "multiply", while & is used to mean "get the address-of", "this is a reference" and "bitwise And"? And then && is used to mean "logical And"! (Yes there is some vague relationship between the first two meanings of each pair, but it's pretty tenuous, and most importantly is completely unnecessary.) Then we have << (and >>) overloaded to mean "shift left" and "output" (or is it "input"? I can never remember...).

It's like the designer of C had an allergy to keywords, and BS caught that allergy too - why else would someone want to write && instead of AND, or || (which I couldn't even find on my Apple keyboard!) instead of OR? Actually, this has to be true, because as Ian Joyner says: "C++ saves on keywords by overloading one keyword in several contexts".

Getting back to those curley brackets, is this C++ statement:

  while (foo) {
     bar;
  }

really better than this AmigaE statement:

  WHILE foo
     bar
  ENDWHILE

The latter needs absolutely no symbols, is more readable, and only needs 3 extra characters! Plus it's easier to find a matching ENDWHILE in nested loops, rather than a matching } . "Ah," you say, "but it needs three lines, even if the body can fit on one line. Curley brackets are more flexible!":

  while (foo) {bar;}

No, sorry, AmigaE has a matching equivalent that is actually 2 characters *smaller* than the equivalent C++ code:

  WHILE foo DO bar

And I'm not even counting the extra typing needed for every trailing semi-colon, which must really add-up. As far as I can see, the only benefit of using symbols is to make programming look harder than it really is, to keep too many people from learning to program, thus helping to keep programmer salaries high... (Not to mention that all those C++ gotchas really help pay the bills.)

Go back to CONTENTS


5. Problems which have NOT been categorised

Having finished poking holes in BS's design philosophy, I'll now list some other problems with C++ that I haven't yet had time to classify properly:

5.1. Keyword overloading

I stole this criticism from Ian Joyner's war-and-peace-sized criticism of C++: "C++ saves on keywords by overloading one keyword in several contexts, even though the uses have different or even opposite meanings. Static is another case, which is used in three different contexts. The keyword count metric does not show that C++ is a small non-complex language: less keywords have made C++ more complex and confusing."

I will add that "virtual" has two different uses, but that they are similar enough to cause quite a bit of confusion - and contrary to popular belief, "virtual inheritance" has absolutely nothing to do with "virtual methods"!

AmigaE uses new keywords where it makes sense, but this doesn't clash with the namespace of variables/etc, because E's keywords are always uppercase, while variables/etc are not allowed to be. Now yes, AmigaE requires that constants are uppercase too (roughly speaking), and in AmigaE this means that constants cannot be keywords, *but* this is only because of AmigaE's primitive implementation - PortablE works perfectly fine, despite reserving only about 6 constants for keywords.

5.2. Type system

C++ inherits a truely abysmal type system from C, and if anything makes it worse. But let's focus on C's contribution: When talking about primitive types, there isn't really a type system at all, it's more a set of ad-hoc rules for automatically converting between them (aka "promotion"), and this happens to approximate a proper type system (with inheritance between types) if you squint *really* hard:

As a general rule smaller types are promoted to larger types, but anything smaller than an "int" (which includes "short") effectively does not exist, because C++ always promotes it to an "int" when operated upon. Add two bytes together? You got an int! Bitwise AND two bools together? You got an int! This pretty-much renders any primitive (numerical) type-checking as non-existant, unless you use floating point maths (which is pretty rare) or store the result of logical boolean operations (which is not used too often).

While "short" is guaranteed to be 16-bits (or similar), and "long" is guaranteed to be 32-bits (or similar), "int" is a vague pseudo-type that can be anywhere between "short" & "long" depending on the CPU & the compiler. But on modern CPUs an "int" is typically as large as a "long".

Given the previous two paragraphs, you might as well use "long" for all your numerical types, whatever they hold, and not keep any illusions that the C/C++ compiler might actually prevent you trying to pass a larger value than is allowed by a function in many (if not most) situations. (I need to check this, because I may not be entirely correct. So please do not treat it as fact.)


And then there are the types themselves: "char" is actually a byte, so that there isn't actually an abstract OS-portable type for a character. And even if the OS uses 16-bit (Unicode) characters, C's immediate characters (like 'c') are still 8-bits! Basically C++ is stuck in an ASCII time-warp, where one char is one byte, even when it isn't. So what happens if you do want to use Unicode? You must replace every reference to the "char" type with the wonderfully named "wchar_t" type, and then carefully prefix *every instance* of an immediate string/character with an "L". I think the long (and still incomplete) switch to Unicode can mostly be blamed on the prevelant use of the C++ language.

Now OK, AmigaE isn't really any better, because it's CHAR type is used to represent a byte, and it has no type-checking to speak of. But PortablE does, and it's *way* better most of the time - not only does it have a BYTE type, and CHAR is really a character, but smaller types inherit larger types, so that it can actually do type checking on numerical stuff. And if you combine two BOOLs, you still get a BOOL - genius huh? But if you don't want your parameters to be type-checked, then simply don't specify any type, so that PortablE will simply use the largest value type (called VALUE).

5.3. The infamous Switch statement

Although BS inherited this from C, he could have easily provided a safer alternative (as he did for casts). The basic problem is that it is easy to forget to put a "break;" statement at the end of each case statement, and forgetting it means that the code in the next case statement gets executed too - ooops!

Of course, sometimes this behaviour is desirable (so that several cases can use the same code), but it is unfortunate that it is the default! A "nonbreak" (or perhaps "continue") keyword should have caused this behaviour instead. Or an even better solution would have been the one used by AmigaE's SELECT statement, which has the added advantage of being easily understood by the compiler:

  SELECT 256 OF character
  CASE "a" TO "z"      ; PrintF('This is a letter.\n')
  CASE "0" TO "9", "." ; PrintF('This is a number or decimal point.\n')
  DEFAULT              ; PrintF('Unknown\n')
  ENDSELECT

Here I hope you can understand what is happening - we have explicitly told the compiler which cases should be handled by what code, so that there is no possibility of a mistake. Even better, unlike C++ compilers which must analyse all the variables & cases involved to try to infer whether a jump table would be suitable (and then construct one without making a mistake), the programmer has explicitly said that the cases fall within specific range (that would be used for the jump table's size), plus the relevant cases have already been explicitly stated, so it has no need to deduce them for a jump table. And it's easier & safer to boot!

5.4. Bad operators

The "::" operator has two problems. First is that it is used to call the inherited super class by name, which is fine *until* you want to restructure your class hierachy - then you must manually search & replace all instances. AmigaE & Java get this right, by simply saying that the SUPER class should be called, but not actually naming it.


The second (lesser) problem is that the class's name is put before the variable, for example "foo.bar.class::meef". This breaks the natural left-to-right flow for reading an expression (or sentence), because: After reading "foo.bar" there is a "." telling us to expect a member access, but inexplicably this is followed by the class we are casting "foo.bar" to, yet we do not find out this is a cast until reading the "::" after the class name!

Contrast this with the equivalent AmigaE expression "foo.bar::class.meef", where after "foo.bar" we reach "::class" which says to treat (or cast) the preceeding "foo.bar" as being of type "class", and then continue with ".meef" on the object which is being treated as of type "class". In this case there is no break in the left-to-right flow in the expression.


Maybe I'll add more operators later, but that was really the worst operator choice.

5.5. Operator overloading

### to do ###

5.6. Templates

Time for something a little more controversial :-)

### to do ### (Generics vs templates, but Java's Generics kind of suck. Inheritance vs templates, when you can use boxing.)


5.7. Other criticisms of C++

And if all that wasn't enough, here are some rather more amusing (but mostly so very true) takes on why C++ sucks so much:

http://www.mischel.com/rants/cpprant.htm

"If you haven't read Scott Meyers's Effective C++, you should. In fact, if you're writing C++ programs and haven't yet read the book, you're committing malpractice. Drop what you're doing, go find a copy, and read it. .... If you're still writing C++ programs, you're committing malpractice with malice aforethought."


http://xenomachina.com/2004/05/c-rant.html

"Another major frustration is the way C++ creates default copy constructors and assignment operators which are almost always incorrect. (they're virtually guaranteed to be incorrect any time you've got member variables which are pointers)"

"I think a lot of this might stem from Stroustrup not thinking very clearly about the problems involved. ... I used to think, despite the problems with C++, that Stroustrup must be a pretty smart guy. Then I went to a talk of his"


http://etymon.blogspot.com/2004/07/impressive-rant.html
http://groups.google.com/group/comp.lang.lisp/msg/917737b7cc8510e3

"I don't claim to be typical. for more than a year after first exposed to C++ in a six-month project, I was actively considering another career because C++ was becoming the required language and I couldn't stomach the monumental waste of human resources that this language requires" (I think this guy hates C++ more than me, and that takes some doing!)

"I actually think C++ is ideal only for programmers without any ethics. ... one can only imagine what having to lie to your compiler _every_ day and to make unwarranted guesses"

"the 'improved' strong typing is designed to cause errors if you make simple mistakes, forcing you to use a lot more of your brain on remembering silly stuff, but then you get to templates and all hell breaks loose because suddenly you have _no_control_at_all_ over where things fly."

"people who come from the C++ camp are positively obsessed with protecting their data ... are they paranoid? I think [they] have _become_ paranoid because C++ makes you go nuts if you don't -- it's a natural, psychological defense mechanism against criminally bad language design: the flip side of having to feign certainty about your guesses is that you lose confidence in your real certainty, too."


http://www.phy.duke.edu/resources/computing/brahma/Resources/c++_interview/c++_interview.html

"Interviewer: Yes, but C++ is basically a sound language.

Stroustrup: You really believe that, don't you? Have you ever sat down and worked on a C++ project? Here's what happens: First, I've put in enough pitfalls to make sure that only the most trivial projects will work first time. Take operator overloading. At the end of the project, almost every module has it, ... The same operator then means something totally different in every module. Try pulling that lot together, when you have a hundred or so modules."

(note: This is most likely a spoof, but it's still makes some good points, in particular about programmer pay.)



And if you really have time to spare (I haven't yet;), then Ian Joyner wrote a truely encyclopedic critique of C++, even though it was only occasionally insightful from what I saw:

http://burks.brighton.ac.uk/burks/pcinfo/progdocs/cppcrit/