Generics and Type Erasure in Java

· 3 min read · #java #generics #type-erasure

If you've ever written a generic data structure in Java, you've probably run into a line that looks completely reasonable and yet refuses to compile:

T[] data = new T[capacity];   // does not compile

It feels like it should work. You can write new String[capacity] and new int[capacity], so why not new T[capacity]? The answer is a single design decision Java made back in 2004, and once you understand it, a whole cluster of generic "gotchas" stops being mysterious. That decision is called type erasure. Let's work up to it.

The generic array problem

Say we're building a small generic container, the kind of thing you write once and use with any element type:

public class Box<T> {
  private T[] data;

  public Box(int capacity) {
    data = new T[capacity];   // we'd love to write this
  }
}

The compiler stops us cold on that new T[capacity]. The usual fix is to allocate an array of Object and cast it to T[]:

@SuppressWarnings("unchecked")
public Box(int capacity) {
  data = (T[]) new Object[capacity];
}

The cast is unchecked, the compiler can't verify it, so it warns us, and we suppress the warning because data is private and we control every value that goes into it. From the outside, the box behaves exactly as if it held a real T[].

This is the standard idiom. Java's own ArrayList does exactly this: it stores its elements in an Object[] and casts on the way out. But it raises an obvious question. Why can't we just write new T[capacity] and skip the dance?

Type erasure

Here's the rule that explains it. Type erasure means generic type parameters exist only at compile time. The compiler uses them to check your code, and then it throws them away before your program runs.

So when you write this:

Box<String> b = new Box<String>(10);

the compiler checks every use of b against String, and then erases the type argument. At runtime, what's left behaves as if you'd written:

Box b = new Box(10);

The angle brackets are gone. The running program has no idea that b was ever a "box of String." And that is precisely why new T[capacity] is illegal: building an array requires knowing its component type at runtime, and by the time the program runs, T has been erased. There's no concrete type left for the array creation to use, so the language disallows the syntax outright rather than let you create something unsound.

What else erasure explains

The array restriction is the one people hit first, but it's not alone. Several of Java's generic limitations are the same fact wearing different hats:

  • All instantiations share one class. Box<String> and Box<Integer> compile to the same bytecode. There is only one Box class at runtime, not one per type argument.
  • instanceof can't ask about a type parameter. You can't write x instanceof T, and you can't write b instanceof Box<String>. The type argument isn't there at runtime to test against, so the check is meaningless and the compiler rejects it.
  • You can't do new T() either, for the same reason new T[] fails: there's no concrete type to construct.

None of these are arbitrary rules someone added to make your life harder. They all fall out of one decision: the type argument is not around at runtime to act on.

Why Java erases

Other languages made different choices. C# keeps generic type information at runtime; C++ templates generate a separate concrete class for each type you use. So why did Java erase?

Because of history. Generics arrived in Java 5, in 2004, nearly a decade after the language shipped. By then there was already an enormous body of pre-generic code in the wild, code full of raw types like List and ArrayList with no angle brackets at all. The language designers wanted new generic code and old raw code to keep working together, in the same program, without a painful migration. Erasure made that possible: because Box<String> and a raw Box are the same type underneath, the new and the old interoperate seamlessly.

The limitations we walked through, the array problem, the missing runtime type, the broken instanceof, are the price Java paid for that backward compatibility. Whether it was the right call is a fun thing to argue about. But knowing that it was the call, and why, turns a pile of confusing compiler errors into a single idea you can reason about.

So the next time new T[capacity] won't compile, you'll know it isn't the compiler being difficult. It's just type erasure, doing exactly what it was designed to do.