Maximum number of values ​​in enum Part II

Part One, Theoretical  | Part two, practical



We continue to search for the maximum possible number of values ​​in the enumeration.
This time we will focus on the practical side of the issue and see how the IDE, the compiler and the JVM will respond to our achievements.

Content


  
  Javac Tools
  Extract method
  Dynamic Class-File Constants
    Sudden Difficulties
    Bright Future
  Unsafe
  Test
    Javac and Switch Performance
  Conclusion
  Additional Resources


Tools


Javac takes care of us: it cuts out characters that it doesn’t like from identifiers and forbids inheriting from it java.lang.Enum, so for experiments we need other tools.

We will test hypotheses using asmtools - assembler and disassembler for JVM, and generate class-files on an industrial scale - using the ASM library .

For simplicity of understanding, the essence of what is happening will be duplicated in a java-like pseudocode.


Javac


As a starting point, it is logical to take the best result, achievable without tricks, with the help of only one javac. Everything is simple here - we create the source file with the enumeration and add elements to it until javac refuses to compile it with the “code too large” curse.

Quite a long time, since Java 1.7, this number has been kept at the level of 2_746 elements. But somewhere after Java 11, there were changes in the algorithm for storing values ​​in the constant pool and the maximum number decreased to 2_743. Yes, yes, just because of changing the order of the elements in the pool of constants!

We will focus on the best of the values.


Extract method


Since one of the limiting factors is related to the size of the bytecode in the static initialization block, we will try to make the latter as easy as possible.

Recall how it looks on the example of the enumeration FizzBuzzfrom the first part. Comments provide appropriate assembly instructions.

static {}
static  {
    Fizz = new FizzBuzz("Fizz", 0);
    //  0: new           #2                  // class FizzBuzz
    //  3: dup
    //  4: ldc           #22                 // String Fizz
    //  6: iconst_0
    //  7: invokespecial #24                 // Method "<init>":(Ljava/lang/String;I)V
    // 10: putstatic     #25                 // Field Fizz:LFizzBuzz;
    Buzz = new FizzBuzz("Buzz", 1);
    // 13: new           #2                  // class FizzBuzz
    // 16: dup
    // 17: ldc           #28                 // String Buzz
    // 19: iconst_1
    // 20: invokespecial #24                 // Method "<init>":(Ljava/lang/String;I)V
    // 23: putstatic     #30                 // Field Buzz:LFizzBuzz;
    FizzBuzz = new FizzBuzz("FizzBuzz", 2);
    // 26: new           #2                  // class FizzBuzz
    // 29: dup
    // 30: ldc           #32                 // String FizzBuzz
    // 32: iconst_2
    // 33: invokespecial #24                 // Method "<init>":(Ljava/lang/String;I)V
    // 36: putstatic     #33                 // Field FizzBuzz:LFizzBuzz;

    $VALUES = new FizzBuzz[] {
    // 39: iconst_3
    // 40: anewarray     #2                  // class FizzBuzz
        Fizz, 
    // 43: dup
    // 44: iconst_0
    // 45: getstatic     #25                 // Field Fizz:LFizzBuzz;
    // 48: aastore
        Buzz, 
    // 49: dup
    // 50: iconst_1
    // 51: getstatic     #30                 // Field Buzz:LFizzBuzz;
    // 54: aastore
        FizzBuzz
    // 55: dup
    // 56: iconst_2
    // 57: getstatic     #33                 // Field FizzBuzz:LFizzBuzz;
    // 60: aastore
    };
    // 61: putstatic     #1                  // Field $VALUES:[LFizzBuzz;
    // 64: return
}


The first thing that comes to mind is to put the creation and filling of the array $VALUESinto a separate method.

$VALUES = createValues();

Developing this idea, the creation of instances of enumeration elements can be transferred to the same method:

static  {
    FizzBuzz[] localValues = createValues();

    int index = 0;
    Fizz = localValues[index++];
    Buzz = localValues[index++];
    FizzBuzz = localValues[index++];

    $VALUES = localValues;
}

private static FizzBuzz[] createValues() {
    return new FizzBuzz[] {
        new FizzBuzz("Fizz", 0), 
        new FizzBuzz("Buzz", 1), 
        new FizzBuzz("FizzBuzz", 2)
    };
}

Already better, but each capture of an array element and the subsequent index increment cost 6 bytes, which is too expensive for us. Put them out in a separate method.


private static int valueIndex;

static  {
    $VALUES = createValues();

    valueIndex = 0;
    Fizz = nextValue();
    Buzz = nextValue();
    FizzBuzz = nextValue();
}

private static FizzBuzz nextValue() {
    return $VALUES[valueIndex++];
}

It takes 11 bytes to initialize $VALUES, valueIndexand return from the static initialization block , and 65_524 more bytes remain to initialize the fields. Initialization of each field requires 6 bytes, which enables us to create an enumeration of 10_920 elements.

Almost four times growth compared to javac must definitely be celebrated by code generation!

Generator source code: ExtractMethodHugeEnumGenerator.java Generated
class example: ExtractMethodHugeEnum.class

Dynamic Class-File Constants


It's time to remember about JEP 309 and its mysterious dynamic constants .

The essence of innovation in a nutshell:

To already existing types supported by a pool of constants added another one CONSTANT_Dynamic. When loading a class, the type of such a constant is known, but its value is unknown. The first loading of a constant leads to a call to the bootstrap method specified in its declaration.

The result of this method becomes a constant value. There are no ways to change the value associated with an already initialized constant. Which is quite logical for a constant.

If you also thought of Singleton, then forget it immediately. The specification separately emphasizes that there is no guarantee of thread safety in this case, and the initialization method in multi-threaded code can be called more than once. It is only guaranteed that in the case of several calls to the bootstrap method for the same constant, the JVM will throw a coin and select one of the calculated values ​​for the role of the constant value, and the others will be sacrificed to the garbage collector.

Behaviorally, a CONSTANT_Dynamic constant is resolved by executing its bootstrap method on the following parameters:

  1. a local Lookup object,
  2. the String representing the name component of the constant,
  3. the Class representing the expected constant type, and
  4. any remaining bootstrap arguments.

As with invokedynamic, multiple threads can race to resolve, but a unique winner will be chosen and any other contending answers discarded.

To load values ​​from the pool of constants in the bytecode, commands are provided ldc, ldc_wand ldc2_w. Of interest to us is the first of them - ldc.

It, unlike the others, is able to load values ​​only from the first 255 slots of the constant pool, but it takes 1 byte less in bytecode. All this gives us savings of up to 255 bytes and an 255 + ((65_524 - (255 * 5)) / 6) = 10_963element in the enumeration. This time the growth is not so impressive, but it is still there.

Armed with this knowledge, let's get started.

In the static initialization block, instead of method calls, nextValue()we will now load the value of the dynamic constant. The value of the ordinalordinal index of the enumeration element will be passed explicitly, thus getting rid of the field valueIndex, the factory methodnextValue()and doubts about the thread safety of our implementation.

As a bootstrap method, we will use a special subtype of MethodHandle that imitates the behavior of an operator newin Java. The standard library provides a MethodHandles.Lookup :: findConstructor () method for obtaining such a method handle , but in our case, the JVM will take care of the construction of the necessary method handle.

To use the constructor of our enumeration as a bootstrap method, it will have to be slightly modified by changing the signature. The parameters required for the bootstrap method will be added to the traditional constructor of the name enumeration element and serial number:

private FizzBuzz(MethodHandles.Lookup lookup, String name, Class<?> enumClass, int ordinal) {
    super(name, ordinal);
}

In the form of pseudo-code, initialization will look like this:

static  {
    Fizz = JVM_ldc(FizzBuzz::new, "Fizz", 0);
    Buzz = JVM_ldc(FizzBuzz::new, "Buzz", 1);
    FizzBuzz = JVM_ldc(FizzBuzz::new, "FizzBuzz", 2);

    $VALUES = createValues();
}

In the example above, the instructions are ldcdesignated as method calls JVM_ldc(), in the bytecode in their place will be the corresponding JVM instructions.

Since now we have a separate constant for each element of the enumeration, the creation and filling of the array $VALUEScan also be implemented through a dynamic constant. The bootstrap method is very simple:

private static FizzBuzz[] createValues(MethodHandles.Lookup lookup, String name, Class<?> clazz, FizzBuzz... elements) {
    return elements;
}

All the trick in the list of static parameters for this dynamic constant, there we will list all the elements that we want to put in $VALUES:

BootstrapMethods:
  ...
  1: # 54 REF_invokeStatic FizzBuzz.createValues: (Ljava / lang / invoke / MethodHandles $ Lookup; Ljava / lang / String; Ljava / lang / Class; [LFizzBuzz;) [LFizzBuzz;
    Method arguments:
      # 1 # 0: Fizz: LFizzBuzz;
      # 2 # 0: Buzz: LFizzBuzz;
      # 3 # 0: FizzBuzz: LFizzBuzz;

The JVM spoils the array from these static parameters and passes it to our bootstrap method as a vararg parameter elements. The maximum number of static parameters is traditional 65_535, so it is guaranteed to be enough for all elements of the enumeration, no matter how many there are.

For transfers with a large number of elements, this change will reduce the size of the resulting class file, and in the case when, due to the large number of elements, the method createValues()had to be split into several parts, it also saves slots in the constant pool.
And in the end, it's just beautiful.


Sudden difficulties


Which we heroically overcome by generating classes manually.

High-level libraries provide a convenient interface in exchange for some restriction of freedom of action. The ASM library we use to generate class files is no exception. It does not provide mechanisms for directly controlling the contents of the pool of constants. This is usually not very important, but not in our case.

As you remember, we need the first 255 elements of the constant pool to save precious bytes in the static initialization block. When dynamic constants are added in a standard way, they will be located at random indices and mixed with other elements that are not so critical for us. This will prevent us from reaching the maximum.

Fragment of a pool of constants formed in the traditional way
Constant pool:
   # 1 = Utf8 FizzBuzz
   #2 = Class              #1             // FizzBuzz
   #3 = Utf8               java/lang/Enum
   #4 = Class              #3             // java/lang/Enum
   #5 = Utf8               $VALUES
   #6 = Utf8               [LFizzBuzz;
   #7 = Utf8               valueIndex
   #8 = Utf8               I
   #9 = Utf8               Fizz
  #10 = Utf8               LFizzBuzz;
  #11 = Utf8               Buzz
  #12 = Utf8               FizzBuzz
  #13 = Utf8               values
  #14 = Utf8               ()[LFizzBuzz;
  #15 = NameAndType        #5:#6          // $VALUES:[LFizzBuzz;
  #16 = Fieldref           #2.#15         // FizzBuzz.$VALUES:[LFizzBuzz;
  #17 = Class              #6             // "[LFizzBuzz;"
  #18 = Utf8               clone
  #19 = Utf8               ()Ljava/lang/Object;
  #20 = NameAndType        #18:#19        // clone:()Ljava/lang/Object;
  #21 = Methodref          #17.#20        // "[LFizzBuzz;".clone:()Ljava/lang/Object;
  ...
  #40 = NameAndType        #9:#10         // Fizz:LFizzBuzz;
  #41 = Dynamic            #0:#40         // #0:Fizz:LFizzBuzz;
  #42 = Fieldref           #2.#40         // FizzBuzz.Fizz:LFizzBuzz;
  #43 = NameAndType        #11:#10        // Buzz:LFizzBuzz;
  #44 = Dynamic            #0:#43         // #0:Buzz:LFizzBuzz;
  #45 = Fieldref           #2.#43         // FizzBuzz.Buzz:LFizzBuzz;
  #46 = NameAndType        #12:#10        // FizzBuzz:LFizzBuzz;
  #47 = Dynamic            #0:#46         // #0:FizzBuzz:LFizzBuzz;
  #48 = Fieldref           #2.#46         // FizzBuzz.FizzBuzz:LFizzBuzz;



Fortunately, there is a workaround - when creating a class, you can specify a sample class from which a pool of constants and an attribute with a description of bootstrap methods will be copied. Only now we have to generate it manually.

In fact, it is not as difficult as it might seem at first glance. The format of the class-file is quite simple and its manual generation is a somewhat tedious process, but not at all complicated.

The most important thing here is a clear plan. To enumerate from the COUNTelements we need:

  • COUNTtype records CONSTANT_Dynamic- our dynamic constants
  • COUNTtype records CONSTANT_NameAndType- pairs of links to the name of the enumeration element and its type. The type will be the same for everyone, this is the class type of our enumeration.
  • COUNTtype records CONSTANT_Utf8- directly the names of the enumeration elements
  • COUNTrecords of type CONSTANT_Integer- serial numbers of enumeration elements passed to the constructor as a parameter valueordinal
  • names of the current and base classes, attributes, method signatures and other boring implementation details. Those interested can look in the source code of the generator.

There are a lot of constituent elements in the pool of constants that refer to other elements of the pool by index, so all the indices we need should be calculated in advance, elementNamesis a list of the names of the elements of our enumeration:

int elementCount = elementNames.size();

int baseConDy = 1;
int baseNameAndType = baseConDy + elementCount;
int baseUtf8 = baseNameAndType + elementCount;
int baseInteger = baseUtf8 + elementCount;
int indexThisClass = baseInteger + elementCount;
int indexThisClassUtf8 = indexThisClass + 1;
int indexSuperClass = indexThisClassUtf8 + 1;
int indexSuperClassUtf8 = indexSuperClass + 1;
int indexBootstrapMethodsUtf8 = indexSuperClassUtf8 + 1;
int indexConDyDescriptorUtf8 = indexBootstrapMethodsUtf8 + 1;
int indexBootstrapMethodHandle = indexConDyDescriptorUtf8 + 1;
int indexBootstrapMethodRef = indexBootstrapMethodHandle + 1;
int indexBootstrapMethodNameAndType = indexBootstrapMethodRef + 1;
int indexBootstrapMethodName = indexBootstrapMethodNameAndType + 1;
int indexBootstrapMethodDescriptor = indexBootstrapMethodName + 1;

int constantPoolSize = indexBootstrapMethodDescriptor + 1;

After that, we begin to write.

At the beginning - the signature of the class file, the four bytes known to everyone 0xCA 0xFE 0xBA 0xBEand the file format version:

// Class file header
u4(CLASS_FILE_SIGNATURE);
u4(version);

Then - a pool of constants:

Pool of constants
// Constant pool
u2(constantPoolSize);

// N * CONSTANT_Dynamic
for (int i = 0; i < elementCount; i++) {
    u1u2u2(CONSTANT_Dynamic, i, baseNameAndType + i);
}

// N * CONSTANT_NameAndType
for (int i = 0; i < elementCount; i++) {
    u1u2u2(CONSTANT_NameAndType, baseUtf8 + i, indexConDyDescriptorUtf8);
}

// N * CONSTANT_Utf8
//noinspection ForLoopReplaceableByForEach
for (int i = 0; i < elementCount; i++) {
    u1(CONSTANT_Utf8);
    utf8(elementNames.get(i));
}

// N * CONSTANT_Integer
for (int i = 0; i < elementCount; i++) {
    u1(CONSTANT_Integer);
    u4(i);
}

// ThisClass
u1(CONSTANT_Class);
u2(indexThisClassUtf8);

// ThisClassUtf8
u1(CONSTANT_Utf8);
utf8(enumClassName);

// SuperClass
u1(CONSTANT_Class);
u2(indexSuperClassUtf8);

// SuperClassUtf8
u1(CONSTANT_Utf8);
utf8(JAVA_LANG_ENUM);

// BootstrapMethodsUtf8
u1(CONSTANT_Utf8);
utf8(ATTRIBUTE_NAME_BOOTSTRAP_METHODS);

// ConDyDescriptorUtf8
u1(CONSTANT_Utf8);
utf8(binaryEnumClassName);

// BootstrapMethodHandle
u1(CONSTANT_MethodHandle);
u1(REF_newInvokeSpecial);
u2(indexBootstrapMethodRef);

// BootstrapMethodRef
u1u2u2(CONSTANT_Methodref, indexThisClass, indexBootstrapMethodNameAndType);

// BootstrapMethodNameAndType
u1u2u2(CONSTANT_NameAndType, indexBootstrapMethodName, indexBootstrapMethodDescriptor);

// BootstrapMethodName
u1(CONSTANT_Utf8);
utf8(BOOTSTRAP_METHOD_NAME);

// BootstrapMethodDescriptor
u1(CONSTANT_Utf8);
utf8(BOOTSTRAP_METHOD_DESCRIPTOR);


After the pool constant talking about access modifiers and flags ( public, final, enun, and so on), the class name and its ancestor:

u2(access);
u2(indexThisClass);
u2(indexSuperClass);

The dummy class we generated will have no interfaces, no fields, no methods, but there will be one attribute with a description of bootstrap methods:

// Interfaces count
u2(0);
// Fields count
u2(0);
// Methods count
u2(0);
// Attributes count
u2(1);

And here is the body of the attribute itself:

// BootstrapMethods attribute
u2(indexBootstrapMethodsUtf8);
// BootstrapMethods attribute size
u4(2 /* num_bootstrap_methods */ + 6 * elementCount);
// Bootstrap method count
u2(elementCount);

for (int i = 0; i < elementCount; i++) {
    // bootstrap_method_ref
    u2(indexBootstrapMethodHandle);
    // num_bootstrap_arguments
    u2(1);
    // bootstrap_arguments[1]
    u2(baseInteger + i);
}

That's all, the class is formed. We take these bytes and create from them ClassReader:

private ClassReader getBootstrapClassReader(int version, int access, String enumClassName, List<String> elementNames) {
    byte[] bootstrapClassBytes = new ConDyBootstrapClassGenerator(
        version,
        access,
        enumClassName,
        elementNames
    )
    .generate();

    if (bootstrapClassBytes == null) {
        return null;
    } else {
        return new ClassReader(bootstrapClassBytes);
    }
}

It was not so difficult.

Generator Source Code: ConDyBootstrapClassGenerator.java

Bright future


We briefly digress from our listings:


public class DiscoverConstantValueAttribute {

    public static final String STRING = "Habrahabr, world!";

    public static final Object OBJECT = new Object();

}


In the static initialization block of this class, there will suddenly be only one write operation, in the field OBJECT:


static {
    OBJECT = new Object();
    //  0: new           #2                  // class java/lang/Object
    //  3: dup
    //  4: invokespecial #1                  // Method java/lang/Object."<init>":()V
    //  7: putstatic     #7                  // Field OBJECT:Ljava/lang/Object;
    // 10: return
}


But what about STRING?
The team will help shed light on this riddle javap -c -s -p -v DiscoverConstantValueAttribute.class, here is the fragment that interests us:


public static final java.lang.String STRING;
  descriptor: Ljava/lang/String;
  flags: (0x0019) ACC_PUBLIC, ACC_STATIC, ACC_FINAL
  ConstantValue: String Habrahabr, world!


The value of the static final field has moved from the initialization block to a separate attribute ConstantValue. Here's what they write about this attribute in JVMS11 §4.7.2 :

A ConstantValue attribute represents the value of a constant expression (JLS §15.28), and is used as follows:
  • If the ACC_STATIC flag in the access_flags item of the field_info structure is set, then the field represented by the field_info structure is assigned the value represented by its ConstantValue attribute as part of the initialization of the class or interface declaring the field (§5.5). This occurs prior to the invocation of the class or interface initialization method of that class or interface (§2.9.2).
  • Otherwise, the Java Virtual Machine must silently ignore the attribute.


If such an attribute occurs at the same time staticand final(although the latter is not explicitly spelled out here) in a field, then such a field is initialized with the value from this attribute. And this happens even before the static initialization method is called.

It would be tempting to use this attribute to initialize the elements of the enumeration, in our chapter before last there were just constants, albeit dynamic ones.

And we are not the first to think in this direction, there is a mention in JEP 309 ConstantValue. Unfortunately, this mention is in the Future work chapter:

Future work

Possible future extensions include:

...
  • Attaching dynamic constants to the ConstantValue attribute of static fields


In the meantime, we can only dream about the times when this feature will move from the “good to do” state to “ready”. Then the restrictions on the size of the code in the initialization block will lose their influence and the maximum number of elements in the enumeration will determine the limitations of the constant pool.

According to rough estimates, in this case we can hope for an 65 489 / 4 = 16_372element. Here 65_489is the number of unoccupied slots of the constant pool, 46 of the theoretically possible 65_535 went to overhead. 4- the number of slots required for the declaration of one field and the corresponding dynamic constant.

The exact number, of course, can be found out only after the release of the JDK version with support for this feature.


Unsafe


Our enemy is the linear growth of the initialization block with an increase in the number of enumeration elements. If we had found a way to curtail initialization in a loop, thereby removing the relationship between the number of elements in the enumeration and the size of the initialization block, we would make another breakthrough.

Unfortunately, none of the standard public APIs allows writing to static finalfields even inside a static initialization block. Neither Reflection nor VarHandles will help here. Our only hope is great and terrible sun.misc.Unsafe.

An unsafe execution of FizzBuzz might look something like this:

Unsafe FizzBuzz
import java.lang.reflect.Field;
import sun.misc.Unsafe;

public enum FizzBuzz {

    private static final FizzBuzz[] $VALUES;

    public static final FizzBuzz Fizz;
    public static final FizzBuzz Buzz;
    public static final FizzBuzz FizzBuzz;

    public static FizzBuzz[] values() {
        return (FizzBuzz[]) $VALUES.clone();
    }

    public static FizzBuzz valueOf(String name) {
        return (FizzBuzz) Enum.valueOf(FizzBuzz.class, name);
    }

    private FizzBuzz(String name, int ordinal) {
        super(name, ordinal);
    }

    private static FizzBuzz[] createValues() {
        return new FizzBuzz[] {
            Fizz,
            Buzz,
            FizzBuzz
        }
    }

    static  {
        Field unsafeField = Unsafe.class.getDeclaredField("theUnsafe");
        unsafeField.setAccessible(true);
        Unsafe unsafe = (Unsafe) unsafeField.get(null);

        String[] fieldNames = "Fizz,Buzz,FizzBuzz".split(",");
        for(int i = 0; i < fieldNames.length; i++) {
            String fieldName = fieldNames[i];
            Field field = FizzBuzz.class.getDeclaredField(fieldName);
            long fieldOffset = unsafe.staticFieldOffset(field);
            unsafe.putObject(FizzBuzz.class, fieldOffset, new FizzBuzz(fieldName, i));
        }

        $VALUES = createValues();
    }

}


This approach allows us to create an enumeration with approximately 21 thousand elements; for more, the capacity of the pool of constants is not enough.

The documentation on Enum :: ordinal () requires that its value matches the sequence number of the corresponding element in the enumeration declaration, so you have to explicitly store the list of field names in the correct order, thereby almost doubling the size of the class file.

public final int ordinal ()

Returns the ordinal of this enumeration constant (its position in its enum declaration, where the initial constant is assigned an ordinal of zero).

Here the public API to the contents of the pool of constants could help, we already know how to fill it in the order we need, but there is no such API and it is unlikely to ever be. The Class :: getConstantPool () method available in OpenJDK is declared as package-private and it would be rash to rely on it in user code.

The initialization block is now quite compact and almost independent of the number of elements in the enumeration, so you createValues()can refuse it by embedding its body in the loop:

static  {
    Field unsafeField = Unsafe.class.getDeclaredField("theUnsafe");
    unsafeField.setAccessible(true);
    Unsafe unsafe = (Unsafe) unsafeField.get(null);

    String[] fieldNames = "Fizz,Buzz,FizzBuzz".split(",");
    FizzBuzz[] localValues = new FizzBuzz[fieldNames.length];
    for(int i = 0; i < fieldNames.length; i++) {
        String fieldName = fieldNames[i];
        Field field = FizzBuzz.class.getDeclaredField(fieldName);
        long fieldOffset = unsafe.staticFieldOffset(field);
        unsafe.putObject(
            FizzBuzz.class,
            fieldOffset,
            (localValues[i] = new FizzBuzz(fieldName, i))
        );
    }

    $VALUES = localValues;
}

Here an avalanche-like process happens: along with the method createValues(), instructions for reading fields of enumeration elements disappear, type records Fieldreffor these fields become unnecessary , and therefore type NameAndTyperecords for type records Fieldref. In the constant pool, 2 * < >slots are freed up that can be used to declare additional enumeration elements.

But not everything is so rosy, tests show a significant performance drawdown: initializing an enumeration class with 65 thousand elements takes unthinkable one and a half minutes. As it turned out pretty quickly, "reflex slows down."

The implementation of Class :: getDeclaredField () in OpenJDK has a linear asymptotic behavior of the number of fields in the class, and our initialization block is quadratic because of this.

Adding caching improves the situation somewhat, although it does not completely solve it:

static  {
    Field unsafeField = Unsafe.class.getDeclaredField("theUnsafe");
    unsafeField.setAccessible(true);
    Unsafe unsafe = (Unsafe) unsafeField.get(null);

    String[] fieldNames = "Fizz,Buzz,FizzBuzz".split(",");
    Field[] fields = FizzBuzz.class.getDeclaredFields();
    HashMap<String, Field> cache = new HashMap<>(fields.length);

    for(Field field : fields) {
        cache.put(field.getName(), field);
    }

    for (int i = 0; i < fieldNames.length; i++) {
        String fieldName = fieldNames[i];
        Field field = cache.get(fieldName);
        long fieldOffset = unsafe.staticFieldOffset(field);
        unsafe.putObject(
            FizzBuzz.class,
            fieldOffset,
            (localValues[i] = new FizzBuzz(fieldName, i))
        );
    }    

    $VALUES = localValues;
}

The unsafe approach described in this chapter allows you to create transfers with the number of elements up to 65_410, which is almost 24 times more than the result achievable with javac and is quite close to the theoretical limit of 65_505 elements calculated by us in the previous publication of the cycle.


Check performance


For tests, we take the largest enumeration, generating it using the command java -jar HugeEnumGen.jar -a Unsafe UnsafeHugeEnum. As a result, we get a class file with a size of 2 megabytes and 65_410 elements.

Create a new Java project in IDEA and add the generated class as an external library.

Almost immediately, it becomes apparent that IDEA is not ready for such a stress test:



Auto-completion of an enumeration element takes tens of seconds both on the ancient mobile i5 and on the more modern i7 8700K. And if you try using quick fix to add the missing elements to the switch, then IDEA even stops redrawing the windows. I suspect that temporarily, but failed to wait for completion. Responsiveness during debugging also leaves much to be desired.

Let's start with a small number of elements in switch:

public class TestFew {

    public static void main(String... args) {
        for(String arg : args) {
            System.out.print(arg + " : ");

            try {
                UnsafeHugeEnum value = UnsafeHugeEnum.valueOf(arg);

                doSwitch(value);
            } catch(Throwable e) {
                e.printStackTrace(System.out);
            }
        }
    }

    private static void doSwitch(UnsafeHugeEnum value) {
        switch(value) {
            case VALUE_00001:
                System.out.println("First");
                break;
            case VALUE_31415:
                System.out.println("(int) (10_000 * Math.PI)");
                break;
            case VALUE_65410:
                System.out.println("Last");
                break;
            default:
                System.out.println("Unexpected value: " + value);
                break;
        }
    }

}

There are no surprises here, compilation and launch are regular:

$ java TestFew VALUE_00001 VALUE_00400 VALUE_31415 VALUE_65410
VALUE_00001 : First
VALUE_00400 : Unexpected value: VALUE_00400
VALUE_31415 : (int) (10_000 * Math.PI)
VALUE_65410 : Last

What about more items in switch? Can we, for example, process switchall of our 65 thousand elements in one at once?

switch(value) {
    case VALUE_00001:
    case VALUE_00002:
        ...
    case VALUE_65410:
        System.out.println("One of known values: " + value);
        break;
    default:
        System.out.println("Unexpected value: " + value);
        break;
}

Alas, no. When we try to compile, we get a whole bunch of error messages:

$ javac -fullversion
javac full version "14.0.1+7"

$ javac TestAll.java
TestAll.java:18: error: code too large for try statement
        switch(value) {
        ^
TestAll.java:65433: error: too many constants
                break;
                ^
TestAll.java:17: error: code too large
    private static void doSwitch(UnsafeHugeEnum value) {
                        ^
TestAll.java:1: error: too many constants
public class TestAll {
       ^
4 errors



Javac and switch


To understand what is happening, we have to figure out how the translation switchof the elements of the enumeration occurs .

The JVM specification has a separate chapter in JVMS11 §3.10 Compiling Switches , the recommendations of which boil down to switchusing one of the two bytecode instructions, tableswitchor lookupswitch. switchWe will not find any references to strings or enumeration elements in this chapter.

The best documentation is code, so it's time to dive into the source javac.

The choice between tableswitchand lookupswitchoccurs in Gen :: visitSwitch () and depends on the number of options in switch. In most cases, wins tableswitch:

// Determine whether to issue a tableswitch or a lookupswitch
// instruction.
long table_space_cost = 4 + ((long) hi - lo + 1); // words
long table_time_cost = 3; // comparisons
long lookup_space_cost = 3 + 2 * (long) nlabels;
long lookup_time_cost = nlabels;
int opcode =
    nlabels > 0 &&
    table_space_cost + 3 * table_time_cost <=
    lookup_space_cost + 3 * lookup_time_cost
    ?
    tableswitch : lookupswitch;

The header tableswitchis approximately 16 bytes plus 4 bytes per value. Thus, switchunder no circumstances can there be more ( 65_535 - 16 ) / 4 = 16_379elements.

Indeed, after reducing the number of branches casein the body switchto 16 thousand, there remains only one compilation error, the most mysterious:

TestAll.java:18: error: code too large for try statement
        switch(value) {
        ^

In search of the source of the error, we will return a little earlier, to the stage of getting rid of syntactic sugar. The switchmethods are responsible for the translation visitEnumSwitch(), mapForEnum()and the class EnumMappingin Lower.java .

There we also find a small documentary comment:

EnumMapping JavaDoc
/** This map gives a translation table to be used for enum
 *  switches.
 *
 *  <p>For each enum that appears as the type of a switch
 *  expression, we maintain an EnumMapping to assist in the
 *  translation, as exemplified by the following example:
 *
 *  <p>we translate
 *  <pre>
 *          switch(colorExpression) {
 *          case red: stmt1;
 *          case green: stmt2;
 *          }
 *  </pre>
 *  into
 *  <pre>
 *          switch(Outer$0.$EnumMap$Color[colorExpression.ordinal()]) {
 *          case 1: stmt1;
 *          case 2: stmt2
 *          }
 *  </pre>
 *  with the auxiliary table initialized as follows:
 *  <pre>
 *          class Outer$0 {
 *              synthetic final int[] $EnumMap$Color = new int[Color.values().length];
 *              static {
 *                  try { $EnumMap$Color[red.ordinal()] = 1; } catch (NoSuchFieldError ex) {}
 *                  try { $EnumMap$Color[green.ordinal()] = 2; } catch (NoSuchFieldError ex) {}
 *              }
 *          }
 *  </pre>
 *  class EnumMapping provides mapping data and support methods for this translation.
 */


The mysterious tryturns out to be part of an automatically generated helper class TestAll$0. Inside - a declaration of a static array and code to initialize it.

The array fixes the correspondence between the names of the enumeration elements and the switchnumerical values assigned to them during compilation , thereby protecting the compiled code from the harmful effects of refactoring.

When reordering, adding new ones, or deleting existing enumeration elements, some of them may change the value ordinal()and this is what an additional level of indirection protects from.

try {
    $SwitchMap$UnsafeHugeEnum[UnsafeHugeEnum.VALUE_00001.ordinal()] = 1;
    //  9: getstatic     #2                  // Field $SwitchMap$UnsafeHugeEnum:[I
    // 12: getstatic     #3                  // Field UnsafeHugeEnum.VALUE_00001:LUnsafeHugeEnum;
    // 15: invokevirtual #4                  // Method UnsafeHugeEnum.ordinal:()I
    // 18: iconst_1
    // 19: iastore
}
// 20: goto          24
catch(NoSuchFieldError e) { }
// 23: astore_0

The initialization code is simple and consumes from 15 to 17 bytes per element. As a result, the static initialization block accommodates the initialization of no more than 3_862 elements. This number turns out to be the maximum number of enumeration elements that we can use in one switchwith the current implementation javac.


Conclusion


We saw that the use of even such a simple technique as allocating the creation of enumeration elements and initializing an array $VALUESinto a separate method allows you to increase the maximum number of elements in an enumeration from 2_746 to 10_920.

The constant dynamic results against the background of previous achievements do not look very impressive and allow you to get only 43 elements more, but with this approach it’s much more elegant to add new properties to the enumeration - just modify the constructor and pass it the necessary values ​​through the static parameters of the dynamic constant.

If sometime in the future attribute ConstantValuewill be taught to understand the dynamic constants, this number could rise to 10 thousand to 16.

Usesun.misc.Unsafeallows you to make a giant leap and increase the maximum number of elements to 65_410. But do not forget that Unsafethis is a proprietary API that may disappear over time and its use is a considerable risk, as javac directly warns:

Test.java:3: warning: Unsafe is internal proprietary API and may be removed in a future release
import sun.misc.Unsafe;
               ^

But, as it turned out, it is not enough to generate a giant enumeration, you also need to be able to use it.

Currently, there are problems with the support of such enumerations both from the IDE and at the Java compiler level.

A large number of fields in the class can degrade the responsiveness of the IDE both during editing and during debugging. Sometimes up to a complete hang.

The restrictions imposed by the class file format and implementation details of javac make it impossible to use switchmore than 3_862 elements in the code at the same time. Of the positive aspects, it is worth mentioning that these can be arbitrary 3_862 elements.

Further improvement of the results is possible only through refinement of the Java compiler, but this is a completely different story.


Additional materials


GitHub source code: https://github.com/Maccimo/HugeEnumGeneratorArticle

Collected JAR file: https://github.com/Maccimo/HugeEnumGeneratorArticle/releases/tag/v1.0

Supported Startup Help

Huge enumeration generator

    https://github.com/Maccimo/HugeEnumGeneratorArticle

Additional information (in Russian):

    https://habr.com/ru/post/483392/
    https://habr.com/ru/post/501870/

Usage:
    java -jar HugeEnumGen.jar [ <options> ] <enum name>

    <enum name>
        An enumeration class name.
        Should be a valid Java identifier. May contain package name.

Options:

    -d <directory>
        Output directory path.
        Current working directory by default.

    -e <item list file>
        Path to UTF8-encoded text file with list of enumeration item names.
        Item names will be autogenerated if absent.
        Mutually exclusive with the -c option.

    -c <count>
        Count of autogenerated enumeration item names.
        Mutually exclusive with the -e option.
        Default value: Algorithm-depended

    -a <algorithm>
        Enumeration generation algorithm.
        Supported algorithms:
          ConDy          - Employ Constant Dynamic (JEP 309) for enum elements initialization
          ExtractMethod  - Extract enum elements initialization code to separate method
          Unsafe         - Employ sun.misc.Unsafe for enum elements initialization

        Default algorithm: ExtractMethod

    -h / -?
        Show this help page.

Example:

    java -jar HugeEnumGen.jar -d ./bin -c 2020 com.habr.maccimo.HugeEnum2020



All Articles