xiaobaoqiu Blog

Think More, Code Less

你真的了解String吗

String是Java中最重要最常用的类,没有之一。但是我们很多人可能只知道String的存在和简单使用,并不了解其背后的实现以及为什么这么实现。 这里我就简单的挑选几个重要的节点讲解。

本文基于JDK 1.7

1
2
3
4
xiaobaoqiu@xiaobaoqiu:~ java -version
java version "1.7.0_80"
Java(TM) SE Runtime Environment (build 1.7.0_80-b15)
Java HotSpot(TM) 64-Bit Server VM (build 24.80-b11, mixed mode)

实现

Java其实就是char数组的一个包装,String真实的内容就是存储在char数组中,数组的一个特性就是在逻辑和存储上都是连续的、可随机访问的。

1
2
/** The value is used for character storage. */
private final char value[];

String的操作其实都是围绕底层的char数组来的,比如trim实现,trim的其实是value的内容

1
2
3
4
5
6
7
8
9
10
11
12
13
public String trim() {
    int len = value.length;
    int st = 0;
    char[] val = value;    /* avoid getfield opcode */

    while ((st < len) && (val[st] <= ' ')) {
        st++;
    }
    while ((st < len) && (val[len - 1] <= ' ')) {
        len--;
    }
    return ((st > 0) || (len < value.length)) ? substring(st, len) : this;
}

需要注意的是,String中的很多实现都是使用 System.arraycopy 来实现数组数据的快速拷贝,System.arraycopy是一个native方法:

1
2
3
4
5
public char[] toCharArray() {
    char result[] = new char[value.length];
    System.arraycopy(value, 0, result, 0, value.length);
    return result;
}

final & immutable

String基本约定中最重要的一条是Immutable,Immutable的意思就是对String的每次修改都是生成一个新的String,原始对象不受影响,类似于copy-on-write的思想。

先具体看一下什么是Immutable。String不可变很简单,如下图,给一个已有字符串"abcd",第二次赋值成"abcedl",不是在原内存地址上修改数据,而是重新指向一个新对象,新地址。

那String怎么做到不可变?主要通过三个手段 (1).String类声明为final(不会被继承改掉这种Immutable特性); (2).底层的value数组声明为final(value这个引用地址不可变); (3).所有String的方法里很小心的没有去动value数组里的元素,没有暴露内部成员字段;

其实String是不可变的关键都在底层的实现,而不是一个final修饰符。

不可变对象带来的最大好处就是线程安全,因为不会被改写。

hashcode

String内部属性除了value数组外,还有一个 hash,用于记录String的hash值。 String能这样讲hash值cache下来的原因就是上面提到的

1
2
/** Cache the hash code for the string */
private int hash; // Default to 0

每次调用hashCode,先检测 hash 是否已经计算好了,如果计算了,就不要重新计算。

1
2
3
4
5
6
7
8
9
10
11
12
public int hashCode() {
    int h = hash;
    if (h == 0 && value.length > 0) {
        char val[] = value;

        for (int i = 0; i < value.length; i++) {
            h = 31 * h + val[i];
        }
        hash = h;
    }
    return h;
}

hash32

String中有一个hash32()的方法,它的结果被缓存在成员变量int hash32 中。

1
2
3
4
/**
 * Cached value of the alternative hashing algorithm result
 */
private transient int hash32 = 0;
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
/**
 * Calculates a 32-bit hash value for this string.
 *
 * @return a 32-bit hash value for this string.
 */
int hash32() {
    int h = hash32;
    if (0 == h) {
       // harmless data race on hash32 here.
       h = sun.misc.Hashing.murmur3_32(HASHING_SEED, value, 0, value.length);

       // ensure result is not zero to avoid recalcing
       h = (0 != h) ? h : 1;

       hash32 = h;
    }

    return h;
}

关于 hash32 为什么存在,可以看下面官网的说明:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
Java SE 7u6 introduces an improved, alternative hash function for the following map and map-derived collection implementations:

HashMap
Hashtable
HashSet
LinkedHashMap
LinkedHashSet
WeakHashMap
ConcurrentHashMap

The alternative hash function improves the performance of these map implementations when a large number of key hash collisions are encountered.

For Java SE 7u6, this alternative hash function is implemented as follows:

The alternative hash function is only applied to maps with a capacity larger than a specified threshold size. By default, the threshold is -1. This value disables the alternative hash function. To enable the alternative hash function, set the jdk.map.althashing.threshold system property to a different value. The recommended value is 512. Setting this system property to 512 causes all maps with a capacity larger than 512 entries to use the alternative hash function. You can set this system property to 0, which causes all maps to use the alternative hash function.

The following describes the jdk.map.althashing.threshold system property in more detail:

Value type: Integer
Value default: -1
Value range: From -1 to 2147483647, inclusive
Description: Threshold capacity at which maps use the alternative hash function. The value -1 is a synonym for 2147483647 (which is easier to remember). All other values correspond to threshold capacity.
For example, the following command runs the Java application MyApplication and sets the jdk.map.althashing.threshold system property to 512:

java -Djdk.map.althashing.threshold=512 MyApplication
If the alternative hash function is being used, then the iteration order of keys, values, and entities vary for each instance of HashMap, Hashtable, HashSet, and ConcurrentHashMap. This change in iteration order may cause compatibility issues with some programs. This is the reason that the alternative hash function is disabled by default. Hashing improvements will be investigated in future releases. In the meantime, the system property jdk.map.althashing.threshold is experimental. It is strongly recommended that you test your applications with the alternative hashing function enabled (by setting jdk.map.althashing.threshold to 0) to determine if your applications are affected by iteration order. If there is an impact, you should fix your applications as soon as possible because there is no guarantee of iteration order.

简单说就是为了解决 HashMap 这种hash 碰撞严重时候的性能,因为HashMap使用链地址法解决Hash碰撞,当链很长的时候,性能会受影响。 但是这个问题在Java 8 中已经解决了,因为Java 8中引入一个新的策略,当链比较长的时候(默认阈值10),会转化为采用红黑树实现后面的链表,可以讲时间从O(N)降到O(lgN).

参考: http://stackoverflow.com/questions/23716836/what-is-use-of-hash32-in-string-class http://docs.oracle.com/javase/7/docs/technotes/guides/collections/changes7.html http://stackoverflow.com/questions/23926312/difference-between-hash32-and-hash-in-java-string-object http://stackoverflow.com/questions/23926312/difference-between-hash32-and-hash-in-java-string-object/23926405#23926405

hash32这个方法最大的变化是,同一个字符串在不同的JVM上执行hash32()的结果可能不同(确切的说,多数情况下都会不同,因为其内部分别调用了一次System.currentTimeMillis()和两次System.nanoTime()用于初始化seed)。因此,某些容器在每次运行程序时迭代顺序都不同。

CaseInsensitiveComparator

在String中,有一个护额略大小写的比较方法,它的实现就是我们这里要说的CaseInsensitiveComparator

1
2
3
4
5
public int compareToIgnoreCase(String str) {
    return CASE_INSENSITIVE_ORDER.compare(this, str);
}

public static final Comparator<String> CASE_INSENSITIVE_ORDER = new CaseInsensitiveComparator();

CaseInsensitiveComparator的实现如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
private static class CaseInsensitiveComparator implements Comparator<String>, java.io.Serializable {
    // use serialVersionUID from JDK 1.2.2 for interoperability
    private static final long serialVersionUID = 8575799808933029326L;

    public int compare(String s1, String s2) {
        int n1 = s1.length();
        int n2 = s2.length();
        int min = Math.min(n1, n2);
        for (int i = 0; i < min; i++) {
            char c1 = s1.charAt(i);
            char c2 = s2.charAt(i);
            if (c1 != c2) {
                c1 = Character.toUpperCase(c1);
                c2 = Character.toUpperCase(c2);
                if (c1 != c2) {
                    c1 = Character.toLowerCase(c1);
                    c2 = Character.toLowerCase(c2);
                    if (c1 != c2) {
                        // No overflow because of numeric promotion
                        return c1 - c2;
                    }
                }
            }
        }
        return n1 - n2;
    }
}

我们发现比较时候需要现将字符转化为大小比较,再转化为小写比较。