Code point and code unit are related to string characters and their encoding. In this tutorial, we shall the difference between the two and see sample code for the same.

Code Unit

Code units as one would imagine is the number of units used for representing a character string. More specifically, code units tell us the number bits being used to represent a character. These bits will vary depending upon the encoding style being used. For example UTF-8, UTF-16 and UTF-32 have code unit of 8, 16 and 32 respectively. The code unit is always specified as an integer.

Having said that some encoding mechanism use two code units for representing characters which are outside the range of single code units. For example with UTF-16 we could have a maximum 10FFFF+1 characters being represented and for anything else we could use 32 bits by making of use of two 16 bit characters. This maximum range of single code units is also known as Basic Multilingual Plane.

Code Point

Code point on the other hand is the actual character being represent by using the x number of bits used by the encoding style. It is up-to the encoding algorithm to decide which code point will represent which character. Further, the code points could also be big-endian and small-endian.

Code points and code units in Java API

The Java String API has methods which can be used to get the code point of a particular character in the String. The index argument passed to the “codePointAt” method returns the code point value for the character at that position in the String. The following program demonstrates the use of “codePointAt” method in String class:

package com.example;

public class Main {

	public static void main(String[] args) {
		String str = new String("Test");
		System.out.println(str.codePointAt(1));
		
	}
}

Output:
101

Please note that the “codePointAt” has been overloaded in String class with following versions:


public int codePointAt(int index) -- Returns the character (Unicode code point) at the specified index
public int codePointBefore(int index) -- Returns the character (Unicode code point) before the specified index. The index refers to char values (Unicode code units) and ranges from 1 to CharSequence#length() length
public int codePointCount(int beginIndex, int endIndex) -- Returns the number of Unicode code points in the specified text range of this String

Difference between CodePoint and CodeUnit in Java Strings admin Core Java
Code point and code unit are related to string characters and their encoding. In this tutorial, we shall the difference between the two and see sample code for the same. Code Unit Code units as one would imagine is the number of units used for representing a character string. More specifically,...
Code point and code unit are related to string characters and their encoding. In this tutorial, we shall the difference between the two and see sample code for the same. <h2>Code Unit</h2> Code units as one would imagine is the <strong>number of units used for representing a character string</strong>. More specifically, code units tell us the number bits being used to represent a character. These bits will vary depending upon the encoding style being used. For example UTF-8, UTF-16 and UTF-32 have code unit of 8, 16 and 32 respectively. The code unit is always specified as an integer. Having said that some encoding mechanism use two code units for <a href="http://www.javaexperience.com/strip-invalid-characters-from-xml/" title="Strip invalid characters from XML">representing characters</a> which are outside the range of single code units. For example with UTF-16 we could have a maximum 10FFFF+1 characters being represented and for anything else we could use 32 bits by making of use of two 16 bit characters. This maximum range of single code units is also known as Basic Multilingual Plane. <h2>Code Point</h2> Code point on the other hand is <strong>the actual character being represent</strong> by using the x number of bits used by the encoding style. It is up-to the encoding algorithm to decide which code point will represent which character. Further, the code points could also be big-endian and small-endian. <h2>Code points and code units in Java API</h2> The Java String API has methods which can be used to get the code point of a particular character in the String. The index argument passed to the <strong>"codePointAt" method returns the code point value</strong> for the character at that position in the String. The following program demonstrates the use of "codePointAt" method in <a href="http://www.javaexperience.com/why-the-string-class-is-immutable/" title="Why the String class is Immutable">String class</a>: 1 Output: 101 Please note that the "codePointAt" has been <a href="http://www.javaexperience.com/operator-overloading-in-java/" title="Operator overloading in Java">overloaded in String class</a> with following versions: <code> public int codePointAt(int index) -- Returns the character (Unicode code point) at the specified index public int codePointBefore(int index) -- Returns the character (Unicode code point) before the specified index. The index refers to char values (Unicode code units) and ranges from 1 to CharSequence#length() length public int codePointCount(int beginIndex, int endIndex) -- Returns the number of Unicode code points in the specified text range of this String </code>