UTF8

class UTF8

The UTF8 class encapsulates a utf8 encoded array of characters and allows for easy encoding and decoding.

Public Functions

UTF8 &Assign(UTF8 &&in_utf8)

Moves the source UTF8 object to this object. This method is functionally equivalent to the overloaded assignment operator.

Parameters:in_utf8 – The source of the move.
Returns:A reference to this object.
UTF8 &Assign(UTF8 const &in_utf8)

Copies the source UTF8 object to this object. This method is functionally equivalent to the overloaded assignment operator.

Parameters:in_utf8 – The source of the copy.
Returns:A reference to this object.
inline char At(size_t in_index) const

Retrieves the utf8 encoded character at the specified index. This method may split up individual code points.

Returns:The utf8 encoded character array.
void Clear()

Reset all string data.

inline bool Empty() const

Indicates whether this utf8 string is empty.

Returns:true if the UTF8 string is empty, false otherwise.
inline char const *GetBytes() const

Retrieves the raw, utf8 encoded character array.

Returns:The utf8 encoded character array.
size_t GetHash() const

Returns a hash code for the utf8 encoded characters.

Returns:The size_t hash code.
inline size_t GetLength() const

Retrieves the number of bytes in the utf8 encoded string up to but not including the null terminator. This will return 0 if the utf8 object is uninitialized.

Returns:The number of bytes.
inline size_t GetWStrLength() const

Retrieves the number of wide characters in the wchar_t string up to but not including the null terminator. This will return 0 if the utf8 object is uninitialized.

Returns:The number of wide characters.
inline bool IsValid() const

Indicates whether this utf8 string has been initialized.

Returns:true if the UTF8 string has been initialized, false otherwise.
inline operator char const*() const

Allows typecasting to const char * by retrieves the raw, utf8 encoded character array.

Returns:The utf8 encoded character array.
inline bool operator!=(char const *in_utf8) const

This function is used to check a utf8-encoded character string for equivalence to this.

Parameters:in_utf8 – The object to compare to this.
Returns:true if the objects are not equivalent, false otherwise.
inline bool operator!=(UTF8 const &in_utf8) const

This function is used to check an object for equivalence to this.

Parameters:in_utf8 – The object to compare to this.
Returns:true if the objects are not equivalent, false otherwise.
UTF8 operator+(char const *in_utf8) const

Creates a new UTF8 object by appending a utf8 encoded string to the end of this object.

Parameters:in_utf8 – A string, assumed to be utf8 encoded, used as the tail end of the new string.
Returns:A new UTF8 object representing the concatenation of 2 strings.
UTF8 operator+(UTF8 const &in_utf8) const

Creates a new UTF8 object by appending a UTF8 object to the end of this object.

Parameters:in_utf8 – The tail end of the new string.
Returns:A new UTF8 object representing the concatenation of 2 strings.
UTF8 &operator+=(char const *in_utf8)

Appends a utf8 encoded string to the end of this object.

Parameters:in_utf8 – A string, assumed to be utf8 encoded, used as the tail end of the new string.
Returns:A reference to this object.
UTF8 &operator+=(UTF8 const &in_utf8)

Appends a UTF8 object to the end of this object.

Parameters:in_utf8 – The tail end of the new string.
Returns:A reference to this object.
inline UTF8 &operator=(UTF8 &&in_utf8)

The move assignment operator takes control of the underlying data from the source utf8 string.

Parameters:the – source of the move.
inline UTF8 &operator=(UTF8 const &in_utf8)

Copies the source UTF8 object to this object.

Parameters:in_utf8 – The source of the copy.
Returns:A reference to this object.
bool operator==(char const *in_utf8) const

This function is used to check a utf8-encoded character string for equivalence to this.

Parameters:in_utf8 – The object to compare to this.
Returns:true if the objects are equivalent, false otherwise.
bool operator==(UTF8 const &in_utf8) const

This function is used to check an object for equivalence to this.

Parameters:in_utf8 – The object to compare to this.
Returns:true if the objects are equivalent, false otherwise.
inline void Reset()

Resets this object to its initial, uninitialized state.

size_t ToWStr(wchar_t *out_wide_string) const

Decode a utf8 encoded string into a wide character buffer

Parameters:out_wide_string
Returns:the number of wide characters (code points) in the wide string.
size_t ToWStr(WCharArray &out_wide_string) const

Decode a utf8 encoded string into a wide character buffer

Returns:The number of wide characters (code points) in the wide string.
UTF8()

The default constructor creates an empty UTF8 string.

UTF8(char const *in_string, char const *in_locale = 0)

This constructor can be used to encode a string from any known locale to utf8. Be careful not to re-encode a string that’s already utf8 encoded.

Parameters:
  • in_string – The string to be encoded.
  • in_locale – A string identifying the source locale of in_string. If none is specified, the default locale on the local machine will be used. If in_string is already utf8 encoded, specify the locale as “utf8” to prevent re-encoding.
UTF8(UTF8 &&in_that)

The move constructor takes control of the underlying data from the source utf8 string.

Parameters:the – source of the move.
UTF8(UTF8 const &in_that)

The copy constructor copies the source utf8 string.

Parameters:in_that – the source to be copied.
UTF8(wchar_t const *in_string)

This constructor can be used to encode a wide character string to utf8.

Parameters:in_string – The string to be encoded.
~UTF8()

A destructor for a UTF8 string.

Friends

inline friend bool operator!=(char const *in_left, UTF8 const &in_right)

This function is used to check a utf8-encoded character string for equivalence to a UTF8 object.

Parameters:
  • in_left – A utf8-encoded character string.
  • in_right – A UTF8 object.
Returns:

true if the objects are not equivalent, false otherwise.

inline friend bool operator!=(wchar_t const *in_left, UTF8 const &in_right)

This function is used to check a wide character string for equivalence to a UTF8 object.

Parameters:
  • in_left – A wide character string.
  • in_right – A UTF8 object.
Returns:

true if the objects are not equivalent, false otherwise.

inline friend UTF8 operator+(char const *in_left, UTF8 const &in_right)

Creates a new UTF8 object by appending a UTF8 object to the end of a utf8-encoded character string.

Parameters:
  • in_left – A string, assumed to be utf8 encoded, used as the head end of the new string.
  • in_right – A UTF8 object used as the tail end of the new string.
Returns:

A new UTF8 object representing the concatenation of 2 strings.

inline friend UTF8 operator+(wchar_t const *in_left, UTF8 const &in_right)

Creates a new UTF8 object by appending a UTF8 object to the end of a wide character string.

Parameters:
  • in_left – A wide character string used as the head end of the new string.
  • in_right – A UTF8 object used as the tail end of the new string.
Returns:

A new UTF8 object representing the concatenation of 2 strings.

inline friend bool operator==(char const *in_left, UTF8 const &in_right)

This function is used to check a utf8-encoded character string for equivalence to a UTF8 object.

Parameters:
  • in_left – A utf8-encoded character string.
  • in_right – A UTF8 object.
Returns:

true if the objects are equivalent, false otherwise.

inline friend bool operator==(wchar_t const *in_left, UTF8 const &in_right)

This function is used to check a wide character string for equivalence to a UTF8 object.

Parameters:
  • in_left – A wide character string.
  • in_right – A UTF8 object.
Returns:

true if the objects are equivalent, false otherwise.