Character encoding

Note

If you are not using Japanese, Chinese, Cyrillic , Greek or Hebrew languages then this section can be safely skipped.

The core problem for the library is that it has no way of knowing in what input encoding the string given to the library is using. Hence it is necessary to, sometime, tell the library what input encoding is being used in order for the library to do necessary character encoding conversion to generate UTF-8 (or UTF-16) as needed to properly render the TTF fonts. The specific encoding options for each major supported locale are explained below.

By default all JpGraph library files and examples are encoded in UTF-8

All defines mentioned below can be found in the file "jpgraph_ttf.inc.php"

Japanese encoding options

There is only one possible option that can be specified.

Table 8.4. Japanese encoding options

Symbolic definePossible values Description
ASSUME_EUCJP_ENCODINGtrue/falseAssumes that Japanese text have been entered in EUC-JP encoding. If this define is true then conversion from EUC-JP to UTF8 is done automatically in the library using the mbstring module in PHP. Note that the multibyte extension in PHP is not normally enabled.


Otherwise it is assumed that the input characters are encoded in UTF-8. Remember that to show the Japanese character sets (Kanji, Hiragana and Katakana) one of the Japanese font families (FF_MINCHO, FF_PMINCHO, FF_GOTHIC or FF_PGOTHIC) must be specified.

An example of using Japanese locale together with Windrose plots can be seen in Localizing the default names for the compass directions.

Chinese encoding options

There are no specific settings that control the encoding. The following rules are used depending on the font is specified.

  1. If the font is specified as FF_SIMSUN the built-in library conversion from GB2312 to UTF-8 will be used. This translation table is stored in the file jpgraph_gb2312.inc.php.

  2. If the font is specified as FF_CHINESE then no conversion is made since it is assumed that the input character string is already in UTF-8 This only has the effect of changing the font to the default Chinese font family.

  3. If the font is specified as FF_BIG5 then it is assumed that the input character string is encoded in BIG5 and the internal translation to UTF-8 is done by the iconv() function. This means that PHP must be built with iconv() support. By default this is not compiled into PHP (needs the "--width-iconv" when configured). For more on building PHP with the right options see Appendix I. Compiling PHP. If this method is not present the library will generate the following an error message.

An example of using Chinese encoding with Windrose plots can be seen in Figure 21.11. Using chinese fonts (windrose_ex6.1.php)

Cyrillic encoding options

In order to do proper translation to unicode from cyrillic the LANGUAGE_CYRILLIC define should be set to true. If you are running the library in multiuser environment it might be necessary to also adjust the LANGUGAE_CHARSET define as described below.

Table 8.5. Cyrillic encoding options

Symbolic definePossible values Description
LANGUAGE_CYRILLICtrue/false

Special unicode cyrillic language support

CYRILLIC_FROM_WINDOWStrue/false

If you are setting this config to true the conversion will assume that the input text is encoded in windows 1251, if false it will assume koi8-r

LANGUAGE_CHARSETstring

This constant is used to auto-detect whether cyrillic conversion is really necessary if enabled. Just specify the encoding used, e.g. 'windows-1251', with a variable containing the input character encoding string of your application calling JpGraph.

A typical such string would be 'UTF-8' or 'utf-8'. The comparison is case-insensitive. If this charset is not a 'koi8-r' or 'windows-1251' derivate then no conversion is done. This constant can be very important in multi-user multi-language environments where a cyrillic conversion could be needed for some cyrillic people and resulting in just erroneous conversions for non cyrillic language based people.

Example: In the free project management software dotproject.net $locale_char_set is dynamically set by the language environment the user has chosen.

Usage: define('LANGUAGE_CHARSET', $locale_char_set); where $locale_char_set is a GLOBAL (string) variable from the application including JpGraph.


Hebrew encoding options

There are no user adjustable settings. The conversion is made from iso to unicode with the help of the PHP method "hebrev()" which is used to convert logical Hebrew text to visual text. This conversion is done automatically when the font is one of FF_DAVID, FF_MIRIAM or FF_AHRON

Greek encoding options

In order to do proper translation to unicode from greek the LANGUAGE_GREEK define should be specified to true.

Table 8.6. Greek encoding options

Symbolic definePossible values Description
LANGUAGE_GREEKtrue/falseSpecial unicode greek language support
GREEK_FROM_WINDOWStrue/falseIf you are setting this define to true the conversion of greek characters will assume that the input text is windows 1251