This is a new and improved version of https://www.oracle.com/database/technologies/faq-nls-lang.html.
Last updated: 2022-07-22
- NLS_LANG Parameter Fundamentals
- Common NLS_LANG Myths
- The NLS_LANG parameters need to be exactly the same as the client’s locale parameters
- NLS_LANG will change my client host’s character set
- NLS_LANG is also a server-side database parameter
- NLS_LANG needs to be the same as the database character set
- NLS_LANG always needs to be the same as your client host’s character set
- If you don’t set NLS_LANG on the client, it uses the NLS_LANG of the server
- The NLS_LANG LANGUAGE and TERRITORY components control what characters can be stored in the database
- The value of NLS_LANG is case sensitive
- Checking the current NLS_LANG setting
- The Priority of NLS Parameters related to NLS_LANG
- Scopes of NLS Parameters
- An example of a wrong NLS_LANG setup
- How to set up the NLS_LANG properly
- Other Frequently asked questions regarding NLS_LANG
- What does the LANGUAGE component of the NLS_LANG parameter control?
- What does the TERRITORY component of the NLS_LANG parameter control?
- How can I see what’s really stored in the database?
- Where is the Character Conversion done?
- Windows SQL*Plus is not showing all my extended characters
- Why do I get a question mark or inverted question mark back when selecting just inserted characters?
- How do I check the code points managed by a UNIX Operating System?
- What about command line tools like SQL*Loader, Import/Export, etc.?
- What about database links?
- What about multiple Oracle Homes on Windows?
- Is there an Oracle Unicode Client on Windows?
- What is a Character Set or Code Page?
- Why are there different Character Sets?
- What is the difference between 7-bit, 8-bit and Unicode Character Sets?
- How to choose the right database character set?
NLS_LANG
Parameter Fundamentals
A locale
is a set of information addressing linguistic and cultural requirements that correspond to a given language and country. Traditionally, the data associated with a locale
provides support for formatting and parsing of dates, times, numbers, currencies, etc. Providing current and correct locale
data has historically been the responsibility of each platform owner or vendor, leading to inconsistencies and errors in locale
data.
Setting the NLS_LANG
operating system environment variable is the simplest way to specify locale
behavior for Oracle software. It sets the language and territory used by the client application and the database server host. It also indicates the client’s character set, which corresponds to the character set for data to be entered or displayed by a client program.
NLS_LANG
is set as a local environment variable on UNIX platforms. NLS_LANG
is set in the registry on Windows platforms.
The NLS_LANG
variable has three components: LANGUAGE
, TERRITORY
, and CHARACTERSET
. It needs to be specified it in the following format, including the punctuation:
NLS_LANG = LANGUAGE_TERRITORY.CHARACTERSET
If NLS_LANG
is not set, it defaults to NLS_LANG=LANGUAGE_TERRITORY.US7ASCII
with the LANGUAGE_TERRITORY
components derived from the NLS Oracle Database instance parameters.
Each component of the NLS_LANG
variable controls the operation of a subset of globalization support features:
Language
Specifies conventions such as the language used for Oracle error messages, sorting, day names, and month names. Each supported language has a unique name, for example, AMERICAN
, FRENCH
, or GERMAN
. The language component specifies default values for the territory and character set components. If the language is not specified, then the value defaults to AMERICAN
.
Territory
Specifies conventions such as the default date, monetary, and numeric formats. Each supported territory has a unique name, for example, AMERICA
, FRANCE
, or AUSTRIA
. If the territory component is not specified, then the value is derived from the language value.
Characterset
Specifies the character set used by the client application (normally the Oracle-provided character set that corresponds to the user’s/application’s terminal character set or the OS character set). Each supported character set has a unique acronym, for example, UTF8
, WE8ISO8859P1
, WE8DEC
, WE8MSWIN1252
, or JA16EUC
. Each language has a default character set associated with it.
Setting partial components only
All components of the NLS_LANG
definition are optional; any item that is not specified uses its default value. If you want to specify only territory and/or character set, then you must include the preceding delimiter [underscore (_
) for territory, period (.
) for character set]. Otherwise, the value is parsed as a language name.
For example, to set only the character set component of NLS_LANG
, use the following format: NLS_LANG=.AL32UTF8
To set only the territory component of NLS_LANG
, use the following format: NLS_LANG=_AUSTRIA
The remainder of this document will focus on the characterset component of the NLS_LANG
environment variable, as it is the least understood and most important piece to set correctly.
Common NLS_LANG
Myths
The NLS_LANG
parameters need to be exactly the same as the client’s locale
parameters
Many different operating systems use different values for their locale
parameters. The Oracle Database software hence uses its own naming schemes that differ from those of the operating system. For example, an operating system setting on Linux of LC_ALL=en_US.utf8
would be equivalent to NLS_LANG=AMERICAN_AMERICA.AL32UTF8
. Setting NLS_LANG
to the same value of LC_ALL
of en_US.utf8
would be incorrect!
For a complete list of language, territory and character set names, see the Oracle Database Globalization Support Guide – Appendix A – Locale Data.
NLS_LANG
will change my client host’s character set
The character set defined in the NLS_LANG
variable does NOT CHANGE your client host’s character set. It is used to let the Oracle Database software on the server-side know what character set you are using on the client-side so that Oracle can do the proper character set conversion, if needed. You cannot change the character set of your client host itself by using a different NLS_LANG
character set! Each operating system provides its own mechanisms how to change the character set it is using.
NLS_LANG
is also a server-side database parameter
The short answer is NO! Many believe that setting the NLS_LANG
will also impact the behavior of the Oracle Database instance itself, but NLS_LANG
is only ever considered by the Oracle client software connecting to an Oracle Database so that it can convert data from and to the database character set. The database instance does not look at NLS_LANG
. Neither does NLS_LANG
have an impact on the database itself. The database always operates in the character set that it was created with. You can set NLS_LANG
on the database server host which will impact Oracle clients started on the server host, such as running SQL*Plus or other client tools on the server host that connect to the database. This is applicable if the database host OS itself uses a different character set than the database, however, such a setup is not recommended!
NLS_LANG
needs to be the same as the database character set
NO! Setting the NLS_LANG
to the character set of the database MAY be correct but IS often not correct. DO NOT assume that NLS_LANG
needs to be the same as the database character set. THIS IS OFTEN NOT TRUE! NLS_LANG
is there to tell the database instance in what locale
the client is operating in.
When the client NLS_LANG
character set is set to the same value as the database character set, Oracle assumes that the data being sent or received is of the same (correct) encoding, so no conversions or validations occur for performance reasons. The data is just stored inside the database as delivered by the client, bit by bit, and sent to the client as it is stored inside the database, bit by bit.
Note that Oracle recommends using Unicode (AL32UTF8
) as the character set for the database itself!
NLS_LANG
always needs to be the same as your client host’s character set
This is not necessarily always the case. For example, if you execute a file in SQL*Plus via @<file>
, that file could be encoded in a different character set than what your client host operates with. In that case, NLS_LANG
will have to (at least) match the CHARACTERSET
component with the encoding of the file so that the data within the file is converted correctly.
If you don’t set NLS_LANG
on the client, it uses the NLS_LANG
of the server
This is NOT true! NLS_LANG
is an Oracle client software environment variable. If it is not set then its value is by default AMERICAN_AMERICA.US7ASCII
. In other words, the language component is AMERICAN
, the territory component is AMERICA
, and the character set component is US7ASCII
.
The NLS_LANG
LANGUAGE
and TERRITORY
components control what characters can be stored in the database
Setting the LANGUAGE
and TERRITORY
components of NLS_LANG
have nothing to do with the ability to store characters in a database. A NLS_LANG
set to JAPANESE_JAPAN.WE8MSWIN1252
will not allow you to store Japanese characters, as WE8MSWIN1252
doesn’t support Japanese characters. However a NLS_LANG
set to AMERICAN_AMERICA.JA16SJIS
will allow you to store Japanese characters providing the input data is truly JA16SJIS
and that the database is also in a character set that can store Japanese characters, like UTF8
or JA16SJIS
). See further above what the LANGUAGE
and TERRITORY
components control.
The value of NLS_LANG
is case sensitive
The NLS_LANG
variable values are not case sensitive:
$ export NLS_LANG=gERMan_Austria.al32utf8
$ sqlplus / as sysdba
SQL*Plus: Release 21.0.0.0.0 - Production on Sa Jul 16 18:17:03 2022
Version 21.3.0.0.0
Copyright (c) 1982, 2021, Oracle. All rights reserved.
Verbunden mit:
Oracle Database 21c Express Edition Release 21.0.0.0.0 - Production
Version 21.3.0.0.0
SQL> SELECT parameter, value
2 FROM NLS_SESSION_PARAMETERS
3 WHERE parameter IN ('NLS_LANGUAGE','NLS_TERRITORY');
PARAMETER VALUE
------------- -------
NLS_LANGUAGE GERMAN
NLS_TERRITORY AUSTRIA
Furthermore, the database will prevent you from connecting if you had specified invalid values for NLS_LANG
:
$ export NLS_LANG=Gibts_Goa.net
$ sqlplus / as sysdba
SQL*Plus: Release 21.0.0.0.0 - Production on Sat Jul 16 18:20:51 2022
Version 21.3.0.0.0
Copyright (c) 1982, 2021, Oracle. All rights reserved.
ERROR:
ORA-12705: Cannot access NLS data files or invalid environment specified
Enter user-name:
Checking the current NLS_LANG
setting
In some cases the NLS_LANG
has already been set during the Oracle software installation or thereafter manually. To be sure, you can use these methods to get back the value of NLS_LANG
for SQL*Plus:
On UNIX
SQL> host echo $NLS_LANG
This returns the value of the environment variable if it is set.
On Windows
On Windows you have two possible options, normally the NLS_LANG
is set in the registry, but it can also be set in the environment, however, the latter is often not the case. The value in the environment takes precedence over the value in the registry and is used for ALL Oracle_Homes on the server. Also note that any USER environment variable takes precedence over any SYSTEM environment variable (this is Windows behavior, and has nothing to do with Oracle).
To check if NLS_LANG
is set in the environment, you can use the following method:
SQL> host echo %NLS_LANG%
If NLS_LANG
is set in the environment, this reports something like:
GERMAN_AUSTRIA.WE8ISO8859P1
If NLS_LANG
is not set in the environment, you will just get %NLS_LANG%
back.
To check the value in the registry, use the following command:
SQL> @.[%NLS_LANG%].
The value is present in the registry if you get something like this back:
Unable to open file ".[ENGLISH_UNITED KINGDOM.WE8ISO8859P1]."
The “file name” between the square brackets is the value of the registry parameter.
Just like with the environment variable approach, NLS_LANG
is not present in the registry if you get this back:
Unable to open file ".[%NLS_LANG%]."
Note: the @.[%NLS_LANG%].
technique reports the NLS_LANG
known to the SQL*Plus executable, it will not read the registry itself. But if you run the HOST
command first and the NLS_LANG
is not set in the environment variable, then you can be sure the variable is set in the registry itself, if the @.[%NLS_LANG%].
returns a valid value.
All other NLS parameters can be retrieved via: SELECT * FROM NLS_SESSION_PARAMETERS;
Important:
SELECT USERENV ('language') FROM DUAL;
gives the session’s LANGUAGE_TERRITORY
but the DATABASE character set, not the session’s, so the value returned is not the client’s complete NLS_LANG
setting!
The Priority of NLS Parameters related to NLS_LANG
There are 3 levels at which you can set NLS parameters: Database, Instance and Session. If a parameter is defined at more than one level then the rules on which one takes precedence are quite straightforward:
- NLS session settings
- NLS instance settings
- NLS database settings
This means that the NLS session setting takes precedence over the NLS instance setting which takes precedence over the NLS database setting.
Scopes of NLS Parameters
Session Parameters
You can retrieve the settings used for the current database session via this SQL statement:
SELECT * FROM NLS_SESSION_PARAMETERS;
These reflect (in this order):
- The values of NLS parameters set by the
ALTER SESSION
command, e.g.ALTER SESSION SET NLS_DATE_FORMAT = 'DD/MM/YYYY';
- If there were no explicit
ALTER SESSION
statements done, then it reflects the setting of the corresponding NLS parameter on the client derived from the client-sideNLS_LANG
variable. - If
NLS_LANG
does not include theLANGUAGE
component, thenAMERICAN
is used as default. So if you setNLS_LANG=_BELGIUM.UTF8
then the SQL statement above will report the following (note the difference betweenNLS_LANG=_BELGIUM.UTF8
(correct) andNLS_LANG=BELGIUM.UTF8
(incorrect), you need to set the “_
” as the component separator):
PARAMETER VALUE
---------------- --------
NLS_LANGUAGE AMERICAN
NLS_TERRITORY BELGIUM
NLS_CURRENCY €
NLS_ISO_CURRENCY BELGIUM
- If
NLS_LANG
does not include theTERRITORY
component, then it defaults based on theLANGUAGE
component. So if you setNLS_LANG=ITALIAN_.WE8MSWIN1252
then you get this (note the difference between NLS_LANG=ITALIAN_.WE8MSWIN1252 (correct) and NLS_LANG=ITALIAN.WE8MSWIN1252 (incorrect), you need to set the “_” as the component separator):
PARAMETER VALUE
---------------- -------
NLS_LANGUAGE ITALIAN
NLS_TERRITORY ITALY
NLS_CURRENCY €
NLS_ISO_CURRENCY ITALY
- If
NLS_LANG
does not include theLANGUAGE_TERRITORY
components, then these default toAMERICAN_AMERICA
. So if you setNLS_LANG=.WE8MSWIN1252
then you get this (note the difference betweenNLS_LANG=.WE8MSWIN1252
(correct) andNLS_LANG=WE8MSWIN1252
(incorrect), you need to set the “.
” as the component separator):
PARAMETER VALUE
---------------- --------
NLS_LANGUAGE AMERICAN
NLS_TERRITORY AMERICA
NLS_CURRENCY $
NLS_ISO_CURRENCY AMERICA
- If
NLS_LANG
is set (either way like in points 3, 4, or 5) then parameters likeNLS_SORT
,NLS_DATE_FORMAT
,NLS_ISO_CURRENCY
, etc. can be set as a “standalone” setting viaALTER SESSION
and will overrule the defaults derived from theNLS_LANG
LANGUAGE_TERRITORY
components. So if you setNLS_LANG=AMERICAN_AMERICA.UTF8
andALTER SESSION SET NLS_ISO_CURRENCY=FRANCE
then you get this:
PARAMETER VALUE
---------------- --------
NLS_LANGUAGE AMERICAN
NLS_TERRITORY AMERICA
NLS_CURRENCY $
NLS_ISO_CURRENCY FRANCE
Defaults:
- If
NLS_DATE_LANGUAGE
orNLS_SORT
are not set, then they are derived fromNLS_LANGUAGE
. - If
NLS_CURRENCY
,NLS_DUAL_CURRENCY
,NLS_ISO_CURRENCY
,NLS_DATE_FORMAT
,NLS_TIMESTAMP_FORMAT
,NLS_TIMESTAMP_TZ_FORMAT
,NLS_NUMERIC_CHARACTERS
are not set, then they are derived fromNLS_TERRITORY
.
- If
NLS_LANG
is not set at all, then it defaults toLANGUAGE_TERRITORY.US7ASCII
. The values used forLANGUAGE_TERRITORY
components are the ones found in the NLS instance parameter settings (NLS_INSTANCE_PARAMETERS
).
Parameters likeNLS_SORT
defined as “standalone” environment variables on the client side are ignored.
Important
- If set, session parameters (
NLS_SESSION_PARAMETERS
) always take precedence over NLS instance parameters (NLS_INSTANCE_PARAMETERS
) and NLS database parameters (NLS_DATABASE_PARAMETERS
). - This behavior cannot be disabled on/from the server, so a parameter set on the client always has precedence above an instance or database parameter.
NLS_LANG
cannot be changed byALTER SESSION
, however, theLANGUAGE
andTERRITORY
components can via theNLS_LANGUAGE
andNLS_TERRITORY
session parameters. Note thatNLS_LANGUAGE
and/orNLS_TERRITORY
cannot be set as “standalone” environment or registry variables!- The values in
NLS_SESSION_PARAMETERS
for the current session are NOT visible for other sessions. If you need to trace the values of these then you have to use a logon trigger and create your own logging table to store the current values of your session parameters. - The
CHARACTERSET
component ofNLS_LANG
is NOT shown in any database system table or view. - On Windows you have two possible options, normally the
NLS_LANG
is set in the registry, but it can also be set as an environment variable, however, the latter is not often done and generally not recommended to do so. The value in the environment takes precedence over the value in the registry and is used for ALL Oracle_Homes on the server if it is defined as a system environment variable. - NLS_LANGUAGE also declares the language for the client error messages.
- You cannot “set” an NLS parameter in a SQL script; you need to use
ALTER SESSION
.
Instance Parameters
You can retrieve the NLS instance settings via this SQL statement:
SELECT * from NLS_INSTANCE_PARAMETERS;
These are the settings either set in the database instance parameter file (spfile.ora/init.ora
) at the moment that the database instance was started or set through the ALTER SYSTEM
command.
If a parameter is not explicitly set in the database instance parameter file or via ALTER SYSTEM
then its value is NOT derived from a “higher” level parameter (we are talking about parameters like NLS_SORT
that derive a default from NLS_LANGUAGE
in NLS_SESSION_PARAMETERS
, this is NOT the case for NLS_INSTANCE_PARAMETERS
).
Important
- You cannot define
NLS_LANG
in the spfile.ora/init.ora as a database instance parameter. - You cannot define the client character set as a database instance parameter (spfile.ora/init.ora). The client character set is defined by the
NLS_LANG
environment variable on the client OS. NLS_LANG
is not a database (spfile.ora/init.ora) parameter, however,NLS_LANGUAGE
andNLS_TERRITORY
are. If you would like to set either of these as database parameters, you need to set them separately via theNLS_LANGUAGE
andNLS_TERRITORY
database instance parameters.- You cannot define the database character set as a database instance parameter. The database (national) character set
NLS_(NCHAR)_CHARACTERSET
) is defined by theCREATE DATABASE
command executed during initial database creation. - These settings take precedence above the
NLS_DATABASE_PARAMETERS
. - These values are used for the
NLS_SESSION_PARAMETERS
if the clientNLS_LANG
is NOT set. - Oracle strongly recommends that you set the
NLS_LANG
on the client at least toNLS_LANG=.CHARACTERSET
- The
NLS_LANGUAGE
value in the instance parameters also declares the language for the server error messages in alert.log and in the trace files.
Database Parameters
You can retrieve the NLS settings for the database via this SQL statement:
SELECT * from NLS_DATABASE_PARAMETERS;
The database NLS LANGUAGE_TERRITORY
components default to AMERICAN_AMERICA
if there are no parameters explicitly set in the spfile.ora/init.ora during database creation time. There is no way to change these settings after the database creation. Do NOT attempt to update system tables to bypass these settings! These settings are used to give the database itself default values if neither the INSTANCE nor SESSION parameters are set. If you need different values, set the NLS INSTANCE parameters accordingly in the database instance parameter file.
Important
- The
NLS_DATABASE_PARAMETERS
settings are overridden byNLS_INSTANCE_PARAMETERS
andNLS_SESSION_PARAMETERS
. - The
NLS_CHARACTERSET
andNLS_NCHAR_CHARACTERSET
parameters cannot be overridden by instance or session parameters.
They are defined by the values specified in the CREATE DATABASE command executed during initial database creation and are not intended to be changed afterward dynamically. Do NOT update system tables to change the character set. This can corrupt your database and potentially make it impossible to open the database again. - Setting the
NLS_LANG
during the creation of the database does not influence theNLS_DATABASE_PARAMETERS
. - The
NLS_LANG
set during the database creation has NO impact on the database National Character Set.
Additional SELECT
statements
SELECT name, value$ FROM sys.props$ WHERE name LIKE '%NLS%';
This gives the same info asNLS_DATABASE_PARAMETERS.
You should useNLS_DATABASE_PARAMETERS
instead ofprops$
.
Note the UPPERCASE'%NLS%'
.SELECT * FROM v$nls_parameters;
This view shows the current session parameters and the DATABASE character set as seen in theNLS_DATABASE_PARAMETERS
view.SELECT name, value FROM v$parameter WHERE name LIKE '%nls%';
This view gives the same information asNLS_INSTANCE_PARAMETERS
.
Note the lowercase'%nls%'
.SELECT USERENV('language') FROM dual;
andSELECT SYS_CONTEXT('userenv','language') FROM dual;
Both these SELECT statements give the session’s LANGUAGE_TERRITORY and the DATABASE character set. The database character set is not the same as the character set of theNLS_LANG
variable that you started this session/database connection with! Do not be fooled, although the output of this query looks like the value of aNLS_LANG
variable, it is NOT!SELECT USERENV('lang') FROM dual;
This SELECT gives the shortcode that Oracle uses for the language defined by theNLS_LANGUAGE
setting for this session. IfNLS_LANGUAGE
is set toGERMAN
then this will return “D
“, ifNLS_LANGUAGE
is set toENGLISH
then this will return “GB
“, ifNLS_LANGUAGE
is set toAMERICAN
then this will return “US
“, and so on…SHOW parameter nls%
This will give the same as theNLS_INSTANCE_PARAMETERS
.
An example of a wrong NLS_LANG
setup
In the following scenario, a database is created on a Linux system with the US7ASCII
character set. A Windows client connecting to the database works with the WE8MSWIN1252
character set (Regional settings -> Western Europe/ACP 1252) and the DBA administering the database uses a UNIX shell with the ROMAN8
character set to work on the database. The NLS_LANG
is set to AMERICAN_AMERICA.US7ASCII
on the clients and the server host.
So we have:
- Database character set:
US7ASCII
- Database server
NLS_LANG=AMERICAN_AMERICA_US7ASCII
- Database server
- Windows client:
WE8MSWIN1252
- Windows client
NLS_LANG=AMERICAN_AMERICA_US7ASCII
- Windows client
- DBA UNIX shell:
ROMAN8
- DBA UNIX client
NLS_LANG=AMERICAN_AMERICA_US7ASCII
- DBA UNIX client
Note: This is an INCORRECT setup to explain character set conversion, don’t use this in your environment!
A very important point, as mentioned in the MYTH section:
When the client NLS_LANG
character set is set to the same value as the database character set, Oracle assumes that the data being sent or received is of the same (correct) encoding, so no conversions or validations occur for performance reasons. The data is just stored inside the database as delivered by the client, bit by bit, and sent to the client as it is stored inside the database, bit by bit.
Now let’s assume that from the Windows client you insert an 'é'
(LATIN SMALL LETTER E WITH ACUTE) into a table NLS_TEST
containing one column TEST
of the type CHAR(1)
.
As long as you insert into and select the column on Windows with the WE8MSWIN1252 character set everything runs smoothly. No conversion is done because NLS_LANG on the client machine matches the database character set (US7ASCII) and the 8 bits for the character are inserted and read back, even if the character set of the database itself is defined as 7 bits. This happens because a byte is 8 bits and Oracle is ALWAYS using 8 bits even with a 7-bit character set. In a correct setup, the Most Significant Bit is just not used and only 7 bits are taken into account.
For one reason or another, you need to connect to the database from the UNIX server. When you SELECT
from the table NLS_TEST
where you previously inserted the data via the Windows client you now get a 'Õ'
(LATIN CAPITAL LETTER O WITH TILDE) instead of the 'é'
on the UNIX server.
Additionally, if you insert the 'é'
on the UNIX server and you SELECT the row on the Windows client you now get an 'Å'
(LATIN CAPITAL LETTER A WITH RING ABOVE) back.
Why is this happening? Because you have INCORRECT data in the database. You stored the numeric value for 'é'
of the WE8MSWIN1252
character set in the database but you told Oracle this was US7ASCII
data, so Oracle did NOT convert anything and just stored the numeric value (again: Oracle thinks that the client is giving US7ASCII
codes because the NLS_LANG
on the client is set to US7ASCII
, and the database character set is also US7ASCII
, hence no conversion is done).
When you SELECT the same row back on the UNIX server, Oracle is again expecting that the value is correct and passes the value to the UNIX terminal without any conversion.
Now the problem is that in the WE8MSWIN1252
character set the 'é'
has the hexadecimal value E9
and in the ROMAN8
character set the hexadecimal value for 'é'
is C5
. Oracle just passes the value stored in the database (E9
) to the UNIX terminal, and the UNIX terminal thinks this is the letter '
because in its (Õ
'ROMAN8
) character set the hexadecimal value E9
is representing the letter '
. So instead of the Õ
''é'
you get '
on the UNIX terminal screen.Õ
'
The inverse (the insert on the UNIX and the SELECT on the Windows client) is the same story, but you get other results.
The solution is to create the database with a character set that contains the letter 'é'
(UTF-8
, WE8MSWIN1252
, WE8ISO89859P1
, etc.) and setting the NLS_LANG
on the Windows client to WE8MSWIN1252
and on the UNIX server to WE8ROMAN8
. If you then insert an 'é'
on both sides, you will get an 'é'
back regardless of where you SELECT them. Oracle then knows that a hexadecimal value of C5
is inserted by the UNIX server and a hexadecimal value of E9
from the Windows client and converts both values to the value of 'é'
represented in the database character set (the value in the database depends on the character set you have chosen) and vice versa when selecting the character again.
Note that you do not have to switch between UNIX, Windows, or other OS clients to run into this kind of problem. The same problem appears if you add multiple Windows clients that are using different character sets and have an incorrect NLS_LANG
environment variable set.
How to set up the NLS_LANG
properly
For UNIX
To specify the locale
behavior of your Oracle client software, you have to set your NLS_LANG
environment variable. It sets the language, territory and also character set that your client is operating in. You need to check the locale
environment settings of the client operating system to set your NLS_LANG
3rd component (character set) in accordance with it. To do this, use the locale
command like this:
$ locale
LANG=de_AT
LC_CTYPE="de_AT.iso885915@euro"
LC_NUMERIC="de_AT.iso885915@euro"
LC_TIME="de_AT.iso885915@euro"
LC_COLLATE="de_AT.iso885915@euro"
LC_MONETARY="de_AT.iso885915@euro"
LC_MESSAGES="de_AT.iso885915@euro"
LC_PAPER="de_AT.iso885915@euro"
LC_NAME="de_AT.iso885915@euro"
LC_ADDRESS="de_AT.iso885915@euro"
LC_TELEPHONE="de_AT.iso885915@euro"
LC_MEASUREMENT="de_AT.iso885915@euro"
LC_IDENTIFICATION="de_AT.iso885915@euro"
LC_ALL=
The output of this command is not exactly the same in all the UNIX environments. On some platforms, it can be useful to use the locale LC_CTYPE | head
command to have more details about the character set code page that is really used:
$ locale LC_CTYPE | head
upper;lower;alpha;digit;xdigit;space;print;graph;blank;cntrl;punct;alnum;combining;combining_level3
toupper;tolower;totitle
16
1
ISO-8859-15
72
86
1
0
1
In both cases above, the NLS_LANG
character set component should be set to WE8ISO8859P15
. Note that on Solaris, AIX, and TRU64 this command doesn’t give interesting complementary information. To find more details about these locale
environment settings, see:
- Linux:
/usr/share/locale/locale.alias
- Solaris:
/usr/lib/locale
- AIX:
/usr/lib/nls/README
- TRU64:
/usr/lib/nls
- HP-UX:
/usr/lib/nls/config
To set a chosen value for these locale
settings, it’s needed to know which values are available. You can retrieve that with the following command locale -a
:
$ locale -a
aa_DJ
aa_DJ.iso88591
aa_DJ.utf8
aa_ER
aa_ER@saaho
aa_ER.utf8
aa_ER.utf8@saaho
aa_ET
aa_ET.utf8
af_ZA
af_ZA.iso88591
af_ZA.utf8
am_ET
am_ET.utf8
an_ES
an_ES.iso885915
an_ES.utf8
...
Once you have decided on which character set you want to use, for example UTF-8
, you can set it e.g. on Linux via export LC_ALL=de_AT.utf8
. Below is the output of locale
and locale LC_CTYPE | head
after the change to UTF-8
has happened:
$ export LC_ALL=de_AT.utf8
$ locale
LANG=de_AT
LC_CTYPE="de_AT.utf8"
LC_NUMERIC="de_AT.utf8"
LC_TIME="de_AT.utf8"
LC_COLLATE="de_AT.utf8"
LC_MONETARY="de_AT.utf8"
LC_MESSAGES="de_AT.utf8"
LC_PAPER="de_AT.utf8"
LC_NAME="de_AT.utf8"
LC_ADDRESS="de_AT.utf8"
LC_TELEPHONE="de_AT.utf8"
LC_MEASUREMENT="de_AT.utf8"
LC_IDENTIFICATION="de_AT.utf8"
LC_ALL=de_AT.utf8
$ locale LC_CTYPE | head
upper;lower;alpha;digit;xdigit;space;print;graph;blank;cntrl;punct;alnum;combining;combining_level3
toupper;tolower;totitle
16
6
UTF-8
72
86
1
0
1
In this example, because the client is using UTF-8
, the 3rd component (character set) of NLS_LANG
should equally be set to AL32UTF8
(as explained earlier, Oracle uses its own names for character sets as well) to let the Oracle Database software on the database server know which character set the client itself is using. Note that the LANGUAGE_TERRITORY
components can remain different. It is perfectly fine if the client host operates in German within the Austrian territory settings but the database session to the Oracle Database operates in English and within the American territory settings. What is important, however, is that the character set that the client host uses matches the one specified in NLS_LANG
. So this is permissive:
$ export NLS_LANG=American_America.AL32UTF8
For Windows and DOS Code Pages
On Windows systems, the encoding scheme (character set) is specified by a code page. Code pages are defined to support specific languages or groups of languages, which share common writing systems. From Oracle’s point of view, the terms code page and character set mean the same. Note that in non-Chinese-Japanese-Korean environments, the Windows GUI and DOS command prompt do not use the same code page.
As a result, Windows uses 2 different character sets for the ANSI (sqlplusw.exe) and the OEM (DOS box – sqlplus.exe) environments.
Where to set the NLS_LANG
in Windows
In the Registry
On Windows systems, you should make sure that you have set an NLS_LANG
registry subkey for each of your Oracle Homes:
You can easily modify this subkey with the Windows Registry Editor:Start -> Run...
Type regedit
, and click ok
Edit the following registry entry:
For Oracle version 7
HKEY_LOCAL_MACHINE\SOFTWARE\ORACLE
For Oracle Database versions 8, 8i and 9i
HKEY_LOCAL_MACHINE\SOFTWARE\ORACLE\HOMEx\
where x
is the unique number identifying the Oracle home. HOME0
is the first installation.
For Oracle Database 10g and later
HKEY_LOCAL_MACHINE\SOFTWARE\ORACLE\KEY_
There you have an entry with name NLS_LANG
.
When starting an Oracle tool, like SQL*Plus, it will read the content of the oracle.key file located in the same directory to determine which registry tree will be used, therefore which NLS_LANG
subkey will be used.
Important
Some people are confused by finding a NLS_LANG
set to NA
in HKEY_LOCAL_MACHINE\SOFTWARE\ORACLE
when no version 7 was installed. This is used for backward compatibility and can be ignored.
As a System or User Environment Variable, in System properties
Although the Registry is the primary repository for settings on Windows, it is not the only place where parameters can be set. Even if not at all recommended, you can set the NLS_LANG
as a System or User Environment Variable in the System properties.
This setting will be used for ALL Oracle Homes.
To check and modify them:
Right-click the My Computer
icon -> Properties
and select the Advanced
Tab -> Click on Environment Variables
The User Variables
list contains the settings for the specific OS user currently logged on and the System Variables
contains system-wide variables for all users.
Since these environment variables take precedence over the parameters already set in your Registry, you should not set Oracle parameters at this location unless you have a very good reason!
As an Environment Variable defined in the command prompt
With this approach, before using an Oracle command line tool you need to MANUALLY SET the NLS_LANG
parameter. In an MS-DOS command prompt, use the set command, for example:
C:\> set NLS_LANG=American_America.WE8PC850
Determine your Windows ANSI code page
Now that you know what the NLS_LANG
is currently set to, you can check to see if it properly agrees with the current ANSI code page. The ACP (ANSI Code Page) is defined by the “default locale” setting of Windows, so if you have a UK Windows 7 client and you want to input Cyrillic (Russian) you need to change the ACP (by changing the “default locale”) in order to be able to input Russian.
You’ll find its value in the registry:
Start -> Run...
Type regedit
, and click ok
Browse the following registry entry: HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\NLS\CodePage\
There you have (all the way down) an entry with the name ACP. The value of ACP is your current GUI code page, for the mapping to the Oracle name. Since there are many registry entries with very similar names, please make sure that you are looking at the right place in the registry.
OEM
= the command line code pageANSI
= the GUI code page
Find the correspondent Oracle client character set
Find the Oracle client character set in the table below based on the ACP you found above. Note that there is only ONE CORRECT value for a given ACP.
ANSI CodePage (ACP) | Oracle Client character set (3rd component of NLS_LANG ) |
---|---|
1250 | EE8MSWIN1250 |
1251 | CL8MSWIN1251 |
1252 | WE8MSWIN1252 |
1253 | EL8MSWIN1253 |
1254 | TR8MSWIN1254 |
1255 | IW8MSWIN1255 |
1256 | AR8MSWIN1256 |
1257 | BLT8MSWIN1257 |
1258 | VN8MSWIN1258 |
874 | TH8TISASCII |
932 | JA16SJIS |
936 | ZHS16GBK |
949 | KO16MSWIN949 |
950 | ZHT16MSWIN950 – except for Hong Kong (see below) |
This is the character set used by the GUI SQL*Plus (sqlplusW.exe
) that you start through the Windows start menu. Please note the difference between the GUI SQL*Plus and the “DOS mode” SQL*Plus.
You can use UTF-8
as Oracle client character set (via NLS_LANG
) on Windows NT, 2000 and XP but you will be limited to using only client programs that explicitly support this configuration. This is because the user interface of Win32 is not UTF-8
, therefore the client programs have to perform explicit conversions between UTF-8
(used on Oracle side) and UTF-16
(used on Win32 side).
Use the Windows Registry Editor to set up NLS_LANG
in your Oracle Home with the value you have just found above.
The correct NLS_LANG
for Windows Command Line Operations
MS-DOS mode uses, with a few exceptions like CJK, (Japanese, Korean, Simplified Chinese, and Traditional Chinese) a different code page (called OEM code page) than Windows GUI (ANSI code page). This means that before using an Oracle command line tool such as SQL*Plus (sqlplus.exe
) in a command prompt, you need to MANUALLY SET the NLS_LANG
parameter as an environment variable with the set DOS command BEFORE using the tool.
For Japanese, Korean, Simplified Chinese, and Traditional Chinese, the MS-DOS OEM code page (CJK) is identical to the ANSI code page meaning that, in this particular case, there is no need to set the NLS_LANG
parameter in MS-DOS mode.
In all other cases, you need to set it in order to overwrite the NLS_LANG
registry key already matching the ANSI code page. The new “MS-DOS dedicated” NLS_LANG
needs to match the MS-DOS OEM code page that could be retrieved by typing chcp
in a Command Prompt:
C:\> chcp
Active code page: 437
C:\> set NLS_LANG=american_america.US8PC437
If the NLS_LANG
parameter for the MS-DOS mode session is not set appropriately, error messages and data can be corrupted due to incorrect character set conversion.
Use the following list to find the Oracle character set that fits your MS-DOS code page in use on your system:
MS-DOS codepage | Oracle Client character set (3rd component of NLS_LANG ) |
---|---|
437 | US8PC437 |
737 | EL8PC737 |
850 | WE8PC850 |
852 | EE8PC852 |
857 | TR8PC857 |
858 | WE8PC858 |
861 | IS8PC861 |
862 | IW8PC1507 |
865 | N8PC865 |
866 | RU8PC866 |
List of common NLS_LANG settings used in the Windows Registry
Note: this is the correct setting for the GUI SQL*Plus version, (sqlplusW.exe
)
If you are testing with “special” characters please DO use the GUI and not the “DOS box” (sqlplus.exe
)!
Operating System Locale | Oracle NLS_LANG setting |
---|---|
Arabic (U.A.E.) | ARABIC_UNITED ARAB EMIRATES.AR8MSWIN1256 |
Bulgarian | BULGARIAN_BULGARIA.CL8MSWIN1251 |
Catalan | CATALAN_CATALONIA.WE8MSWIN1252 |
Chinese (PRC) | SIMPLIFIED CHINESE_CHINA.ZHS16GBK |
Chinese (Taiwan) | TRADITIONAL CHINESE_TAIWAN.ZHT16MSWIN950 |
Chinese (Hong Kong HKCS) | TRADITIONAL CHINESE_HONG KONG.ZHT16HKSCS |
Chinese (Hong Kong HKCS2001) | TRADITIONAL CHINESE_HONG KONG.ZHT16HKSCS2001 |
Croatian | CROATIAN_CROATIA.EE8MSWIN1250 |
Czech | CZECH_CZECH REPUBLIC.EE8MSWIN1250 |
Danish | DANISH_DENMARK.WE8MSWIN1252 |
Dutch (Netherlands) | DUTCH_THE NETHERLANDS.WE8MSWIN1252 |
Dutch (Belgium) | DUTCH_BELGIUM.WE8MSWIN1252 |
English (United Kingdom) | ENGLISH_UNITED KINGDOM.WE8MSWIN1252 |
English (United States) | AMERICAN_AMERICA.WE8MSWIN1252 |
Estonian | ESTONIAN_ESTONIA.BLT8MSWIN1257 |
Finnish | FINNISH_FINLAND.WE8MSWIN1252 |
French (Canada) | CANADIAN FRENCH_CANADA.WE8MSWIN1252 |
French (France) | FRENCH_FRANCE.WE8MSWIN1252 |
German (Germany) | GERMAN_GERMANY.WE8MSWIN1252 |
Greek | GREEK_GREECE.EL8MSWIN1253 |
Hebrew | HEBREW_ISRAEL.IW8MSWIN1255 |
Hungarian | HUNGARIAN_HUNGARY.EE8MSWIN1250 |
Icelandic | ICELANDIC_ICELAND.WE8MSWIN1252 |
Indonesian | INDONESIAN_INDONESIA.WE8MSWIN1252 |
Italian (Italy) | ITALIAN_ITALY.WE8MSWIN1252 |
Japanese | JAPANESE_JAPAN.JA16SJIS |
Korean | KOREAN_KOREA.KO16MSWIN949 |
Latvian | LATVIAN_LATVIA.BLT8MSWIN1257 |
Lithuanian | LITHUANIAN_LITHUANIA.BLT8MSWIN1257 |
Norwegian | NORWEGIAN_NORWAY.WE8MSWIN1252 |
Polish | POLISH_POLAND.EE8MSWIN1250 |
Portuguese (Brazil) | BRAZILIAN PORTUGUESE_BRAZIL.WE8MSWIN1252 |
Portuguese (Portugal) | PORTUGUESE_PORTUGAL.WE8MSWIN1252 |
Romanian | ROMANIAN_ROMANIA.EE8MSWIN1250 |
Russian | RUSSIAN_CIS.CL8MSWIN1251 |
Slovak | SLOVAK_SLOVAKIA.EE8MSWIN1250 |
Spanish (Spain) | SPANISH_SPAIN.WE8MSWIN1252 |
Swedish | SWEDISH_SWEDEN.WE8MSWIN1252 |
Thai | THAI_THAILAND.TH8TISASCII |
Spanish (Mexico) | MEXICAN SPANISH_MEXICO.WE8MSWIN1252 |
Spanish (Venezuela) | LATIN AMERICAN SPANISH_VENEZUELA.WE8MSWIN1252 |
Turkish | TURKISH_TURKEY.TR8MSWIN1254 |
Ukrainian | UKRAINIAN_UKRAINE.CL8MSWIN1251 |
Vietnamese | VIETNAMESE_VIETNAM.VN8MSWIN1258 |
List of common NLS_LANG
settings used in the Command Prompt (DOS box)
Note: this is the correct setting for the DOS BOX SQL*Plus version (sqlplus.exe
)!
Operating System Locale | Oracle Client character set (3rd component of NLS_LANG ) |
---|---|
Arabic | AR8ASMO8X |
Catalan | WE8PC850 |
Chinese (PRC) | ZHS16GBK |
Chinese (Taiwan) | ZHT16MSWIN950 |
Czech | EE8PC852 |
Danish | WE8PC850 |
Dutch | WE8PC850 |
English (United Kingdom) | WE8PC850 |
English (United States) | US8PC437 |
Finnish | WE8PC850 |
French | WE8PC850 |
German | WE8PC850 |
Greek | EL8PC737 |
Hebrew | IW8PC1507 |
Hungarian | EE8PC852 |
Italian | WE8PC850 |
Japanese | JA16SJIS |
Korean | KO16MSWIN949 |
Norwegian | WE8PC850 |
Polish | EE8PC852 |
Portuguese | WE8PC850 |
Romanian | EE8PC852 |
Russian | RU8PC866 |
Slovak | EE8PC852 |
Slovenian | EE8PC852 |
Spanish | WE8PC850 |
Swedish | WE8PC850 |
Turkish | TR8PC857 |
Other Frequently asked questions regarding NLS_LANG
What does the LANGUAGE
component of the NLS_LANG
parameter control?
The LANGUAGE
component of the NLS_LANG
parameter controls the operation of a subset of globalization support features. It specifies conventions such as the language used for Oracle error messages, sorting, day names, and month names. Each supported language has a unique name; for example, AMERICAN
, FRENCH
, or GERMAN
. The LANGUAGE
component specifies default values for the territory and character set components. If the LANGUAGE
component is not specified, then the value defaults to AMERICAN
.
What does the TERRITORY
component of the NLS_LANG
parameter control?
The TERRITORY
component of the NLS_LANG
parameter controls the operation of a subset of globalization support features. It specifies conventions such as the default date, monetary, and numeric formats. Each supported territory has a unique name; for example, AMERICA
, FRANCE
, or AUSTRIA
. If the territory is not specified, then the value is derived from the LANGUAGE
component.
How can I see what’s really stored in the database?
To find the real numeric value for a character stored in the database, you can use the DUMP
function:
The syntax of the function call is: DUMP( <value> [, <format> [, <offset> [, <length> ] ] ] )
Parameters:
<value>
– is the value to be displayed, for example, a column name containing the data.<format>
– is a number that describes the format in which bytes of the value are to be displayed: 8 – means octal, 10 – means decimal, 16 – means hexadecimal; other values between 0 and 16 mean decimal; values greater than 16 are a little confusing and mean: print bytes as ASCII characters if they correspond to printable ASCII codes, print them as “^x
” if they correspond to ASCII control codes and print them in hexadecimal otherwise; adding 1000 to the format number will add character set information for the character data type values to the return value.<offset>
– is the offset of the first byte of the value to display; negative values mean counting from the end.<length>
– is the number of bytes to display.
So for example:
SQL> SELECT DUMP(column, 1016) FROM table;
Typ=1 Len=39 CharacterSet=AL32UTF8: 227,131,143,227,131,170
returns the value of a column consisting of 3 Japanese characters in UTF-8
encoding. The 1st character is 227(*255)+131. You will probably need to convert this to UCS2 to verify the code point value with the Unicode Standard code page.
Where is the Character Conversion done?
Normally the character conversion is done on the client side for performance reasons. This is true from Oracle version 8.0.4 onwards. If the database is using a character set not known by the client then the conversion is done on the server side. This is true from Oracle version 8.1.6 onwards.
Windows SQL*Plus is not showing all my extended characters
If you see black squares instead of characters, you probably don’t have the right font defined for your code page. A font is a collection of glyphs (from “hieroglyphs”) that share a common appearance (typeface, character size). A font is used by the operating system to convert a numeric value into a graphical representation on screen. A font does not necessarily contain a graphical representation for all numeric values defined in the code page you are using. That’s why you sometimes get black squares on the screen if you change fonts and the new font has no representation for a certain symbol.
The Windows Character Set Map
utility can be used to see which glyphs are parts of a certain font.
On Windows 2000, XP and beyond:
Start -> Run...
Type charmap
, and click ok
.
Why do I get a question mark or inverted question mark back when selecting just inserted characters?
When characters are converted between the client’s and the database character set, or vice versa, the character should exist in both used character sets. If the character does not exist in the character set being converted to (the destination) then a replacement character is used. Some character sets specify replacement characters when translating from other specific character sets but for those character sets that do not, a default replacement character, such as a ?
, is used. It is important to note that a conversion from a replacement character back to the original character is not possible.
How do I check the code points managed by a UNIX Operating System?
To know which code point is generated for a character in a Unix environment, you can use the od
command:
echo "ü" | od -xc
0000000 bcc3 000a
303 274 \n
0000003
You can also check the character corresponding to a code point using the echo
command like this:
For Linux:
echo -e "\0303\0274"
ü
for Solaris, AIX, HP-UX, TRU64:
echo "\0303\0274"
ü
You can use Locale Builder or a printed code page (see links below) to verify that your native code page and NLS_LANG
setting properly correspond. If there is any ambiguity then use the command above to get the values for more than one character. For printed code pages, see:
What about command line tools like SQL*Loader, Import/Export, etc.?
Typically the NLS_LANG
needs to match the character set of the client terminal you are using. However, if you, for example, wish to load a file, you may find that your client terminal character set may not match the one that the file is encoded with. For example, your client could be using the iso885915@euro
character set as specified in its locale
environment variable LC_ALL="de_AT.iso885915@euro"
. The file, however, could be encoded with UTF-16
.
For tools like SQL*Loader you can, of course, temporarily change the NLS_LANG
on your client to match the character set of the file that you wish to load. An alternative to changing NLS_LANG
and the recommended approach is to specify the character set of the file using the CHARACTERSET
keyword in the .ctl
file. In that case, SQL*Loader will interpret the data in the file to be loaded in that character set regardless of the client character set setting specified by NLS_LANG
. Here is an example .ctl
file for utf16
:
-- Copyright (c) 2001 by Oracle Corporation
-- NAME
-- ulcase11.ctl - Load Data in the Unicode Character Set UTF-16
-- DESCRIPTION
-- Loads data in the Unicode character set UTF-16. The data is in little
-- Endean byte order. This means that depending on whether SQL*Loader is
-- running on a little Endean or a big Endean system, it will have to
-- byte swap the UTF-16 character data as necessary. This load uses
-- character length semantics, the default for the character set UTF-16.
--
-- This case study is modeled after case study 3 (ulcase3), which loads
-- variable length delimited (terminated and enclosed) data.
--
-- RETURNS
--
-- NOTES
-- None
-- MODIFIED (MM/DD/YY)
-- rpfau 02/06/01 - Merged rpfau_sqlldr_add_case_study_11
-- rpfau 01/30/01 - Creation
--
LOAD DATA
CHARACTERSET utf16
BYTEORDER little
INFILE ulcase11.dat
REPLACE
INTO TABLE EMP
FIELDS TERMINATED BY X'002c' OPTIONALLY ENCLOSED BY X'0022'
(empno integer external (5), ename, job, mgr,
hiredate DATE(20) "DD-Month-YYYY",
sal, comm,
deptno CHAR(5) TERMINATED BY ":",
projno,
loadseq SEQUENCE(MAX,1) )
From Oracle Database 9i onwards, the Export utilities always export user data, including Unicode data, in the character set of the database. The Import utilities automatically convert the user data to the character set of the target database.
In Oracle Database 8i, the Export utility exports user data converting it from the database character set to the character set specified by NLS_LANG
of the Export session. The Import utility first converts the data to the character set specified by NLS_LANG
of the Import session and then converts it to the character set of the target database. Care must be taken that the character set of NLS_LANG
for Export and Import sessions contain all characters to be migrated. This character set is usually chosen to be either the source database or the target database character set and it is usually the same for both Export and Import sessions. This choice is recommended especially with multibyte character sets, which pose some restrictions on export files. The Oracle Database 8i conversions to and from the NLS_LANG
character set happen in Oracle Database 9i for DDL statements contained in the Export file.
What about database links?
The NLS_LANG
on the server (or client) has no influence on character set conversion through a database link. Oracle Database will automatically do the conversion from the character set of the source database to the character set of the target database (or reverse).
What about multiple Oracle Homes on Windows?
There is nothing special with NLS_LANG
and multiple Oracle Homes on Windows. The parameter taken into account is the one specified in the ORACLE_HOME
registry key used by the Oracle Database. If the NLS_LANG
is set in the environment, it takes precedence over the value in the registry and is used for ALL Oracle Homes on the server/client.
The NLS_LANG
settings can be found in these registry keys:
HKEY_LOCAL_MACHINE\SOFTWARE\ORACLE
HKEY_LOCAL_MACHINE\SOFTWARE\ORACLE\HOMEx
Is there an Oracle Unicode Client on Windows?
On Windows there are two kinds of tools/applications:
- Fully Unicode-enabled applications that accept Unicode code points and which can render them. It’s the applications that need to deal with Unicode. Windows provides the Unicode API but the Windows Graphical Interface (GUI) system itself is not Unicode “by nature”.
Fully Unicode-enabled applications can only show one glyph for a given Unicode code point. So there is NO confusion possible here, these applications will need to use a full Unicode font. If you have a full Unicode application, then you need to set the CHARACTERSET component ofNLS_LANG
toAL32UTF8
. - Standard ANSI application (like
sqlplusw.exe
) cannot use Unicode code points. So the Unicode code point stored in the database needs to be converted to an ANSI code point based on the correct setting of theNLS_LANG
. This allows Oracle to map the Unicode code point to the character set of the client itself. If the Unicode code point does not have a mapping to the character set of the client then a replacement character is used.
What is a Character Set or Code Page?
A character set is just an agreement on what numeric value a symbol has. A computer does not know “A
” or “B
“, it only knows the (binary) numeric value for that symbol, defined in the character set used by its Operating System (OS) or in hardware (firmware) for terminals. A computer can only manipulate numbers, which is why there is a need for character sets. Some examples are: ASCII
, an old 7-bit character set, ROMAN8
, an 8-bit character set on UNIX or UTF-8
a multi-byte character set.
A code page is a name for the Windows/DOS encoding schemes, for Oracle NLS you can consider it the same as a character set. You also have to distinguish between a FONT and a character set/code page. A font is used by the OS to convert a numeric value into a graphical “print” on the screen. The Wingdings font on Windows is the best example of a font where an “A
” is NOT shown as an “A
” on screen, but for the OS the numeric value represents an “A
“. So you don’t SEE it as an “A
“, but for Windows, it’s an “A
” and will be saved (or used) as an “A
“.
To better understand the explanation above, just open Microsoft Word, choose the Wingdings font, type your name (you will see symbols) and save this as an HTML file. If you open the HTML file with a text editor like Notepad you will see that in the <style>
section the fonts are declared and lower in the <body>
section you will find your name in plain text but with the style='font-family: Wingdings'
attribute. If you open the file in your browser, you will again see the Wingdings symbols. It is the presentation of the data that changes, but not the data itself.
It’s also possible with some fonts that you don’t see all the symbols defined in the code page that you are using, just because the creator of the font did not include a graphical representation for all the symbols in that font. That’s why you sometimes get black squares on the screen if you change fonts. On Windows, you can use the Character Map tool to see all the symbols defined in a font.
Why are there different Character Sets?
Two main reasons:
- Historically, vendors have defined different “character sets” for their hardware and software, mainly because there were no official standards.
- New character sets have been defined to support new languages. With an 8-bit character set, you are limited in the number of symbols you can support so there are different character sets for different written languages.
What is the difference between 7-bit, 8-bit and Unicode Character Sets?
A 7-bit character set can only represent 128 symbols (2^7 = 128).
An 8-bit character set can only represent 256 symbols (2^8 = 256).
Unicode (UTF-8
) is a multi-byte character set. Unicode has the capability to define over a million characters. For more information on Unicode see the white paper Oracle Unicode Database Support.
How to choose the right database character set?
A fundamental consideration for choosing a character set is ensuring it can handle any language that needs to be supported immediately and in the indeterminate future. Another often overlooked consideration is to consider what applications and technologies you may want to utilize or interact with the database. Remember that character set conversions can be tedious and expensive projects. It is best to choose a character set that will avoid ever having to go through a conversion. Use Locale Builder (from Oracle Database 9i onwards) to view the characters defined for a particular Oracle Database character set.
Choosing Unicode as the database character set ensures a strong foundation for whatever is built into and on top of the database. Oracle recommends using Unicode (AL32UTF8
) for all new system deployment. Migrating legacy systems to Unicode is also recommended. Deploying your systems today in Unicode offers many advantages in usability, compatibility, and extensibility. Oracle’s comprehensive support allows you to deploy high-performing systems faster and more efficiently while leveraging the true power of Unicode. Even if you don’t need to support multilingual data today or have any requirement for Unicode, it is still likely to be the best choice for a new system in the long run. It will ultimately save you time and money and give you competitive advantages.