Localization is the process of making your app ready to work well in different locales (or countries).
As you might know, not everyone uses the same character for decimal points or for thousands separators… or for currency.
These locales have names, and you can select one to use. For example, a US locale might write a number like:
100,000.00
Whereas in Brazil, the same might be written with the commas and decimal points swapped:
100.000,00
Makes it easier to write your code so it ports to other nationalities with ease!
Well, sort of. Turns out C only has one built-in locale, and it’s limited. The spec really leaves a lot of ambiguity here; it’s hard to be completely portable.
But we’ll do our best!
For these calls, include <locale.h>
.
There is basically one thing you can portably do here in terms of declaring a specific locale. This is likely what you want to do if you’re going to do locale anything:
(LC_ALL, ""); // Use this environment's locale for everything setlocale
You’ll want to call that so that the program gets initialized with your current locale.
Getting into more details, there is one more thing you can do and stay portable:
(LC_ALL, "C"); // Use the default C locale setlocale
but that’s called by default every time your program starts, so there’s not much need to do it yourself.
In that second string, you can specify any locale supported by your system. This is completely system-dependent, so it will vary. On my system, I can specify this:
(LC_ALL, "en_US.UTF-8"); // Non-portable! setlocale
And that’ll work. But it’s only portable to systems which have that exact same name for that exact same locale, and you can’t guarantee it.
By passing in an empty string (""
) for the second argument, you’re telling C, “Hey, figure out what the current locale on this system is so I don’t have to tell you.”
Because moving green pieces of paper around promises to be the key to happiness155, let’s talk about monetary locale. When you’re writing portable code, you have to know what to type for cash, right? Whether that’s “$”, “€”, “¥”, or “£”.
How can you write that code without going insane? Luckily, once you call setlocale(LC_ALL, "")
, you can just look these up with a call to localeconv()
:
struct lconv *x = localeconv();
This function returns a pointer to a statically-allocated struct lconv
that has all that juicy information you’re looking for.
Here are the fields of struct lconv
and their meanings.
First, some conventions. An _p_
means “positive”, and _n_
means “negative”, and int_
means “international”. Though a lot of these are type char
or char*
, most (or the strings they point to) are actually treated as integers156.
Before we go further, know that CHAR_MAX
(from <limits.h>
) is the maximum value that can be held in a char
. And that many of the following char
values use that to indicate the value isn’t available in the given locale.
Field | Description |
---|---|
char *mon_decimal_point |
Decimal pointer character for money, e.g. "." . |
char *mon_thousands_sep |
Thousands separator character for money, e.g. "," . |
char *mon_grouping |
Grouping description for money (see below). |
char *positive_sign |
Positive sign for money, e.g. "+" or "" . |
char *negative_sign |
Negative sign for money, e.g. "-" . |
char *currency_symbol |
Currency symbol, e.g. "$" . |
char frac_digits |
When printing monetary amounts, how many digits to print past the decimal point, e.g. 2 . |
char p_cs_precedes |
1 if the currency_symbol comes before the value for a non-negative monetary amount, 0 if after. |
char n_cs_precedes |
1 if the currency_symbol comes before the value for a negative monetary amount, 0 if after. |
char p_sep_by_space |
Determines the separation of the currency symbol from the value for non-negative amounts (see below). |
char n_sep_by_space |
Determines the separation of the currency symbol from the value for negative amounts (see below). |
char p_sign_posn |
Determines the positive_sign position for non-negative values. |
char p_sign_posn |
Determines the positive_sign position for negative values. |
char *int_curr_symbol |
International currency symbol, e.g. "USD " . |
char int_frac_digits |
International value for frac_digits . |
char int_p_cs_precedes |
International value for p_cs_precedes . |
char int_n_cs_precedes |
International value for n_cs_precedes . |
char int_p_sep_by_space |
International value for p_sep_by_space . |
char int_n_sep_by_space |
International value for n_sep_by_space . |
char int_p_sign_posn |
International value for p_sign_posn . |
char int_n_sign_posn |
International value for n_sign_posn . |
OK, this is a trippy one. mon_grouping
is a char*
, so you might be thinking it’s a string. But in this case, no, it’s really an array of char
s. It should always end either with a 0
or CHAR_MAX
.
These values describe how to group sets of numbers in currency to the left of the decimal (the whole number part).
For example, we might have:
2 1 0
--- --- --- $100,000,000.00
These are groups of three. Group 0 (just left of the decimal) has 3 digits. Group 1 (next group to the left) has 3 digits, and the last one also has 3.
So we could describe these groups, from the right (the decimal) to the left with a bunch of integer values representing the group sizes:
3 3 3
And that would work for values up to $100,000,000.
But what if we had more? We could keep adding 3
s…
3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
but that’s crazy. Luckily, we can specify 0
to indicate that the previous group size repeats:
3 0
Which means to repeat every 3. That would handle $100, $1,000, $10,000, $10,000,000, $100,000,000,000, and so on.
You can go legitimately crazy with these to indicate some weird groupings.
For example:
4 3 2 1 0
would indicate:
$1,0,0,0,0,00,000,0000.00
One more value that can occur is CHAR_MAX
. This indicates that no more grouping should occur, and can appear anywhere in the array, including the first value.
3 2 CHAR_MAX
would indicate:
100000000,00,000.00
for example.
And simply having CHAR_MAX
in the first array position would tell you there was to be no grouping at all.
All the sep_by_space
variants deal with spacing around the currency sign. Valid values are:
Value | Description |
---|---|
0 |
No space between currency symbol and value. |
1 |
Separate the currency symbol (and sign, if any) from the value with a space. |
2 |
Separate the sign symbol from the currency symbol (if adjacent) with a space, otherwise separate the sign symbol from the value with a space. |
The sign_posn
variants are determined by the following values:
Value | Description |
---|---|
0 |
Put parens around the value and the currency symbol. |
1 |
Put the sign string in front of the currency symbol and value. |
2 |
Put the sign string after the currency symbol and value. |
3 |
Put the sign string directly in front of the currency symbol. |
4 |
Put the sign string directly behind the currency symbol. |
When I get the values on my system, this is what I see (grouping string displayed as individual byte values):
= "."
mon_decimal_point = ","
mon_thousands_sep = 3 3 0
mon_grouping = ""
positive_sign = "-"
negative_sign = "$"
currency_symbol = 2
frac_digits = 1
p_cs_precedes = 1
n_cs_precedes = 0
p_sep_by_space = 0
n_sep_by_space = 1
p_sign_posn = 1
n_sign_posn = "USD "
int_curr_symbol = 2
int_frac_digits = 1
int_p_cs_precedes = 1
int_n_cs_precedes = 1
int_p_sep_by_space = 1
int_n_sep_by_space = 1
int_p_sign_posn = 1 int_n_sign_posn
Notice how we passed the macro LC_ALL
to setlocale()
earlier… this hints that there might be some variant that allows you to be more precise about which parts of the locale you’re setting.
Let’s take a look at the values you can see for these:
Macro | Description |
---|---|
LC_ALL |
Set all of the following to the given locale. |
LC_COLLATE |
Controls the behavior of the strcoll() and strxfrm() functions. |
LC_CTYPE |
Controls the behavior of the character-handling functions157. |
LC_MONETARY |
Controls the values returned by localeconv() . |
LC_NUMERIC |
Controls the decimal point for the printf() family of functions. |
LC_TIME |
Controls time formatting of the strftime() and wcsftime() time and date printing functions. |
It’s pretty common to see LC_ALL
being set, but, hey, at least you have options.
Also I should point out that LC_CTYPE
is one of the biggies because it ties into wide characters, a significant can of worms that we’ll talk about later.