C++Talk.NET Forum Index C++Talk.NET
C++ language newsgroups
 
Archives   FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

imbue(locale) and file encoding

 
Post new topic   Reply to topic    C++Talk.NET Forum Index -> C++ language (comp.lang.c++)
View previous topic :: View next topic  
Author Message
Ralf Goertz
Guest





PostPosted: Wed Nov 15, 2006 10:10 am    Post subject: imbue(locale) and file encoding Reply with quote



Hi,

since my previous post
<455440ad$0$30326$9b4e6d93 (AT) newsspool1 (DOT) arcor-online.net> is still
unanswered I'd like to rephrase my question. In order to read/write a
wstring in UTF-8 encoding it is *not* sufficient to imbue the stream
with a locale like "de_DE.UTF-8". Doing so only takes care of facets of
decimal numbers and the like. Rather, one has to call
locale::global("de_DE.UTF-8"). Is this behaviour conforming to the
standard? And if so why? I mean why wouldn't wcin.imbue("de_DE.UTF-8")
make wcin accept UTF-8 multibyte characters while still allowing 5,7 to
be parsed as 5.7?

file wcintest.cc:
-------------
#include <iostream>
#include <string>
#include <locale>
using namespace std;

float f;
wstring euro;

int main(){
locale l("de_DE.UTF-8");
wcin.imbue(l);
locale::global(l); // (*)
wcin>>f>>euro;
wcout.imbue(locale("en_US.UTF-8"));
wcout<<f<<L" "<<euro<<endl;
}
-------------

Calling

$ echo "5,70 €" |./wcintest

in a UTF-8 environment gives

5.70 €

but only if the line marked (*) is present. Otherwise you only get

5.70

It seems as if the encoding part of the locale is ignored by the imbue
calls but I don't see why this should be the case.

I use g++ (GCC) 4.1.0 under linux (i386).

Ralf
Back to top
ondra.holub
Guest





PostPosted: Wed Nov 15, 2006 10:10 am    Post subject: Re: imbue(locale) and file encoding Reply with quote



Currently I do not have linux here (at work) so I am only guessing. Did
you try to change locale of output to German locale?

wcout.imbue(l);

Maybe the euro sign is not accepted by US locale.
Back to top
ondra.holub
Guest





PostPosted: Thu Nov 16, 2006 10:10 am    Post subject: Re: imbue(locale) and file encoding Reply with quote



I think that tolower function is not designed for C++. There should be
used facets instead, but the code looks a bit complicated:

#include <iostream>
#include <locale>

int main()
{
std::locale loc("german");
char s[] = "äÖü";

std::use_facet< std::ctype<char> >(loc).tolower(s, s + sizeof(s));
std::cout << s << std::endl;

std::use_facet< std::ctype<char> >(loc).toupper(s, s + sizeof(s));
std::cout << s << std::endl;

return 0;
}
Back to top
Ralf Goertz
Guest





PostPosted: Thu Nov 16, 2006 10:10 am    Post subject: Re: imbue(locale) and file encoding Reply with quote

ondra.holub wrote:

Quote:
Hi. I tried it on Open SUSE 10.1 and the behaviour is exactly the same
as you described. There is no problem when using cin, cout and string,
but it does not work with wide-character versions Sad

I would use cin, cout and string, but then there is the problem, that
string.size() and string.substr() do not work as expected.

Quote:
With wide strings it works also when you set global locale to
locale("") - the current user's system locale. Maybe standard library
expects latin-1 encoding as default and it is not correct for utf-8
systems. But I am only guessing. Anyway, I think it is not problem to
start the main function with locale::global(locale("")); and it should
work everywhere (hopefuly).

Yeah it works, but I don't see the logic. Suppose you want to convert a
german utf8-encoded text file with floats and euro signs into a latin1
encoded file with en_US locale. Then you always have to change the
global locale before switching from reading from wcin to writing to
wcout or vice versa. If source and destination had the same encoding
then one imbue call for each stream would be sufficient. As I have found
nothing on the net that says "imbue calls do not care about encoding" I
suspect it might be a bug in my libstdc++ implementation of the
standard. It would be nice to know how other compilers/libraries deal
with that situation.

Another problem I encountered is that tolower() does not work on wchar_t
Umlauts although I use the correct global locale.

Ralf
Back to top
Ralf Goertz
Guest





PostPosted: Thu Nov 16, 2006 10:10 am    Post subject: Re: imbue(locale) and file encoding Reply with quote

I wrote:

Quote:

Yeah it works, but I don't see the logic. Suppose you want to convert
a german utf8-encoded text file with floats and euro signs into a
latin1 encoded file with en_US locale. Then you always have to change
the global locale before switching from reading from wcin to writing
to wcout or vice versa.

I just found the following in Stroustrup (retranslated from German)

"Setting the global locale does not affect existing input/output
streams. The streams continue to use those locales that were assigned to
them using imbue() during their creation."

Ralf
Back to top
Display posts from previous:   
Post new topic   Reply to topic    C++Talk.NET Forum Index -> C++ language (comp.lang.c++) All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2006 phpBB Group
SEO toolkit © 2004-2006 webmedic.