|
View:
New views
5 Messages
—
Rating Filter:
Alert me
|
|
|
Proper handling of unicode stringsI'm currently in the process of writing an application which needs to
support unicode - but I'm still a little confused of how to properly handle it. Maybe someone can help me out here. First of is it valid for e.g. utf8 strings to assume they are NULL terminated? Would it be valid to call g_strdup on a utf8 string? If not (and this is done quite often in the unicode glib part) I assume I have to add the byte length of a string, right (which will bloat function declarations)? _______________________________________________ gtk-list mailing list gtk-list@... http://mail.gnome.org/mailman/listinfo/gtk-list |
|
|
Re: Proper handling of unicode stringsYes an UTF-8 string a NULL-terminated ASCII-compatible string. For all purposes except where you need to read it character-by-character (e.g. Gtk+/Pango "reading" the string to display it), you can just treat it like a normal ASCII string.
2008/7/6 LCID Fire <lcid-fire@...>: I'm currently in the process of writing an application which needs to -- ------------ Please note that according to the German law on data retention, information on every electronic information exchange with me is retained for a period of six months. [Bitte beachten Sie, dass dem Gesetz zur Vorratsdatenspeicherung zufolge jeder elektronische Kontakt mit mir sechs Monate lang gespeichert wird.] _______________________________________________ gtk-list mailing list gtk-list@... http://mail.gnome.org/mailman/listinfo/gtk-list |
|
|
Re: Proper handling of unicode stringsThat's great - simplifies a lot of things. But since one character might
need more space than a gchar is it save to call strlen on that string? Thanks Milosz Derezynski wrote: > Yes an UTF-8 string a NULL-terminated ASCII-compatible string. For all > purposes except where you need to read it character-by-character (e.g. > Gtk+/Pango "reading" the string to display it), you can just treat it > like a normal ASCII string. > > 2008/7/6 LCID Fire <lcid-fire@... <mailto:lcid-fire@...>>: > > I'm currently in the process of writing an application which needs to > support unicode - but I'm still a little confused of how to properly > handle it. Maybe someone can help me out here. > > First of is it valid for e.g. utf8 strings to assume they are NULL > terminated? Would it be valid to call g_strdup on a utf8 string? > > If not (and this is done quite often in the unicode glib part) I assume > I have to add the byte length of a string, right (which will bloat > function declarations)? gtk-list mailing list gtk-list@... http://mail.gnome.org/mailman/listinfo/gtk-list |
|
|
Re: Proper handling of unicode stringsIt's "safe" in the aforementioned sense, but if you want to properly count characters in the UTF-8 string, you should use g_utf8_strlen() instead.
2008/7/7 LCID Fire <lcid-fire@...>: That's great - simplifies a lot of things. But since one character might -- ------------ Please note that according to the German law on data retention, information on every electronic information exchange with me is retained for a period of six months. [Bitte beachten Sie, dass dem Gesetz zur Vorratsdatenspeicherung zufolge jeder elektronische Kontakt mit mir sechs Monate lang gespeichert wird.] _______________________________________________ gtk-list mailing list gtk-list@... http://mail.gnome.org/mailman/listinfo/gtk-list |
|
|
Re: Proper handling of unicode stringsOn Mon, 7 Jul 2008 12:01:36 +0200
"Milosz Derezynski" <internalerror@...> wrote: > It's "safe" in the aforementioned sense, but if you want to properly > count characters in the UTF-8 string, you should use g_utf8_strlen() > instead. > > 2008/7/7 LCID Fire <lcid-fire@...>: > > > That's great - simplifies a lot of things. But since one character > > might need more space than a gchar is it save to call strlen on > > that string? It is not just "safe" in the sense described above, but required if you need to know the byte length (say to allocate storage on the heap). If you need to know the byte length use strlen(). If you need to know the number of characters (which will be rare, unless you are thinking of converting say to UCS-4), then use g_utf8_strlen(). If you want to iterate over the string then g_utf8_next_char() is handy. Chris _______________________________________________ gtk-list mailing list gtk-list@... http://mail.gnome.org/mailman/listinfo/gtk-list |
| Free Forum Powered by Nabble | Forum Help |