New "Unicode" bundle in the Review trunk

View: New views
15 Messages — Rating Filter:   Alert me  

New "Unicode" bundle in the Review trunk

by Hans-Jörg Bibiko :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Dear all,

there's a new bundle called "Unicode" in the review trunk. It is meant  
to be a place where we can gather any kind of scripts, commands, etc.  
which are related to general Unicode issue, meaning non-ASCII. This  
should also a place where we can gather scripts related to specific  
languages like Japanese, Chinese, Greek etc.
This bundle is the first stage. How do we separate this bundle is a  
future task.

Thus, if there is someone who already has such scripts or is willing  
to support, please let us/me know.

Up to now there are the following stuff in:

- Normalize according canonical (de)composition of accented characters
- Delete Diacritics: façadë έ だ => facade ε た
- Convert to a similar Unicode Character: type the letter 'c' to get a  
list of "cçćĉċčƈ¢ɕʗḉ⒞ⓒc¢"
- Convert to Greek Character: type 'n' to get "ν"
- Show Unicode Name: select some letters to get a list of the Unicode  
names like LATIN SMALL LETTER A

I have many other scripts, but I need some time to polish them up.

To get this bundle, simply use the Subversion Bundle's checkout

http://macromates.com/svn/Bundles/trunk/Review/Bundles/Unicode.tmbundle

save this to the Desktop or whatever.

I know, to deal with non-ASCII scripts in TM 1.x is a bit tricky, but  
TM 2.0 will come ;)

Cheers,

--Hans
______________________________________________________________________
For new threads USE THIS: textmate@...
(threading gets destroyed and the universe will collapse if you don't)
http://lists.macromates.com/mailman/listinfo/textmate

Re: New "Unicode" bundle in the Review trunk

by Walter Dörwald :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hans-Joerg Bibiko wrote:

> Dear all,
>
> there's a new bundle called "Unicode" in the review trunk. It is meant
> to be a place where we can gather any kind of scripts, commands, etc.
> which are related to general Unicode issue, meaning non-ASCII. This
> should also a place where we can gather scripts related to specific
> languages like Japanese, Chinese, Greek etc.
> This bundle is the first stage. How do we separate this bundle is a
> future task.
>
> Thus, if there is someone who already has such scripts or is willing to
> support, please let us/me know.
>
> Up to now there are the following stuff in:
>
> - Normalize according canonical (de)composition of accented characters
> - Delete Diacritics: façadë έ だ => facade ε た
> - Convert to a similar Unicode Character: type the letter 'c' to get a
> list of "cçćĉċčƈ¢ɕʗḉ⒞ⓒc¢"
> - Convert to Greek Character: type 'n' to get "ν"
> - Show Unicode Name: select some letters to get a list of the Unicode
> names like LATIN SMALL LETTER A
>
> I have many other scripts, but I need some time to polish them up.
>
> To get this bundle, simply use the Subversion Bundle's checkout
>
> http://macromates.com/svn/Bundles/trunk/Review/Bundles/Unicode.tmbundle
>
> save this to the Desktop or whatever.
>
> I know, to deal with non-ASCII scripts in TM 1.x is a bit tricky, but TM
> 2.0 will come ;)

One small note:

In the character name script you should probably call unicodedata.name()
with a second argument in case the character has no name, i.e. replace

     res = a + " : " + unicodedata.name(a)

with

     res = a + " : " + unicodedata.name(a, "U+%04X" % ord(a))

Furthermore it would be great if this script could display all
information there is in the Python Unicode database, i.e. stuff like

    unicodedata.category()
    unicodedata.bidrectional()
    unicodedata.decimal()

etc.

Servus,
    Walter

______________________________________________________________________
For new threads USE THIS: textmate@...
(threading gets destroyed and the universe will collapse if you don't)
http://lists.macromates.com/mailman/listinfo/textmate

Re: New "Unicode" bundle in the Review trunk

by Hans-Jörg Bibiko :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


On 30.05.2008, at 17:32, Walter Dörwald wrote:

> Hans-Joerg Bibiko wrote:
>> Dear all,
>> there's a new bundle called "Unicode" in the review trunk. It is  
>> meant to be a place where we can gather any kind of scripts,  
>> commands, etc. which are related to general Unicode issue, meaning  
>> non-ASCII. This should also a place ...
>
> One small note:
>
> In the character name script you should probably call  
> unicodedata.name() with a second argument in case the character has  
> no name, i.e. replace
>
>     res = a + " : " + unicodedata.name(a)
>
> with
>
>     res = a + " : " + unicodedata.name(a, "U+%04X" % ord(a))
Thanks for the hint! These are more or less the first scripts which I  
wrote in python ;)
Caused by the issue that python has installed some Unicode data per  
default.

> Furthermore it would be great if this script could display all  
> information there is in the Python Unicode database, i.e. stuff like
>
>    unicodedata.category()
>    unicodedata.bidrectional()
>    unicodedata.decimal()
Yes. I have such a script in Perl which also shows up info about  
Unicode code points etc.

Servus,

--Hans
______________________________________________________________________
For new threads USE THIS: textmate@...
(threading gets destroyed and the universe will collapse if you don't)
http://lists.macromates.com/mailman/listinfo/textmate

Re: New "Unicode" bundle in the Review trunk

by Walter Dörwald :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hans-Jörg Bibiko wrote:
 > On 30.05.2008, at 17:32, Walter Dörwald wrote:
 >
 >> Hans-Joerg Bibiko wrote:
 >>> Dear all,
 >>> there's a new bundle called "Unicode" in the review trunk. It is
 >>> meant to be a place where we can gather any kind of scripts,
 >>> commands, etc. which are related to general Unicode issue, meaning
 >>> non-ASCII. This should also a place ...
 >>
 >> One small note:
 >>
 >> In the character name script you should probably call
 >> unicodedata.name() with a second argument in case the character has no
 >> name, i.e. replace
 >>
 >>     res = a + " : " + unicodedata.name(a)
 >>
 >> with
 >>
 >>     res = a + " : " + unicodedata.name(a, "U+%04X" % ord(a))
 > Thanks for the hint! These are more or less the first scripts which I
 > wrote in python ;)
 > Caused by the issue that python has installed some Unicode data per
 > default.

Here's another patch (against the current version). It shows both the
codepoint and the name.

BTW, you don't have to use a regular expression to split a string into
characters, simply iterating through it does the trick:

Index: Commands/Show Unicode Names.tmCommand
===================================================================
--- Commands/Show Unicode Names.tmCommand (revision 9813)
+++ Commands/Show Unicode Names.tmCommand (working copy)
@@ -8,11 +8,13 @@
  <string>#!/usr/bin/python
  import unicodedata
  import sys
-import re

-for a in re.compile("(?um)(.)").split(unicode(sys.stdin.read(), "UTF-8")):
-     if (len(a)==1) and (a != '\n'):
-          res = a + " : " + unicodedata.name(a, "U+%04X" % ord(a))
+for a in unicode(sys.stdin.read(), "UTF-8"):
+     if a != '\n':
+          res = u"%s : U+%04X" % (a, ord(a))
+          name = unicodedata.name(a, None)
+          if name:
+              res += u" : %s" % name
            print res.encode("UTF-8")</string>
  <key>fallbackInput</key>
  <string>character</string>


 >> Furthermore it would be great if this script could display all
 >> information there is in the Python Unicode database, i.e. stuff like
 >>
 >>    unicodedata.category()
 >>    unicodedata.bidrectional()
 >>    unicodedata.decimal()
 > Yes. I have such a script in Perl which also shows up info about Unicode
 > code points etc.

OK, now I see that the script displays information about every character
in the selection. Adding more info might be a space problem.

Another problem: Using Ctrl-Shift-U as the shortcut hides the "Convert
To Lowercase" command.

Servus,
    Walter

______________________________________________________________________
For new threads USE THIS: textmate@...
(threading gets destroyed and the universe will collapse if you don't)
http://lists.macromates.com/mailman/listinfo/textmate

Re: New "Unicode" bundle in the Review trunk

by Hans-Jörg Bibiko :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On 02.06.2008, at 00:04, Walter Dörwald wrote:

> Here's another patch (against the current version). It shows both  
> the codepoint and the name.
>
> BTW, you don't have to use a regular expression to split a string  
> into characters, simply iterating through it does the trick:
>
> Index: Commands/Show Unicode Names.tmCommand
> -for a in re.compile("(?um)(.)").split(unicode(sys.stdin.read(),  
> "UTF-8")):
> -     if (len(a)==1) and (a != '\n'):
> -          res = a + " : " + unicodedata.name(a, "U+%04X" % ord(a))
> +for a in unicode(sys.stdin.read(), "UTF-8"):
> +     if a != '\n':
> +          res = u"%s : U+%04X" % (a, ord(a))
> +          name = unicodedata.name(a, None)
> +          if name:
> +              res += u" : %s" % name
>            print res.encode("UTF-8")</string>
>   <key>fallbackInput</key>
>   <string>character</string>
Thanks! Just committed to the trunk.

> >> Furthermore it would be great if this script could display all
> >> information there is in the Python Unicode database, i.e. stuff  
> like
> >>
> >>    unicodedata.category()
> >>    unicodedata.bidrectional()
> >>    unicodedata.decimal()
> > Yes. I have such a script in Perl which also shows up info about  
> Unicode
> > code points etc.
Just added to the bundle a prototype of 'Show Unicode Properties'


> Another problem: Using Ctrl-Shift-U as the shortcut hides the  
> "Convert To Lowercase" command.
Yes. This was a bad key combo. I changed it temporally to CTRL+OPT
+APPLE+U

BTW: Can Python handle Unicode codepoints which are specified in  
Unicode pane B, meaning greater U+FFFF? I tried it out. I found out  
that Python uses UTF-16 internally.
But e.g. UCS hex: 20000 ; UTF-16: D840 DC00 .
I can print that character to TM but unicodedata fails because it  
expects one character but not two (?)

Servus,

--der Hans
______________________________________________________________________
For new threads USE THIS: textmate@...
(threading gets destroyed and the universe will collapse if you don't)
http://lists.macromates.com/mailman/listinfo/textmate

Re: New "Unicode" bundle in the Review trunk

by Alexey Blinov :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Sorry but do i miss something? I have that error
-------------------------
Traceback (most recent call last):
  File "/tmp/temp_textmate.0WYiu4", line 50, in <module>
    result=dialog.menu([re.sub(r"(?=[^a-zA-Z0-9_ .\/\-\x7F-\xFF\n])",
r'\\', a) + "\t" + unicodedata.name(a, "U+%04X" % ord(a)) for a in
suggestions])
  File "/Applications/TextMate.app/Contents/SharedSupport/Support/lib/dialog.py",
line 51, in menu
    plist = to_plist(menu)
UnboundLocalError: local variable 'menu' referenced before assignment
-------------------------
when try to "Convert to Greek..." or "Convert to Similar..."

Alexey Blinov

On Mon, Jun 2, 2008 at 3:09 AM, Hans-Jörg Bibiko <bibiko@...> wrote:

> On 02.06.2008, at 00:04, Walter Dörwald wrote:
>>
>> Here's another patch (against the current version). It shows both the
>> codepoint and the name.
>>
>> BTW, you don't have to use a regular expression to split a string into
>> characters, simply iterating through it does the trick:
>>
>> Index: Commands/Show Unicode Names.tmCommand
>> -for a in re.compile("(?um)(.)").split(unicode(sys.stdin.read(),
>> "UTF-8")):
>> -     if (len(a)==1) and (a != '\n'):
>> -          res = a + " : " + unicodedata.name(a, "U+%04X" % ord(a))
>> +for a in unicode(sys.stdin.read(), "UTF-8"):
>> +     if a != '\n':
>> +          res = u"%s : U+%04X" % (a, ord(a))
>> +          name = unicodedata.name(a, None)
>> +          if name:
>> +              res += u" : %s" % name
>>           print res.encode("UTF-8")</string>
>>        <key>fallbackInput</key>
>>        <string>character</string>
>
> Thanks! Just committed to the trunk.
>
>> >> Furthermore it would be great if this script could display all
>> >> information there is in the Python Unicode database, i.e. stuff like
>> >>
>> >>    unicodedata.category()
>> >>    unicodedata.bidrectional()
>> >>    unicodedata.decimal()
>> > Yes. I have such a script in Perl which also shows up info about Unicode
>> > code points etc.
>
> Just added to the bundle a prototype of 'Show Unicode Properties'
>
>
>> Another problem: Using Ctrl-Shift-U as the shortcut hides the "Convert To
>> Lowercase" command.
>
> Yes. This was a bad key combo. I changed it temporally to CTRL+OPT+APPLE+U
>
> BTW: Can Python handle Unicode codepoints which are specified in Unicode
> pane B, meaning greater U+FFFF? I tried it out. I found out that Python uses
> UTF-16 internally.
> But e.g. UCS hex: 20000 ; UTF-16: D840 DC00 .
> I can print that character to TM but unicodedata fails because it expects
> one character but not two (?)
>
> Servus,
>
> --der Hans
> ______________________________________________________________________
> For new threads USE THIS: textmate@...
> (threading gets destroyed and the universe will collapse if you don't)
> http://lists.macromates.com/mailman/listinfo/textmate
>


______________________________________________________________________
For new threads USE THIS: textmate@...
(threading gets destroyed and the universe will collapse if you don't)
http://lists.macromates.com/mailman/listinfo/textmate

Re: New "Unicode" bundle in the Review trunk

by Hans-Jörg Bibiko :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


On 2 Jun 2008, at 15:26, Alexey Blinov wrote:

> Sorry but do i miss something? I have that error
> -------------------------
> Traceback (most recent call last):
>  File "/tmp/temp_textmate.0WYiu4", line 50, in <module>
>    result=dialog.menu([re.sub(r"(?=[^a-zA-Z0-9_ .\/\-\x7F-\xFF\n])",
> r'\\', a) + "\t" + unicodedata.name(a, "U+%04X" % ord(a)) for a in
> suggestions])
>  File "/Applications/TextMate.app/Contents/SharedSupport/Support/lib/
> dialog.py",
> line 51, in menu
>    plist = to_plist(menu)
> UnboundLocalError: local variable 'menu' referenced before assignment
> -------------------------
> when try to "Convert to Greek..." or "Convert to Similar..."

You have to upgrade dialog.py in /Applications/TextMate.app/Contents/
SharedSupport/Support/lib

The old version didn't support UTF-8.

Cheers,

Hans

______________________________________________________________________
For new threads USE THIS: textmate@...
(threading gets destroyed and the universe will collapse if you don't)
http://lists.macromates.com/mailman/listinfo/textmate

Re: New "Unicode" bundle in the Review trunk

by Walter Dörwald :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hans-Jörg Bibiko wrote:

> On 02.06.2008, at 00:04, Walter Dörwald wrote:
>> Here's another patch (against the current version). It shows both the
>> codepoint and the name.
>>
>> BTW, you don't have to use a regular expression to split a string into
>> characters, simply iterating through it does the trick:
>>
>> Index: Commands/Show Unicode Names.tmCommand
>> -for a in re.compile("(?um)(.)").split(unicode(sys.stdin.read(),
>> "UTF-8")):
>> -     if (len(a)==1) and (a != '\n'):
>> -          res = a + " : " + unicodedata.name(a, "U+%04X" % ord(a))
>> +for a in unicode(sys.stdin.read(), "UTF-8"):
>> +     if a != '\n':
>> +          res = u"%s : U+%04X" % (a, ord(a))
>> +          name = unicodedata.name(a, None)
>> +          if name:
>> +              res += u" : %s" % name
>>            print res.encode("UTF-8")</string>
>>      <key>fallbackInput</key>
>>      <string>character</string>
> Thanks! Just committed to the trunk.
>
>> >> Furthermore it would be great if this script could display all
>> >> information there is in the Python Unicode database, i.e. stuff like
>> >>
>> >>    unicodedata.category()
>> >>    unicodedata.bidrectional()
>> >>    unicodedata.decimal()
>> > Yes. I have such a script in Perl which also shows up info about
>> Unicode
>> > code points etc.
> Just added to the bundle a prototype of 'Show Unicode Properties'
>
>
>> Another problem: Using Ctrl-Shift-U as the shortcut hides the "Convert
>> To Lowercase" command.
> Yes. This was a bad key combo. I changed it temporally to CTRL+OPT+APPLE+U
>
> BTW: Can Python handle Unicode codepoints which are specified in Unicode
> pane B, meaning greater U+FFFF? I tried it out. I found out that Python
> uses UTF-16 internally.

At least the Python that ships with the OS uses 2 byte Unicode character
with partial UTF-16 support:

Python 2.5.2 (r252:60911, Apr  8 2008, 18:54:00)
[GCC 3.3.5 (Debian 1:3.3.5-13)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
 >>> import sys
 >>> sys.maxunicode
65535

The size of a Unicode character is specified at compile time with the
--enable-unicode option, so you *could* compile a wide Python with:
./configure --enable-unicode=ucs4

> But e.g. UCS hex: 20000 ; UTF-16: D840 DC00 .
> I can print that character to TM but unicodedata fails because it
> expects one character but not two (?)

There are some spots in the Python code base where in narrow builds
surrogate pairs are interpreted properly as characters outside the BMP,
but unicodedata isn't one of them (so it's not actually real UTF-16
throughout). There's an open issue on the Python bugtracker about that:

http://bugs.python.org/issue1706460

So there are two options:

1) Apple starts compiling its Python with --enable-unicode=ucs4
2) Python gets fixed so that surrogate pairs can be passed to
unicodedata functions.

I think I might give 2) a try.

Servus,
    Walter

______________________________________________________________________
For new threads USE THIS: textmate@...
(threading gets destroyed and the universe will collapse if you don't)
http://lists.macromates.com/mailman/listinfo/textmate

Re: New "Unicode" bundle in the Review trunk

by Walter Dörwald :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Walter Dörwald wrote:

> Hans-Jörg Bibiko wrote:
>
>> On 02.06.2008, at 00:04, Walter Dörwald wrote:
>>> Here's another patch (against the current version). It shows both the
>>> codepoint and the name.
>>> [...]

Here's another suggestions on the current Bundle version:

To get the UTF-8 bytes of a character, you're doing the following:

     print "  UTF-8         : " + "
".join(repr(char.encode("UTF-8")).split('\\x')).lstrip("'
").rstrip("'").upper()

This only works for characters with a codepoint >= 128. The following
code should work better:

     print "  UTF-8         : %s" % " ".join(hex(ord(c))[2:].upper() for
c in char)

Furthermore the code:

    decomp = unicodedata.decomposition(char).lstrip(' ').rstrip(' ')

can be simplyfied to:

    decomp = unicodedata.decomposition(char).strip()

(strip() strips from both ends and stripping all whitespace is the
default when no argument is given.)

Hope that helps.

Servus,
    Walter


______________________________________________________________________
For new threads USE THIS: textmate@...
(threading gets destroyed and the universe will collapse if you don't)
http://lists.macromates.com/mailman/listinfo/textmate

Re: New "Unicode" bundle in the Review trunk

by Walter Dörwald :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Walter Dörwald wrote:

> Walter Dörwald wrote:
>
>> Hans-Jörg Bibiko wrote:
>>
>>> On 02.06.2008, at 00:04, Walter Dörwald wrote:
>>>> Here's another patch (against the current version). It shows both
>>>> the codepoint and the name.
>>>> [...]
>
> Here's another suggestions on the current Bundle version:
>
> To get the UTF-8 bytes of a character, you're doing the following:
>
>     print "  UTF-8         : " + "
> ".join(repr(char.encode("UTF-8")).split('\\x')).lstrip("'
> ").rstrip("'").upper()
>
> This only works for characters with a codepoint >= 128. The following
> code should work better:
>
>     print "  UTF-8         : %s" % " ".join(hex(ord(c))[2:].upper() for
> c in char)

Oops, that was of course supposed to be:

      print "  UTF-8         : %s" % " ".join(hex(ord(c))[2:].upper()
for  c in char.encode("utf-8"))

Servus,
    Walter

______________________________________________________________________
For new threads USE THIS: textmate@...
(threading gets destroyed and the universe will collapse if you don't)
http://lists.macromates.com/mailman/listinfo/textmate

Re: New "Unicode" bundle in the Review trunk

by Hans-Jörg Bibiko :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


On 3 Jun 2008, at 17:29, Walter Dörwald wrote:

> Walter Dörwald wrote:
>> Walter Dörwald wrote:
>>> Hans-Jörg Bibiko wrote:
>>>
>>>> On 02.06.2008, at 00:04, Walter Dörwald wrote:
>>>>> Here's another patch (against the current version). It shows  
>>>>> both the codepoint and the name.
>>>>> [...]
>> Here's another suggestions on the current Bundle version:
>> To get the UTF-8 bytes of a character, you're doing the following:
>>    print "  UTF-8         : " + "  
>> ".join(repr(char.encode("UTF-8")).split('\\x')).lstrip("'  
>> ").rstrip("'").upper()
>> This only works for characters with a codepoint >= 128. The  
>> following code should work better:
>>    print "  UTF-8         : %s" % " ".join(hex(ord(c))[2:].upper()  
>> for c in char)
>
> Oops, that was of course supposed to be:
>
>     print "  UTF-8         : %s" % " ".join(hex(ord(c))[2:].upper()  
> for  c in char.encode("utf-8"))

Once again, thanks a lot for teaching me Python ;)
The code changes are in the SVN trunk.


--Hans
______________________________________________________________________
For new threads USE THIS: textmate@...
(threading gets destroyed and the universe will collapse if you don't)
http://lists.macromates.com/mailman/listinfo/textmate

Re: New "Unicode" bundle in the Review trunk

by Hans-Jörg Bibiko :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


On 03.06.2008, at 17:29, Walter Dörwald wrote:

> Walter Dörwald wrote:
>> Walter Dörwald wrote:
>>> Hans-Jörg Bibiko wrote:
>>>
>>>> On 02.06.2008, at 00:04, Walter Dörwald wrote:
>>>>> Here's another patch (against the current version). It shows  
>>>>> both the codepoint and the name.
>>>>> [...]
>> Here's another suggestions on the current Bundle version:
>> To get the UTF-8 bytes of a character, you're doing the following:
>>     print "  UTF-8         : " + " ".join(repr(char.encode
>> ("UTF-8")).split('\\x')).lstrip("' ").rstrip("'").upper()
>> This only works for characters with a codepoint >= 128. The  
>> following code should work better:
>>     print "  UTF-8         : %s" % " ".join(hex(ord(c))[2:].upper
>> () for c in char)
>
> Oops, that was of course supposed to be:
>
>      print "  UTF-8         : %s" % " ".join(hex(ord(c))[2:].upper
> () for  c in char.encode("utf-8"))
Could it be that this isn't allowed in Python for Tiger? I get an  
error for invalid syntax referring to 'for'
Mac OSX 10.4.11 ppc; Python 2.4.2

On my 10.5.3 Mac it works(?)

--Hans
______________________________________________________________________
For new threads USE THIS: textmate@...
(threading gets destroyed and the universe will collapse if you don't)
http://lists.macromates.com/mailman/listinfo/textmate

Re: New "Unicode" bundle in the Review trunk

by Walter Dörwald :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hans-Jörg Bibiko wrote:

>
> On 03.06.2008, at 17:29, Walter Dörwald wrote:
>
>> Walter Dörwald wrote:
>>> Walter Dörwald wrote:
>>>> Hans-Jörg Bibiko wrote:
>>>>
>>>>> On 02.06.2008, at 00:04, Walter Dörwald wrote:
>>>>>> Here's another patch (against the current version). It shows both
>>>>>> the codepoint and the name.
>>>>>> [...]
>>> Here's another suggestions on the current Bundle version:
>>> To get the UTF-8 bytes of a character, you're doing the following:
>>>     print "  UTF-8         : " + "
>>> ".join(repr(char.encode("UTF-8")).split('\\x')).lstrip("'
>>> ").rstrip("'").upper()
>>> This only works for characters with a codepoint >= 128. The following
>>> code should work better:
>>>     print "  UTF-8         : %s" % " ".join(hex(ord(c))[2:].upper()
>>> for c in char)
>>
>> Oops, that was of course supposed to be:
>>
>>      print "  UTF-8         : %s" % " ".join(hex(ord(c))[2:].upper()
>> for  c in char.encode("utf-8"))
> Could it be that this isn't allowed in Python for Tiger? I get an error
> for invalid syntax referring to 'for'
> Mac OSX 10.4.11 ppc; Python 2.4.2
>
> On my 10.5.3 Mac it works(?)

AFAICR Tiger has Python 2.3, which didn't support generator expressions.

The following should work:

print "  UTF-8         : %s" % " ".join([hex(ord(c))[2:].upper() for c
in char.encode("utf-8")])

(i.e. replace the generator expression with a list comprehension by
adding [] around the join argument.)

Hope that helps!

Servus,
    Walter

______________________________________________________________________
For new threads USE THIS: textmate@...
(threading gets destroyed and the universe will collapse if you don't)
http://lists.macromates.com/mailman/listinfo/textmate

Re: New "Unicode" bundle in the Review trunk

by Hans-Jörg Bibiko :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


On 03.06.2008, at 22:44, Walter Dörwald wrote:

> print "  UTF-8         : %s" % " ".join([hex(ord(c))[2:].upper()  
> for c in char.encode("utf-8")])

Thanks. This did the trick.
I thought that I did try out this [ ]-notation, but anyway ... the  
main thing is that it works :)

--Hans
______________________________________________________________________
For new threads USE THIS: textmate@...
(threading gets destroyed and the universe will collapse if you don't)
http://lists.macromates.com/mailman/listinfo/textmate

Re: New "Unicode" bundle in the Review trunk

by Hans-Jörg Bibiko :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi,

there are some more commands available:





Furthermore I wrote some basic syntax highlighting stuff to display  
'no ASCII', 'no Latin', 'all combining diacritics' characters.

Most of the commands also support Unicode higher than U+FFFF.
[But be careful! Up to now TM 1 can display (more or less) these  
characters, but each char are in TM 1 two chars! If you place the  
caret in between and invoke a command TM will crash immediately! But  
TM 2 supports these chars ;) ]

I wrote a new Chinese Traditional <> Simplified Converter. It also  
converts characters > U+FFFF (Apple's not ;) ), and it show up all  
those characters which have more than one counterpart as snippets  
{A=B|C}. There's a command which shows a menu displaying B and C etc.  
to disambiguate (Apple does not do that).

All Unicode data are coming from the latest Unicode 5.1 and it's easy  
to upgrade.

Show Unicode Properties also shows all known information about  
Chinese/Japanese/Korean ideographs, like Radical, readings, Wubi Xing  
codes, etc. All these data are coming from Apple's Character Palette  
internals ;)
But I think about to integrate Unicode's UniHan database. This zip  
file (6MB) won't be part of that bundle. Anyone who wants to use it  
can download it (I will provide a command for that).

Last but not least I want to say thank you to Walter Dörwald who  
helped me a lot with the Python scripts.

Cheers,

--Hans




______________________________________________________________________
For new threads USE THIS: textmate@...
(threading gets destroyed and the universe will collapse if you don't)
http://lists.macromates.com/mailman/listinfo/textmate

uni.jpg (64K) Download Attachment