Jsoftware
High-Performance Development Platform

The APL character set strikes from beyond the grave!

View: New views
6 Messages — Rating Filter:   Alert me  

The APL character set strikes from beyond the grave!

by Devon McCormick :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

 Members of the Forum -

I just tracked down a bug in "fixargs.ijs" whereby it failed to fully
convert all my old code from using the "y." to "y" etc. argument convention.

The function "fixarg" seems to fail silently for any file with double-byte
characters in it.  My subject line refers to (one place) where the problem
cropped up: I have some code converted from APL in which I've retained the
old APL code as comments.  Once again, that lovely character set causes
problems.

The root cause appears to be in the regular expression code which apparently
doesn't play well with the double-byte characters.  It fails to match the
target string if there's such a character anywhere in the source string.

Here's a short example of the problem:

   load 'regex'
   ]str=. a.{~67 104 97 114 97 99 116 101 114 32 133 32 110 117 109 98 101
114
Character ? number

   'er' rxmatches str   NB. No result though there should be two matches.
   'er' rxmatch str
_1 0

NB. But if we replace one (double-byte flag) character by a space:
   'er' rxmatches (' ') 10}str
7 2

16 2

The good fix for this - removing the offending characters, fixing the code,
then replacing the removed characters - appears complicated by the fact that
the (initial) replacement in "fixarg" - the line "y=. ((sx;,2);x) rxrplc y"
- changes the length of the string.  It's probably OK to remove but
not replace these characters but I'm not exactly sure how to identify them.

Any ideas?

Thanks,

Devon

--
Devon McCormick, CFA
^me^ at acm.
org is my
preferred e-mail
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: The APL character set strikes from beyond the grave!

by Devon McCormick :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

It looks like the fix may be as simple as replacing the single "fread" with
"ufread" - I'm testing it now.
Though "ufread.ijs" is available on the J Wiki, it does not appear to be
part of the standard distribution.

On 6/30/08, Devon McCormick <devonmcc@...> wrote:

>
>  Members of the Forum -
>
> I just tracked down a bug in "fixargs.ijs" whereby it failed to fully
> convert all my old code from using the "y." to "y" etc. argument convention.
>
> The function "fixarg" seems to fail silently for any file with double-byte
> characters in it.  My subject line refers to (one place) where the problem
> cropped up: I have some code converted from APL in which I've retained the
> old APL code as comments.  Once again, that lovely character set causes
> problems.
>
> The root cause appears to be in the regular expression code which
> apparently doesn't play well with the double-byte characters.  It fails to
> match the target string if there's such a character anywhere in the source
> string.
>
> Here's a short example of the problem:
>
>    load 'regex'
>    ]str=. a.{~67 104 97 114 97 99 116 101 114 32 133 32 110 117 109 98 101
> 114
> Character ? number
>
>    'er' rxmatches str   NB. No result though there should be two matches.
>    'er' rxmatch str
> _1 0
>
> NB. But if we replace one (double-byte flag) character by a space:
>    'er' rxmatches (' ') 10}str
> 7 2
>
> 16 2
>
> The good fix for this - removing the offending characters, fixing the code,
> then replacing the removed characters - appears complicated by the fact that
> the (initial) replacement in "fixarg" - the line "y=. ((sx;,2);x) rxrplc y"
> - changes the length of the string.  It's probably OK to remove but
> not replace these characters but I'm not exactly sure how to identify them.
>
> Any ideas?
>
> Thanks,
>
> Devon
>
> --
> Devon McCormick, CFA
> ^me^ at acm.
> org is my
> preferred e-mail
>



--
Devon McCormick, CFA
^me^ at acm.
org is my
preferred e-mail
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: The APL character set strikes from beyond the grave!

by bill lam-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

This is not a bug in fixargs. The root cause is that your APL characters are
illegal in utf8. I bet those lovely characters will not display inside J602.

turn off utf8 mode in regex by rxutf8 0
or even better, convert those APL characters to true utf8 unicode.

regards,

Devon McCormick wrote:

>  Members of the Forum -
>
> I just tracked down a bug in "fixargs.ijs" whereby it failed to fully
> convert all my old code from using the "y." to "y" etc. argument convention.
>
> The function "fixarg" seems to fail silently for any file with double-byte
> characters in it.  My subject line refers to (one place) where the problem
> cropped up: I have some code converted from APL in which I've retained the
> old APL code as comments.  Once again, that lovely character set causes
> problems.
>
> The root cause appears to be in the regular expression code which apparently
> doesn't play well with the double-byte characters.  It fails to match the
> target string if there's such a character anywhere in the source string.
>
> Here's a short example of the problem:
>
>    load 'regex'
>    ]str=. a.{~67 104 97 114 97 99 116 101 114 32 133 32 110 117 109 98 101
> 114
> Character ? number
>
>    'er' rxmatches str   NB. No result though there should be two matches.
>    'er' rxmatch str
> _1 0
>
> NB. But if we replace one (double-byte flag) character by a space:
>    'er' rxmatches (' ') 10}str
> 7 2
>
> 16 2
>
> The good fix for this - removing the offending characters, fixing the code,
> then replacing the removed characters - appears complicated by the fact that
> the (initial) replacement in "fixarg" - the line "y=. ((sx;,2);x) rxrplc y"
> - changes the length of the string.  It's probably OK to remove but
> not replace these characters but I'm not exactly sure how to identify them.
>
> Any ideas?
>
> Thanks,
>
> Devon
>

----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: The APL character set strikes from beyond the grave!

by Devon McCormick :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Actually, the characters do display properly in both the session manager and
under emacs if I set my font to Dyalog.

As I mentioned, changing "fread" to "ufread" seems to fix the problem.

On 6/30/08, bill lam <bbill.lam@...> wrote:

>
> This is not a bug in fixargs. The root cause is that your APL characters
> are illegal in utf8. I bet those lovely characters will not display inside
> J602.
>
> turn off utf8 mode in regex by rxutf8 0
> or even better, convert those APL characters to true utf8 unicode.
>
> regards,
>
> Devon McCormick wrote:
>
>>  Members of the Forum -
>>
>> I just tracked down a bug in "fixargs.ijs" whereby it failed to fully
>> convert all my old code from using the "y." to "y" etc. argument
>> convention.
>>
>> The function "fixarg" seems to fail silently for any file with double-byte
>> characters in it.  My subject line refers to (one place) where the problem
>> cropped up: I have some code converted from APL in which I've retained the
>> old APL code as comments.  Once again, that lovely character set causes
>> problems.
>>
>> The root cause appears to be in the regular expression code which
>> apparently
>> doesn't play well with the double-byte characters.  It fails to match the
>> target string if there's such a character anywhere in the source string.
>>
>> Here's a short example of the problem:
>>
>>   load 'regex'
>>   ]str=. a.{~67 104 97 114 97 99 116 101 114 32 133 32 110 117 109 98 101
>> 114
>> Character ? number
>>
>>   'er' rxmatches str   NB. No result though there should be two matches.
>>   'er' rxmatch str
>> _1 0
>>
>> NB. But if we replace one (double-byte flag) character by a space:
>>   'er' rxmatches (' ') 10}str
>> 7 2
>>
>> 16 2
>>
>> The good fix for this - removing the offending characters, fixing the
>> code,
>> then replacing the removed characters - appears complicated by the fact
>> that
>> the (initial) replacement in "fixarg" - the line "y=. ((sx;,2);x) rxrplc
>> y"
>> - changes the length of the string.  It's probably OK to remove but
>> not replace these characters but I'm not exactly sure how to identify
>> them.
>>
>> Any ideas?
>>
>> Thanks,
>>
>> Devon
>>
>>
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
>



--
Devon McCormick, CFA
^me^ at acm.
org is my
preferred e-mail
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: The APL character set strikes from beyond the grave!

by bill lam-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Did the j602 session manager display glyph for characters above 127, eg the 133
in your example?

Devon McCormick wrote:

> Actually, the characters do display properly in both the session manager and
> under emacs if I set my font to Dyalog.
>
> As I mentioned, changing "fread" to "ufread" seems to fix the problem.
>
> On 6/30/08, bill lam <bbill.lam@...> wrote:
>> This is not a bug in fixargs. The root cause is that your APL characters
>> are illegal in utf8. I bet those lovely characters will not display inside
>> J602.
>>
>> turn off utf8 mode in regex by rxutf8 0
>> or even better, convert those APL characters to true utf8 unicode.
>>
>> regards,
>>
>> Devon McCormick wrote:
>>
>>>  Members of the Forum -
>>>
>>> I just tracked down a bug in "fixargs.ijs" whereby it failed to fully
>>> convert all my old code from using the "y." to "y" etc. argument
>>> convention.
>>>
>>> The function "fixarg" seems to fail silently for any file with double-byte
>>> characters in it.  My subject line refers to (one place) where the problem
>>> cropped up: I have some code converted from APL in which I've retained the
>>> old APL code as comments.  Once again, that lovely character set causes
>>> problems.
>>>
>>> The root cause appears to be in the regular expression code which
>>> apparently
>>> doesn't play well with the double-byte characters.  It fails to match the
>>> target string if there's such a character anywhere in the source string.
>>>
>>> Here's a short example of the problem:
>>>
>>>   load 'regex'
>>>   ]str=. a.{~67 104 97 114 97 99 116 101 114 32 133 32 110 117 109 98 101
>>> 114
>>> Character ? number
>>>
>>>   'er' rxmatches str   NB. No result though there should be two matches.
>>>   'er' rxmatch str
>>> _1 0
>>>
>>> NB. But if we replace one (double-byte flag) character by a space:
>>>   'er' rxmatches (' ') 10}str
>>> 7 2
>>>
>>> 16 2
>>>
>>> The good fix for this - removing the offending characters, fixing the
>>> code,
>>> then replacing the removed characters - appears complicated by the fact
>>> that
>>> the (initial) replacement in "fixarg" - the line "y=. ((sx;,2);x) rxrplc
>>> y"
>>> - changes the length of the string.  It's probably OK to remove but
>>> not replace these characters but I'm not exactly sure how to identify
>>> them.
>>>
>>> Any ideas?
>>>
>>> Thanks,
>>>
>>> Devon
>>>
>>>
>> ----------------------------------------------------------------------
>> For information about J forums see http://www.jsoftware.com/forums.htm
>>
>
>
>

----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: The APL character set strikes from beyond the grave!

by Devon McCormick :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

No, but the example was actually taken from a different file with a single
odd character as it was a simpler example of the problem.

An example from the file with the APL code is:

   a.{~70 76 83 226 128 158 40 126 70 76 79 87 194 185 88 70 41 197 161 70
76 83
FLS„(~FLOW¹XF)šFLS

which displays incorrectly in my mail-writing window but OK in J.


On 6/30/08, bill lam <bbill.lam@...> wrote:

>
> Did the j602 session manager display glyph for characters above 127, eg the
> 133 in your example?
>
> Devon McCormick wrote:
>
>> Actually, the characters do display properly in both the session manager
>> and
>> under emacs if I set my font to Dyalog.
>>
>> As I mentioned, changing "fread" to "ufread" seems to fix the problem.
>>
>> On 6/30/08, bill lam <bbill.lam@...> wrote:
>>
>>> This is not a bug in fixargs. The root cause is that your APL characters
>>> are illegal in utf8. I bet those lovely characters will not display
>>> inside
>>> J602.
>>>
>>> turn off utf8 mode in regex by rxutf8 0
>>> or even better, convert those APL characters to true utf8 unicode.
>>>
>>> regards,
>>>
>>> Devon McCormick wrote:
>>>
>>>  Members of the Forum -
>>>>
>>>> I just tracked down a bug in "fixargs.ijs" whereby it failed to fully
>>>> convert all my old code from using the "y." to "y" etc. argument
>>>> convention.
>>>>
>>>> The function "fixarg" seems to fail silently for any file with
>>>> double-byte
>>>> characters in it.  My subject line refers to (one place) where the
>>>> problem
>>>> cropped up: I have some code converted from APL in which I've retained
>>>> the
>>>> old APL code as comments.  Once again, that lovely character set causes
>>>> problems.
>>>>
>>>> The root cause appears to be in the regular expression code which
>>>> apparently
>>>> doesn't play well with the double-byte characters.  It fails to match
>>>> the
>>>> target string if there's such a character anywhere in the source string.
>>>>
>>>> Here's a short example of the problem:
>>>>
>>>>  load 'regex'
>>>>  ]str=. a.{~67 104 97 114 97 99 116 101 114 32 133 32 110 117 109 98 101
>>>> 114
>>>> Character ? number
>>>>
>>>>  'er' rxmatches str   NB. No result though there should be two matches.
>>>>  'er' rxmatch str
>>>> _1 0
>>>>
>>>> NB. But if we replace one (double-byte flag) character by a space:
>>>>  'er' rxmatches (' ') 10}str
>>>> 7 2
>>>>
>>>> 16 2
>>>>
>>>> The good fix for this - removing the offending characters, fixing the
>>>> code,
>>>> then replacing the removed characters - appears complicated by the fact
>>>> that
>>>> the (initial) replacement in "fixarg" - the line "y=. ((sx;,2);x) rxrplc
>>>> y"
>>>> - changes the length of the string.  It's probably OK to remove but
>>>> not replace these characters but I'm not exactly sure how to identify
>>>> them.
>>>>
>>>> Any ideas?
>>>>
>>>> Thanks,
>>>>
>>>> Devon
>>>>
>>>>
>>>> ----------------------------------------------------------------------
>>> For information about J forums see http://www.jsoftware.com/forums.htm
>>>
>>>
>>
>>
>>
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
>


--
Devon McCormick, CFA
^me^ at acm.
org is my
preferred e-mail

----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm
LightInTheBox - Buy quality products at wholesale price