Summary of bidi branch?

View: New views
12 Messages — Rating Filter:   Alert me  

Summary of bidi branch?

by David Kastrup :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


Hi,

is there somewhere a summary of where the bidi branch stands nowadays?
How synched is it to actual developments in which branch?

I just got back from a conference where somebody doing critical
editions of Arabic text said that pretty much the only usable editor
(as in: renders characters correctly) for Unicode R-to-L was Unipad
under Windows.

Do people actually use emacs-bidi nowadays?  Does it work for writing?
Just for Hebrew, or other R-to-L scripts, too?

What about crazy things like T-to-B scripts (some Japanese and/or
Chinese variants IIRC)?

I'd be willing to try creating a Yiddish input encoding, though I'd
probably have to think quite a bit of how to encode the various
letters used in Hebrew words.  I don't think there is a standard
unambiguous transliteration scheme for those around.

--
David Kastrup



_______________________________________________
emacs-bidi mailing list
emacs-bidi@...
http://lists.gnu.org/mailman/listinfo/emacs-bidi

Re: Summary of bidi branch?

by Eli Zaretskii :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

> From: David Kastrup <dak@...>
> Date: Wed, 28 Mar 2007 16:11:25 +0200
>
> is there somewhere a summary of where the bidi branch stands nowadays?

Not that I know of, but I don't think there's much to summarize; see
below.

> How synched is it to actual developments in which branch?

I think no one synchs it.  But the patch is quite localized, so it
shouldn't be hard to merge it with the current CVS HEAD.  At least I
hope so.

> I just got back from a conference where somebody doing critical
> editions of Arabic text said that pretty much the only usable editor
> (as in: renders characters correctly) for Unicode R-to-L was Unipad
> under Windows.

Yes, it's very sad, especially since the core reordering code was
written 5 and a half years ago.  (However, I think there's also Yudit,
http://www.yudit.org/.)

> Do people actually use emacs-bidi nowadays?

I cannot imagine that someone uses it, since it crashes the moment you
turn on the bidi display option, even if the buffer consists of strict
left-to-right characters (e.g., ASCII).  The bidi display engine needs
work before it becomes even marginally usable.  Unfortunately, I don't
have anywhere near the time needed for this kind of job.  I can help
with advice and explanations about the code, though.

The reordering code was extensively tested wrapped in a stand-alone
program, so I expect most of the debugging needs to take care of the
way Emacs calls the buffer iterator and moves around the locus of the
iteration.  The testing I did emulated linear iteration through buffer
text, which is probably somewhat simpler than what Emacs actually does
when it prepares the next redisplay.

> Does it work for writing?  Just for Hebrew, or other R-to-L scripts,
> too?

It is supposed to work for all scripts that need bidirectional
display, provided that the directionality properties of the characters
are set up properly, which should already be so in Emacs 23, because
the properties are loaded from the Unicode character database.

The same code could be used, with a slightly different API, to produce
printed matter from bidirectional text.  I thought about both of these
applications during design and implementation of the reordering code.

> What about crazy things like T-to-B scripts (some Japanese and/or
> Chinese variants IIRC)?

The bidi reordering code I wrote doesn't support that.  It only
supports the functionality described in the Unicode Annex 9
(a.k.a. UAX #9).

> I'd be willing to try creating a Yiddish input encoding, though I'd
> probably have to think quite a bit of how to encode the various
> letters used in Hebrew words.  I don't think there is a standard
> unambiguous transliteration scheme for those around.

I don't understand: Yiddish uses Hebrew letters, and is written right
to left, like Hebrew.  So why do you need a new encoding and a new
input method?  You can just use the Hebrew input method (and hit the
same bidi non-support problem ;-).

What am I missing?


_______________________________________________
emacs-bidi mailing list
emacs-bidi@...
http://lists.gnu.org/mailman/listinfo/emacs-bidi

Parent Message unknown Re: Summary of bidi branch?

by Eli Zaretskii :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

> Cc: emacs-bidi@...
> From: David Kastrup <dak@...>
> Date: Wed, 28 Mar 2007 23:48:48 +0200
>
> U+05F0   HEBREW LIGATURE YIDDISH DOUBLE VAV (U+05F0)
> U+05F1 HEBREW LIGATURE YIDDISH VAV YOD (U+05F1)
> U+05F2 HEBREW LIGATURE YIDDISH DOUBLE YOD (U+05F2)

They are marked Yiddish because Hebrew words never use such
combinations of letters.  But otherwise they are part of the Hebrew
character set.

> That people writing occasional Yiddish texts might not be used to a
> Hebrew typewriter and would rather get along with the Latin-character
> input based YIVO transliterations.

You could indeed use YIVO, but you could also simply use German.
That's because many Yiddish words are actually Hebrew transliterations
of German words, and even words whose origin is Hebrew are written in
Latin-like transliterations, by adding transliterations of vowels
which the Hebrew original doesn't consider part of the word.  For
example, where in a Hebrew word an `a' is pronounced, the Yiddish
transliteration would add an `א', where `e' is pronounced, it would
add a `ע', etc.


_______________________________________________
emacs-bidi mailing list
emacs-bidi@...
http://lists.gnu.org/mailman/listinfo/emacs-bidi

Parent Message unknown Re: Summary of bidi branch?

by Eli Zaretskii :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

> Date: Thu, 29 Mar 2007 23:34:49 -0400
> From: Michael Blaustein <Michael.Blaustein@...>
> Cc: David Kastrup <dak@...>, emacs-bidi@...
>
> It's not that bad!  (Assuming you mean emacs-bidi-0.9.1 at
> www.m17n.org.)

No, that's not what I meant.  David was asking about the bidi branch
of the Emacs CVS, which has nothing in common with the m17n version.


_______________________________________________
emacs-bidi mailing list
emacs-bidi@...
http://lists.gnu.org/mailman/listinfo/emacs-bidi

Parent Message unknown Re: Summary of bidi branch?

by Eli Zaretskii :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

> Cc: emacs-bidi@...
> From: David Kastrup <dak@...>
> Date: Thu, 29 Mar 2007 08:44:10 +0200
>
> > You could indeed use YIVO, but you could also simply use German.
>
> For getting Hebrew letters?

Yes.  In Yiddish, Hebrew letters are used as transliterations of Latin
letters.

> > That's because many Yiddish words are actually Hebrew
> > transliterations of German words, and even words whose origin is
> > Hebrew are written in Latin-like transliterations, by adding
> > transliterations of vowels which the Hebrew original doesn't
> > consider part of the word.  For example, where in a Hebrew word an
> > `a' is pronounced, the Yiddish transliteration would add an `א',
>
> Uh, no.  The vowel mark is missing.

These marks are redundant (as they are in Hebrew): any speaker of the
language will have no difficulty reading the word without the
diacriticals (so-called Nikkud).

> Well, I know how to transliterate Yiddish with Latin characters.  But
> that's not the point.

Frankly, I don't know what is the point.  I thought you needed a way
to write Yiddish without bidi support, so I suggested a
transliteration, but perhaps you want something else.

> The usual alphabet used with Yiddish is a
> slightly modified Hebrew alphabet (pronunciation is somewhat different
> from most Hebrew words

There's no standard for Yiddish pronunciation, but most Yiddish
speakers use German pronunciation.


_______________________________________________
emacs-bidi mailing list
emacs-bidi@...
http://lists.gnu.org/mailman/listinfo/emacs-bidi

Parent Message unknown Re: Summary of bidi branch?

by Eli Zaretskii :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

> Cc: Michael Blaustein <Michael.Blaustein@...>, emacs-bidi@...
> From: David Kastrup <dak@...>
> Date: Fri, 30 Mar 2007 13:54:09 +0200
>
> Eli Zaretskii <eliz@...> writes:
>
> >> Date: Thu, 29 Mar 2007 23:34:49 -0400
> >> From: Michael Blaustein <Michael.Blaustein@...>
> >> Cc: David Kastrup <dak@...>, emacs-bidi@...
> >>
> >> It's not that bad!  (Assuming you mean emacs-bidi-0.9.1 at
> >> www.m17n.org.)
> >
> > No, that's not what I meant.  David was asking about the bidi branch
> > of the Emacs CVS, which has nothing in common with the m17n version.
>
> As far as I can tell, m17n provides a Mule based on 20.x, or on a 21
> pretest at most.
>
> Hard to figure out.  That code likely is not of much help, right?

There's some history to the m17n bidi version.  A patch to handle bidi
display was submitted to Emacs back when v21.1 was in last stages of
development.  It was rejected (based on negative reaction of Gerd
Moellmann, the then head maintainer and the main developer of the v21
display engine) because it used techniques that defeated all the usual
display optimizations used by Emacs, such as when you just move the
cursor or insert or delete a single character.  The patch also used a
constant-size buffer to stack characters while they were being
reordered -- which doesn't scale well to very long lines.

The discussions that followed prompted me to design and implement a
bidi reordering engine that was crafted from the ground up to be
compatible with the Emacs redisplay requirements.  (It wasn't easy,
since UAX#9 describes the reordering in a way that assumes batch-style
mechanism, i.e. that a string of characters is passed to the engine
and is expected to be reordered by the engine en-bloc.  By contrast,
Emacs wants to reorder characters one by one, as it walks its iterator
through buffer text.)

I understand that the rejected patch later became the basis for what
you now find on the m17n site.

In retrospect, perhaps that decision to reject the m17n patch was a
bad call, since we could have had a bidi Emacs in v21.x, and gather
valuable user experience by now.  But Gerd felt very strongly about
his opposition, and I at least trusted his judgment, since no one knew
the Emacs display engine as well as he did, maybe still don't.


_______________________________________________
emacs-bidi mailing list
emacs-bidi@...
http://lists.gnu.org/mailman/listinfo/emacs-bidi

Parent Message unknown Re: Summary of bidi branch?

by Eli Zaretskii :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

> Cc: emacs-bidi@...
> From: David Kastrup <dak@...>
> Date: Fri, 30 Mar 2007 14:20:33 +0200
>
> > Yes.  In Yiddish, Hebrew letters are used as transliterations of
> > Latin letters.
>
> Uh, Eli?  I know Yiddish.

I never assumed you didn't.

> Uh no.  I need a way to write Yiddish with Hebrew letters, and so I
> was thinking of making emacs-bidi more interesting by adding (myself)
> another input encoding yielding Hebrew letters, based on the YIVO
> transliteration.

I think it would be a good addition, not only in the bidi Emacs.

> And I don't think we have any arabic input encoding in Emacs 22.

We do have the encoding (arabic-iso-8bit, albeit not thoroughly
supported), but not an input method.

> The only R-to-L script I can identify in Emacs 22 is Hebrew.  Of
> course, Emacs 22 will render it L-to-R, but making it possible to
> _input_ the Unicode might increase the number of people willing to
> invest work into emacs-bidi.

If all you want is to input characters, and don't care about
displaying them correctly, then making an input method that produces
mule-unicode-* characters should be easy.

> >> The usual alphabet used with Yiddish is a slightly modified Hebrew
> >> alphabet (pronunciation is somewhat different from most Hebrew
> >> words
> >
> > There's no standard for Yiddish pronunciation, but most Yiddish
> > speakers use German pronunciation.
>
> Well, since Hebrew is used as a sort of phonetic spelling of Yiddish,
> there is at least a way to pronounce written Yiddish texts sort of
> regularly.

If you mean use Hebrew pronunciation, then I think that's not right: I
think the canonical Hebrew pronunciation is very different from
Yiddish, as Hebrew uses, for example, guttural sounds for some
consonants, and Yiddish does not.


_______________________________________________
emacs-bidi mailing list
emacs-bidi@...
http://lists.gnu.org/mailman/listinfo/emacs-bidi

Parent Message unknown Re: Summary of bidi branch?

by Eli Zaretskii :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

> Cc: Michael.Blaustein@..., emacs-bidi@...
> From: David Kastrup <dak@...>
> Date: Fri, 30 Mar 2007 14:50:57 +0200
>
> During the Emacs 21 pretest phase, I submitted dozens of bug reports
> about display engine problems in connection with preview-latex.  I
> think that quite a number of optimizations got killed because of that:

Yes, but the most important (those I mentioned and a few similar ones)
are still there.  Gerd told me back then that he actually tried the
display engine without any optimizations, and found that it was
unusably slow.

> IIRC, emacs-bidi was supposed to be a rather clean design implementing
> the specs quite straightly, correct?

If you mean the UAX#9, then yes, the intent was to do exactly what it
says.

> So one will probably have more
> cleanup remaining outside of the patch itself than inside it, right?

I don't know.  There are a few gray areas in UAX#9, which we would
need to interpret as best for Emacs.

> I am just trying to get some feeling about the situation, to get a
> guess how much work might be entailed before it is realistic to think
> about getting it into some release in the future.

I think we will not have any idea about this until Someone(tm) will
spend some time using and debugging the display engine with bidi
turned on.


_______________________________________________
emacs-bidi mailing list
emacs-bidi@...
http://lists.gnu.org/mailman/listinfo/emacs-bidi

Re: Summary of bidi branch?

by James Cloos-9 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

>>>>> "Eli" == Eli Zaretskii <eliz@...> writes:

Eli> We do have the encoding (arabic-iso-8bit, albeit not thoroughly
Eli> supported), but not an input method.

[...]

Eli> If all you want is to input characters, and don't care about
Eli> displaying them correctly, then making an input method that produces
Eli> mule-unicode-* characters should be easy.

Here is one that does the same chars as the base arabic layout in
xkeyboard-config.  It'll need a s/is not yet part of/is part of/
in the license and, I presume, assignment papers to be included.

In case it gets munged in transit, a copy is available at:

http://jhcloos.com/emacs/quail/arabic.el

-JimC

============================== cut ==============================
;;; arabic.el --- Quail package for inputting Arabic characters  -*-coding: iso-2022-7bit;-*-

;; Copyright (C) 2007 James Cloos <cloos@...>
;;  Copyright to be assigned to the Free Software Foundation upon request

;; Keywords: mule, input method, Arabic

;; This file is not yet part of GNU Emacs.

;; GNU Emacs is free software; you can redistribute it and/or modify
;; it under the terms of the GNU General Public License as published by
;; the Free Software Foundation; either version 2, or (at your option)
;; any later version.

;; GNU Emacs is distributed in the hope that it will be useful,
;; but WITHOUT ANY WARRANTY; without even the implied warranty of
;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
;; GNU General Public License for more details.

;; You should have received a copy of the GNU General Public License
;; along with GNU Emacs; see the file COPYING.  If not, write to the
;; Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor,
;; Boston, MA 02110-1301, USA.

;;; Commentary:

;;; Code:

(require 'quail)

(quail-define-package
 "arabic" "Arabic" "ع" nil "Arabic input method.

Based on Arabic table in X Keyboard Configuration DB.
" nil t t t t nil nil nil nil nil t)

;;  ذّ 1! 2@ 3# 4$ 5% 6^ 7& 8* 9( 0) -_ =+
;;      ضَ صً ثُ قٌ فﻹ غإ ع` ه÷ خ× ح؛ ج{ د} <>
;;       ش\ سS ي[ ب] لﻷ اأ تـ ن، م/ ك: ط"
;;        ئ~ ءْ ؤِ رٍ ﻻﻵ ىآ ة' و, ز. ظ؟
;;

(quail-define-rules
 ("`" ?ذ)

 ("Q" ?َ)
 ("W" ?ً)
 ("E" ?ُ)
 ("R" ?ٌ)
 ("T" ?ﻹ)
 ("Y" ?إ)
 ("U" ?`)
 ("I" ?÷)
 ("O" ?×)
 ("P" ?؛)

 ("A" ?\\)
 ("S" ?S)
 ("D" ?[)
 ("F" ?])
 ("G" ?ﻷ)
 ("H" ?أ)
 ("J" ?ـ)
 ("K" ?،)
 ("L" ?/)
 (";" ?:)

 ("Z" ?~)
 ("X" ?ْ)
 ("C" ?ِ)
 ("V" ?ٍ)
 ("B" ?ﻵ)
 ("N" ?آ)
 ("M" ?')
 ("<" ?,)
 (">" ?.)
 ("?" ?؟)

 ("`" ?ذ)

 ("q" ?ض)
 ("w" ?ص)
 ("e" ?ث)
 ("r" ?ق)
 ("t" ?ف)
 ("y" ?غ)
 ("u" ?ع)
 ("i" ?ه)
 ("o" ?خ)
 ("p" ?ح)

 ("a" ?ش)
 ("s" ?س)
 ("d" ?ي)
 ("f" ?ب)
 ("g" ?ل)
 ("h" ?ا)
 ("j" ?ت)
 ("k" ?ن)
 ("l" ?م)
 (";" ?ك)

 ("z" ?ئ)
 ("x" ?ء)
 ("c" ?ؤ)
 ("v" ?ر)
 ("b" ?ﻻ)
 ("n" ?ى)
 ("m" ?ة)
 ("," ?و)
 ("." ?ز)
 ("/" ?ظ)

 ("'" ?ط))

;;; arch-tag:
;;; arabic.el ends here
============================== cut ==============================

--
James Cloos <cloos@...>         OpenPGP: 1024D/ED7DAEA6


_______________________________________________
emacs-bidi mailing list
emacs-bidi@...
http://lists.gnu.org/mailman/listinfo/emacs-bidi

Parent Message unknown Re: Summary of bidi branch?

by James Cloos-9 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

>>>>> "David" == David Kastrup <dak@...> writes:

David> Anybody have a clue whether this is a standard typewriter layout?

I don't.  The only provenience in the xkeyboard-config file is:

,----[ from /etc/X11/xkb/symbols/ara ]
| based on a keyboard map from an 'xkb/symbols/ar' file
`----

Looking at the history in the old CVS files, the Arabeyes team were
the last to make changes to xkb/symbols/ar:

,----[ from cvs log of /srv/anoncvs.freedesktop.org/cvs/xapps/xkbcomp-old/symbols/ar,v ]
| 683. Update the Arabic (ar) XKB keyboard map (#5145, Arabeyes team).
`----

so at least it was last worked on by Arabic experts.

David> And the ASCII graphics above were created by you?  If both are
David> "yes", we should be safe.

I just switched to the arabic layout and typed left to right, top to
bottom.  And did the map the same way (switch to arabic, hit Q on the
"Q" line, etc).

-JimC
--
James Cloos <cloos@...>         OpenPGP: 1024D/ED7DAEA6


_______________________________________________
emacs-bidi mailing list
emacs-bidi@...
http://lists.gnu.org/mailman/listinfo/emacs-bidi

Parent Message unknown Re: Summary of bidi branch?

by Eli Zaretskii :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

> Cc: emacs-bidi@...
> From: David Kastrup <dak@...>
> Date: Fri, 30 Mar 2007 15:26:39 +0200
>
> There is a rather straight relation between Yiddish spelling (which
> uses Hebrew letters, sometimes modified slightly) and its
> pronunciation.  This breaks down for words of Hebrew origin (of which
> there is a number in Yiddish): those are spelled just like in Hebrew
> (without vowel marks at all) and presumably pronounced like in modern
> Hebrew.

Again, I think that the Yiddish pronunciation is rather different from
modern Hebrew, because Yiddish uses a kind of Ashkenazi variant of
Hebrew pronunciation that is alien to the modern Hebrew ears.

For example, the word "לשון" ("language") is pronounced "lāshôn" on
Hebrew, but "lôshn" in Yiddish.  And note also "מאַמאַלאָשן" which
replaces the original Hebrew spelling with a direct transliteration of
the Yiddish pronunciation with Hebrew leters.


_______________________________________________
emacs-bidi mailing list
emacs-bidi@...
http://lists.gnu.org/mailman/listinfo/emacs-bidi

Parent Message unknown Re: Summary of bidi branch?

by James Cloos-9 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

>>>>> "David" == David Kastrup <dak@...> writes:

David> Not that it probably matters much for a first try: do you know Arabic?

No.  I've tried to learn the script a bit -- ie to be able to sound
out and/or transliterate arabic script into latin script -- as part
of a general interest in font design and i18n, but I cannot claim
any competence in the spoken language beyond the few words anyone
would pick up listening to the news.

I just knew it would be easy to activate X's input method and type
out the quail file.

-JimC
--
James Cloos <cloos@...>         OpenPGP: 1024D/ED7DAEA6



_______________________________________________
emacs-bidi mailing list
emacs-bidi@...
http://lists.gnu.org/mailman/listinfo/emacs-bidi
LightInTheBox - Buy quality products at wholesale price!