Re: Books, metadata and chatbots - in search of the XML Rosetta Stone

View: New views
1 Messages — Rating Filter:   Alert me  

Parent Message unknown Re: Books, metadata and chatbots - in search of the XML Rosetta Stone

by Dirk Scheuring-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Marcus,

I suggest not searching for the Rosetta Stone through the lens of some
new shiny tech (XML, RDF, OWL,...), as this often leads to lumping
issues together (by waving hands over tags of some markup language)
which are better kept seperate for proper analysis and study of their
relations. Issues are problems which (using the currently available
knowledge) cannot be 'solved' through (deterministic) computational
action, but can at best be 'balanced' through (non-deterministic)
inter-action. I believe that you're adressing four different issues in
your post, and that the required 'balancing act' is different in all
four cases:

#1. Voice in (the speech-to-text issue)

#2. Input analysis (in AIML-speak, the pattern-side issue, where
'input' may also mean 'intermediary computational input' generated
using <srai/>)

#3. Output generation (the template-side issue - where to <srai/> to,
what to <condition/>, when to say <that/>)

#4. Voice out (the text-to-speech issue)

#4, in some way, seems to me to be on the verge of ceasing to be an
issue. Speech-generation technology has definitely been improving over
the past five years or so, with tonal expressiveness increasing so
that the monotonic drone of computational 'utterances' becomes less
annoying. It's not 'problem fixed', but it seems good enough to me to
be usable. Note, however, that nobody seems to have used 'semantics'
in any way to a adress this particular, even though it does seem to
have a significant semantic aspect to it (tone-of-voice can *mean*
quite a lot in a conversation, communicating lots of 'intent', which
computers still cannot tonaly express at all). Required balancing:
tonal expressiveness and other 'natural' properties of spoken language
<-> file/application size/computational complexity.

#1 is, on the surface, a signal-to-noise issue - computers/robots
often operate in noisy environments, and it's difficult to filter a
complex waveform recordable in, say, an office environment, down to a
symbol string useful enough to match an AIML pattern. Semantics can
come into play by ways of side effects: for instance, there's now a
quite large community of enthusiasts who use Alice, Verbot &c. as
interfaces to custom home automation systems. It turns out that, using
speech-to-text, some users may get their living room lights turned on
when their watching TV and somebody on there says something about
turning on the lights. To me, that's a semantics issue, too, like in:
"Not you, dummy!" Required balancing: high reactivity + many false
positives <-> low reactivity + many false negatives.

#2 and #3, to me, are the core semantic issues; #2 requires balancing
client intent <-> authorial intent, and #3 requires balancing the
amount and the properties of potentially useful data (like your travel
books) <-> the complexity/tractability of algorithms which use these
data to generate bot outputs.

For a good intro to #2, pattern-side semantics, you may read Rich's
essay about Zipf's law
<http://www.alicebot.org/articles/wallace/zipf.html>. In a nutshell,
starting with a set of AIML categories, some categories are used much
more often than others. An amazingly large number of chatbot clients
just enter, for example, "you" to the bot, or other single words like
"Maui" seemingly chosen on a whim, compared to the relative few who
say stuff you might read in a book, like "You may now reserve me a
hotel room on Maui, and with a beach view". As a botmaster, the latter
input may be the one you're hoping for, because you cunningly rigged
the category with

<pattern>_ RESERVE * A HOTEL ROOM * MAUI * WITH * BEACH VIEW</pattern>

to initiate an RPC to some room reservation app running on your
network, but for every one time that happens, there's n thousand times
when the input just matches the category with

<pattern>YOU</pattern>.

However, the category which is statistically the most likely to be
used in any bot is the one with

<pattern>*</pattern>.

That's when the bot has no idea of the semantic content of the input
at all. <topic/> may help you here, if you manage to semantically
filter an incoming RSS stream into both <pattern/> /and/ <topic/>
sub-streams (that's hard^2, at least). In general, as measurements
sourced by pandorabots.com and parsimony.net show, usage of the
default category by the average AIML bot ranges over 2-5%, with even a
bot with 2.000.000 categories not scoring significantly better than
one with a mere 100.000. With that, the default category clearly leads
the Zipf-style curve of category usage probabilities, with

<pattern>YES</pattern>

<pattern>NO</pattern>

<pattern>WHAT</pattern>

<pattern>WHY</pattern>

trailing at the expectable (for Zipf distributions) distances. That
usually makes for the head of the head of the long tail, the
very-high-frequency inputs, which to semantically exploit would be
rather important, IMO. Just because they're so, well, frequent.

For a broader take on the whole frequent input classification issue,
Robby Garner (LPC winner 1998, 1999) has compiled a noteworthy list of
"Human Intentions and Behavior" regarding chatbots
<http://www.robitron.com/Behavior.txt>. Members of the Robitron
discussion group may view the discussion thread which lead up to this
list, starting at
<http://tech.groups.yahoo.com/group/Robitron/message/2383>. Another
useful view on this is by Jürgen Pirner (LPC winner 2003), who gave us
this <http://tech.groups.yahoo.com/group/Robitron/message/6431> gem.
Jürgen's Jabberwock <http://www.jabberwock.de.ms/> is another bot of
the jumbo class, which matches 2 millon + patterns (it's not
AIML-based, though). Jürgen also gave us these statistics:

<quote source="http://tech.groups.yahoo.com/group/Robitron/message/2418">

15% Abuse: user is starting a conversation by using foul language and
continue to use swear words (will be kicked, and banned after several
attempts)
12% Sex: user is insisting on this topic (kicked if saying rude or
perverse sex things)
11% Games: user asks the bot for telling jokes, riddles, stories or
playing a game (stories, and most jokes and riddles are generated on
the fly)
9% Knowledge: user is just asking random lexical questions instead of
following the topic of a conversation (will be warned and interrupted
after several attempts)
8% Repeat: user continues to repeat sentences several times (kicked
after 3 repeatings)
6% Garbage: user continues to type nonsense like "asdf" (kicked after
3 attempts)
4% Mocking: user is just copying the robot's replies like an "echo"
(kicked after 3 attempts)
2% Spam: user is flooding the text box, and other attacks (meanwhile
mostly protected)
2% Robot: user is another robot (mostly Elbot or Alice)

On average 17% of the users are kicked out off the chat because of
abusive behavior.
So up to 70% of the conversations are in fact something like a "fight"
instead of having a friendly conversation.

</quote>

As the online edition of The Times reported last spring, the Japanese
Ministry for Trade and Industry has issued a set of guidelines for
robot behavior and use, which include a definition of robot "misuse".
It reads like so:

<quote source="http://www.timesonline.co.uk/tol/news/world/asia/article1620558.ece">

 The reasonably predictable misuse of robots shall be defined as the
management, sale and use of next-generation robots for purposes not
intended by manufacturers

</quote>

Whether you call it "abuse", "misuse", or "fighting", the point is
that at least half of the overall client inputs can be counted on to
be given "for purposes not intended by manufacturers". There have
already been two scientific conferences mainly focused on the
phenomenon of chatbot abuse; rich material (though no proposed
solution of much substance) is to be found at
<http://www.agentabuse.org/>.

Now consider RDF, WordNet, Cyc, DOLCE, SUMO, OWL, KIF, SWRL, Protégé,
a long and ever-growing list of ontology-centred languages and
resources for the Sematic Web, and you won't even find a mention of
this issue in their documentation, much less any ontological mechanism
that would allow you to semantically classify the inputs chatbot users
actually submit, in order to find some useful semantic mapping to some
content your bot might have to distribute. You could, theoretically,
write an ontology of misuse yourself, and see if you can derive a map.
But there's no technological standard, no markup designed by other
people you can use. And it's a vast search space: Whatever you put out
on the web for people to use, people will misuse, in oh so many
unexpected ways. Only by now, we've learned to expect at least some of
them. Progress.

So on to issue #3: content generation; for AIMLers: the <template/>
side. It is not the case that a simple taxonomy, like a book index, is
sufficient to provide enough structure for an interactively generated
text to allow for its recursive semantic self-generation initiated by
a random input indexed "1". Which I guess is roughly what you want to
happen, though you might have said it in another way. An index just
gives you a sorted list of words, with very little knowledge a the
rich (obvious and non-obvious) semantic relations these words may have
as viewed by the reader of an emerging linear text, such as a sequence
of bot outputs interleaved with client inputs. As a bot writer, you'll
always have to work with the fact that you only get the chance to
write /half/ of the text you try to be writing, with - from your
working point of view - near-random (and at worst,
indistinguishable-from-fully-random) symbol strings inserted between
whatever lines of flowery prose you get the bot to generate by
whatever rulebase you know how to use, due to some hopelessly
out-of-control co-author (your client, your reader). Interactivity is
a b***h, semantically.

More technically speaking: you're not even content with some ordinary
first-order logic which can describe the relations between mere
/words/, but you want a second-order logic which can descibe the
relationship between /sentences/, which contain words, which also have
relations, possibly with words in still other sentences. Automatic
reasoning over a sentence concordance? Good luck with that. If you get
it done, Google's Peter Norvig might want to have a word with you. If
so, tell him you're buying his employer, and sacking him for
incompetence.

To get there, you might want to ditch your personal taxonomy at one
time, and go for a big gun scientifically developed ontology. Which is
supposed to capture a rich set of semantic relations between
terms/words, right? Like the OpenCyc "common sense ontology". It's
free, and has been for some years. The same goes for CyN
<http://www.daxtron.com/123start.htm?Cyn>, a version of Gary Dubuque's
ProgramN/AIMLpad AIML interpreter developed by Kino Coursey, which
makes the Cyc family of ontologies usable through an AIML interface
(it basically wraps the Cyc knowledge base into AIML templates),
giving Cyc the natural language capabilities that it lacks.
CyN/AIMLpad also can use WordNet, and ConceptNet, and has it's own
scripting language, which was designed to support the semantic
analysis of interviews, and it can generate AIML categories from chat
input. The works.

So the theory is that you should be able (by cleverly using the
available inference mechanisms //*waves hands*) to generate your
interactive text based on such a tool, and the tool itself has been
available, for free use, for some years now, and there's (sort of)
proof-of-concept that it works, technically speaking (for a subset of
FOL, anyway); alas, nobody has appeared to claim "I did it!", so far.
The crux of many would-be-interactive technologies today: How to
create the mythical "intelligent content object"? How to relate some
arbitrary linear text, like the complete content of a book, to an
ontology and its computational "reasoners", in order to "make it
interactive", and thereby invoke "procedural behavior"? Really, it's
an interesting problem.

In summary: how to close the semantic gap between what people actually
like to say to a chatbot and the content of some book that is supposed
to be accessed via a chatbot interface is very much a research
question, and has shown to be a hard one. Some developmental options
and possibilities exist, which may or may not use RDF, XML, or
NewNewShiny, to adress various issues in various ways, with nothing
like a 'winning strategy' in sight so far. To my knowledge - and I
would be glad to be disproved -, nobody (including Google, Microsoft,
Amazon, which all have been trying for years) has yet been able to put
these new possibilities - rather, this appearant potential - to any
good use in the context of chatbots. This may or may not change in the
future, and you may or may not be able to change it. If you try
(inducing from known cases here), you may expect to be trying long and
hard.

So, no Rosetta Stone, yet. But you're not alone in searching. Another
link of interest: WxGuru (Weather Guru) is an ontology-driven ALICE
clone (they claim, though I wouldn't have guessed it) designed as an
educational chatbot for Atmospheric Science
<http://gsa.confex.com/gsa/2007GE/finalprogram/abstract_122101.htm>.
And for the European Space Agency, there's actually an academic
proposal to use ontology-driven and AIML-based bots as a psychological
support for astronauts on their long way to Mars, HAL 9000-style, no
diggety <http://wwwhome.cs.utwente.nl/~anijholt/artikelen/esa2007.pdf>.
What do you think of that?

Dirk



-----------------------
Marcus L. Endicott wrote:

I am an author and I build chatbots (aka chatterbots). A chatbot is a
conversational agent, driven by a knowledgebase. I am currently trying
to understand the best way to convert a book into a chatbot
knowledgebase.

A knowledgebase is a form of database, and the chatbot is actually a
type of search - an anthropomorphic form of search and therefore an
ergonomic form of search. This simple fact is usually shrouded by the
jargon of 'natural language processing', which may or may not be
actual voice input or output.

According to the ruling precepts of the 'Turing test', chatbots must
be as close as possible to conversational, and this is what
differentiates them from pure 'search' -. With chatbots there is a
significant element of 'smoke and mirrors' involved, which introduces
the human psychological element into the machine in the form of
cultural, linguistic and thematic assumptions and expectations, so
becoming in a sense a sort of 'mind game'.

I'm actually approaching this from two directions. I would also like
to be able to feed RSS into a chatbot knowledgebase. There is
currently no working example of this available. Parsing RSS into AIML
(Artificial Intelligence Markup Language), the most common chatbot
dialect, is problematic and yet to be cracked effectively. So, my
thinking arrived at somehow breaking a book into a form that resembles
RSS. The Wikipedia List of XML markup languages revealed a number of
attempts to add metadata to books.

Dr. Wallace, the originator of AIML, recently responded on the
pandorabots-general group, that using RSS title fields would usually
be too specific to make them useful as chatbot concept triggers.
However, I believe utilities such as the Yahoo! Term Extraction API
could be used to create tags for feed items, which might then prove
more useful when mapped to AIML patterns -.

My supposition is that a *good* book index is in effect a 'taxonomy'
of that book. Paragraphs would generally be too large to meet the
specialized 'conversational' needs of a chatbot. The results of a
conventional concordance would be too general to be useful in a
chatbot -. If RSS as we know it is currently too specific to function
effectively in a chatbot, what if that index were mapped back to the
referring sentences as 'tags', somewhat like RSS?

I figure that if you can relatively quickly break a book down into a
sentence 'concordance', you could then point that at something like
the Yahoo! Term Extraction API to quickly generate relevant keywords
(or 'tags') for each sentence, which could then be used in AIML as
triggers for those sentences in a chatbot -. Is there such a beast as
a 'sentence parser' for a corpus such as a common book? All I want to
do at this point is strip out all the sentences and line them up, as a
conventional concordance does with individual words.

There are a number of examples of desktop chatbots using proprietary
Windows speech recognition today, however to my knowledge there are
currently no chatbots available online or via VoIP that accept voice
input (*not* IM or IRC bots) -. So, I've also spent some time lately
looking into voiceXML (VXML), ccXML and the Voxeo callXML, as well as
the Speech Recognition Grammar Specification (SRGS) and the mythical
voice browser -. The only thing I could find that actually accepts
voice input online for processing is Midomi.com, which accepts voice
input in the form of hummed tune for tune recognition -. Apparently
goog411, which is basically interactive voice response (IVR) rather
than true speech recognition, is as close as it gets to a practical
hybrid online/offline voice search application at this time. So, what
if Google could talk?
_______________________________________________
This is the pandorabots-general mailing list
To Post, reply to pandorabots-general@...
Unsubscribe and change preferences at http://list.pandorabots.com/mailman/listinfo/pandorabots-general
Learn netiquette at http://www.dtcc.edu/cs/rfc1855.html
Learn to read at http://www.literacy.org/