Help, Need A.L.I.C.E Web page Spider

View: New views
20 Messages — Rating Filter:   Alert me  
< Prev | 1 - 2 | Next >

Help, Need A.L.I.C.E Web page Spider

by Demossoft IT :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Please Please help with this.

I need a spider created that will crawl a webpage and grammatically parse the page and create AIML (Artificial Inelligence Markup Language) data. This data will be saved into an AIML file and used to teach a chatterbot the contents of the web page.

The way we see it working is:

1. The spider crawls a page examining the text of each sentence.

2. Then using a grammatic parser it will reformulate that sentence data into possible patterns and responses to be entered as data in the AIML file.

3. Then it will format this into a standard AIML file and allow you to save this code or copy and paste it to another source

This will require someone experience in AIML as well as grammatic sentence parsing.

The idea for this project is to get the stand-alone script, possibilities of this being a desktop VB script. But I hear that existing perl and php extensions may make this easier. Open to other suggestions. Preferably a desktop application to start.




Re: A.L.I.C.E Web page Spider

by Gary Dubuque :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

AIMLpad (Program N) can retrieve a web page using it scripting language.
With careful scripting it can probably find the text on the page.  Once
found, it can split the text into sentences.  It can extract from those
sentences nouns, verbs, etc. using WordNet.  When used in conjunction with
ConceptNet it can summarize the text into a few sentences or tag the
sentences with the parts of speech or even predict the mood of the text.
AIMLpad is designed to use its scripting language to create AIML categories
and even put them into the appropriate files.  When used with OpenCyc, it
can reason about the concepts such as "humans eat food" or "humans have two
hands", etc.  OpenCyc does deductive reasoning.  AIMLpad also has a simple
expert rule system (both forwards and backwards chaining) built in where you
could create the transformations to apply to the web page's text.  Its fuzzy
logic is not very effective since the multiple valued variables don't have
confidence factors on each multivalue, but the framework is there to be
fixed to make this work too. While all these tools are available in the one
editor, it still is not an easy solution to scrape web pages for content and
make it into AIML.  In theory, it should be possible.  In practice, it
probably won't happen.

On the otherhand, if the content is laid out correctly, i.e., question
followed by answer followed by a blank line repeated for however many
questions you have, AIMLpad will automatically convert that into AIML
categories just by selecting one of its menu options (Tools --> Quick Build
Categories...).

I have used AIMLpad to get the quote or joke of the day or to get the
weather report by hacking web pages.

-----Original Message-----
From: alicebot-developer-bounces@...
[mailto:alicebot-developer-bounces@...]On Behalf Of Ty
Ademosu
Sent: Thursday, June 01, 2006 10:03 PM
To: alicebot-developer@...
Subject: [alicebot-developer] A.L.I.C.E Web page Spider



Please Please help with this.

I need a spider created that will crawl a webpage and grammatically parse
the page and create AIML (Artificial Inelligence Markup Language) data. This
data will be saved into an AIML file and used to teach a chatterbot the
contents of the web page.

The way we see it working is:

1. The spider crawls a page examining the text of each sentence.

2. Then using a grammatic parser it will reformulate that sentence data into
possible patterns and responses to be entered as data in the AIML file.

3. Then it will format this into a standard AIML file and allow you to save
this code or copy and paste it to another source

This will require someone experience in AIML as well as grammatic sentence
parsing.

The idea for this project is to get the stand-alone script, possibilities of
this being a desktop VB script. But I hear that existing perl and php
extensions may make this easier. Open to other suggestions. Preferably a
desktop application to start.




--
View this message in context:
http://www.nabble.com/A.L.I.C.E-Web-page-Spider-t1720827.html#a4674266
Sent from the Alicebot Developer forum at Nabble.com.

_______________________________________________
alicebot-developer mailing list
alicebot-developer@...
http://list.alicebot.org/mailman/listinfo/alicebot-developer


_______________________________________________
alicebot-developer mailing list
alicebot-developer@...
http://list.alicebot.org/mailman/listinfo/alicebot-developer

Re: A.L.I.C.E Web page Spider

by Helio Perroni Filho :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

--- Ty Ademosu <tyademosu@...> escreveu:

> I need a spider created that will crawl a webpage
> and grammatically parse the page and create AIML
> (Artificial Inelligence Markup Language) data. This
> data will be saved into an AIML file and used to
> teach a chatterbot the contents of the web page.

You must understand that this is not a trivial
request: you could start a whole project -- perhaps
several -- to fulfill it. Then again, there are some
paths I can envision for providing something of that
sort in a relatively short time.

First, the spider. I don't see much need of a
full-fledged web robot, unless you intend the search
to occur during a conversation. Even if this is the
case, you could use some predefined resource, such as
Wikipedia (http://www.wikipedia.org), and
programatically drive it to retrieve pages (in Java,
I'd use the java.net.URL class to do this). Otherwise,
you can just manually download the pages you want to
parse.

Second, the grammatic parser. I found it surprisingly
difficult to Google out a Free / Open Source parser,
but you can look into Source Forge
(http://sourceforge.net) for some promising projects.
Anyway, why not use AIML itself for this? Most
Alicebots already know how to break inputs into
sentence lists; then, with the right set of AIML
files, the bot could create the request / response
pairs. For example, to retrieve responses for queries
of the form "What is a ..?", you could use this
category:

[category]
  [pattern]* IS A *[/pattern]
  [template]
    (request: What is a <star/>?)
    (response: A <star/> is a <star index="2"/>)
  [/template]
[/category]

Third, the formatter: saving to a file the outputs
from feeding webpages to the "pre-bot" outlined above,
you can latter pass them to a simple
regular-expression parser, that would transform them
into AIML files. I could do this manually inside jEdit
(http://www.jedit.org/), but most programming
languages provide enough RegExp support for automating
the task.

These are my two cents on the subject. I believe that
this would be a practical approach to the problem,
which would allow you to get started at once: with an
AIML parser and a RegExp-enabled text editor, you
could already hack a test implementation.

--
Ja mata ne.
Helio Perroni Filho


__________________________________________________
Fale com seus amigos  de graça com o novo Yahoo! Messenger
http://br.messenger.yahoo.com/ 
_______________________________________________
alicebot-developer mailing list
alicebot-developer@...
http://list.alicebot.org/mailman/listinfo/alicebot-developer

Re: Help, Need A.L.I.C.E Web page Spider

by Demossoft IT :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Dear Sirs I would like to invite you both to help me create this script.  I have a project link below and I am willing to pay you for your help -- budget is a not too big.  Please help with this I appeciate your inputs.

http://www.getafreelancer.com/projects/65872.html

Re: Help, Need A.L.I.C.E Web page Spider

by Demossoft IT :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi, I'm still looking for help and willing to pay.  I can even accept a prototype or something close.  

http://www.getafreelancer.com/projects/65872.html

Re: Help, Need A.L.I.C.E Web page Spider

by mehri :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

How much are you willing to pay?


--- Ty Ademosu <tyademosu@...> wrote:

>
> Hi, I'm still looking for help and willing to pay.
> I can even accept a
> prototype or something close
> --
> View this message in context:
>
http://www.nabble.com/Help%2C-Need-A.L.I.C.E-Web-page-Spider-t1720827.html#a4834270
> Sent from the Alicebot Developer forum at
> Nabble.com.
>
> _______________________________________________
> alicebot-developer mailing list
> alicebot-developer@...
>
http://list.alicebot.org/mailman/listinfo/alicebot-developer
>


.................................................o' \,=./ `o
Mehri                                               (o o)
---=--=---=--=--=---=--=--=--=--=---=--=--=-----ooO--(_)--Ooo---

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com 
_______________________________________________
alicebot-developer mailing list
alicebot-developer@...
http://list.alicebot.org/mailman/listinfo/alicebot-developer

Re: Help, Need A.L.I.C.E Web page Spider

by Demossoft IT :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Please see project page below and place bid if interested thanks.  I appreciate any help.

http://www.getafreelancer.com/projects/65872.html

Re: Help, Need A.L.I.C.E Web page Spider

by mehri :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Well from that link it looks like you're willing to
pay at the most 100 bucks for the project.

Is that correct?


--- Ty Ademosu <tyademosu@...> wrote:

>
> Please see project page below and place bid if
> interested thanks.  I
> appreciate any help.
>
> http://www.getafreelancer.com/projects/65872.html
> --
> View this message in context:
>
http://www.nabble.com/Help%2C-Need-A.L.I.C.E-Web-page-Spider-t1720827.html#a4836780
> Sent from the Alicebot Developer forum at
> Nabble.com.
>
> _______________________________________________
> alicebot-developer mailing list
> alicebot-developer@...
>
http://list.alicebot.org/mailman/listinfo/alicebot-developer
>


.................................................o' \,=./ `o
Mehri                                               (o o)
---=--=---=--=--=---=--=--=--=--=---=--=--=-----ooO--(_)--Ooo---

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com 
_______________________________________________
alicebot-developer mailing list
alicebot-developer@...
http://list.alicebot.org/mailman/listinfo/alicebot-developer

Re: Help, Need A.L.I.C.E Web page Spider

by Demossoft IT :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

That's what I can afford thanks.

Re: Help, Need A.L.I.C.E Web page Spider

by mehri :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


You're trying to solve the below problem set?

-----------------

1. The spider crawls a page examining the text of each
sentence.

2. Then using a grammatic parser it will reformulate
that sentence data into possible patterns and
responses to be entered as data in the AIML file.

3. Then it will format this into a standard AIML file
and allow you to save this code or copy and paste it
to another source

-----------------


I'll tell you one way to do what you want and I'll
tell you for free.

You know C and C++?

Here is a cross platform solution using those two
languages.


1) Download and use libwwww
http://www.w3.org/Library/User/Architecture/

That is going to be your web bot that is going to
traverse the web and download text.  The instructions,
samples, and api to link to it is on that website.
They've already written several web bots for you.

So that solves #1.


2) Download and use link grammar:
http://www.link.cs.cmu.edu/link/

This will categorize the words of the sentence into
nouns, verbs, articles, etc...  

Use this to solve #2.


3) Write out your AIML by rearranging the text from
the link grammar.  

QED

Good luck regardless of the path you choose to
implement.  


--- Ty Ademosu <tyademosu@...> wrote:

>
> That's what I can afford thanks.
> --
> View this message in context:
>
http://www.nabble.com/Help%2C-Need-A.L.I.C.E-Web-page-Spider-t1720827.html#a4837612
> Sent from the Alicebot Developer forum at
> Nabble.com.
>
> _______________________________________________
> alicebot-developer mailing list
> alicebot-developer@...
>
http://list.alicebot.org/mailman/listinfo/alicebot-developer
>


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com 
_______________________________________________
alicebot-developer mailing list
alicebot-developer@...
http://list.alicebot.org/mailman/listinfo/alicebot-developer

Re: A.L.I.C.E Web page Spider

by Demossoft IT :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Gary --

1) How do I fetch a webpage with aimlpad.  
2) I tried to connect to a public opencyc server but it's doesn't seem to be helping my chat sessions.  
3) Also are there public worldnet and concept net servers and how would aimlpad connect to them.  4) Finally, I cannot seem disable the little merlin msagent, aimlpad crashes when I try to access that tab.

Many Thanks

Ty

Re: A.L.I.C.E Web page Spider

by Gary Dubuque :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Ty--

Last question first...  Does that happen on the view menu option where you
check MS Agent on/off?  To manually turn off the agent edit the options.ini
file (found in the same folder as the AIMLpad.exe) and set the MSagent=0 ,
it will probably be a 1 or 2 before you change it to 0 to turn off the
agent.  The most likely reason the program stops when trying to set ms agent
preferences is that the agent stuff is not installed correctly.  There is a
tutorial at AIMLpad.com for figuring out the correct installation of MS
Agent.

question 3.  I am assuming you mean WordNet because Worldnet is an AT&T
product.  It is not necessary to use the public site for WordNet since as a
dictionary of sorts it is just as current using it locally.  The problem
with WordNet is that it is a file base system and really needs to be on your
computer to work efficiently.  Actually I haven't visited WordNet for
several months, so if there is a public interface through the internet, it
is new to me.  I'll have to look into that.

Once WordNet is installed, you can use AIMLpad script commands to extract
nouns, verbs, etc. from given text.  There are tutorials at AIMLpad.com for
learning how to do this too.  The prime example adds additional categories
(by substituting words) similar to a new one added for a response not
matched.

ConceptNet can be the public site if you want.  I had to add a fix to the
software for the server to offer the full functionality of the user
interface.  I doubt that the public version even exposes the XMLRPC
services.  If it does, the script code sample provided in your download can
be easily modified to use that resource.  You need to know that ConceptNet
is only interfaced through the AIMLpad scripting language.  The provided
script examples show how to get related concepts, paraphrase a text block,
calculate emotional content as well as a few other things like parsing
sentences into their parts of speech.

question 2.  The AIML for utilizing OpenCyc needs to be loaded.  Kino
Coursey extended AIML with new tags to accommodate functions provided by
Cyc.  Unless the AIML is written to use those extensions, the connection to
the public server will not be accessed.  How AIMLpad operates with Cyc is
explained in a separate pdf located in the doc folder of the standard
install.  I believe there are sample AIML files to try out included too.

question 1.  May I suggest something?  Part of the problem with web pages is
finding what text you are wanting to "index" or "translate".  Perhaps you
might want to consider RSS feeds which already have this organized in a way
that makes it much easier to find.  That said, I have not played with RSS
yet (although I bet AIMLpad is capable of doing so.)

This is another example where script commands are required.  Again there are
tutorials explaining the techniques.  Briefly a script command named "url"
which is followed by the http address loads the web pages into variables
(predicates in the AIML terminology) starting with URLdata1 and continuing
through URLdata2, URLdata3, etc. in chucks of 32K.  If there is alot of
advertisements on a page, you often have more than 32K and a whole bunch to
wade through before you get to readable content like news or weather or even
the joke of the day, etc.  You can use the "find" command and the SUBSTR()
function to extract the text you want.

So here we are with text to convert.  How is the question.  How do we take
text and extract patterns to index it?  I suppose you could send it to
ConceptNet to extract the keywords most likely that the text contains.
Given words which the text is about, are you guessing forms like "What is
???" or "Who Is ???" to be the prompts to respond with the web search
information?

If you are thinking this, perhaps you want to explore the AnswerBus link
which can be adapted to look up keywords and only needs a couple of AIML
categories instead of manufacturing many, many categories from web crawling.

The bottom line comes down to what kind of text do you want to turn into
AIML.  The techniques probably will vary for different kinds of information.
Again, AIMLpad has some simple utilities for strictly formatted text to
convert to AIML.

If you can figure out the process for transforming the text, I'd be glad to
help make the tool to do it.

BTW, Google tries something like this and probably has the largest computer
complex to do so in the world.  IBM provides a specialized version of
extracting from the web (called web fountain), but it is extremely expensive
and hand tailored to each specific request.  As I have said before, the
greatest minds and billions and billions of dollars are currently dedicated
to this - it is not a simple task.

Hope this helps,
  Gary Dubuque

-----Original Message-----
From: alicebot-developer-bounces@...
[mailto:alicebot-developer-bounces@...]On Behalf Of Ty
Ademosu
Sent: Thursday, June 15, 2006 1:17 PM
To: alicebot-developer@...
Subject: Re: [alicebot-developer] A.L.I.C.E Web page Spider



Gary --

1) How do I fetch a webpage with aimlpad.
2) I tried to connect to a public opencyc server but it's doesn't seem to be
helping my chat sessions.
3) Also are there public worldnet and concept net servers and how would
aimlpad connect to them.  4) Finally, I cannot seem disable the little
merlin msagent, aimlpad crashes when I try to access that tab.

Many Thanks

Ty
--
View this message in context:
http://www.nabble.com/Help%2C-Need-A.L.I.C.E-Web-page-Spider-t1720827.html#a
4889869
Sent from the Alicebot Developer forum at Nabble.com.

_______________________________________________
alicebot-developer mailing list
alicebot-developer@...
http://list.alicebot.org/mailman/listinfo/alicebot-developer


_______________________________________________
alicebot-developer mailing list
alicebot-developer@...
http://list.alicebot.org/mailman/listinfo/alicebot-developer

Re: A.L.I.C.E Web page Spider

by Demossoft IT :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I have a prototype in vb6 working.  The output file looks like this

<aiml>

<category>
<pattern>What is the purpose of the talk.origins Usenet newsgroup?%</pattern>

<template>The purpose of the talk.origins newsgroup is to provide a forum for discussion of issues related to biological and physical origins. See the talk.origins Newsgroup Welcome FAQ.
</template>
</category>

<category>
<pattern>What is the purpose of the Talk.Origins Archive?#!</pattern>

<template>The purpose of the TO Archive is to provide easy access to the many FAQ (frequently asked question) files and essays have been posted to the Usenet newsgroup talk.origins. The Archive exists expressly to provide mainstream scientific responses to the many issues that appear in the talk.origins newsgroup. See the Talk.Origins Archive's Welcome Page and the Talk.Origins Archive's Must-Read FAQs.  I thought evolution was just a theory. </template>
</category>
.
.
<aiml>

Aimlpad chatbot does not seem to recognize this.  I added the aiml file to the filelist.txt and hit learn but no real effect.  As you can see above some of the questions do have strange characters after them like !, # etc  could this be the problem or just bad formatting or are questions too long.


Re: A.L.I.C.E Web page Spider

by Stef-6 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Those strange characters are likely to be the problem.  Easy test, just
remove those extraneous characters and see how you go.

Cheers,
Stefan Zakarias.

Ty Ademosu wrote:

> I have a prototype in vb6 working.  The output file looks like this
>
> <pattern>What is the purpose of the talk.origins Usenet
> newsgroup?%</pattern>
>
> <pattern>What is the purpose of the Talk.Origins Archive?#!</pattern>
>
>
> As you can see above
> some of the questions do have strange characters after them like !, # etc
> could this be the problem or just bad formatting or are questions too long.

_______________________________________________
alicebot-developer mailing list
alicebot-developer@...
http://list.alicebot.org/mailman/listinfo/alicebot-developer

Re: A.L.I.C.E Web page Spider

by Demossoft IT :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Yeah I'm pretty sure I already tried that.  Any other suggestions? Perhaps if I use another chatbot than aimlpad ?  I think I'm using the right learning procedure but it's still not working.

Re: Help, Need A.L.I.C.E Web page Spider

by Demossoft IT :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I'd like to really thank you guys for all your help I successfully completed this project and got paid for it

 Unix Tips
http://www.demossoft.netfirms.com

Ty Ademosu wrote:
Please Please help with this.

I need a spider created that will crawl a webpage and grammatically parse the page and create AIML (Artificial Inelligence Markup Language) data. This data will be saved into an AIML file and used to teach a chatterbot the contents of the web page.

The way we see it working is:

1. The spider crawls a page examining the text of each sentence.

2. Then using a grammatic parser it will reformulate that sentence data into possible patterns and responses to be entered as data in the AIML file.

3. Then it will format this into a standard AIML file and allow you to save this code or copy and paste it to another source

This will require someone experience in AIML as well as grammatic sentence parsing.

The idea for this project is to get the stand-alone script, possibilities of this being a desktop VB script. But I hear that existing perl and php extensions may make this easier. Open to other suggestions. Preferably a desktop application to start.



Re: Help, Need A.L.I.C.E Web page Spider

by mehri :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Congrats on this Ty

If you don't mind me asking...

How is it being used?
Did you use any 3rd party software and if so what?
Is it possible for anyone to obtain a copy (either pay or free) of your work?
What were the licensing terms?  
 How much did you make off of this project?  
 
This type of information could help the rest of us better understand the industry of AIML.

-------------------

I'd like to really thank you guys for all your help I successfully completed
this project and got paid for it

http://medicine.science-tips.org  Medicine Science Tips
http://www.demossoft.netfirms.com





_______________________________________________
alicebot-developer mailing list
alicebot-developer@...
http://list.alicebot.org/mailman/listinfo/alicebot-developer

Re: Help, Need A.L.I.C.E Web page Spider

by Demossoft IT :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

It's a prototype created to teach a chatbot a websites content, specifically foodwiki.net and halowiki.net

I decided to use VB6 to  manipulate word and ie to read site content and then created the aiml files -- no 3rd parties

Well someone on this list offered about $50 for it but I haven't heard back.  Miraculously, I charged the buyer $385 and it took about 1 month for the prototype, I'm just glad it's done.  Not perfect but works ok most of the time.

I haven't really thought about licensing details probably just $50 for whole thing or something since I got some pointers from here.

Rentacoder Portfolio  -- You can view many projects here

http://www.demossoft.netfirms.com


mehri wrote:
Congrats on this Ty

If you don't mind me asking...

How is it being used?
Did you use any 3rd party software and if so what?
Is it possible for anyone to obtain a copy (either pay or free) of your work?
What were the licensing terms?  
 How much did you make off of this project?  
 
This type of information could help the rest of us better understand the industry of AIML.

-------------------

I'd like to really thank you guys for all your help I successfully completed
this project and got paid for it

http://medicine.science-tips.org  Medicine Science Tips
http://www.demossoft.netfirms.com





__________________