|
View:
New views
20 Messages
—
Rating Filter:
Alert me
|
| < Prev | 1 - 2 | Next > |
|
|
Help, Need A.L.I.C.E Web page SpiderPlease Please help with this.
I need a spider created that will crawl a webpage and grammatically parse the page and create AIML (Artificial Inelligence Markup Language) data. This data will be saved into an AIML file and used to teach a chatterbot the contents of the web page. The way we see it working is: 1. The spider crawls a page examining the text of each sentence. 2. Then using a grammatic parser it will reformulate that sentence data into possible patterns and responses to be entered as data in the AIML file. 3. Then it will format this into a standard AIML file and allow you to save this code or copy and paste it to another source This will require someone experience in AIML as well as grammatic sentence parsing. The idea for this project is to get the stand-alone script, possibilities of this being a desktop VB script. But I hear that existing perl and php extensions may make this easier. Open to other suggestions. Preferably a desktop application to start. |
|
|
Re: A.L.I.C.E Web page SpiderAIMLpad (Program N) can retrieve a web page using it scripting language.
With careful scripting it can probably find the text on the page. Once found, it can split the text into sentences. It can extract from those sentences nouns, verbs, etc. using WordNet. When used in conjunction with ConceptNet it can summarize the text into a few sentences or tag the sentences with the parts of speech or even predict the mood of the text. AIMLpad is designed to use its scripting language to create AIML categories and even put them into the appropriate files. When used with OpenCyc, it can reason about the concepts such as "humans eat food" or "humans have two hands", etc. OpenCyc does deductive reasoning. AIMLpad also has a simple expert rule system (both forwards and backwards chaining) built in where you could create the transformations to apply to the web page's text. Its fuzzy logic is not very effective since the multiple valued variables don't have confidence factors on each multivalue, but the framework is there to be fixed to make this work too. While all these tools are available in the one editor, it still is not an easy solution to scrape web pages for content and make it into AIML. In theory, it should be possible. In practice, it probably won't happen. On the otherhand, if the content is laid out correctly, i.e., question followed by answer followed by a blank line repeated for however many questions you have, AIMLpad will automatically convert that into AIML categories just by selecting one of its menu options (Tools --> Quick Build Categories...). I have used AIMLpad to get the quote or joke of the day or to get the weather report by hacking web pages. -----Original Message----- From: alicebot-developer-bounces@... [mailto:alicebot-developer-bounces@...]On Behalf Of Ty Ademosu Sent: Thursday, June 01, 2006 10:03 PM To: alicebot-developer@... Subject: [alicebot-developer] A.L.I.C.E Web page Spider Please Please help with this. I need a spider created that will crawl a webpage and grammatically parse the page and create AIML (Artificial Inelligence Markup Language) data. This data will be saved into an AIML file and used to teach a chatterbot the contents of the web page. The way we see it working is: 1. The spider crawls a page examining the text of each sentence. 2. Then using a grammatic parser it will reformulate that sentence data into possible patterns and responses to be entered as data in the AIML file. 3. Then it will format this into a standard AIML file and allow you to save this code or copy and paste it to another source This will require someone experience in AIML as well as grammatic sentence parsing. The idea for this project is to get the stand-alone script, possibilities of this being a desktop VB script. But I hear that existing perl and php extensions may make this easier. Open to other suggestions. Preferably a desktop application to start. -- View this message in context: http://www.nabble.com/A.L.I.C.E-Web-page-Spider-t1720827.html#a4674266 Sent from the Alicebot Developer forum at Nabble.com. _______________________________________________ alicebot-developer mailing list alicebot-developer@... http://list.alicebot.org/mailman/listinfo/alicebot-developer _______________________________________________ alicebot-developer mailing list alicebot-developer@... http://list.alicebot.org/mailman/listinfo/alicebot-developer |
|
|
Re: A.L.I.C.E Web page Spider--- Ty Ademosu <tyademosu@...> escreveu:
> I need a spider created that will crawl a webpage > and grammatically parse the page and create AIML > (Artificial Inelligence Markup Language) data. This > data will be saved into an AIML file and used to > teach a chatterbot the contents of the web page. You must understand that this is not a trivial request: you could start a whole project -- perhaps several -- to fulfill it. Then again, there are some paths I can envision for providing something of that sort in a relatively short time. First, the spider. I don't see much need of a full-fledged web robot, unless you intend the search to occur during a conversation. Even if this is the case, you could use some predefined resource, such as Wikipedia (http://www.wikipedia.org), and programatically drive it to retrieve pages (in Java, I'd use the java.net.URL class to do this). Otherwise, you can just manually download the pages you want to parse. Second, the grammatic parser. I found it surprisingly difficult to Google out a Free / Open Source parser, but you can look into Source Forge (http://sourceforge.net) for some promising projects. Anyway, why not use AIML itself for this? Most Alicebots already know how to break inputs into sentence lists; then, with the right set of AIML files, the bot could create the request / response pairs. For example, to retrieve responses for queries of the form "What is a ..?", you could use this category: [category] [pattern]* IS A *[/pattern] [template] (request: What is a <star/>?) (response: A <star/> is a <star index="2"/>) [/template] [/category] Third, the formatter: saving to a file the outputs from feeding webpages to the "pre-bot" outlined above, you can latter pass them to a simple regular-expression parser, that would transform them into AIML files. I could do this manually inside jEdit (http://www.jedit.org/), but most programming languages provide enough RegExp support for automating the task. These are my two cents on the subject. I believe that this would be a practical approach to the problem, which would allow you to get started at once: with an AIML parser and a RegExp-enabled text editor, you could already hack a test implementation. -- Ja mata ne. Helio Perroni Filho __________________________________________________ Fale com seus amigos de graça com o novo Yahoo! Messenger http://br.messenger.yahoo.com/ _______________________________________________ alicebot-developer mailing list alicebot-developer@... http://list.alicebot.org/mailman/listinfo/alicebot-developer |
|
|
Re: Help, Need A.L.I.C.E Web page SpiderDear Sirs I would like to invite you both to help me create this script. I have a project link below and I am willing to pay you for your help -- budget is a not too big. Please help with this I appeciate your inputs.
http://www.getafreelancer.com/projects/65872.html |
|
|
Re: Help, Need A.L.I.C.E Web page SpiderHi, I'm still looking for help and willing to pay. I can even accept a prototype or something close.
http://www.getafreelancer.com/projects/65872.html |
|
|
Re: Help, Need A.L.I.C.E Web page SpiderHow much are you willing to pay?
--- Ty Ademosu <tyademosu@...> wrote: > > Hi, I'm still looking for help and willing to pay. > I can even accept a > prototype or something close > -- > View this message in context: > http://www.nabble.com/Help%2C-Need-A.L.I.C.E-Web-page-Spider-t1720827.html#a4834270 > Sent from the Alicebot Developer forum at > Nabble.com. > > _______________________________________________ > alicebot-developer mailing list > alicebot-developer@... > http://list.alicebot.org/mailman/listinfo/alicebot-developer > .................................................o' \,=./ `o Mehri (o o) ---=--=---=--=--=---=--=--=--=--=---=--=--=-----ooO--(_)--Ooo--- __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com _______________________________________________ alicebot-developer mailing list alicebot-developer@... http://list.alicebot.org/mailman/listinfo/alicebot-developer |
|
|
Re: Help, Need A.L.I.C.E Web page SpiderPlease see project page below and place bid if interested thanks. I appreciate any help.
http://www.getafreelancer.com/projects/65872.html |
|
|
Re: Help, Need A.L.I.C.E Web page SpiderWell from that link it looks like you're willing to
pay at the most 100 bucks for the project. Is that correct? --- Ty Ademosu <tyademosu@...> wrote: > > Please see project page below and place bid if > interested thanks. I > appreciate any help. > > http://www.getafreelancer.com/projects/65872.html > -- > View this message in context: > http://www.nabble.com/Help%2C-Need-A.L.I.C.E-Web-page-Spider-t1720827.html#a4836780 > Sent from the Alicebot Developer forum at > Nabble.com. > > _______________________________________________ > alicebot-developer mailing list > alicebot-developer@... > http://list.alicebot.org/mailman/listinfo/alicebot-developer > .................................................o' \,=./ `o Mehri (o o) ---=--=---=--=--=---=--=--=--=--=---=--=--=-----ooO--(_)--Ooo--- __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com _______________________________________________ alicebot-developer mailing list alicebot-developer@... http://list.alicebot.org/mailman/listinfo/alicebot-developer |
|
|
Re: Help, Need A.L.I.C.E Web page SpiderThat's what I can afford thanks.
|
|
|
Re: Help, Need A.L.I.C.E Web page SpiderYou're trying to solve the below problem set? ----------------- 1. The spider crawls a page examining the text of each sentence. 2. Then using a grammatic parser it will reformulate that sentence data into possible patterns and responses to be entered as data in the AIML file. 3. Then it will format this into a standard AIML file and allow you to save this code or copy and paste it to another source ----------------- I'll tell you one way to do what you want and I'll tell you for free. You know C and C++? Here is a cross platform solution using those two languages. 1) Download and use libwwww http://www.w3.org/Library/User/Architecture/ That is going to be your web bot that is going to traverse the web and download text. The instructions, samples, and api to link to it is on that website. They've already written several web bots for you. So that solves #1. 2) Download and use link grammar: http://www.link.cs.cmu.edu/link/ This will categorize the words of the sentence into nouns, verbs, articles, etc... Use this to solve #2. 3) Write out your AIML by rearranging the text from the link grammar. QED Good luck regardless of the path you choose to implement. --- Ty Ademosu <tyademosu@...> wrote: > > That's what I can afford thanks. > -- > View this message in context: > http://www.nabble.com/Help%2C-Need-A.L.I.C.E-Web-page-Spider-t1720827.html#a4837612 > Sent from the Alicebot Developer forum at > Nabble.com. > > _______________________________________________ > alicebot-developer mailing list > alicebot-developer@... > http://list.alicebot.org/mailman/listinfo/alicebot-developer > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com _______________________________________________ alicebot-developer mailing list alicebot-developer@... http://list.alicebot.org/mailman/listinfo/alicebot-developer |
|
|
Re: A.L.I.C.E Web page SpiderGary --
1) How do I fetch a webpage with aimlpad. 2) I tried to connect to a public opencyc server but it's doesn't seem to be helping my chat sessions. 3) Also are there public worldnet and concept net servers and how would aimlpad connect to them. 4) Finally, I cannot seem disable the little merlin msagent, aimlpad crashes when I try to access that tab. Many Thanks Ty |
|
|
Re: A.L.I.C.E Web page SpiderTy--
Last question first... Does that happen on the view menu option where you check MS Agent on/off? To manually turn off the agent edit the options.ini file (found in the same folder as the AIMLpad.exe) and set the MSagent=0 , it will probably be a 1 or 2 before you change it to 0 to turn off the agent. The most likely reason the program stops when trying to set ms agent preferences is that the agent stuff is not installed correctly. There is a tutorial at AIMLpad.com for figuring out the correct installation of MS Agent. question 3. I am assuming you mean WordNet because Worldnet is an AT&T product. It is not necessary to use the public site for WordNet since as a dictionary of sorts it is just as current using it locally. The problem with WordNet is that it is a file base system and really needs to be on your computer to work efficiently. Actually I haven't visited WordNet for several months, so if there is a public interface through the internet, it is new to me. I'll have to look into that. Once WordNet is installed, you can use AIMLpad script commands to extract nouns, verbs, etc. from given text. There are tutorials at AIMLpad.com for learning how to do this too. The prime example adds additional categories (by substituting words) similar to a new one added for a response not matched. ConceptNet can be the public site if you want. I had to add a fix to the software for the server to offer the full functionality of the user interface. I doubt that the public version even exposes the XMLRPC services. If it does, the script code sample provided in your download can be easily modified to use that resource. You need to know that ConceptNet is only interfaced through the AIMLpad scripting language. The provided script examples show how to get related concepts, paraphrase a text block, calculate emotional content as well as a few other things like parsing sentences into their parts of speech. question 2. The AIML for utilizing OpenCyc needs to be loaded. Kino Coursey extended AIML with new tags to accommodate functions provided by Cyc. Unless the AIML is written to use those extensions, the connection to the public server will not be accessed. How AIMLpad operates with Cyc is explained in a separate pdf located in the doc folder of the standard install. I believe there are sample AIML files to try out included too. question 1. May I suggest something? Part of the problem with web pages is finding what text you are wanting to "index" or "translate". Perhaps you might want to consider RSS feeds which already have this organized in a way that makes it much easier to find. That said, I have not played with RSS yet (although I bet AIMLpad is capable of doing so.) This is another example where script commands are required. Again there are tutorials explaining the techniques. Briefly a script command named "url" which is followed by the http address loads the web pages into variables (predicates in the AIML terminology) starting with URLdata1 and continuing through URLdata2, URLdata3, etc. in chucks of 32K. If there is alot of advertisements on a page, you often have more than 32K and a whole bunch to wade through before you get to readable content like news or weather or even the joke of the day, etc. You can use the "find" command and the SUBSTR() function to extract the text you want. So here we are with text to convert. How is the question. How do we take text and extract patterns to index it? I suppose you could send it to ConceptNet to extract the keywords most likely that the text contains. Given words which the text is about, are you guessing forms like "What is ???" or "Who Is ???" to be the prompts to respond with the web search information? If you are thinking this, perhaps you want to explore the AnswerBus link which can be adapted to look up keywords and only needs a couple of AIML categories instead of manufacturing many, many categories from web crawling. The bottom line comes down to what kind of text do you want to turn into AIML. The techniques probably will vary for different kinds of information. Again, AIMLpad has some simple utilities for strictly formatted text to convert to AIML. If you can figure out the process for transforming the text, I'd be glad to help make the tool to do it. BTW, Google tries something like this and probably has the largest computer complex to do so in the world. IBM provides a specialized version of extracting from the web (called web fountain), but it is extremely expensive and hand tailored to each specific request. As I have said before, the greatest minds and billions and billions of dollars are currently dedicated to this - it is not a simple task. Hope this helps, Gary Dubuque -----Original Message----- From: alicebot-developer-bounces@... [mailto:alicebot-developer-bounces@...]On Behalf Of Ty Ademosu Sent: Thursday, June 15, 2006 1:17 PM To: alicebot-developer@... Subject: Re: [alicebot-developer] A.L.I.C.E Web page Spider Gary -- 1) How do I fetch a webpage with aimlpad. 2) I tried to connect to a public opencyc server but it's doesn't seem to be helping my chat sessions. 3) Also are there public worldnet and concept net servers and how would aimlpad connect to them. 4) Finally, I cannot seem disable the little merlin msagent, aimlpad crashes when I try to access that tab. Many Thanks Ty -- View this message in context: http://www.nabble.com/Help%2C-Need-A.L.I.C.E-Web-page-Spider-t1720827.html#a 4889869 Sent from the Alicebot Developer forum at Nabble.com. _______________________________________________ alicebot-developer mailing list alicebot-developer@... http://list.alicebot.org/mailman/listinfo/alicebot-developer _______________________________________________ alicebot-developer mailing list alicebot-developer@... http://list.alicebot.org/mailman/listinfo/alicebot-developer |
|
|
Re: A.L.I.C.E Web page SpiderI have a prototype in vb6 working. The output file looks like this
<aiml> <category> <pattern>What is the purpose of the talk.origins Usenet newsgroup?%</pattern> <template>The purpose of the talk.origins newsgroup is to provide a forum for discussion of issues related to biological and physical origins. See the talk.origins Newsgroup Welcome FAQ. </template> </category> <category> <pattern>What is the purpose of the Talk.Origins Archive?#!</pattern> <template>The purpose of the TO Archive is to provide easy access to the many FAQ (frequently asked question) files and essays have been posted to the Usenet newsgroup talk.origins. The Archive exists expressly to provide mainstream scientific responses to the many issues that appear in the talk.origins newsgroup. See the Talk.Origins Archive's Welcome Page and the Talk.Origins Archive's Must-Read FAQs. I thought evolution was just a theory. </template> </category> . . <aiml> Aimlpad chatbot does not seem to recognize this. I added the aiml file to the filelist.txt and hit learn but no real effect. As you can see above some of the questions do have strange characters after them like !, # etc could this be the problem or just bad formatting or are questions too long. |
|
|
Re: A.L.I.C.E Web page SpiderThose strange characters are likely to be the problem. Easy test, just
remove those extraneous characters and see how you go. Cheers, Stefan Zakarias. Ty Ademosu wrote: > I have a prototype in vb6 working. The output file looks like this > > <pattern>What is the purpose of the talk.origins Usenet > newsgroup?%</pattern> > > <pattern>What is the purpose of the Talk.Origins Archive?#!</pattern> > > > As you can see above > some of the questions do have strange characters after them like !, # etc > could this be the problem or just bad formatting or are questions too long. _______________________________________________ alicebot-developer mailing list alicebot-developer@... http://list.alicebot.org/mailman/listinfo/alicebot-developer |
|
|
Re: A.L.I.C.E Web page SpiderYeah I'm pretty sure I already tried that. Any other suggestions? Perhaps if I use another chatbot than aimlpad ? I think I'm using the right learning procedure but it's still not working.
|
|
|
Re: Help, Need A.L.I.C.E Web page SpiderI'd like to really thank you guys for all your help I successfully completed this project and got paid for it
Unix Tips http://www.demossoft.netfirms.com
|
|
|
Re: Help, Need A.L.I.C.E Web page SpiderCongrats on this Ty
If you don't mind me asking... How is it being used? Did you use any 3rd party software and if so what? Is it possible for anyone to obtain a copy (either pay or free) of your work? What were the licensing terms? How much did you make off of this project? This type of information could help the rest of us better understand the industry of AIML. ------------------- I'd like to really thank you guys for all your help I successfully completed this project and got paid for it http://medicine.science-tips.org Medicine Science Tips http://www.demossoft.netfirms.com _______________________________________________ alicebot-developer mailing list alicebot-developer@... http://list.alicebot.org/mailman/listinfo/alicebot-developer |
|
|
Re: Help, Need A.L.I.C.E Web page SpiderIt's a prototype created to teach a chatbot a websites content, specifically foodwiki.net and halowiki.net
I decided to use VB6 to manipulate word and ie to read site content and then created the aiml files -- no 3rd parties Well someone on this list offered about $50 for it but I haven't heard back. Miraculously, I charged the buyer $385 and it took about 1 month for the prototype, I'm just glad it's done. Not perfect but works ok most of the time. I haven't really thought about licensing details probably just $50 for whole thing or something since I got some pointers from here. Rentacoder Portfolio -- You can view many projects here http://www.demossoft.netfirms.com
|