Computer Association of SIUE - Forums

CAOS Forums => Questions and Answers => Topic started by: raptor on 2007-10-30T13:43:57-05:00 (Tuesday)

Title: Pulling text from an RSS feed.
Post by: raptor on 2007-10-30T13:43:57-05:00 (Tuesday)
Okay guys (and girl),

I'm going to be involved in a project where I need to pull text from RSS feeds.  I've not dealt with too much web based coding, nor the back end of an RSS feed.  

So I guess what I'm looking for here is some general direction.  Something is making me think a Pearl-ish something or another lol.

Thanks,
Scott
Title: Re: Pulling text from an RSS feed.
Post by: William Grim on 2007-10-31T08:00:19-05:00 (Wednesday)
Use Java and StAX if you want to use a pull-based parser.  Java's HDOM is good if you need to use a tree-based parser.

There isn't a backend to RSS; it's simply structured XML.

Personally, I'd avoid perl, but it's up to you.
Title: Re: Pulling text from an RSS feed.
Post by: raptor on 2007-10-31T11:17:01-05:00 (Wednesday)
Looks like i have lots of learning to do :)
Title: Re: Pulling text from an RSS feed.
Post by: Justin Camerer on 2007-10-31T11:18:10-05:00 (Wednesday)
I (obviously) would suggest trying Ruby. There is a Ruby library available called Hpricot that parses html (and xml (and pretty much any ml)) and allows for really easy accessing of everything.

Ex.

Digg's RSS
 

 
    Digg
    en-ushttp://digg.com/
    Digg
   
      Your Blog Might Suck If...
      http://digg.com/offbeat_news/Your_Blog_Might_Suck_If
      IÃ,¿m not sure why but Jeff FoxworthyÃ,¿s Ã,¿You might be a redneckÃ,¿ routine always stuck with me. So, with a tip of the hat to the original, IÃ,¿d like to present Ã,¿Your Blog Might SuckÃ,¿.
     
      Wed, 31 Oct 2007 15:50:12 GMT
      http://digg.com/offbeat_news/Your_Blog_Might_Suck_If
      184
      msaleemhttp://digg.com/users/msaleem/m.png
      Offbeat News
      40
   

 




Small Ruby Program

require 'hpricot'

# Parses the xml into an Hpricot object
doc = Hpricot( digg_xml )

# Prints out the title and description to std out
puts "Title: " + (doc % 'item/title').inner_html
puts "Description: " + (doc % 'item/description').inner_html


Hope this helps. I promise it won't hurt to try Ruby.
Title: Re: Pulling text from an RSS feed.
Post by: William Grim on 2007-10-31T12:34:58-05:00 (Wednesday)
Depends on the performance necessary.  Java offers much better performance than ruby, but if it's something small, then okay.
Title: Re: Pulling text from an RSS feed.
Post by: raptor on 2007-10-31T13:00:13-05:00 (Wednesday)
I'm not yet sure on problem set size.  I'm under the impression that computer resources will not be an issue.

Very Interesting stuff though.