Filed under: aggregation

Updated news search and RSS feeds

We’ve tweaked the free RSS feeds section on our site and made it a bit more friendly to use. 

But the big new thrill is that you can run your keyword searches over our 600 news categories (‘woods’ over golf news, ‘black hole’ over space science news, etc) and effectively use the categories to filter the news for very targeted results.


screenshot moreover.com rss feed builder

new rss feed search 

 

Leave a Comment February 24, 2009

New language auto-detection over Blogs

We are pleased to announce the upcoming launch of improved language detection for blogs in the UGC Metabase in two weeks. We’re also introducing new blog lists sorted by language, so you can see all the English, French, German, Chinese blogs, etc, in our index.

And we’re adding a new date field, showing the time we indexed a particular post. This is in addition to the publish date already provided, as copied from the original XML/RSS feed.

 

1. Improved language detection at post level 

Blog feeds normally state which language they are in. However, this isn’t always reliable – typically blog publishing platforms have a default language setting, and bloggers do not always update their blogs to give their local language. The result is a significant portion of blog feeds with the wrong language. 

We’ve been working hard in the background to produce a more reliable approach to language detection. We’ll be rolling this out next month as the basis for setting the post’s language, as provided in the <language> tag. Only when this approach is unable to confidently determine the language, will we revert to using the language tag provided in the original XML as fallback.

 

2. New language tagging at feed level

 Further to this, we are adding a new <feedLanguage> tag, showing the language of the blog feed. This is in addition to the existing <language> tag referred to above, which is at post level. 

Adding language categorisation at feed level makes it possible to better organise the index by language – for example we can identify exactly which blogs are in French, which are in English, etc, and provide and manage these in lists.

The new language tag will appear in the UGC XML as follows

<feedLink>http://blog.moreover.com/feed/</feedLink> 
<feedLanguage>English</feedLanguage>
<generator>http://wordpress.org/?v=MU</generator>

 

3. Introducing a new Harvest Date field

Lastly, we’re adding a new <itemHarvestDate> field to the feed. This gives the time Moreover actually indexed the item. We already pass on the publish date of the post, as provided in the original XML/RSS feed — The new index time complements this tag and can provide, for example, additional information about the latency of indexing as it occurs across the feeds.

The new harvest date tag will appear in the UGC XML as follows:

<pubDate>2009-02-11 14:26:06.0</pubDate>
<itemHarvestDate>2009-03-13 18:38:21.0</itemHarvestDate>
<validDate>2009-03-13 18:37:18.0</validDate>

All times are shown in GMT.

 

We believe in being open and transparent about our crawling performance, and are confident about our technology. We invite comparison with other, similar services (for example, see Technorati and a recent comment on ReadWriteWeb), and welcome any feedback you, as customers and users, have.

.

1 Comment February 19, 2009

Search Engine Toolkit gets a new portal

We have a new home for our News Search API – the Search Engine Toolkit product portal. Now customers can login and get full search API details, all the search filters and output options (close to 30), along with example search integrations, a gallery and online FAQs and support.

Wondering what a Search Engine Toolkit does?

Skipping the smart-but-useless-Alec answer (“well that all depends on what you do with it….”), it basically gives you access to a live news search engine and a set of tools that let you control and focus the search results. There’s no frontend UI as such, just HTTP calls and RSS returns. That makes it inherently flexible, and how you integrate all depends on how and where you want your users to access and view news headlines (ok, back to smart Alec after all).

For example, the news search on our free feeds page uses the SET – it’s just a search box with a couple of filter options. Our own Newsdesk uses it too – here it’s powering a full enterprise application. Moreover client BusinessWeek on their part uses it to power news headlines in their Business Exchange network, while news prediction site HubDub uses it to automatically match headlines to users’ questions (see `related news` on this page). Media analysis company MediaMiser take the toolkit to integrate Moreover news in their enterprise service.

In a nutshell, there really is no set way for deploying news content, so the toolkit’s constructed to work to your scope and scale, and fit with any design.

(oh did we mention OpenCalais integration coming to a search engine toolkit near you soon??)

Leave a Comment November 27, 2008

Next page


Moreover Technologies

Our company blog with the latest news, product updates, media intelligence insights, and other fine fare out of our Dayton (OH), Reston (VA), and London (UK) offices!

Moreover Links

Latest Tweets

  • The RSS feed for this twitter account is not loadable for the moment.
  • By: Web Designer

Follow @moreovertech on twitter.

Tag Cloud

  Bookmark and Share
wordpress counter

Archives