<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Moreover Technologies Blog &#187; XML</title>
	<atom:link href="http://www.moreover.com/blog/tag/xml/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.moreover.com/blog</link>
	<description>Helping Companies Turn Mass Media into Media Intelligence</description>
	<lastBuildDate>Thu, 29 Nov 2012 14:00:36 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.4.2</generator>
		<item>
		<title>New language auto-detection over Blogs</title>
		<link>http://www.moreover.com/blog/2009/02/19/new-auto-language-detection-over-blogs/</link>
		<comments>http://www.moreover.com/blog/2009/02/19/new-auto-language-detection-over-blogs/#comments</comments>
		<pubDate>Thu, 19 Feb 2009 17:27:51 +0000</pubDate>
		<dc:creator>Brian Mackie</dc:creator>
				<category><![CDATA[aggregation]]></category>
		<category><![CDATA[metadata]]></category>
		<category><![CDATA[Moreover Technologies]]></category>
		<category><![CDATA[Products]]></category>
		<category><![CDATA[social media]]></category>
		<category><![CDATA[Social Media Metabase]]></category>
		<category><![CDATA[blog languages]]></category>
		<category><![CDATA[blogs]]></category>
		<category><![CDATA[XML]]></category>

		<guid isPermaLink="false">http://blog.moreover.com/?p=241</guid>
		<description><![CDATA[We are pleased to announce the upcoming launch of improved language detection for blogs in the UGC Metabase in two weeks. We&#8217;re also introducing new blog lists sorted by language, so you can see all the English, French, German, Chinese blogs, etc, in our index. And we&#8217;re adding a new date field, showing the time we indexed a particular post. [...]]]></description>
			<content:encoded><![CDATA[<p>We are pleased to announce the upcoming launch of improved language detection for blogs in the UGC Metabase in two weeks. We&#8217;re also introducing new blog lists sorted by language, so you can see all the English, French, German, Chinese blogs, etc, in our index.</p>
<p>And we&#8217;re adding a new date field, showing the time we indexed a particular post. This is in addition to the publish date already provided, as copied from the original XML/RSS feed.</p>
<p> </p>
<p><strong><span style="font-size:small;">1. Improved language detection at post level</span></strong> </p>
<p>Blog feeds normally state which language they are in. However, this isn&#8217;t always reliable &#8211; typically blog publishing platforms have a default language setting, and bloggers do not always update their blogs to give their local language. The result is a significant portion of blog feeds with the wrong language. </p>
<p>We&#8217;ve been working hard in the background to produce a more reliable approach to language detection. We&#8217;ll be rolling this out next month as the basis for setting the post&#8217;s language, as provided in the <strong>&lt;language&gt;</strong> tag. Only when this approach is unable to confidently determine the language, will we revert to using the language tag provided in the original XML as fallback.</p>
<p> </p>
<p><strong><span style="font-size:small;">2. New language tagging at feed level</span></strong></p>
<p> Further to this, we are adding a new <strong>&lt;feedLanguage&gt;</strong> tag, showing the language of the blog <em>feed</em>. This is in addition to the existing <strong>&lt;language&gt;</strong> tag referred to above, which is at <em>post</em> level. </p>
<p>Adding language categorisation at feed level makes it possible to better organise the index by language &#8211; for example we can identify exactly which blogs are in French, which are in English, etc, and provide and manage these in lists.</p>
<p>The new language tag will appear in the UGC XML as follows</p>
<blockquote><p><span style="color:#0000ff;"><span style="color:#000000;">&lt;feedLink&gt;http://blog.moreover.com/feed/&lt;/feedLink&gt; </span><br />
</span><span style="color:#3333ff;"><strong>&lt;feedLanguage&gt;English&lt;/feedLanguage&gt;</strong></span><br />
<span style="color:#000000;">&lt;generator&gt;<span class="tx">http://wordpress.org/?v=MU</span>&lt;/generator&gt;</span></p></blockquote>
<p> </p>
<p><strong><span style="font-size:small;">3. Introducing a new Harvest Date field</span></strong></p>
<p>Lastly, we&#8217;re adding a new <strong>&lt;itemHarvestDate&gt;</strong> field to the feed. This gives the time Moreover actually indexed the item. We already pass on the publish date of the post, as provided in the original XML/RSS feed &#8212; The new index time complements this tag and can provide, for example, additional information about the latency of indexing as it occurs across the feeds.</p>
<p>The new harvest date tag will appear in the UGC XML as follows:</p>
<blockquote><p><span class="135592920-12022009">&lt;pubDate&gt;2009-02-11 14:26:06.0&lt;/pubDate&gt;<br />
</span><strong><span style="color:#0000ff;">&lt;itemHarvestDate&gt;2009-03-13 18:38:21.0&lt;/itemHarvestDate&gt;</span></strong><br />
<span style="color:#000000;">&lt;validDate&gt;2009-03-13 18:37:18.0&lt;/validDate&gt;</span></p></blockquote>
<p>All times are shown in GMT.</p>
<p> </p>
<p><em>We believe in being open and transparent about our crawling performance, and are confident about our technology. We invite comparison with other, similar services (for example, see <a href="http://www.readwriteweb.com/archives/technorati_retiring_old_crawle.php" target="_self">Technorati and a recent comment on ReadWriteWeb</a></em><em>), and welcome any feedback you, as customers and users, have.</em></p>
<p>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.moreover.com/blog/2009/02/19/new-auto-language-detection-over-blogs/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>
<!-- This Quick Cache file was built for (  www.moreover.com/blog/tag/xml/feed/ ) in 0.10863 seconds, on Dec 3rd, 2012 at 4:15 am UTC. -->
<!-- This Quick Cache file will automatically expire ( and be re-built automatically ) on Dec 3rd, 2012 at 5:15 am UTC --