<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Data Preprocessing &#8211; Lessons Learned</title>
	<atom:link href="http://armin.diotavelli.net/wordpress/2010/02/data-preprocessing-lessons-learned/feed/" rel="self" type="application/rss+xml" />
	<link>http://armin.diotavelli.net/wordpress/2010/02/data-preprocessing-lessons-learned/</link>
	<description>Digitale Notizen eines analogen Enthusiasten</description>
	<lastBuildDate>Sun, 25 Jul 2010 20:28:25 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: Raza</title>
		<link>http://armin.diotavelli.net/wordpress/2010/02/data-preprocessing-lessons-learned/comment-page-1/#comment-468</link>
		<dc:creator>Raza</dc:creator>
		<pubDate>Sun, 25 Jul 2010 20:28:25 +0000</pubDate>
		<guid isPermaLink="false">http://armin.diotavelli.net/wordpress/?p=57#comment-468</guid>
		<description>@OG Dude pre-processing can be pain but it is the MOST important step in any data mining projects. Unless feature selection/extraction and data representation is done right, your DM technique is next to being useless. Spending more time in pre-processing is actually a good thing. 

&quot;Worry about the data first before you worry about the algorithm&quot; - Peter Norvig</description>
		<content:encoded><![CDATA[<p>@OG Dude pre-processing can be pain but it is the MOST important step in any data mining projects. Unless feature selection/extraction and data representation is done right, your DM technique is next to being useless. Spending more time in pre-processing is actually a good thing. </p>
<p>&#8220;Worry about the data first before you worry about the algorithm&#8221; &#8211; Peter Norvig</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Tweets that mention Data Preprocessing – Lessons Learned « Let x = x -- Topsy.com</title>
		<link>http://armin.diotavelli.net/wordpress/2010/02/data-preprocessing-lessons-learned/comment-page-1/#comment-467</link>
		<dc:creator>Tweets that mention Data Preprocessing – Lessons Learned « Let x = x -- Topsy.com</dc:creator>
		<pubDate>Sun, 25 Jul 2010 13:08:48 +0000</pubDate>
		<guid isPermaLink="false">http://armin.diotavelli.net/wordpress/?p=57#comment-467</guid>
		<description>[...] This post was mentioned on Twitter by Alyona Medelyan, Tara. Tara said: RT @zelandiya: Data processing - Lessons learned: Tips from #nlproc grad student: http://bit.ly/bpifBY [...]</description>
		<content:encoded><![CDATA[<p>[...] This post was mentioned on Twitter by Alyona Medelyan, Tara. Tara said: RT @zelandiya: Data processing &#8211; Lessons learned: Tips from #nlproc grad student: <a href="http://bit.ly/bpifBY" rel="nofollow">http://bit.ly/bpifBY</a> [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: OG Dude</title>
		<link>http://armin.diotavelli.net/wordpress/2010/02/data-preprocessing-lessons-learned/comment-page-1/#comment-432</link>
		<dc:creator>OG Dude</dc:creator>
		<pubDate>Sun, 20 Jun 2010 10:42:26 +0000</pubDate>
		<guid isPermaLink="false">http://armin.diotavelli.net/wordpress/?p=57#comment-432</guid>
		<description>Yeah, I guess you could carve those truisms in stone... pre-processing especially is a b*tch. Frustrating to know you spend more time on squeezing stuff into the right shape than actually doing something noteworthy with the data.

Another thing I find hard to get right is the trade-off between &quot;early optimization&quot; - which is bad and doing a quick and dirty job - which is also bad. I mean most of the times you&#039;ll NEVER use code you wrote for a particular project again but if you&#039;re like me then doing a half-ass job under that premise is like watching people use MS Comic Sans in Powerpoint - it gives you cringes on so many levels...</description>
		<content:encoded><![CDATA[<p>Yeah, I guess you could carve those truisms in stone&#8230; pre-processing especially is a b*tch. Frustrating to know you spend more time on squeezing stuff into the right shape than actually doing something noteworthy with the data.</p>
<p>Another thing I find hard to get right is the trade-off between &#8220;early optimization&#8221; &#8211; which is bad and doing a quick and dirty job &#8211; which is also bad. I mean most of the times you&#8217;ll NEVER use code you wrote for a particular project again but if you&#8217;re like me then doing a half-ass job under that premise is like watching people use MS Comic Sans in Powerpoint &#8211; it gives you cringes on so many levels&#8230;</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: DrNI@AM</title>
		<link>http://armin.diotavelli.net/wordpress/2010/02/data-preprocessing-lessons-learned/comment-page-1/#comment-290</link>
		<dc:creator>DrNI@AM</dc:creator>
		<pubDate>Wed, 03 Mar 2010 07:18:16 +0000</pubDate>
		<guid isPermaLink="false">http://armin.diotavelli.net/wordpress/?p=57#comment-290</guid>
		<description>Congrats for finishing your thesis! And thanks for the advise in this post. It comes too late for me, hence I can confirm most of the points. :-)

We tend to think of pre-processing tools perfect and readily available things... depending on the application, the errors in sentence splitting, tokenization, and tagging multiply. Fixing these things could fill a thesis of its own.</description>
		<content:encoded><![CDATA[<p>Congrats for finishing your thesis! And thanks for the advise in this post. It comes too late for me, hence I can confirm most of the points. <img src='http://armin.diotavelli.net/wordpress/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> </p>
<p>We tend to think of pre-processing tools perfect and readily available things&#8230; depending on the application, the errors in sentence splitting, tokenization, and tagging multiply. Fixing these things could fill a thesis of its own.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
