<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Our Geek Space</title>
	<atom:link href="http://blog.moove-it.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.moove-it.com</link>
	<description>be free to express yourself...</description>
	<lastBuildDate>Tue, 20 Mar 2012 18:29:19 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>That Big Data problem &#8211; Thinking the Hadoop way</title>
		<link>http://blog.moove-it.com/that-big-data-problem-thinking-the-hadoop-way/</link>
		<comments>http://blog.moove-it.com/that-big-data-problem-thinking-the-hadoop-way/#comments</comments>
		<pubDate>Tue, 20 Mar 2012 18:29:19 +0000</pubDate>
		<dc:creator>Fernando Doglio</dc:creator>
				<category><![CDATA[big data]]></category>
		<category><![CDATA[hadoop]]></category>
		<category><![CDATA[moove-it]]></category>
		<category><![CDATA[analytics]]></category>

		<guid isPermaLink="false">http://blog.moove-it.com/?p=976</guid>
		<description><![CDATA[What is the “big data problem”? &#160; “On the night of July 9, 1958 an earthquake along the Fairweather Fault in the Alaska Panhandle loosened about 40 million cubic yards (30.6 million cubic meters) of rock high above the northeastern shore of Lituya Bay. This mass of rock plunged from an altitude of approximately 3000 [...]]]></description>
			<content:encoded><![CDATA[<h2 dir="ltr"><strong><strong>What is the “big data problem”?</strong></strong></h2>
<p>&nbsp;</p>
<blockquote><p>“On the night of July 9, 1958 an earthquake along the Fairweather Fault in the Alaska Panhandle loosened about 40 million cubic yards (30.6 million cubic meters) of rock high above the northeastern shore of Lituya Bay. This mass of rock plunged from an altitude of approximately 3000 feet (914 meters) down into the waters of Gilbert Inlet (see map below). The impact generated a local tsunami that crashed against the southwest shoreline of Gilbert Inlet. The wave hit with such power that it swept completely over the spur of land that separates Gilbert Inlet from the main body of Lituya Bay. The wave then continued down the entire length of Lituya Bay, over La Chaussee Spit and into the Gulf of Alaska. The force of the wave removed all trees and vegetation from elevations as high as 1720 feet (524 meters) above sea level. Millions of trees were uprooted and swept away by the wave. This is the highest wave that has ever been known.“ (quoted from <a href="http://geology.com/records/biggest-tsunami.shtml)">http://geology.com/records/biggest-tsunami.shtml)</a></p></blockquote>
<p>Now lets use our imagination a bit, and pretend we’re on a digital world, and that an even bigger wave can be seen on the horizon, only that the wave is made up of 1’s and 0’s. That’s the current status of information on the net right now.</p>
<p>A huge wave of data is being generated every second, ranging from user generated information such as tweets, status updates, uploaded pictures, blog posts, comments, text messages, e-mails and so on to machine generated data, like server access logs, error logs, transaction logs, etc.</p>
<p>And that’s not even the problem, the problem is that we need to start thinking in terms of TB or even PB of information, billions of rows instead of millions of them in order to be able to handle this big wave that’s coming.</p>
<p><span id="more-976"></span></p>
<p>Normally, when you have to store information on your application you ask yourself one basic question:</p>
<p><strong>What do I need this information for?</strong></p>
<p>And from the answer you get, you plan your storage and you start saving that specific information.</p>
<p>Lets look at an example, from two different perspectives:</p>
<h4>Traditional way of thinking:</h4>
<p>Say for example, you’re a web development company and you’re asked to create a basic web analytics  app for your company site.  So you ask yourself:</p>
<p><strong>What do I need the  information for?</strong></p>
<p>As an answer, you might get something like:</p>
<ul>
<li>To get number of visits to each page.</li>
<li>To get a list of referrer sites.</li>
<li>To get the number of unique visits.</li>
<li>To get a list of web browsers used on the site.</li>
</ul>
<p><strong><strong><br />
</strong></strong>It’s a short list, I know, but this is a basic example.</p>
<h5><em>Back to the problem:</em></h5>
<p>You have your answer, all that information can be fetched from the server’s access log, so you configure your log files to store that information, great! You’re done!</p>
<p>Yes, you’re done, you got your system ready, it shows the information you were asked to show, but you also closed the door to other potential analytics that could come out of the information stored on those access logs (like request method used, response code given, size of the object returned and so on) and other sources of information.</p>
<h3>Thinking in “big data” terms:</h3>
<p>Thinking in “big data” terms means (at least to me), saving all the information you’re working with on your project and then finding out new and exiting ways to interpret that information and get results out of it.</p>
<h5><em>Back to the problem, with the “big data” way of thinking this time:</em></h5>
<p>This time around, you think in “big data” terms, so you already have lots of data being saved for every visit, such as:</p>
<ul>
<li>Access log information.</li>
<li>Error log information.</li>
<li>User input (if there is any)</li>
<li>User behavior data (such as clicking patterns and smiliar)</li>
<li>and so on.</li>
</ul>
<p><strong><strong><br />
</strong></strong>That’s because when you created your website, you asked yourself a different question:</p>
<p><strong>What is all the information I can get from my website?</strong></p>
<p>And since you changed your question, you significantly change the answer to your problem.</p>
<p>You now have a vast amount of information to analyse and get insight from.</p>
<p>This is great, but where do we store all this log information? It could potentially become too much for a single machine and we don’t want to loose any information by rotating logs and using other techniques.</p>
<p>So another valid question would be:</p>
<p><strong>What kind of hardware do I need to store and process all that information in a timely manner?</strong></p>
<h3>What kind of hardware do we need then?</h3>
<p>We need some kind of setup that will allow us to:</p>
<ul>
<li>Store vasts amounts of data</li>
<li>Process this data in a timely manner</li>
<li>Be able to grow as much as we want (storage and processing power wise)</li>
<li>Be fault tolerant (storage and processing power wise)</li>
<li>Affordable</li>
</ul>
<p><strong><strong><br />
</strong></strong>That is a lot to ask (specially if we consider the last point) of a single computer, isn’t it? So the answer will probably come in the form of a distributed system.</p>
<h2><strong>Enter Hadoop</strong></h2>
<p><img class="alignright" style="border-image: initial; border-width: 1px; border-color: black; border-style: solid;" title="Hadoop" src="http://www.sara.nl/sites/default/files/Hadoop_logo.jpg" alt="" width="216" height="162" /></p>
<p>What is Hadoop? In a nutshell, Hadoop is the solution to our problems, one of them (mind you), but a pretty powerful one at that.</p>
<p>In more detail, <a href="http://hadoop.apache.org" target="_blank">Hadoop</a> is an Open Source Apache project, dedicated to solve two major problems related to big data:</p>
<ol>
<li>Where to store all of the information?</li>
<li>How to process that information at an affordable cost and in an reasonable amount of time?</li>
</ol>
<p><strong><strong><br />
</strong></strong>To answer these questions, Hadoop provides the following solutions:</p>
<h3></h3>
<h3></h3>
<h3></h3>
<h3></h3>
<h3></h3>
<h3></h3>
<h3></h3>
<h3></h3>
<h3><span style="text-decoration: underline;">HDFS</span></h3>
<p>This is the Hadoop Distributed File System, it allows us to store in a reliable way all the information we need.<br />
This works by interconnecting commodity machines (affordable) and using the resulting shared storage (Store vasts amounts of data).<br />
The HDFS takes whatever we throw at it and splits the files into evenly sized chunks of data, and then spreads them throughout the cluster. In this stage, it also replicates the files, providing data redundancy and fault tolerance.</p>
<p>Thanks to the HDFS we can have  as much storage capacity as we need, by adding new machines to the cluster (Be able to grow as much as we want ).<br />
We also gain a very important asset, that is fault tolerance. Since we’re replicating  the information into several nodes of the cluster, our commodity machines are free to fail and the only place where that will affect us is performance (no data loss or incomplete information).</p>
<h3><span style="text-decoration: underline;">MapReduce</span></h3>
<p>This is the other “leg” of Hadoop, an implementation of the MapReduce algorithm proposed by Google in 2004.</p>
<p>The MapReduce algorithm allows us to process large amounts of information (terabytes of information) in a distributed (thouthend of nodes) and fault tolerant manner (Process this data in a timely manner) . And if we consider that we already have a cluster of computers working for us with the HDFS, MapReduce is the perfect match to take advantage of that computational power sitting there on every node of the cluster.</p>
<p>This algorithm has two basic steps:</p>
<ol>
<li><strong>Map step</strong>: In this stage, the input data will be split into smaller chunks to be analyzed and transformed by processes called “mappers”. Thanks to the integration with the HDFS, the main node will effectively schedule map jobs to use the data that’s already on the nodes they’re running on, allowing the system to utilize very little bandwidth. The output of these mapper jobs will be a set of (key, value) tuples.</li>
<li><strong>Reduce step</strong>: The output of the mappers will be sent into the reduce jobs. These jobs will process the information with an added benefit of knowing that it’s input will be given in a sorted manner by the system. They’re main purpose is to aggregate the information given by the mappers and output only that which is needed.</li>
</ol>
<p><strong><strong><br />
</strong></strong>There is an implicit step between 1 and 2, that is the <strong>shuffle &amp; sort step</strong>, done by the system automatically. In this step, the system will sort the output of all mapper nodes by the key of each tuple and it’ll send these sorted results into the reduce nodes, assuring that all tuples with the same key will go to the same reducer.</p>
<div class="mceTemp mceIEcenter" style="text-align: center;">
<dl id="" class="wp-caption aligncenter" style="width: 600px;">
<dt class="wp-caption-dt"><img style="border-image: initial; border-width: 1px; border-color: black; border-style: solid;" src="https://lh5.googleusercontent.com/SdWtYHwQNXTHr58KwyDMeSCbWjn3q2yVh3xttmwyfGz-W3kQYVEuPEfnuwL6uqgwLDDUNHuSTZobVjr5jt0rHRDLZ7fkzlBJNwSsxI6Zz4UBzZ7Njvc" alt="" width="590px;" height="326px;" /></dt>
<dd class="wp-caption-dd">Graphical representation of the MapReduce steps</dd>
</dl>
</div>
<h2>Thinking the Hadoop way</h2>
<p>So, we have our data, we have our questions to ask to that data, we have our needs and we have our solution. What now?</p>
<p>Your following steps could include:</p>
<ol>
<li><strong>Installing and configuring your Hadoop cluster</strong>:  For this step, the company <a href="http://www.cloudera.com/" target="_blank">Cloudera</a> has a standarized distribution of hadoop, which they call Cloudera&#8217;s Distribution Including Apache Hadoop(CDH). You can download it for free and it comes with serveral other projects from the Hadoop ecosystem (such as Pig, Hive, Hbase, and so on). And for managing your cluster, you could use their Cloudera Manager, which allows you to manage up to 50 nodes for free.</li>
<li><strong>Upload the information to your HDFS</strong>.</li>
<li><strong>Transform your information using a MapReduce job</strong>: I consider this step to be optional. I would use a “hand-written” MapReduce job if I had transform my data set in a specific way in order to query it later on.</li>
<li><strong>Query your data set</strong>: There are several ways to do this, tools like <a href="http://pig.apache.org/" target="_blank">Pig </a>or <a href="http://hive.apache.org/" target="_blank">Hive</a>, allow you write MapReduce jobs (for data transformation) on a higher level language (PigLatin or SQL). Others like <a href="http://hbase.apache.org/" target="_blank">HBase</a> and <a href="http://cassandra.apache.org/">Cassandra</a> work better for quick queries to that data, they work directly over the HDFS ignoring the MapReduce framework, but you’re a bit limited on what you can do with the information.</li>
</ol>
<p>&nbsp;</p>
<h3>And finally, a pretty common question:</h3>
<p><em><strong>Is Hadoop the best solution for big data analysis out there?</strong></em></p>
<p>Probably not, since “the best” is always relative to your needs, but it’s a pretty darn good one, so give it a try.</p>
<p>Besides, all the cool kids are doing it:</p>
<ul>
<li>Facebook &#8211; 15 PB of information last time they revealed the number.</li>
<li>Ebay &#8211; 5.3 PB of information on their clusters.</li>
<li>LinkedIn</li>
<li>Twitter</li>
</ul>
<p><strong id="internal-source-marker_0.2652685425709933"><br />
And many others, check out the complete list <a href="http://wiki.apache.org/hadoop/PoweredBy" target="_blank">here</a>.</strong></p>
<p>&nbsp;</p>
<!-- PHP 5.x -->]]></content:encoded>
			<wfw:commentRss>http://blog.moove-it.com/that-big-data-problem-thinking-the-hadoop-way/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Interesting Ruby&#8217;s websites for beginners &#8211; cool initiatives</title>
		<link>http://blog.moove-it.com/ruby-websites-for-beginners-cool-initiatives/</link>
		<comments>http://blog.moove-it.com/ruby-websites-for-beginners-cool-initiatives/#comments</comments>
		<pubDate>Thu, 26 Jan 2012 19:29:47 +0000</pubDate>
		<dc:creator>Mariana De Carli</dc:creator>
				<category><![CDATA[learn]]></category>
		<category><![CDATA[news]]></category>
		<category><![CDATA[ruby]]></category>

		<guid isPermaLink="false">http://blog.moove-it.com/?p=967</guid>
		<description><![CDATA[The coolest thing about Ruby is that even though it’s a dynamic and reflective programming language it’s very easy to learn. More and more programmers around the world are interested in learning this new tool for making cool things. We’ve selected 3 interesting websites to help people (kids or adults) to know a little bit more [...]]]></description>
			<content:encoded><![CDATA[<p>The coolest thing about Ruby is that even though it’s a dynamic and reflective programming language it’s very easy to learn. More and more programmers around the world are interested in learning this new tool for making cool things. We’ve selected 3 interesting websites to help people (kids or adults) to know a little bit more of Ruby’s world.</p>
<h1>Kids Ruby (to kids)</h1>
<p><a title="kids ruby" href="http://kidsruby.com/" target="_blank"><img class="alignleft size-large wp-image-968" title="captura_kidsruby" src="http://blog.moove-it.com/wp-content/uploads/2012/01/captura_kidsruby-1024x277.jpg" alt="" width="717" height="194" /></a></p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p><strong><a href="http://kidsruby.com/">http://kidsruby.com</a></strong>: Kids Ruby is especially focused on kids, with a very easy interface which allows you to see the code, run it, and at the same time see what it outputs. Kids Ruby is also attractive for his Turtle graphics that allows you to draw pictures and have fun by mixing and trying colors. Kids Ruby includes a lot of useful resources and you don’t even need an internet connection to work. Developers also created a complete KidsRuby operating system based on Ubuntu Linux that makes program in Ruby a lot easier for kids.</p>
<h1>Rails for zombies</h1>
<p><a title="rails for zombies" href="http://railsforzombies.org/" target="_blank"><img class="alignleft size-full wp-image-969" title="railsforzombies" src="http://blog.moove-it.com/wp-content/uploads/2012/01/railsforzombies.jpeg" alt="" width="803" height="420" /></a></p>
<p>&nbsp;</p>
<p><strong><a href="http://railsforzombies.org/">http://railsforzombies.org/</a>: </strong>Rails for Zombies offers an open-source web framework with all the power of the Ruby language and with no additional configuration needed. In this site you can see tutorial videos which allow you to learn more about Ruby on Rails in just five levels. After seeing each video you’ll be challenged with cool exercises to practice your new skills. So if you’re a zombie and you’re hungry for Ruby’s knowledge this is the perfect site for you.</p>
<h1>Try Ruby</h1>
<p><a href="http://tryruby.org/" target="_blank"><img class="alignleft size-full wp-image-970" title="tryruby" src="http://blog.moove-it.com/wp-content/uploads/2012/01/tryruby.jpeg" alt="" width="922" height="474" /></a></p>
<p><strong><a href="http://tryruby.org/levels/1/challenges/0">http://tryruby.org/</a>: </strong>This website brings a very interactive Ruby tutorial; you can test new functions step by step and understand a little bit more about this language. In just 15 minutes and with a very interactive interface you can understand what Ruby is about. This site also allows you to save your progress by sign up for free at <strong>Code School</strong>.</p>
<p>Now that you have these very easy options to learn Ruby why don’t you try it out and maybe we’ll see you soon as a new member of Moove-IT’s team <img src='http://blog.moove-it.com/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' /> </p>
<!-- PHP 5.x -->]]></content:encoded>
			<wfw:commentRss>http://blog.moove-it.com/ruby-websites-for-beginners-cool-initiatives/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Digital Blackout against SOPA &#8211; PIPA</title>
		<link>http://blog.moove-it.com/digital-blackout-against-sopa-pipa/</link>
		<comments>http://blog.moove-it.com/digital-blackout-against-sopa-pipa/#comments</comments>
		<pubDate>Wed, 18 Jan 2012 19:45:56 +0000</pubDate>
		<dc:creator>Mariana De Carli</dc:creator>
				<category><![CDATA[news]]></category>
		<category><![CDATA[18 of january]]></category>
		<category><![CDATA[blackout]]></category>
		<category><![CDATA[google]]></category>
		<category><![CDATA[Jan 18]]></category>
		<category><![CDATA[PIPA]]></category>
		<category><![CDATA[SOPA]]></category>
		<category><![CDATA[wikipedia]]></category>
		<category><![CDATA[wordpress]]></category>

		<guid isPermaLink="false">http://blog.moove-it.com/?p=957</guid>
		<description><![CDATA[The largest online protest in the history of Internet is taking place today, more than one hundred sites, including the popular Wikipedia, Google and WordPress confirmed their participation in this digital blackout against the new anti-piracy laws of the USA. Stop Online Piracy Act (SOPA) and Protect IP Act (PIPA) will be voted on Jan [...]]]></description>
			<content:encoded><![CDATA[<p>The largest online protest in the history of Internet is taking place today, more than one hundred sites, including the popular Wikipedia, Google and WordPress confirmed their participation in this digital blackout against the new anti-piracy laws of the USA.</p>
<p><strong>Stop Online Piracy Act (SOPA) and Protect IP Act (PIPA) </strong>will be voted on Jan 24<sup>th</sup> by the Congress in attempt to pass internet censorship in the Senate. These two laws are probably the most rejected ones by Americans citizen because some of them consider that they affect the most appreciated thing on internet, freedom.</p>
<p>The <strong>SOPA</strong> law attempt to close any foreign site which sells or shares pirated content from the USA, including music, films, books and every product non authorized for free distribution on the internet. The <strong>PIPA </strong>law meanwhile has its focus directly on protect Intellectual Property Act, avoiding any economic threats and thefts to creativity.</p>
<p>Anyway, these laws have great support from big industries like National Cable &amp; Telecommunications Association, the National Association of Theatre Owners, Viacom, Copyright Alliance and NBC Universal, which argue that their businesses are dramatically affected by online piracy.</p>
<p>We will have to wait until next January 24 to see if the public opinion will have a direct influence on the fate of these laws.</p>
<p style="text-align: left;"><strong>Wikipedia.org – Home page Jan 18</strong><sup><strong>th</strong></sup></p>
<p style="text-align: left;"><a href="http://blog.moove-it.com/wp-content/uploads/2012/01/wikipedia_black.jpeg" rel="lightbox[957]" title="wikipedia_blackout"><img class="alignnone size-medium wp-image-960" title="wikipedia_blackout" src="http://blog.moove-it.com/wp-content/uploads/2012/01/wikipedia_black-300x172.jpg" alt="" width="300" height="172" /></a></p>
<p style="text-align: left;"><strong>Google.com – Home page Jan 18</strong><sup><strong>th</strong></sup></p>
<p style="text-align: left;"><a href="http://blog.moove-it.com/wp-content/uploads/2012/01/google_black.jpeg" rel="lightbox[957]" title="google_black"><img class="alignnone size-medium wp-image-959" title="google_black" src="http://blog.moove-it.com/wp-content/uploads/2012/01/google_black-300x188.jpg" alt="" width="300" height="188" /></a></p>
<p style="text-align: left;"><strong>WordPress.com – Home page Jan 18</strong><sup><strong>th</strong></sup></p>
<p style="text-align: left;"><a href="http://blog.moove-it.com/wp-content/uploads/2012/01/wordpress_black.jpeg" rel="lightbox[957]" title="wordpress_black"><img class="size-medium wp-image-958 alignleft" title="wordpress_black" src="http://blog.moove-it.com/wp-content/uploads/2012/01/wordpress_black-300x171.jpg" alt="" width="300" height="171" /></a></p>
<!-- PHP 5.x -->]]></content:encoded>
			<wfw:commentRss>http://blog.moove-it.com/digital-blackout-against-sopa-pipa/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Meet Moove-iT&#8217;s UX Group</title>
		<link>http://blog.moove-it.com/meet-moove-it-ux-group/</link>
		<comments>http://blog.moove-it.com/meet-moove-it-ux-group/#comments</comments>
		<pubDate>Thu, 24 Nov 2011 16:49:33 +0000</pubDate>
		<dc:creator>sebastian.suttner</dc:creator>
				<category><![CDATA[best practices]]></category>
		<category><![CDATA[html]]></category>
		<category><![CDATA[moove-it]]></category>
		<category><![CDATA[talks]]></category>
		<category><![CDATA[usablility]]></category>
		<category><![CDATA[ux]]></category>
		<category><![CDATA[workshops]]></category>

		<guid isPermaLink="false">http://blog.moove-it.com/?p=950</guid>
		<description><![CDATA[What do users really need? That&#8217;s the question any software developer should ask themselves. Here at moove-it, we always put ourselves in our clients shoes to understand their needs and give them exactly what they are looking for. In order to do so we&#8217;ve created the UX Department. From the moment we started gathering up [...]]]></description>
			<content:encoded><![CDATA[<p>What do users really need? That&#8217;s the question any software developer should ask themselves.</p>
<p>Here at moove-it, we always put ourselves in our clients shoes to understand their needs and give them exactly what they are looking for. In order to do so we&#8217;ve created the UX Department.</p>
<p>From the moment we started gathering up to discuss latest design patterns and the top UX tendencies, we knew something great would come out of it, and so it did. We managed to nurture the whole team with what we&#8217;ve learned, improve existing products and enhance new projects&#8217;s design from scratch.<br />
We&#8217;ll keep working as hard as possible on UX, not only because of how thrilled we&#8217;ve got with the results, but also because the way the users feel the product is what matters most.</p>
<p>We present partial conclusions found by the UX group. We share the presentation (in spanish)</p>
<div id="__ss_10310513" style="width: 425px;"><strong style="display: block; margin: 12px 0 4px;"></strong><object id="__sse10310513" width="425" height="355" classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="allowFullScreen" value="true" /><param name="allowScriptAccess" value="always" /><param name="wmode" value="transparent" /><param name="src" value="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=uxenproyectosdemooveit-111124103713-phpapp02&amp;stripped_title=ux-en-proyectos-de-mooveit&amp;userName=martincabrera" /><param name="allowscriptaccess" value="always" /><param name="allowfullscreen" value="true" /><embed id="__sse10310513" width="425" height="355" type="application/x-shockwave-flash" src="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=uxenproyectosdemooveit-111124103713-phpapp02&amp;stripped_title=ux-en-proyectos-de-mooveit&amp;userName=martincabrera" allowfullscreen="true" allowscriptaccess="always" wmode="transparent" /></object></p>
</div>
<!-- PHP 5.x -->]]></content:encoded>
			<wfw:commentRss>http://blog.moove-it.com/meet-moove-it-ux-group/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>It&#8217;s about timing baby!</title>
		<link>http://blog.moove-it.com/its-about-timing-baby/</link>
		<comments>http://blog.moove-it.com/its-about-timing-baby/#comments</comments>
		<pubDate>Tue, 15 Nov 2011 13:44:45 +0000</pubDate>
		<dc:creator>Andreas Fast</dc:creator>
				<category><![CDATA[ruby]]></category>
		<category><![CDATA[ruby on rails]]></category>

		<guid isPermaLink="false">http://blog.moove-it.com/?p=943</guid>
		<description><![CDATA[Yeah, it&#8217;s about timing. There was a problem in one of our projects at moove-it related to slow processing. There is a daemon spawning new threads to process certain new entries to the database. The entries come from a different system, that&#8217;s the reason for this program which processes each new entry. Sometimes at certain [...]]]></description>
			<content:encoded><![CDATA[<p>Yeah, it&#8217;s about timing.</p>
<p><a href="http://blog.moove-it.com/wp-content/uploads/2011/11/27169w0tgpcaxy.jpg" rel="lightbox[943]" title="Timing"><img class="size-medium wp-image-944 alignright" title="Timing" src="http://blog.moove-it.com/wp-content/uploads/2011/11/27169w0tgpcaxy-300x300.jpg" alt="" width="300" height="300" /></a></p>
<p>There was a problem in one of our projects at <a title="moove-it" href="http://moove-it.com" target="_blank">moove-it</a> related to slow processing. There is a daemon spawning new threads to process certain new entries to the database. The entries come from a different system, that&#8217;s the reason for this program which processes each new entry. Sometimes at certain hours of a day there are peaks in the entries to the database and the process will fall behind by about 20.000 entries or more. So we started analyzing the code to understand what was happening and why it took so long. We noted that each new thread the daemon spawned took about 5 seconds to complete its task. As we narrowed the measurement we came up with some code that took 5 seconds to execute but it only involved access to the database. So thanks to Aaron Patterson&#8217;s (<a title="@tenderlove" href="http://twitter.com/tenderlove" target="_blank">@tenderlove</a>) talk at RubyConf Uruguay about &#8220;Who makes the best asado&#8221; where he talked about rails and how it manages threads and database connections, we knew where to look.</p>
<p>What he explained is that each new thread requests its own database connection from the connection pool, and if there isn&#8217;t a free connection, rails waits for about 5 seconds and if after that there is no free connection it iterates over all the threads to take back the connections of dead threads(<a title="more info" href="http://tenderlovemaking.com/2011/10/20/connection-management-in-activerecord/" target="_blank">more info</a>). See the correlation with the 5 seconds I talked about in the previous paragraph? We immediately suspected that this was the problem. So we started searching the Rails API for a way to release the connection at the end of each thread&#8217;s execution. Surprisingly, we didn&#8217;t find an easy and understandable explanation anywhere at first googling <img src='http://blog.moove-it.com/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' /> , so we digged deeper and came up with the following line:</p>
<blockquote><p>ActiveRecord::Base.connection_handler.clear_active_connections!</p></blockquote>
<p>The ActiveRecord::Base.connection_handler method returns the connection handler for the current thread and the clear_active_connections method does what it looks like, or from the Rails doc: &#8220;Returns any connections in use by the current thread back to the pool, and also returns connections to the pool cached by threads that are no longer alive.&#8221;</p>
<p>So this line returns the connections in use by a thread to the pool and enables the new threads spawned by the daemon to use the freed connections. This way we avoid the 5 second wait for rails to free the connections for us.</p>
<p>This one line picked up our performance from processing 1.000 entries in almost 2 hours to processing 10.000 in 5 minutes. Nice huh?!</p>
<p>That&#8217;s it. I&#8217;m not sure if this is the best way of doing it since this method also &#8220;&#8230; returns connections to the pool cached by threads that are no longer alive.&#8221; I guess this means it does the iteration over all the threads Aaron mentioned, but as you can see I&#8217;m happy with the performance improvement. We are using Rails 3.0.5, Aaron said that he will change the behavior, read more about it <a title="here" href="http://tenderlovemaking.com/2011/10/20/connection-management-in-activerecord/" target="_blank">here</a>.</p>
<p>Special thanks to <a title="@cheloeloelo" href="http://twitter.com/cheloeloelo" target="_blank">@cheloeloelo</a> who helped detecting the problems and digging through the Rails API finding the proper method to free the connections.</p>
<p><a href="http://www.freedigitalphotos.net/images/view_photog.php?photogid=151">Image: Suat Eman / FreeDigitalPhotos.net</a></p>
<!-- PHP 5.x -->]]></content:encoded>
			<wfw:commentRss>http://blog.moove-it.com/its-about-timing-baby/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>We run Montevideo 2011 &#8211; 10K Nike competition</title>
		<link>http://blog.moove-it.com/we-run-montevideo-2011-10k-nike-competition/</link>
		<comments>http://blog.moove-it.com/we-run-montevideo-2011-10k-nike-competition/#comments</comments>
		<pubDate>Tue, 08 Nov 2011 19:05:19 +0000</pubDate>
		<dc:creator>Ariel Ludueña</dc:creator>
				<category><![CDATA[competitions]]></category>
		<category><![CDATA[moove-it]]></category>
		<category><![CDATA[competition]]></category>
		<category><![CDATA[marathon]]></category>
		<category><![CDATA[team]]></category>

		<guid isPermaLink="false">http://blog.moove-it.com/?p=922</guid>
		<description><![CDATA[Last Saturday a group of brave Moovetians decided to accept the challenge and run the Nike 10K competition. Nike 10K consist in running 10 kms along the coastline through some Montevideo’s neighborhoods enjoying the beautiful landscape. Take a look at the pictures. The whole team  Silvana, Martin and Nicolas Ariel before breaking the finish ribbon [...]]]></description>
			<content:encoded><![CDATA[<p>Last Saturday a group of brave Moovetians decided to accept the challenge and run the Nike 10K competition.</p>
<p>Nike 10K consist in running 10 kms along the coastline through some Montevideo’s neighborhoods enjoying the beautiful landscape.</p>
<p>Take a look at the pictures.</p>
<h3 style="text-align: left;">The whole team<span class="Apple-style-span" style="font-size: 13px; font-weight: normal;"> </span></h3>
<p style="text-align: center;"><a href="http://blog.moove-it.com/wp-content/uploads/2011/11/Nike-We-run-Montevideo-2011-358.jpg" rel="lightbox[922]" title="The whole team"><img class="aligncenter size-full wp-image-923" title="The whole team" src="http://blog.moove-it.com/wp-content/uploads/2011/11/Nike-We-run-Montevideo-2011-358.jpg" alt="" width="675" height="448" /></a></p>
<h3>Silvana, Martin and Nicolas</h3>
<p style="text-align: center;"><a href="http://blog.moove-it.com/wp-content/uploads/2011/11/Nike-We-run-Montevideo-2011-344.jpg" rel="lightbox[922]" title="Nike-We-run-Montevideo-2011-344"><img class="aligncenter size-full wp-image-928" title="Nike-We-run-Montevideo-2011-344" src="http://blog.moove-it.com/wp-content/uploads/2011/11/Nike-We-run-Montevideo-2011-344.jpg" alt="" width="675" height="448" /></a></p>
<h3>Ariel before breaking the finish ribbon  :-P</h3>
<p style="text-align: center;"><a href="http://blog.moove-it.com/wp-content/uploads/2011/11/Nike-We-run-Montevideo-2011-071.jpg" rel="lightbox[922]" title="Nike-We-run-Montevideo-2011-071"><img class="aligncenter size-full wp-image-932" title="Nike-We-run-Montevideo-2011-071" src="http://blog.moove-it.com/wp-content/uploads/2011/11/Nike-We-run-Montevideo-2011-071.jpg" alt="" width="675" height="448" /></a></p>
<h3>Bird´s eye view</h3>
<p><a href="http://blog.moove-it.com/wp-content/uploads/2011/11/389452_10150450348467704_37391212703_10478562_1447301236_n.jpg" rel="lightbox[922]" title="Bird's eye view"><img class="aligncenter size-full wp-image-938" title="Bird's eye view" src="http://blog.moove-it.com/wp-content/uploads/2011/11/389452_10150450348467704_37391212703_10478562_1447301236_n.jpg" alt="" width="800" height="432" /></a></p>
<!-- PHP 5.x -->]]></content:encoded>
			<wfw:commentRss>http://blog.moove-it.com/we-run-montevideo-2011-10k-nike-competition/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Dart &#8211; A new language for structured web programming</title>
		<link>http://blog.moove-it.com/dart-a-new-language-for-structured-web-programming/</link>
		<comments>http://blog.moove-it.com/dart-a-new-language-for-structured-web-programming/#comments</comments>
		<pubDate>Wed, 02 Nov 2011 13:40:47 +0000</pubDate>
		<dc:creator>Andreas Fast</dc:creator>
				<category><![CDATA[moove-it]]></category>

		<guid isPermaLink="false">http://blog.moove-it.com/?p=905</guid>
		<description><![CDATA[On October 10th Lars Bak &#38; Gilad Bracha presented a technology preview on Dart. Lars Bak &#38; Gilad Bracha are Google employees leading the development of Dart. Dart is open source, so anyone can use and change it. It’s still in the early stages but the design goals are very clear. It aims to be [...]]]></description>
			<content:encoded><![CDATA[<div>On October 10th Lars Bak &amp; Gilad Bracha presented a technology preview on Dart. Lars Bak &amp; Gilad Bracha are Google employees leading the development of Dart. Dart is open source, so anyone can use and change it. It’s still in the early stages but the design goals are very clear. It aims to be a structured yet flexible programming language for the web. To feel familiar and be easy to learn, focus on high performance and fast startup. To be appropriate for all devices from phones and tablets to notebooks and servers. There is also a lot of work being done on tools for Dart to run fast on all major modern browsers. It runs on a Virtual Machine on the server and there is a tool to compile the code to javascript to run it on a browser. It also provides a DOM api.<br />
The following presentation shows the basics of the language including some examples. In addition, here are some photos of the presentation at moove-iT!</div>
<div>

<a href='http://blog.moove-it.com/dart-a-new-language-for-structured-web-programming/dart1/' title='Dart1'><img width="150" height="150" src="http://blog.moove-it.com/wp-content/uploads/2011/11/Dart1-150x150.jpg" class="attachment-thumbnail" alt="Dart1" title="Dart1" /></a>
<a href='http://blog.moove-it.com/dart-a-new-language-for-structured-web-programming/dart2/' title='Dart2'><img width="150" height="150" src="http://blog.moove-it.com/wp-content/uploads/2011/11/Dart2-150x150.jpg" class="attachment-thumbnail" alt="Dart2" title="Dart2" /></a>
<a href='http://blog.moove-it.com/dart-a-new-language-for-structured-web-programming/dart3/' title='Dart3'><img width="150" height="150" src="http://blog.moove-it.com/wp-content/uploads/2011/11/Dart3-150x150.jpg" class="attachment-thumbnail" alt="Dart3" title="Dart3" /></a>

</div>
<p>&nbsp;</p>
<div style="width:595px" id="__ss_9976959"> <strong style="display:block;margin:12px 0 4px"><a href="http://www.slideshare.net/ahfast/structured-web-programming-9976959" title="Structured web programming" target="_blank">Structured web programming</a></strong> <object id="__sse9976959" width="595" height="497"><param name="movie" value="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=structuredwebprogramming1-111101090531-phpapp01&#038;rel=0&#038;stripped_title=structured-web-programming-9976959&#038;userName=ahfast" /><param name="allowFullScreen" value="true"/><param name="allowScriptAccess" value="always"/><embed name="__sse9976959" src="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=structuredwebprogramming1-111101090531-phpapp01&#038;rel=0&#038;stripped_title=structured-web-programming-9976959&#038;userName=ahfast" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="595" height="497"></embed></object> </div>
<!-- PHP 5.x -->]]></content:encoded>
			<wfw:commentRss>http://blog.moove-it.com/dart-a-new-language-for-structured-web-programming/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Second RubyConf in Uruguay &#8211; 11th and 12th November</title>
		<link>http://blog.moove-it.com/second-rubyconf-in-uruguay-11th-12th-november/</link>
		<comments>http://blog.moove-it.com/second-rubyconf-in-uruguay-11th-12th-november/#comments</comments>
		<pubDate>Wed, 02 Nov 2011 01:19:45 +0000</pubDate>
		<dc:creator>Gabriela Isnardi</dc:creator>
				<category><![CDATA[moove-it]]></category>
		<category><![CDATA[news]]></category>
		<category><![CDATA[ruby]]></category>
		<category><![CDATA[ruby on rails]]></category>
		<category><![CDATA[talks]]></category>
		<category><![CDATA[workshops]]></category>
		<category><![CDATA[rails]]></category>
		<category><![CDATA[rubyconf]]></category>
		<category><![CDATA[uruguay]]></category>

		<guid isPermaLink="false">http://blog.moove-it.com/?p=900</guid>
		<description><![CDATA[We are sponsoring one of the greatest technology events here in Uruguay. The Second RubyConf taking place within less than two weeks, the 11th and 12th November 2011, where many IT experts from all over the world get together in order to be immersed in this dynamic world and and up to date get with [...]]]></description>
			<content:encoded><![CDATA[<p>We are sponsoring one of the greatest technology events here in Uruguay. The Second RubyConf taking place within less than two weeks, the 11th and 12th November 2011, where many IT experts from all over the world get together in order to be immersed in this dynamic world and and up to date get with the latest trends of Ruby and Agile methodologies.</p>
<p><a title="RubyConf Uruguay" href="http://rubyconfuruguay.org/" target="_blank">RubyConf Uruguay 2011</a></p>
<p>We are hungry for knowledge and refreshment, and we all want to be on the same train.</p>
<p>Please welcome all the new members to this awesome community. And help spreading the news, but even more important, do not miss the opportunity to meet the experts, discuss the future of RoR, and be Rail!</p>
<p><a href="http://blog.moove-it.com/wp-content/uploads/2011/11/newspaper-advertising-final-01-1024x710.jpg" rel="lightbox[900]" title="newspaper-advertising-final-01"><img class="alignleft size-large wp-image-901" title="newspaper-advertising-final-01" src="http://blog.moove-it.com/wp-content/uploads/2011/11/newspaper-advertising-final-01-1024x710.jpg" alt="" width="717" height="497" /></a></p>
<p>&nbsp;</p>
<!-- PHP 5.x -->]]></content:encoded>
			<wfw:commentRss>http://blog.moove-it.com/second-rubyconf-in-uruguay-11th-12th-november/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>You can’t help but Mooving…  (percussion workshop)</title>
		<link>http://blog.moove-it.com/you-can%e2%80%99t-help-but-mooving-percussion-workshop/</link>
		<comments>http://blog.moove-it.com/you-can%e2%80%99t-help-but-mooving-percussion-workshop/#comments</comments>
		<pubDate>Mon, 17 Oct 2011 18:59:24 +0000</pubDate>
		<dc:creator>Gabriela Isnardi</dc:creator>
				<category><![CDATA[moove-it]]></category>
		<category><![CDATA[news]]></category>
		<category><![CDATA[workshops]]></category>

		<guid isPermaLink="false">http://blog.moove-it.com/?p=875</guid>
		<description><![CDATA[Our main tradition, heritage and passion: The beat of the “Uruguayan” drums. It doesn’t matter if you listen to the drums once a month, everyday, or once a year.  When it comes to local music, there is nothing more Uruguayan than the sound produced by rhythmically striking a drum, and especially when playing Candombe, a unique [...]]]></description>
			<content:encoded><![CDATA[<p>Our main tradition, heritage and passion: The beat of the “Uruguayan” drums.</p>
<p>It doesn’t matter if you listen to the drums once a month, everyday, or once a year.  When it comes to local music, there is nothing more Uruguayan than the sound produced by rhythmically striking a drum, and especially when playing Candombe, a unique way of percussion, which will make your hair stand on its end.</p>
<p>Team building activities can range from treasure hunts to Safari trips, though this time, we have decided to do one which people could easily identify with, and which does not require sophisticated skills, but the desire to unwind, switch off and connect. Drumming workshops come first, the bonding is just a consequence.</p>
<p>Last week we had our first percussion workshop. Pablo Leites, an excellent musician and percussionist, also known as “Gancho” has been our instructor.  He has also been Martin Cabrera’s (Moove-IT cofounder) best friend for a long time.</p>
<p>Please have a look at the following pictures…</p>
<table>
<tbody>
<tr>
<td><a href="http://blog.moove-it.com/wp-content/uploads/2011/10/chica_DSC_0017.jpg" rel="lightbox[875]" title="chica_DSC_0017"><img class="alignleft size-medium wp-image-885" title="chica_DSC_0017" src="http://blog.moove-it.com/wp-content/uploads/2011/10/chica_DSC_0017-300x200.jpg" alt="" width="300" height="200" /></a></td>
<td><a href="http://blog.moove-it.com/wp-content/uploads/2011/10/chica_DSC_00060.jpg" rel="lightbox[875]" title="chica_DSC_00060"><img src="http://blog.moove-it.com/wp-content/uploads/2011/10/chica_DSC_00060-200x300.jpg" alt="" title="chica_DSC_00060" width="200" height="300" class="alignleft size-medium wp-image-895" /></a></td>
</tr>
<tr>
<td><a href="http://blog.moove-it.com/wp-content/uploads/2011/10/chica_DSC_0060.jpg" rel="lightbox[875]" title="chica_DSC_0060"><img class="alignleft size-medium wp-image-882" title="chica_DSC_0060" src="http://blog.moove-it.com/wp-content/uploads/2011/10/chica_DSC_0060-200x300.jpg" alt="" width="200" height="300" /></a></td>
<td><a href="http://blog.moove-it.com/wp-content/uploads/2011/10/chica_DSC_0036.jpg" rel="lightbox[875]" title="chica_DSC_0036"><img class="alignleft size-medium wp-image-885" title="chica_DSC_0036" src="http://blog.moove-it.com/wp-content/uploads/2011/10/chica_DSC_0036-300x200.jpg" alt="" width="300" height="200" /></a></td>
</tr>
<tr>
<td><a href="http://blog.moove-it.com/wp-content/uploads/2011/10/chica_DSC_0054.jpg" rel="lightbox[875]" title="chica_DSC_0054"><img class="alignleft size-medium wp-image-885" title="chica_DSC_0054" src="http://blog.moove-it.com/wp-content/uploads/2011/10/chica_DSC_0054-300x200.jpg" alt="" width="300" height="200" /></a></td>
<td><a href="http://blog.moove-it.com/wp-content/uploads/2011/10/chica_DSC_0056.jpg" rel="lightbox[875]" title="chica_DSC_0056"><img class="alignleft size-medium wp-image-885" title="chica_DSC_0056" src="http://blog.moove-it.com/wp-content/uploads/2011/10/chica_DSC_0056-300x200.jpg" alt="" width="300" height="200" /></a></td>
</tr>
<tr>
<td><a href="http://blog.moove-it.com/wp-content/uploads/2011/10/chica_DSC_0082.jpg" rel="lightbox[875]" title="chica_DSC_0082"><img class="alignleft size-medium wp-image-885" title="chica_DSC_0082" src="http://blog.moove-it.com/wp-content/uploads/2011/10/chica_DSC_0082-300x200.jpg" alt="" width="300" height="200" /></a></td>
<td><a href="http://blog.moove-it.com/wp-content/uploads/2011/10/chica_DSC_0088.jpg" rel="lightbox[875]" title="chica_DSC_0088"><img class="alignleft size-medium wp-image-885" title="chica_DSC_0088" src="http://blog.moove-it.com/wp-content/uploads/2011/10/chica_DSC_0088-300x200.jpg" alt="" width="300" height="200" /></a></td>
</tr>
<tr>
<td><a href="http://blog.moove-it.com/wp-content/uploads/2011/10/chica_DSC_0076.jpg" rel="lightbox[875]" title="chica_DSC_0076"><img class="alignleft size-medium wp-image-885" title="chica_DSC_0076" src="http://blog.moove-it.com/wp-content/uploads/2011/10/chica_DSC_0076-300x200.jpg" alt="" width="300" height="200" /></a></td>
<td><a href="http://blog.moove-it.com/wp-content/uploads/2011/10/chica_DSC_0016.jpg" rel="lightbox[875]" title="chica_DSC_0016"><img class="alignleft size-medium wp-image-885" title="chica_DSC_0016" src="http://blog.moove-it.com/wp-content/uploads/2011/10/chica_DSC_0016-300x200.jpg" alt="" width="300" height="200" /></a></td>
</tr>
</tbody>
</table>
<p><strong>Why percussion</strong></p>
<p>Why is percussion so important to us? Being a country of immigrants, Uruguay was formed by people from all over the world. And like it always happens, music has played a tremendous positive role in bringing people together and creating stronger, more significant bonds. We connect with our basic instincts, we forget about language barriers, cultural differences, rank, and we just let ourselves feel and relax.  Just listen to a few strikes and you will feel multicolor, ageless and energized.</p>
<p>You may have heard of<em> Las Llamadas</em> (The Callings), a popular annual event during Carnival here in Uruguay, which gathers thousands of people from all over the world. The drums are the main stars, and the African music roots brought by the people once made slaves in this country (and happily freed more than 150 years ago) are now our truly genuine and local music.</p>
<p>I personally love this rhythm, and even though I am not a music expert I will recognize its pace wherever I go. I am not sure if it is the adrenaline than runs through your body, or the inseparable link to human nature, but percussion makes your body shake, like toddlers instinctively struggling to move their bodies to the rhythm of the music.</p>
<p>Team Building !!</p>
<!-- PHP 5.x -->]]></content:encoded>
			<wfw:commentRss>http://blog.moove-it.com/you-can%e2%80%99t-help-but-mooving-percussion-workshop/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Alternatives to full text queries (part II)</title>
		<link>http://blog.moove-it.com/alternatives-to-full-text-queries-part-ii/</link>
		<comments>http://blog.moove-it.com/alternatives-to-full-text-queries-part-ii/#comments</comments>
		<pubDate>Wed, 12 Oct 2011 18:48:23 +0000</pubDate>
		<dc:creator>Fernando Doglio</dc:creator>
				<category><![CDATA[moove-it]]></category>
		<category><![CDATA[full text]]></category>
		<category><![CDATA[lucene]]></category>
		<category><![CDATA[solr]]></category>
		<category><![CDATA[sphinx]]></category>
		<category><![CDATA[xapian]]></category>

		<guid isPermaLink="false">http://blog.moove-it.com/?p=653</guid>
		<description><![CDATA[For part one, click here&#8230; What do they have in common and what makes them different? Even though it’s hard to come up with a comparison table between all four alternatives, mainly because I can’t claim to have personal experience with all of them, the Internet has a lot of information on the subject, so [...]]]></description>
			<content:encoded><![CDATA[<p>For part one, <a href="http://blog.moove-it.com/alternatives-to-full-text-queries-part-i/">click here</a>&#8230;</p>
<p><span style="font-size: large;"><strong>What do they have in common and what makes them different?</strong></span></p>
<p>Even though it’s hard to come up with a comparison table between all four alternatives, mainly because I can’t claim to have personal experience with all of them, the Internet has a lot of information on the subject, so I went ahead and did a bit of research on the matter.</p>
<p>Another point of interest to consider is that though on the long run, all four solutions provide very similar services; they do it a bit differently, since they can be categorized into two places:</p>
<ul>
<li><em>Full text search servers</em>: They provide a finished solution, ready for the developers to install and interact with. You don’t have to integrate them into your application; you only have to interact with them. In here we have <strong>Solr </strong>and <strong>Sphinx</strong>.</li>
<li><em>Full text search APIs</em>: They provide the functionalities needed by the developer, but at a lower level. You’ll need to integrate these APIs into your application, instead of just consuming it’s services through a standard interface (like what happens with the servers). In here, we have the <strong>Lucene Project </strong>and the <strong>Xapian project</strong>.</li>
</ul>
<p>Taking all of this into account, we can now proceed into a more in-depth discussion about our options:</p>
<p><span id="more-653"></span></p>
<h3><em><strong>Full text search servers</strong></em></h3>
<p>Like I’ve discussed before, search servers provide an end solution, ready to be installed, to be tweaked into your needs and to be interfaced with.</p>
<p>The installing and tweaking process can be hard, depending on your specific needs, but the bright side is that all of it is done <strong>outside your application </strong>and your code. Once you’re done and the server is installed according to your needs, the only step left is interacting with it from your application using one of the methods provided by the server.</p>
<p>So, what do they have in common?</p>
<ul>
<li>They’ll both satisfy your needs regarding searching and indexing speed, since they do it very efficiently.</li>
<li>They both have a long list of high-traffic sites using them, as we’ve seen above.</li>
<li>Both offer commercial support, which is great if you’re planning on developing a commercial application to use them.</li>
<li>Both offer client API bindings for several platforms/languages.</li>
<li>Both can be distributed to increase speed and capacity.</li>
<li>They both have great support for advanced querying. This allows them to use proximity search, relevance sorting and so on.</li>
</ul>
<p>Some differences:</p>
<ul>
<li>Solr is an Apache Project built on top of the Lucene Project, allowing it to improve some of it’s features whenever the Lucene projects updates. Sphinx is an isolated project, which requires it to improve independently.</li>
<li>Solr supports the use of wildcards for it’s searches, while the current release version of<br />
Sphinx (0.9) does not. The latest beta for Sphinx (2.0) though, appears to be supporting this feature.</li>
<li>Solr comes with spell check for search terms out of the box, while Sphinx does not.</li>
<li>Solr can parse rich text formats like Word documents, PDF files and so on; Sphinx does not provide this feature out of the box.</li>
<li>Sphinx integrates more tightly with RDBMSs, especially MySQL since it was built with this functionality in mind.</li>
<li>Sphinx can use the stopwords for a more relevant result ranking by default, whether Solr extracts them before doing the search.</li>
<li>Sphinx supports SQL as a query language, when Solr forces you to learn it’s own language.</li>
<li>Sphinx does not support multithreading on Windows machines, which slows down it’s performance on this OS.</li>
</ul>
<p><strong>So, which one is better?</strong></p>
<p>As you might have guessed, there is straight answer to this question, since they both do a similar job, but have different strong and weak points. This allows them to be better than the other for some specific cases.</p>
<p>On an out-of-the-box basis, for a generic search engine implementation, my opinion (which should be taken lightly, since it’s only the opinion of a single developer) would be to implement it with Solr.</p>
<p>This resolution comes based on the features that Solr provides out of the box. Some of them, being quite important for a search engine, such as wildcard support, rich text format parsing, and so on.</p>
<p>On the other hand, if your needs are as specific as indexing content from a database, then Sphinx would be the way to go, since it appears to have integration with that kind of content natively, which would assure a higher performance over other solutions.</p>
<h3><em><strong>Full text search APIs</strong></em></h3>
<p>In this case, we’ll be comparing APIs, which could be thought as “tools” to add to the “tool box” provided by the programming language you’re using.</p>
<p>Our two contestants are the <strong>Lucene Project </strong>and the <strong>Xapian project</strong>. They both have quite a number of followers and applications that use them, so lets see what else they have in common:</p>
<ul>
<li>They’re both highly portable. Lucene is written in JAVA, and thus it works on any JAVA capable OS. Xapian on the other hand is written in C++ and has support for most of the operative systems on the market.</li>
<li>They both support rich text formats, which is definitely a plus if you’re trying to index any type of information (no need to pre-process them yourself).</li>
<li>They both support advance search mechanics, like wildcards, proximity search, stemming, and so on.</li>
<li>They’re both able to add data into their index on a real-time basis, making information accessible immediately.</li>
</ul>
<p>Basically, Lucene and Xapian are both very much alike: they’re both very powerful and very customizable. And yet, they have some differences, as we’ll see next.</p>
<p>Some of the differences</p>
<ul>
<li>Lucene does not support faceted search, whilst Xapian does.</li>
<li>Lucene does not support spelling corrections out of the box, whilst Xapian does.</li>
<li>Xapian has support for synonyms out of the box, Lucene does not.</li>
<li>Lucene has been ported into other programming languages (like PHP, Ruby, C#, etc, etc) providing Lucene-like APIs for those languages.</li>
<li>Xapian provides bindings for other languages, but maintains a core API that’s always the same, securing that no matter which binding it is that you’re using, you’re always working with the official distribution and not a poorly done copy.</li>
</ul>
<p><strong>So, which one is better?</strong></p>
<p>In this case, the answer (according to my research) to this question is not, as one would expect, the same as the one between Sphinx and Solr.</p>
<p>According to what we can see from our list of common and different features, one would be inclined to believe that Xapian is the way to go (at least I am!) since it is clearly superior (on an out-of-the-box basis of course).</p>
<p>So, why would we ever pick Lucene? Well, for starters, Lucene’s ports tend to have a good integration with some of those languages’ most famous frameworks (on php, Zend framework implements the Lucene search API, Ruby has it’s own implementation as well, called Ferret, which in turn has a RoR plug-in, and so on), which would ease the development process considerably. This could actually be a major point in favor of Lucene, if what we’re already using (or are planning on using) one of those frameworks.</p>
<h3><strong>Final thoughts</strong></h3>
<p>All in all, there are many solutions out there worth the try, even though I’ve only covered those that would appear to be the four most known (or used) options. There are others like <a href="http://en.wikipedia.org/wiki/DataparkSearch"><span style="color: #000099;"><span style="text-decoration: underline;">DataparkSearch</span></span></a>, <span style="color: #000099;"><span style="text-decoration: underline;"><a href="http://en.wikipedia.org/wiki/Ht-//Dig">Ht</a><a href="http://en.wikipedia.org/wiki/Ht-//Dig">-//</a><a href="http://en.wikipedia.org/wiki/Ht-//Dig">Dig</a></span></span>, <a href="http://en.wikipedia.org/wiki/MnoGoSearch"><span style="color: #000099;"><span style="text-decoration: underline;">mnoGoSearch</span></span></a>, <a href="http://en.wikipedia.org/wiki/KinoSearch"><span style="color: #000099;"><span style="text-decoration: underline;">KinoSearch</span></span></a>, and a very long etc, which might be the right pick for your needs.</p>
<p>What is really important to remember is that database engines are not the only solution out there for data handling. And also that full text search solutions are not the silver bullet needed to kill the proverbial werewolf that represents our searching problems either.</p>
<p>We need to think about our needs very carefully before choosing a technology or we can end up having that proverbial wolf biting our rear&#8230;</p>
<!-- PHP 5.x -->]]></content:encoded>
			<wfw:commentRss>http://blog.moove-it.com/alternatives-to-full-text-queries-part-ii/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

