MCB.guru

Real-Time Replication from MySQL to Cassandra

Earlier this month I blogged about our new Hadoop applier, I published the docs for that this week (http://docs.continuent.com/tungsten-replicator-3.0/deployment-hadoop.html) as part of the Tungsten Replicator 3.0 documentation (http://docs.continuent.com/tungsten-replicator-3.0/index.html). It contains some additional interesting nuggets that will appear in future blog posts. The main part of that functionality that performs the actual applier for Hadoop is…

February 27, 2014
5 Steps to Start Data Mining

I have a great guest blog on the basics of data mining over at SciTech Connect: 5 Steps to Start Data Mining – SciTech Connect | SciTech Connect.

February 27, 2014
Getting Data into Hadoop in real-time

Moving data between databases is hard. Without ever intending it, I seem to have spent a lifetime working on solutions for getting data into and out of databases, but more frequently between. In fact, my first job out of university was migrating data from BRS/Text, a free-text database (probably what we would call a NoSQL)…

February 12, 2014
Moving to MapReduce 2 with YARN

Moving to MapReduce 2 with YARN I have a guest blog post over on the Safari Books Online blog looking at the changes in the YARN system for managing Hadoop jobs.

February 12, 2014
Anonymizing Data During Replication

If you happen to work with personal data, chances are you are subject to SOX (Sarbanes-Oxley) whether you like it or not. One of the worst aspects of this is that if you want to be able to analyse your data and you replicate out to another host, you have to find a way of…

February 10, 2014
Process complex text for information mining

My latest article on data mining text information is now available: Text — an everyday component of nearly all social interaction, social networks, and social sites — is difficult to process. Even the basic task of picking out specific words, phrases, or ideas is challenging. String searches and regex tools don\’t suffice. But the Annotation…

February 5, 2014
Building flexible apps from big data sources

My article on how to build flexible apps on top of the BigInsights platform has been published. This demonstrates a cool way to combine some client-end JavaScript and existing technologies to build a Big Data query interface without developing a specialised application for the purpose. It’s no secret that a significant proportion of the needs…

December 28, 2013
Process big data with Big SQL in InfoSphere BigInsights

The ability to write an SQL statement against your Big Data stored in Hadoop provides some much needed flexibility. Sure, using Hive or HBase you can perform some of those operations, but there are other alternatives that may suit your needs better, such as the Big SQL utility. My latest article on this tool is…

December 24, 2013
SQL to Hadoop and back again, Part 3: Direct transfer and live data exchange

The third, and final article in my series on migrating data to and from Hadoop and SQL databases is now available: Big data is a term that has been used regularly now for almost a decade, and it — along with technologies like NoSQL — are seen as the replacements for the long-successful RDBMS solutions…

December 23, 2013
SQL to Hadoop and back again, Part 2: Leveraging HBase and Hive

The second article in a series covering Big Data and SQL interaction is available now: “Big data” is a term that has been used regularly now for almost a decade, and it — along with technologies like NoSQL — are seen as the replacements for the long-successful RDBMS solutions that use SQL. Today, DB2®, Oracle,…

October 9, 2013