MCB.guru

Category: Coalface

Continuent Replication to Hadoop – Now in Stereo!

Hopefully by now you have already seen that we are working on Hadoop replication. I’m happy to say that it is going really well. I’ve managed to push a few terabytes of data and different data sets through into Hadoop on Cloudera, HortonWorks, and Amazon’s Elastic MapReduce (EMR). For those who have been following my…

March 31, 2014
Parallel Extractor for Provisioning

Coming up as a new feature in Tungsten Replicator (and written by our replicator expert Stephane Giron) is the ability to provision a new database by using data from an existing database. This new feature comes in the form of a tool called the Parallel Extractor. The principles are very simple. On the master side: Start…

March 16, 2014
Using the Continuent Docs

As hopefully has been noticed, the Continuent documentation is achieving a pretty good critical mass. The content of the documentation is always the most important consideration. Secondary is making sure that the information in the documentation can be found, and that when reading, you can hover and click to get relevant information so that you…

March 12, 2014
Intelligent Linking and Indexing in DocBook

One of the issues I have with DocBook XML is that the links are a little forced and manual. By that, I mean that if I have a command, like trepctl, and I used it in a sentence or description, if I want to link trepctl back to the corresponding trepctl page, I have to…

March 10, 2014
Customizing Chunking in DocBook

I love DocBook XML. No, really. But one thing I hate is the way you have to set a global chunking level for your HTML and then live with it. For most documentation, you want to be able to choose whether a conveniently addressable section within a chapter, and then you want to combine it…

March 5, 2014
MySQL to Hadoop Step-By-Step

We had a great webinar on Thursday about replicating from MySQL to Hadoop (watch the whole thing). It was great, but one of the questions at the end was ‘is there an easy way to test’. Sadly we can’t go giving out convenient ready-to-run downloads of these things because of licensing and and other complexities,…

March 1, 2014
Real-Time Replication from MySQL to Cassandra

Earlier this month I blogged about our new Hadoop applier, I published the docs for that this week (http://docs.continuent.com/tungsten-replicator-3.0/deployment-hadoop.html) as part of the Tungsten Replicator 3.0 documentation (http://docs.continuent.com/tungsten-replicator-3.0/index.html). It contains some additional interesting nuggets that will appear in future blog posts. The main part of that functionality that performs the actual applier for Hadoop is…

February 27, 2014
5 Steps to Start Data Mining

I have a great guest blog on the basics of data mining over at SciTech Connect: 5 Steps to Start Data Mining – SciTech Connect | SciTech Connect.

February 27, 2014
Getting Data into Hadoop in real-time

Moving data between databases is hard. Without ever intending it, I seem to have spent a lifetime working on solutions for getting data into and out of databases, but more frequently between. In fact, my first job out of university was migrating data from BRS/Text, a free-text database (probably what we would call a NoSQL)…

February 12, 2014
Moving to MapReduce 2 with YARN

Moving to MapReduce 2 with YARN I have a guest blog post over on the Safari Books Online blog looking at the changes in the YARN system for managing Hadoop jobs.

February 12, 2014