Distributed Systems

TAG ARCHIVE

3 articles | Updated Oct 17, 2014

October 17, 2014

Consistent Hash Ring

Consistent hashing is a special kind of hashing such that when a hash table is resized and consistent hashing is used, only K/n keys need to be remapped on average, where K is the number of keys, and n is the number of slots. In contrast, in most traditional hash tables, a change in the number of array slots causes nearly all keys to be remapped. Consistent hashing achieves the same goals as Rendezvous hashing (also called HRW Hashing). The two techniques use different algorithms and were devised independently and contemporaneously.

8 min read · Network Read more

August 8, 2014

Distributed Systems Hadoop

Getting Started with Hadoop 2.0

Apache™ Hadoop® is an open source software project that enables the distributed processing of large data sets across clusters of commodity servers. It is designed to scale up from a single server to thousands of machines, with a very high degree of fault tolerance. Rather than relying on high-end hardware, the resiliency of these clusters comes from the software’s ability to detect and handle failures at the application layer. Hadoop 1 popularized MapReduce programming for batch jobs and demonstrated the potential value of large scale, distributed processing. MapReduce, as implemented in Hadoop 1, can be I/O intensive, not suitable for interactive analysis, and constrained in support for graph, machine learning and on other memory intensive algorithms. Hadoop developers rewrote major components of the file system to produce Hadoop 2. To get started with the new version, it helps to understand the major differences between Hadoop 1 and 2.

16 min read · DevOps Read more

June 14, 2013

Hadoop Distributed Systems

Running Hadoop 1.1.2 on Ubuntu Linux (Single-Node Cluster)

In this tutorial I will describe the required steps for setting up a pseudo-distributed, single-node Hadoop cluster backed by the Hadoop Distributed File System, running on Ubuntu Linux. Note: This walkthrough targets Hadoop 1.1.2 (2013) on Ubuntu 13.04, both long past end-of-life. Hadoop 1.x predates YARN, so the JobTracker/TaskTracker daemons, the conf/ layout, and config keys such as fs.default.name and mapred.job.tracker no longer apply to current releases (3.x uses etc/hadoop/, fs.defaultFS, and different web-UI ports). It is kept here as a historical reference; if you are setting up Hadoop today, follow the official Single Node Cluster guide instead.

17 min read · DevOps Read more