Configuring pig to work with a remote Hadoop cluster

1. First, download a stable release of Pig from here.

2. As root (or some other privileged user), untar the pig tarball to /usr/local; this will create a sub-directory like /usr/local/pig.0.11.1.

3. Create a symbolic link (to make things easier)

ln -s /usr/local/pig.0.11.0 /usr/local/pig

4. Update your .bashrc or .profile to include:

export PIG_HOME=/usr/local/pig
export PATH=$PATH:$PIG_HOME/bin

5.  Contact your Hadoop administrator (or get it yourself if you have access) and create a tarball containing the necessary client files:

cd $HADOOP_HOME
tar -czvf client.tar.z core-site.xml hadoop-env.sh hdfs-site.xml log4j.properties mapred-site.xml ssl-client.xml.example

6.  Now, create a new directory (either in your home-directory or some place else if others are going to need to access this:

mkdir hadoop.conf
cd hadoop.conf
tar -zxvf ../client.tar.z .

7.  Now update your .profile or .bashrc to include this line:

export HADOOP_CONF_DIR=$HOME/hadoop.conf

8.  If it isn’t already, export JAVA_HOME in your .profile or .bashrc:

export JAVA_HOME=/usr/local/jdk1.7.0_17

9.  Run pig in interactive mode (but mapreduce execution):

pig -x mapreduce

10.  Test it all out with an actual pig script.  Copy and paste the following into wordcount.pig:

documents = LOAD '/user/hduser/foo_input/*.txt' as line;
words = foreach documents generate flatten(TOKENIZE(line)) as word;
grpd = group words by word;
cntd = foreach grpd generate group, COUNT(words);
dump cntd;Change the directory to something on your HDFS that actually has a bunch of text documents.

Run it:

pig -x mapreduce wordcount.pig

If everything is setup correctly, you’ll get a listing of words encountered and the number of times they were encountered.

Advertisements
This entry was posted in hadoop, pig, scripting, Uncategorized and tagged , , . Bookmark the permalink.

One Response to Configuring pig to work with a remote Hadoop cluster

  1. Fredrik says:

    Thanks! 🙂

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s