1. First, download a stable release of Pig from here.
2. As root (or some other privileged user), untar the pig tarball to /usr/local; this will create a sub-directory like /usr/local/pig.0.11.1.
3. Create a symbolic link (to make things easier)
ln -s /usr/local/pig.0.11.0 /usr/local/pig
4. Update your .bashrc or .profile to include:
export PIG_HOME=/usr/local/pig export PATH=$PATH:$PIG_HOME/bin
5. Contact your Hadoop administrator (or get it yourself if you have access) and create a tarball containing the necessary client files:
cd $HADOOP_HOME tar -czvf client.tar.z core-site.xml hadoop-env.sh hdfs-site.xml log4j.properties mapred-site.xml ssl-client.xml.example
6. Now, create a new directory (either in your home-directory or some place else if others are going to need to access this:
mkdir hadoop.conf cd hadoop.conf tar -zxvf ../client.tar.z .
7. Now update your .profile or .bashrc to include this line:
8. If it isn’t already, export JAVA_HOME in your .profile or .bashrc:
9. Run pig in interactive mode (but mapreduce execution):
pig -x mapreduce
10. Test it all out with an actual pig script. Copy and paste the following into wordcount.pig:
documents = LOAD '/user/hduser/foo_input/*.txt' as line; words = foreach documents generate flatten(TOKENIZE(line)) as word; grpd = group words by word; cntd = foreach grpd generate group, COUNT(words); dump cntd;Change the directory to something on your HDFS that actually has a bunch of text documents.
pig -x mapreduce wordcount.pig
If everything is setup correctly, you’ll get a listing of words encountered and the number of times they were encountered.