Could not infer the matching function for org.apache.pig.builtin.SUM (or any function for that matter)

Pig – the language – may be like Pig – the animal – when it comes to ingesting data (not very picky), but syntax certainly does matter.  I learned this tonight while experimenting with Pig.  My script was pretty simple:

1. Load some data

2. Filter that data

3. Group that data

4. Aggregate that data

5. Sort that data

6. Limit the data to the top 10

7. Dump the data

What I didn’t realize is that the variable that populates one variable is very important – when it comes to aggregation and inferring the appropriate data type.  For example, this will fail with the error found in the post heading:

mydata = load '/user/hduser/foo.dat' using PigStorage(',') as (person:chararray,val:int);
mydata_filtered = filter mydata by val > 1;
group_mydata = group mydata_filtered by ( person );
sumval = foreach group_mydata generate group, SUM(mydata.val);
dump sumval;

The reason is that the variable that was used to populate group_mydata is mydata_filtered and not mydata.  This was a pretty sloppy mistake that I made when adding additional logic to filter the data AFTER the original script was written.  So if you get this error:

Could not infer the matching function for org.apache.pig.builtin.SUM

Confirm you are aggregating the column from the appropriate Pig relation and not an earlier created one (though it may be the parent!).

Advertisements
This entry was posted in hadoop, pig, scripting. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s