Author Archives: resilvajr

Using R to improve your fantasy football team

So I’ve started playing around with R and this week decided to see if I could more intelligently add a player to my team from the ranks of free agency.  The position I needed to fill?  The kicker. The first … Continue reading

Posted in Uncategorized | 1 Comment

“Error occurred while loading translation library” when connecting R to IBM Netezza

When connecting my R-2.15 client to IBM Netezza v7 (NZA 2.5.4) for the first time, I got the error above.  Here was the connect call: >nzConnectDSN(“NZSQL”) Error in odbcDriverConnect(“DSN=VirtualNZ”) :   (converted from warning) [RODBC] ERROR: state HY000, code 45, … Continue reading

Posted in Uncategorized | Leave a comment

Securing (and sharing) password information in Sqoop jobs

Sqoop is a utility that allows you to move data from a relational database system to an HDFS file system (or export from Hadoop to RDBMS!).  One of the things to keep in mind as you start building Sqoop jobs … Continue reading

Posted in General, hadoop, scripting, sqoop | Tagged , , | Leave a comment

Pig workflow optimization: splitting data flows

Pig supports the concept of non-linear data flows, where you have a single input but multiple outputs.  Pig’s optimizer is smart enough to recognize when the same input is referenced multiple times and implicitly splits those data flows.  You can … Continue reading

Posted in Uncategorized | Leave a comment

To copy or move: Implications of loading Hive managed table from HDFS versus local filesystem

When using the load function to populate a Hive table, it’s important to understand what Hive does with the actual data files when the input data resides on your local file system or on the HDFS file system. For example, … Continue reading

Posted in hadoop, hive, scripting, Uncategorized | Tagged , , | Leave a comment

Hive’s collection data types

Hive offers several collection data types: struct, map and array. These data types don’t necessarily make a lot of sense if you are moving the data from the well-structured world of the RDBMS but if you are working directly with … Continue reading

Posted in hadoop, hive, scripting | Tagged , | Leave a comment

Passing parameters to Hive scripts

Like Pig and other scripting languages, Hive provides you with the ability to create parameterized scripts – greatly increasing the re-usability of the scripts.  To take advantage, write your Hive scripts like this: select yearid, sum(HR) from   batting_stats where  teamid … Continue reading

Posted in hadoop, hive, scripting | Tagged , , | Leave a comment