To copy or move: Implications of loading Hive managed table from HDFS versus local filesystem

When using the load function to populate a Hive table, it’s important to understand what Hive does with the actual data files when the input data resides on your local file system or on the HDFS file system.

For example, to load data from your local home directory into a Hive table:

hive> LOAD DATA LOCAL INPATH '/home/username1/weather/input' INTO TABLE weather_data;

You’ll actually see write in the output messages like:

Copying data from file:/home/hduser/weather_data/input
Copying file: file:/home/hduser/weather_data/input/weather.16.csv
Copying file: file:/home/hduser/weather_data/input/weather.86.csv
...
...
Copying file: file:/home/hduser/weather_data/input/weather.52.csv
Copying file: file:/home/hduser/weather_data/input/weather.37.csv
Loading data to table default.weather_data

Under the covers, Hive will actually copy the files found in /home/username1/weather into the HDFS directory associated with the table weather_data (e.g. /user/hive/warehouse/weather_data/). If you want to see what that directory is, run the following hive command:

hive> describe extended weather_data;

Look for the ‘location’ value.

If that data was already on the HDFS file system, however, Hive would employ a move and not a copy.  For example:

hduser@hadoop1:/home/hduser/$ hadoop dfs -ls /user/hduser/weather_data/ | wc -l
101
hive> load data inpath '/user/hduser/weather_data/' into table weather_data;

Now, let’s check the output of dfs -ls | wc -l

hduser@hadoop1:~/weather_data$ hadoop dfs -ls weather_data | wc -l
0

As you can see, the files were physically moved from /user/hduser/weather_data into the location associated with the Hive table.

Advertisements
This entry was posted in hadoop, hive, scripting, Uncategorized and tagged , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s