Earlier I have discussed about writing
reusable scripts using Apache Hive, now we see how to achieve same
functionality using Pig Latin.
Pig Latin has an option called param, using
this we can write dynamic scripts .
Assume ,we have a file called numbers with
below data.
12
23
34
12
56
34
57
12
If we want to list numbers equal to 12
,then we write pig latin code like below.
Numbers = load
‘/data/numbers’ as (number:int);
specificNumber = filter
numbers by number==12;
Dump specificNumber;
Usually we write above code in a file .let
us assume we have written it in a file called numbers.pig
And we write code from file using
Pig –f
/path/to/numbers.pig
Later if we want to see only numbers equals
to 34, then we change second line to
specificNumber = filter
numbers by number==34;
and we re-run the code using same command.
But Its not a good practice to touch the
code in production ,so we can make this script dynamic by using –param option
of Piglatin.
Whatever values we want to decide at the
time of running we make them dynamic .now we want to decide number to be
filtered at the time running job,we can write second line like below.
specificNumber = filter numbers by number==$dynanumber
and we run code like below.
Pig –param
dynanumber=12 –f numbers.pig
Assume we even want to take path at the
time of running script, now we write code like below
Numbers = load ‘$path’
as (number:int);
specificNumber = filter
numbers by number==’$ dynanumber';
Dump specificNumber;
And run like below
Pig –param
path=/data/path –param dynanumber =34 –f numbers.pig
If you feel this code is missing
readability, we can specify all these dynamic values in a file like below
##Dyna.params (file name)
Path = /data/numbers
dynanumber = 34
Then you can run script with param-file
option like below.
Pig –param-file
dyna.params –f numbers.pig
This type of feature is not available in
apache hive.
So what are the benefits we gain using this
feature.
1.
We can avoid hard coding in pig
scripts
2.
Of course, we make scripts more
reusable and dynamic.
3.
We can have better productivity
using reusable scripts.
Happy
Hadooping friends.
No comments:
Post a Comment