We have three types of functions in hive ,first one is single row function they operate on single row at a time.
second one is multi row function they can operate on multiple rows at a time and third is table generating function they generate multiple rows out of a single row
Hive has good number of built in functions in these categories ,you can check all of them using
show functions;
If you want to understand one particular function like concatYou can use
describe function concat
It displays small help page for concat function.
However sometimes you may also need to write your own function if you do not find any suitable function for you.
These custom functions can be of three types
1.Single row function (UDF =User Defined Function)
2.Multi row function (UDAF=User Defined Aggregate Function)
3.Table generation function (UDTF =User Defined Table generating Function)
In this, we learn how to develop UDF in hive.
Assume we have a table emp with data like below.
eno,ename,sal,dno
10,Balu,100000,15
20,Bala,200000,25
30,Sai,200000,35
40,Nirupam,300000,15
In this we develop a custom function which prepends Hi to employee name.
Below are steps for the same.
1.write a UDF by extending UDF class using Eclipse
To develop UDF ,we should extend UDF class of hive-exec.jar and override evaluate method of it.
public class HiPrepender extends UDF {
public Text evaluate(Text column){
if(column!=null&&column.getLength()>0){
return new Text("Hi "+column.toString());
}
return null;
}
}
for this you need to have 3 jar files on classpath
hadoop-core*.jar
hive-exec*.jar
apache-commons*.jar
2.Create a jar file for above program
File---->export---->jar file----->specify file path for jar--->next--->do not select main class---->finish
assume you created a jar file named hiprepender.jar
3.Transfer jar file to unix box using filzilla/winscp,if you are not on the same .
if you are on other operating system like windows ,you have to transfer it to machine from where you are running hive queries.
assume you have transferred your jar file to /root directory.
4.From Hive prompt ,add jar file to your class path
hive > add jar /root/hiprepender.jar
5. Create a temporary function
create temporary function prependhi as 'HiPrepender';
Here HiPrepender is the classname we wrote in the first step.
6. Use the custom function;
select hiprepend(ename) from emp;
you will get output like below
Hi Balu
Hi Bala
Hi Sai
Hi Nirupam
In coming articles we learn UDAF and UDTF.
second one is multi row function they can operate on multiple rows at a time and third is table generating function they generate multiple rows out of a single row
Hive has good number of built in functions in these categories ,you can check all of them using
show functions;
If you want to understand one particular function like concatYou can use
describe function concat
It displays small help page for concat function.
However sometimes you may also need to write your own function if you do not find any suitable function for you.
These custom functions can be of three types
1.Single row function (UDF =User Defined Function)
2.Multi row function (UDAF=User Defined Aggregate Function)
3.Table generation function (UDTF =User Defined Table generating Function)
In this, we learn how to develop UDF in hive.
Assume we have a table emp with data like below.
eno,ename,sal,dno
10,Balu,100000,15
20,Bala,200000,25
30,Sai,200000,35
40,Nirupam,300000,15
In this we develop a custom function which prepends Hi to employee name.
Below are steps for the same.
1.write a UDF by extending UDF class using Eclipse
To develop UDF ,we should extend UDF class of hive-exec.jar and override evaluate method of it.
public class HiPrepender extends UDF {
public Text evaluate(Text column){
if(column!=null&&column.getLength()>0){
return new Text("Hi "+column.toString());
}
return null;
}
}
for this you need to have 3 jar files on classpath
hadoop-core*.jar
hive-exec*.jar
apache-commons*.jar
2.Create a jar file for above program
File---->export---->jar file----->specify file path for jar--->next--->do not select main class---->finish
assume you created a jar file named hiprepender.jar
3.Transfer jar file to unix box using filzilla/winscp,if you are not on the same .
if you are on other operating system like windows ,you have to transfer it to machine from where you are running hive queries.
assume you have transferred your jar file to /root directory.
4.From Hive prompt ,add jar file to your class path
hive > add jar /root/hiprepender.jar
5. Create a temporary function
create temporary function prependhi as 'HiPrepender';
Here HiPrepender is the classname we wrote in the first step.
6. Use the custom function;
select hiprepend(ename) from emp;
you will get output like below
Hi Balu
Hi Bala
Hi Sai
Hi Nirupam
In coming articles we learn UDAF and UDTF.
No comments:
Post a Comment