Search This Blog

Word Count In Hive


In this post I am going to discuss how to write word count program in Hive.

Assume we have data in our table like below

This is a Hadoop Post
and Hadoop is a big data technology

and we want to generate word count like below

a 2
and 1
Big 1
data 1
Hadoop 2
is 2
Post 1
technology 1
This 1

Now we will learn how to write program for the same.


1.Convert sentence into words

 the data  we have is in sentences,first we have to convert that it into words applying space as delimiter.we have to use split function of hive.

split (sentence ,' ')


2.Convert column into rows

Now we have array of strings like this 
[This,is,a,hadoop,Post] 
but we have to convert it into multiple rows like below

This
is
a
hadoop
Post

I mean we have to convert every line of data into multiple rows ,for this we have function called explode in hive and this is also called table generating function.

SELECT explode(split(sentence, ' ')) AS word FROM texttable

and create above output as intermediate table.

(SELECT explode(split(sentence, ' ')) AS word FROM texttable)tempTable

after second step you should get output like below

a
a
and
Big
data
Hadoop
Hadoop
is
is
Post
technology
This


3.Apply group by


after second step , it is straight forward ,we have to apply group by to count word occurrences.

select word,count(1) as count from
(SELECT explode(split(sentence, ' ')) AS word FROM texttable)tempTable
group by word

50 comments:

  1. thank you sir..till now never think of word count using hive

    ReplyDelete
  2. Wow. This is brilliant. Thanks for your help!

    Maria | Owensboro Drywall Contractors

    ReplyDelete
  3. Good article about hadoop technology You may like Updated content at Hadoop Quiz all about hadoop

    ReplyDelete
  4. Thanks for making this blog so informative. www.assistedonlinefilings.com

    ReplyDelete
  5. This article has definitely given me a lot to think about. I am not sure where I stand on the issue yet, but I am grateful for the author's insights.

    Tampa SEO

    ReplyDelete
  6. Thanks for this information you shared. brick masonry

    ReplyDelete
  7. Glad to check this site, thank you for this great content you shared. renovation plastering

    ReplyDelete
  8. Interesting blog! Thanks for taking the time in sharing this post. Grapevine Masonry Grapevine TX

    ReplyDelete
  9. I mean we have to convert every line of data from Castle Drywall in Winston Salem into multiple rows ,for this we have function called explode in hive and this is also called table generating function.

    ReplyDelete
  10. Thank you for the information you shared.
    driveway resurfacing

    ReplyDelete
  11. Thank you for keeping us here posted with new content. commercial epoxy flooring

    ReplyDelete
  12. I'm interested in seeing how you approach the word count calculation in Hive. Will you be using built-in functions or custom UDFs? Also, how do you plan to handle special characters, punctuation, and stop words?

    ReplyDelete
  13. Thanks for the clarification!

    All the best,
    Frisco Concrete Contractors

    ReplyDelete
  14. Thanks for sharing this. It’s exactly what I needed to read today. epoxy shed floor

    ReplyDelete
  15. This is a great blog and very informative. concrete driveway

    ReplyDelete
  16. Glad to check this site. Great post!

    ReplyDelete
  17. Such a great info you shared. Great work! demolition company

    ReplyDelete
  18. Thank you for keeping us posted. Great share! high performance scrubs

    ReplyDelete
  19. I always follow your site for an informative blogs. pool enclosure

    ReplyDelete
  20. Awesome post! Glad to check this site. fence installation

    ReplyDelete
  21. Thank you for taking the time to share this content here. Towing Service New Orleans

    ReplyDelete
  22. Word Count In Hive provide great sources with amazing ideas and roof repair company is offering the best results to us.

    ReplyDelete