To practise Hadoop you can use below ways to generate the big data (GB),So that you can get the real feel/power of the Hadoop.
1.clearbits.net
From clearbits.net, you can get quarterly full data set of stack exchange so that you can use it while you are practising the hadoop . it contains around 10 GB data.
2.grouplens.org
grouplens.org collected different rating data sets ,you can use it for practicing the hadoop.If you have Hadoop installed on your machine,you can use the following two ways to generate data.
3.hadoop jar /usr/lib/hadoop-0.20-mapreduce/hadoop-examples.jar randomwriter /random-data
generates 10 GB data per node under folder /random-data in HDFS.
4.hadoop jar /usr/lib/hadoop-0.20-mapreduce/hadoop-examples.jar randomtextwriter /random-text-data
generates 10 GB textual data per node under folder /random-text-data in HDFS.path of hadoop-examples.jar may change as per your hadoop installation.
5. Amazon provides so many data sets ,you can use them.
6. Check answers of the same question on stackoverflow
7.From University of Waikato ,many data sets available for practicing machine learning.
8.See answers for the similar question on Quora.
If you know any free data sets ,please share in comments
Thats a very interesting point on data sets. Thanks a lot.
ReplyDeleteHope this may be useful for you.
ReplyDelete1. Airline Dataset Project
http://www.stat.purdue.edu/~sguha/rhipe/doc/html/airline.html
2. GB's of data on Airlines
https://github.com/0xdata/h2o/wiki/Hacking-Airline-DataSet-with-H2O
3. SFO - Airline Sample data
http://www.flysfo.com/web/page/about/news/pressres/airtrafficdata.html
4. Data Storage Online.
http://datahub.io/en/
5. Lots of Gov Data
http://data.gov.uk/
6.
http://blog.gopivotal.com/news-2/20-examples-of-getting-results-with-big-data
7. US Weather Data - 1990 to 2013 complete data.
http://ftp3.ncdc.noaa.gov/pub/data/noaa/
Thank you Shrikanth for sharing the links,really thats a great job.
ReplyDeleteThank you much more giving the Great Post. I appreciate a good job and Keep it up.
ReplyDeletePrimavera Course in Chennai
Primavera Coaching in Chennai
Power BI Training in Chennai
Excel Training in Chennai
Oracle Training in Chennai
Tableau Training in Chennai
Pega Training in Chennai
Graphic Design Courses in Chennai
Placement Training in Chennai
Soft Skills Training in Chennai
Aivivu chuyên vé máy bay, tham khảo
ReplyDeletevé máy bay đi Mỹ giá rẻ
lịch bay từ california về việt nam
khi nào có chuyến bay từ canada về việt nam
Lịch bay từ Hàn Quốc về Việt Nam hôm nay
The content is very useful for me and very interesting. Surely, this post is very valuable for all readers. Primavera P6 Certification Training in Chennai | Primavera Training in India
ReplyDeleteNice post. Thanks for sharing! I want people to know just how good this information is in your article.
ReplyDeleteIt’s interesting content and Great work. nice to read.
Reactjs Training in Chennai |
Best Reactjs Training Institute in Chennai |
Reactjs course in Chennai
thanks to share valuable info, but most of the links not working.
ReplyDeleteI recommend kaggle datasets you will get many datasets.
Thanks & Regards
Venu
spark training in Hyderabad
wonderful article contains lot of valuable information. Very interesting to read this article.I would like to thank you for the efforts you had made for writing this awesome article.
ReplyDeleteBest PTE institute in ambala,
Best IELTS Institute in Ambala
Best IELTS Coaching in Ambala
This comment has been removed by the author.
ReplyDeleteValuable info. Very interesting to read. I learned a lot. Keep sharing! goo.gl/maps/Eo8Cy38WMJYAt6PZ6
ReplyDeleteGood article about hadoop technology You may like Updated content at Hadoop Quiz all about hadoop
ReplyDelete