Mini Project Report
On
“Analyzing Web log data using Hadoop”
Submitted in partial fulfillment of the
Requirements for the award of the degree of
Bachelor of Technology
In
Computer Science & Engineering
By
G.VEERA REDDY (14R21A05J9)
Under the guidance of
Mr. K. SAI PRASAD,
Assistant Professor, Department of CSE.
Department of Computer Science & Engineering
2017
Department of Computer Science & Engineering
CERTIFICATE
This is to certify that the project entitled “Analyzing web log data using Hadoop” has been submitted by G Veera Reddy (14R21A05J9) in the partial fulfilment of the requirements for the award of degree of Bachelor of Technology in Computer Science and Engineering from Jawaharlal Nehru Technological University, Hyderabad. The results embodied in this project have not been submitted to any other University or Institution for the award of any degree.
Internal Guide Head of the Department
External Examiner
DECLARATION
I hereby declare that the project entitled “Analyzing web log data using hadoop” is the work done during the period from June 2017 to December 2017 and is submitted in the partial fulfillment of the requirements for the award of degree of Bachelor of technology in Computer Science and Engineering from Jawaharlal Nehru Technology University, Hyderabad. The results embodied in this project have not been submitted to any other University or Institution for the award of any degree or diploma.
G Veera Reddy
(14R21A05J9)
ACKNOWLEDGEMENT
There are many people who helped me directly and indirectly to complete my project successfully. I would like to take this opportunity to thank one and all.
First of all I would like to express my deep gratitude towards my internal guide K SAI PRASAD – Assistant Professor, Department of CSE for his/her support in the completion of my dissertation. I wish to express my sincere thanks to N. CHANDRA SEKHAR REDDY, HOD, Dept. of CSE and also to principal Dr. P BHASKARA REDDY for providing the facilities to complete the dissertation.
I would like to thank all our faculty, coordinators and friends for their help and constructive criticism during the project period. Finally, I am very much indebted to our parents for their moral support and encouragement to achieve goals.
G Veera Reddy
(14R21A05J9)
Abstract:
In the contemporary world, data analysis is a challenge in the era of varied inter-disciplines, though there is a specialization in the respective disciplines. In other words, effective data analytics helps in analyzing the data of any business system. But it is the big data which helps and accelerates the process of analysis of data paving way for a success of any business intelligence system. With the expansion of the industry, the data of the industry also expands. Then, it is increasingly difficult to handle huge amount of data that gets generated no matter what’s the business is like, range of fields from social media to finance, flight data, environment and health.
Big Data can be used to assess risk in the insurance industry and to track reactions to products in real time. Big Data is also used to monitor things as diverse as wave movements, flight data, traffic data, financial transactions, health and crime. The challenge of Big Data is how to use it to create something that is value to the user. How can it be gathered, stored, processed and analyzed it to turn the raw data information to support decision making. In this project Big Data is depicted in a form of case study for analyzing web log data using hadoop.
Log files are often very large and can have complex structure. Although the process of generating log files is quite simple and straightforward. This often leads to a common situation, when log files are continuously generated and occupy valuable space on storage devices, but nobody uses them and utilizes enclosed information. This can analyze different kinds of log files such as- Email logs, Web logs, Firewall logs Server logs, Call data logs.
CONTENTS
Certification i
Declaration ii
Acknowledgement iii
Abstract iv
1. Introduction
1.1 Bigdata-3V’s 8
1.2 Ecosystem 9
1.3 Applications of bigdata 10
1.4 Hue 11
2. Literature Survey 2.1 Existing System 12
2.2 Proposed System 13
3. Requirement Analysis 14
3.1 Hardware Requirements
3.2 Software Requirements
4. Implementation 15
4.1 Problem Definition
4.2 System Architecture
5.Methodology
5.1 HDFS 16
5.2 Map Reduce 19
5.3 Hue 21
6. Screen Shots 22
7.Limitations and Future Enhancement 26
8.Conclusion 28
Bibliography