怎么使用Pig分析Hadoop日志
发表于:2025-12-02 作者:千家信息网编辑
千家信息网最后更新 2025年12月02日,这篇文章主要讲解了"怎么使用Pig分析Hadoop日志",文中的讲解内容简单清晰,易于学习与理解,下面请大家跟着小编的思路慢慢深入,一起来研究和学习"怎么使用Pig分析Hadoop日志"吧!目标计算出
千家信息网最后更新 2025年12月02日怎么使用Pig分析Hadoop日志
这篇文章主要讲解了"怎么使用Pig分析Hadoop日志",文中的讲解内容简单清晰,易于学习与理解,下面请大家跟着小编的思路慢慢深入,一起来研究和学习"怎么使用Pig分析Hadoop日志"吧!
目标
计算出每个ip的点击次数,例如 123.24.56.57 13 24.53.23.123 7 34.56.78.120 20 等等……
待分析文件
220.181.108.151 - - [31/Jan/2012:00:02:32 +0800] "GET /home.php?mod=space&uid=158&do=album&view=me&from=space HTTP/1.1" 200 8784 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)"208.115.113.82 - - [31/Jan/2012:00:07:54 +0800] "GET /robots.txt HTTP/1.1" 200 582 "-" "Mozilla/5.0 (compatible; Ezooms/1.0; ezooms.bot@gmail.com)"220.181.94.221 - - [31/Jan/2012:00:09:24 +0800] "GET /home.php?mod=spacecp&ac=pm&op=showmsg&handlekey=showmsg_3&touid=3&pmid=0&daterange=2&pid=398&tid=66 HTTP/1.1" 200 10070 "-" "Sogou web spider/4.0(+http://www.sogou.com/docs/help/webmasters.htm#07)"112.97.24.243 - - [31/Jan/2012:00:14:48 +0800] "GET /data/cache/style_2_common.css?AZH HTTP/1.1" 200 57752 "http://f.dataguru.cn/forum-58-1.html" "Mozilla/5.0 (iPhone; CPU iPhone OS 5_0_1 like Mac OS X) AppleWebKit/534.46 (KHTML, like Gecko) Mobile/9A406"112.97.24.243 - - [31/Jan/2012:00:14:48 +0800] "GET /data/cache/style_2_widthauto.css?AZH HTTP/1.1" 200 1024 "http://f.dataguru.cn/forum-58-1.html" "Mozilla/5.0 (iPhone; CPU iPhone OS 5_0_1 like Mac OS X) AppleWebKit/534.46 (KHTML, like Gecko) Mobile/9A406"112.97.24.243 - - [31/Jan/2012:00:14:48 +0800] "GET /data/cache/style_2_forum_forumdisplay.css?AZH HTTP/1.1" 200 11486 "http://f.dataguru.cn/forum-58-1.html" "Mozilla/5.0 (iPhone; CPU iPhone OS 5_0_1 like Mac OS X) AppleWebKit/534.46 (KHTML, like Gecko) Mobile/9A406"
环境配置
# .bash_profile# Get the aliases and functionsif [ -f ~/.bashrc ]; then . ~/.bashrcfi# User specific environment and startup programsexport ANT_HOME=/home/wukong/usr/apache-ant-1.9.4export HADOOP_HOME=/home/wukong/usr/hadoop-1.2.1export PIG_HOME=/home/wukong/usr/pig-0.13.0export PIG_CLASSPATH=$HADOOP_HOME/confPATH=$PATH:$HOME/bin:$ANT_HOME/bin:$HADOOP_HOME:$HADOOP_HOME/bin:$PIG_HOME/bin:$PIG_CLASSPATHexport PATH
Pig脚本
A = LOAD '/user/wukong/w08/access_log.txt' USING PigStorage(' ') AS (ip);B = GROUP A BY ip;C = FOREACH B GENERATE group AS ip, COUNT(A.ip) AS countip; STORE C INTO '/user/wukong/w08/access_log.out.txt';执行过程
[wukong@bd11 ~]$ pig -x mapreduceWarning: $HADOOP_HOME is deprecated.14/08/28 01:10:51 INFO pig.ExecTypeProvider: Trying ExecType : LOCAL14/08/28 01:10:51 INFO pig.ExecTypeProvider: Trying ExecType : MAPREDUCE14/08/28 01:10:51 INFO pig.ExecTypeProvider: Picked MAPREDUCE as the ExecType2014-08-28 01:10:51,242 [main] INFO org.apache.pig.Main - Apache Pig version 0.13.0 (r1606446) compiled Jun 29 2014, 02:29:342014-08-28 01:10:51,242 [main] INFO org.apache.pig.Main - Logging error messages to: /home/wukong/pig_1409159451241.log2014-08-28 01:10:51,319 [main] INFO org.apache.pig.impl.util.Utils - Default bootup file /home/wukong/.pigbootup not found2014-08-28 01:10:51,698 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://bd11:90002014-08-28 01:10:52,343 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: bd11:9001grunt> lshdfs://bd11:9000/user/wukong/testhdfs://bd11:9000/user/wukong/w05 hdfs://bd11:9000/user/wukong/w06 hdfs://bd11:9000/user/wukong/w07 grunt> mkdir w08grunt> copyFromLocal ./access_log.txt ./w08/grunt> lshdfs://bd11:9000/user/wukong/test hdfs://bd11:9000/user/wukong/w05 hdfs://bd11:9000/user/wukong/w06 hdfs://bd11:9000/user/wukong/w07 hdfs://bd11:9000/user/wukong/w08 grunt> cd w08grunt> lshdfs://bd11:9000/user/wukong/w08/access_log.txt 7118627grunt> A = LOAD '/user/wukong/w08/access_log.txt' USING PigStorage(' ') AS (ip);grunt> B = GROUP A BY ip;grunt> C = FOREACH B GENERATE group AS ip, COUNT(A.ip) AS countip; grunt> STORE C INTO '/user/wukong/w08/out';
执行过程日志
2014-08-28 01:13:56,741 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: GROUP_BY2014-08-28 01:13:56,875 [main] INFO org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, GroupByConstParallelSetter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, PartitionFilterOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter], RULES_DISABLED=[FilterLogicExpressionSimplifier]}2014-08-28 01:13:57,091 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false2014-08-28 01:13:57,121 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.CombinerOptimizer - Choosing to move algebraic foreach to combiner2014-08-28 01:13:57,178 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 12014-08-28 01:13:57,179 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 12014-08-28 01:13:57,432 [main] INFO org.apache.pig.tools.pigstats.mapreduce.MRScriptState - Pig script settings are added to the job2014-08-28 01:13:57,471 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.32014-08-28 01:13:57,479 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Reduce phase detected, estimating # of required reducers.2014-08-28 01:13:57,480 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Using reducer estimator: org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator2014-08-28 01:13:57,492 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=71186272014-08-28 01:13:57,492 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting Parallelism to 12014-08-28 01:13:57,492 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - This job cannot be converted run in-process2014-08-28 01:13:57,492 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - creating jar file Job4751117514743080762.jar2014-08-28 01:14:01,054 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - jar file Job4751117514743080762.jar created2014-08-28 01:14:01,077 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job2014-08-28 01:14:01,095 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Key [pig.schematuple] is false, will not generate code.2014-08-28 01:14:01,095 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Starting process to move generated code to distributed cacche2014-08-28 01:14:01,129 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Setting key [pig.schematuple.classes] with classes to deserialize []2014-08-28 01:14:01,304 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.2014-08-28 01:14:01,805 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete2014-08-28 01:14:02,067 [JobControl] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 12014-08-28 01:14:02,067 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 12014-08-28 01:14:02,109 [JobControl] INFO org.apache.hadoop.util.NativeCodeLoader - Loaded the native-hadoop library2014-08-28 01:14:02,109 [JobControl] WARN org.apache.hadoop.io.compress.snappy.LoadSnappy - Snappy native library not loaded2014-08-28 01:14:02,114 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 12014-08-28 01:14:04,382 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_201408280106_00012014-08-28 01:14:04,382 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases A,B,C2014-08-28 01:14:04,382 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: A[1,4],C[3,4],B[2,4] C: C[3,4],B[2,4] R: C[3,4]2014-08-28 01:14:04,382 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - More information at: http://bd11:50030/jobdetails.jsp?jobid=job_201408280106_00012014-08-28 01:14:18,476 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 50% complete2014-08-28 01:14:18,476 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_201408280106_0001]2014-08-28 01:14:30,058 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_201408280106_0001]2014-08-28 01:14:39,202 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete2014-08-28 01:14:39,210 [main] INFO org.apache.pig.tools.pigstats.mapreduce.SimplePigStats - Script Statistics:HadoopVersion PigVersion UserId StartedAt FinishedAt Features1.2.1 0.13.0 wukong 2014-08-28 01:13:57 2014-08-28 01:14:39 GROUP_BYSuccess!Job Stats (time in seconds):JobId Maps Reduces MaxMapTime MinMapTIme AvgMapTime MedianMapTime MaxReduceTime MinReduceTime AvgReduceTime MedianReducetime Alias Feature Outputsjob_201408280106_0001 1 1 6 6 6 6 11 11 11 11 A,B,C GROUP_BY,COMBINER /user/wukong/w08/access_log.out.txt,Input(s):Successfully read 28134 records (7118993 bytes) from: "/user/wukong/w08/access_log.txt"Output(s):Successfully stored 476 records (8051 bytes) in: "/user/wukong/w08/out"Counters:Total records written : 476Total bytes written : 8051Spillable Memory Manager spill count : 0Total bags proactively spilled: 0Total records proactively spilled: 0Job DAG:job_201408280106_00012014-08-28 01:14:39,227 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!执行结果查看
[wukong@bd11 ~]$ hadoop fs -cat ./w08/out/part-r-00000Warning: $HADOOP_HOME is deprecated.127.0.0.1 21.59.65.67 2112.4.2.19 9112.4.2.51 8060.2.99.33 42省略。。。。。 221.194.180.166 4576
感谢各位的阅读,以上就是"怎么使用Pig分析Hadoop日志"的内容了,经过本文的学习后,相信大家对怎么使用Pig分析Hadoop日志这一问题有了更深刻的体会,具体使用情况还需要大家实践验证。这里是,小编将为大家推送更多相关知识点的文章,欢迎关注!
日志
分析
学习
内容
过程
就是
思路
情况
文件
文章
更多
次数
环境
目标
知识
知识点
篇文章
结果
脚本
跟着
数据库的安全要保护哪些东西
数据库安全各自的含义是什么
生产安全数据库录入
数据库的安全性及管理
数据库安全策略包含哪些
海淀数据库安全审计系统
建立农村房屋安全信息数据库
易用的数据库客户端支持安全管理
连接数据库失败ssl安全错误
数据库的锁怎样保障安全
信托行业互联网 科技
数据库一般怎样收费
路北区媒体网络技术不二之选
数据库和产品经理哪个好些
数据库如何设计一个表
萌域宝可梦服务器如何上架物品
怎样远程进入服务器
尚米网络技术有限公司总经理
卧龙吟手游选服务器
北京手动软件开发推荐
网络安全的试题与答案
大型数据库价值
网络安全编程dvwa靶场简介
蓝蝴蝶校园网搜索不到服务器
网络技术与软件开发试题
网络安全信息 研讨
什么给网络安全带来最大的问题
脏小豆服务器编号是多少
春考网络技术书籍
摩尔庄园所有服务器是互通的吗
北京手动软件开发推荐
南京大学数据库新技术
通过阻塞的方式使服务器停止服务
关于软件开发的公众号
学网络技术和软件技术哪个学校好
滦南监控网络技术人员
网络安全信息化建设调研报告
软件开发电子书下载
三级网络技术书百度云
使用服务器搭建github