Apache Flink是一个分布式流批一体化的开源平台。Flink的核心是一个提供数据分发、通信以及自动容错的流计算引擎。Flink在流计算之上构建批
处理,并且原生的支持迭代计算,内存管理以及程序优化。
In this guide we will start from scratch and go from setting up a Flink project to runninga streaming analysis program on a Flink cluster.
wikipedia provides an IRC channel where all edits to the wiki are logged.we are going toread this channel in Flink and count the number of bytes that each user edits withina given window of time.This is easy enough to implement in a few minutes using Flink,but it willgive you a good foundation from which to start building more complex analysis programs on your own.
We are going to use a Flink Maven Archetype for creating our project structure. Pleasesee java API Quickstart for more detailsabout this.For our purposes,the command to run is this:
You can edit the grouprd ,artifactId and package if you like.With the above parameters,Maven will create a project structure that looks like this:
There is our pom.xml file that already has the Flink dependencies added in the root directory andseveral example Flink programs in src/main/java.we can delete the example programs,sincewe are going to start from scratch:
As a last step we need to add the Flink wikipedia connector as a dependency so that we canuse it in our program. Edit the dependencies section of the pom.xml so that it looks like this :
Notice the flink-connector-wikiedits2.11 dependency that was added.(This example andthe wikipedia connector were inspired by the _Hello Samza example of Apache Samza.)
It's coding time.Fire up your favorite IDE and import the Maven project or open a text editor andcreate the file src/main/java/wikiedits/wikipediaAnalysis.java:
The program is very basic now,but we will fill it in as we go. Note that I'll not giveimport statements here since IDEs can add them automatically.At the end of this section I'll showthe complete code with import statements if you simply want to skip ahead and enter that in youreditor.
The first step in a Flink program is to Create a streamExecutionEnvironment (or ExecutionEnvironment if you are writing a batch job). This can be used to set executionparameters and create sources for reading from external systems.So let's go ahead and addthis to the main method:
【下载地址】
百度网盘链接:https://pan.baidu.com/s/1mKn5gwx3eSDLoagsbrNG5g
提取码:r5vt
相关文章
Apache Flink是一个分布式流批一体化的开源平台。Flink的核心是一个提供数据分发、通信以及自动容错的流计算引擎。Flink在流计
FlashFXP绿色版网盘下载,附激活教程 1839
FlashFxp百度网盘下载链接:https://pan.baidu.com/s/1MBQ5gkZY1TCFY8A7fnZCfQ。FlashFxp是功能强大的FTP工具
Adobe Fireworks CS6 Ansifa绿色精简版网盘下载 1607
firework可以制作精美或是可以闪瞎眼的gif,这在广告领域是需要常用的,还有firework制作下logo,一些原创的图片还是很便捷的,而且fireworks用法简单,配合dw在做网站这一块往往会发挥出很强大的效果。百度网盘下载链接:https://pan.baidu.com/s/1fzIZszfy8VX6VzQBM_bdZQ
navicat for mysql中文绿色版网盘下载 1652
Navicat for Mysql是用于Mysql数据库管理的一款图形化管理软件,非常的便捷和好用,可以方便的增删改查数据库、数据表、字段、支持mysql命令,视图等等。百度网盘下载链接:https://pan.baidu.com/s/1T_tlgxzdQLtDr9TzptoWQw 提取码:y2yq
火车头采集器(旗舰版)绿色版网盘下载 1737
火车头采集器是站长常用的工具,相比于八爪鱼,简洁好用,易于配置。火车头能够轻松的抓取网页内容,并通过自带的工具对内容进行处理。站长圈想要做网站,火车头采集器是必不可少的。百度网盘链接:https://pan.baidu.com/s/1u8wUqS901HgOmucMBBOvEA
Photoshop(CS-2015-2023)绿色中文版软件下载 1858
安装文件清单(共46G)包含Window和Mac OS各个版本的安装包,从cs到cc,从绿色版到破解版,从安装文件激活工具,应有尽有,一次性打包。 Photoshop CC绿色精简版 Photoshop CS6 Mac版 Photoshop CC 2015 32位 Photoshop CC 2015 64位 Photoshop CC 2015 MAC版 Photoshop CC 2017 64位 Adobe Photoshop CC 2018 Adobe_Photoshop_CC_2018 Photoshop CC 2018 Win32 Photoshop CC 2018 Win64