博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
使用spark操作ensemble
阅读量:6453 次
发布时间:2019-06-23

本文共 2530 字,大约阅读时间需要 8 分钟。

hot3.png

// Ensemble源代码地址,https://github.com/XXXShao/EnsembleModelingInSpark// 需要打成jar包再导入方可使用
import Ensemble.{Ensembler,EnsembleModel}import org.apache.spark.ml.linalg.{Vector, Vectors}import org.apache.spark.ml.Pipelineimport org.apache.spark.ml.classification.LogisticRegressionimport org.apache.spark.ml.evaluation.BinaryClassificationEvaluatorimport org.apache.spark.ml.feature.{HashingTF, Tokenizer}import org.apache.spark.ml.linalg.Vectorimport org.apache.spark.ml.tuning.{CrossValidator, ParamGridBuilder}import org.apache.spark.sql.{Row, SparkSession}import org.apache.spark.ml.linalg.{Vector, Vectors}import org.apache.spark.ml.feature.Normalizer//import org.apache.log4j.{Level, Logger}//Logger.getLogger("org.apache.spark").setLevel(Level.WARN)// val training = spark.createDataFrame(Seq(//   (1.0, Vectors.dense(0.0, 1.1, 0.1)),//   (0.0, Vectors.dense(2.0, 1.0, -1.0)),//   (0.0, Vectors.dense(2.0, 1.3, 1.0)),//   (1.0, Vectors.dense(0.0, 1.2, -0.5))// )).toDF("label", "features")val training = spark.read.format("libsvm").load("/user/spark/H2O/data/sample_libsvm_data.txt")// Create a LogisticRegression instance. This instance is an Estimator.val lr = new LogisticRegression()// We may set parameters using setter methods.lr.setMaxIter(10).setRegParam(0.01)val lr1 = new LogisticRegression()val lr2 = new LogisticRegression().setMaxIter(10).setRegParam(0.05)//third component is a piplineval normalizer = (new Normalizer().setInputCol("features")  .setOutputCol("normFeatures").setP(1.0))val lr30 = new LogisticRegression().setFeaturesCol("normFeatures")val lr3 = new Pipeline().setStages(Array(normalizer, lr30))//ensemble modelsval ensembling = new Ensembler().setComponents(Array(lr, lr1, lr2, lr3))val model = ensembling.fit(training)val transformers = model.components.map(t => t.transform(training))transformers.map(x => x.show()) //show individual predictions//combine modelsval prediction = model.transform(training)prediction.show()/****加入xgb之后的模型融合val xgb = new XGBoostEstimator(Map("num_class" -> 2, "num_rounds" -> 5, "objective" -> "binary:logistic", "booster" -> "gbtree")).setLabelCol("label").setFeaturesCol("features")/ensemble modelsval ensembling = new Ensembler().setComponents(Array(xgb, lr, lr1, lr2, lr3))val model = ensembling.fit(training)val transformers = model.components.map(t => t.transform(training))transformers.map(x => x.show()) //show individual predictions//combine modelsval prediction = model.transform(training)prediction.show()****/

转载于:https://my.oschina.net/kyo4321/blog/3013079

你可能感兴趣的文章
继承与派生(二)
查看>>
Nagios整合cacti部署详解
查看>>
Windows变慢原因分析
查看>>
Vbs获取两个日期天数间隔
查看>>
c/c++通用内存泄漏检测框架GMFD(General Memory Fault Detection Framework)
查看>>
Citrix小贴纸--PVS差异vDisk.
查看>>
iphone开源网络编程cocoaasyncsocket
查看>>
Ubuntu和OSX之间通过AD验证共享文件夹(详细设置)
查看>>
hadoop cdh4.6.0编译方法
查看>>
参与 Qt 文档翻译项目
查看>>
异构计算:PC的“动车组”
查看>>
20个常用Expression Blend设计开发技巧 (2)
查看>>
如何杀掉带锁的oracle进程
查看>>
Important Log Locations for Grid
查看>>
ThinkPad T400所有驱动下载
查看>>
Python爬虫从入门到放弃(二十二)之 爬虫与反爬虫大战
查看>>
kvm虚拟化学习笔记(二十)之convirt安装linux系统
查看>>
python中关于中文报错的解决办法
查看>>
XSS研究1-来自内部的XSS攻击
查看>>
shell实战:内置(built-in)变量
查看>>