如何进行Spark中MLlib的本质分析
发表于:2025-12-01 作者:千家信息网编辑
千家信息网最后更新 2025年12月01日,如何进行Spark中MLlib的本质分析,相信很多没有经验的人对此束手无策,为此本文总结了问题出现的原因和解决方法,通过这篇文章希望你能解决这个问题。org.apache.spark.ml(http:
千家信息网最后更新 2025年12月01日如何进行Spark中MLlib的本质分析
如何进行Spark中MLlib的本质分析,相信很多没有经验的人对此束手无策,为此本文总结了问题出现的原因和解决方法,通过这篇文章希望你能解决这个问题。
org.apache.spark.ml(http://spark.apache.org/docs/latest/ml-guide.html )
org.apache.spark.ml.attributeorg.apache.spark.ml.classificationorg.apache.spark.ml.clusteringorg.apache.spark.ml.evaluationorg.apache.spark.ml.featureorg.apache.spark.ml.paramorg.apache.spark.ml.recommendationorg.apache.spark.ml.regressionorg.apache.spark.ml.source.libsvmorg.apache.spark.ml.treeorg.apache.spark.ml.tuningorg.apache.spark.ml.util
org.apache.spark.mllib (http://spark.apache.org/docs/latest/mllib-guide.html )
org.apache.spark.mllib.classificationorg.apache.spark.mllib.clusteringorg.apache.spark.mllib.evaluationorg.apache.spark.mllib.featureorg.apache.spark.mllib.fpmorg.apache.spark.mllib.linalgorg.apache.spark.mllib.linalg.distributedorg.apache.spark.mllib.pmmlorg.apache.spark.mllib.randomorg.apache.spark.mllib.rddorg.apache.spark.mllib.recommendationorg.apache.spark.mllib.regressionorg.apache.spark.mllib.statorg.apache.spark.mllib.stat.distributedorg.apache.spark.mllib.stat.testorg.apache.spark.mllib.treeorg.apache.spark.mllib.tree.configurationorg.apache.spark.mllib.tree.impurityorg.apache.spark.mllib.tree.lossorg.apache.spark.mllib.tree.modelorg.apache.spark.mllib.util
ML概念
DataFrame: Spark ML uses DataFrame from Spark SQL as an ML dataset, which can hold a variety of data types. E.g., a DataFrame could have different columns storing text, feature vectors, true labels, and predictions.Transformer: A Transformer is an algorithm which can transform one DataFrame into another DataFrame. E.g., an ML model is a Transformer which transforms DataFrame with features into a DataFrame with predictions.Estimator: An Estimator is an algorithm which can be fit on a DataFrame to produce a Transformer. E.g., a learning algorithm is an Estimator which trains on a DataFrame and produces a model.Pipeline: A Pipeline chains multiple Transformers and Estimators together to specify an ML workflow.Parameter: All Transformers and Estimators now share a common API for specifying parameters.
ML分类和回归
Classification Logistic regression Decision tree classifier Random forest classifier Gradient-boosted tree classifier Multilayer perceptron classifier One-vs-Rest classifier (a.k.a. One-vs-All)Regression Linear regression Decision tree regression Random forest regression Gradient-boosted tree regression Survival regressionDecision treesTree Ensembles Random Forests Gradient-Boosted Trees (GBTs)
ML聚类
K-meansLatent Dirichlet allocation (LDA)
MLlib 数据类型
Local vectorLabeled pointLocal matrixDistributed matrix RowMatrix IndexedRowMatrix CoordinateMatrix BlockMatrix
MLlib 分类和回归
Binary Classification: linear SVMs, logistic regression, decision trees, random forests, gradient-boosted trees, naive BayesMulticlass Classification:logistic regression, decision trees, random forests, naive BayesRegression:linear least squares, Lasso, ridge regression, decision trees, random forests, gradient-boosted trees, isotonic regression
MLlib 聚类
K-meansGaussian mixturePower iteration clustering (PIC,多用于图像识别)Latent Dirichlet allocation (LDA,多用于主题分类)Bisecting k-meansStreaming k-means
MLlib Models
DecisionTreeModelDistributedLDAModelGaussianMixtureModelGradientBoostedTreesModelIsotonicRegressionModelKMeansModelLassoModelLDAModelLinearRegressionModelLocalLDAModelLogisticRegressionModelMatrixFactorizationModelNaiveBayesModelPowerIterationClusteringModelRandomForestModelRidgeRegressionModelStreamingKMeansModelSVMModelWord2VecModel
Example
import org.apache.spark.ml.classification.LogisticRegression import org.apache.spark.ml.param.ParamMap import org.apache.spark.mllib.linalg.{Vector, Vectors} import org.apache.spark.sql.Row val training = sqlContext.createDataFrame(Seq( (1.0, Vectors.dense(0.0, 1.1, 0.1)), (0.0, Vectors.dense(2.0, 1.0, -1.0)), (0.0, Vectors.dense(2.0, 1.3, 1.0)), (1.0, Vectors.dense(0.0, 1.2, -0.5)) )) .toDF("label", "features") val lr = new LogisticRegression()println("LogisticRegression parameters:\n" + lr.explainParams() + "\n") lr.setMaxIter(10).setRegParam(0.01) val model1 = lr.fit(training) println("Model 1 was fit using parameters: " + model1.parent.extractParamMap) val paramMap = ParamMap(lr.maxIter -> 20) .put(lr.maxIter, 30) .put(lr.regParam -> 0.1, lr.threshold -> 0.55)val paramMap2 = ParamMap(lr.probabilityCol -> "myProbability") val paramMapCombined = paramMap ++ paramMap2val model2 = lr.fit(training, paramMapCombined)println("Model 2 was fit using parameters: " + model2.parent.extractParamMap)test = sqlContext.createDataFrame(Seq( (1.0, Vectors.dense(-1.0, 1.5, 1.3)), (0.0, Vectors.dense(3.0, 2.0, -0.1)), (1.0, Vectors.dense(0.0, 2.2, -1.5)) )) .toDF("label", "features")model2.transform(test) .select("features", "label", "myProbability", "prediction") .collect() .foreach { case Row(features: Vector, label: Double, prob: Vector, prediction: Double) => println(s"($features, $label) -> prob=$prob, prediction=$prediction") }看完上述内容,你们掌握如何进行Spark中MLlib的本质分析的方法了吗?如果还想学到更多技能或想了解更多相关内容,欢迎关注行业资讯频道,感谢各位的阅读!
分类
本质
分析
内容
方法
更多
问题
束手无策
为此
主题
原因
图像
对此
技能
数据
概念
篇文章
类型
经验
行业
数据库的安全要保护哪些东西
数据库安全各自的含义是什么
生产安全数据库录入
数据库的安全性及管理
数据库安全策略包含哪些
海淀数据库安全审计系统
建立农村房屋安全信息数据库
易用的数据库客户端支持安全管理
连接数据库失败ssl安全错误
数据库的锁怎样保障安全
应用服务器的一般配置
佛山锐易网络技术公司
常州智慧医院软件开发
网络安全面临的威胁
咸宁市跑腿软件开发app
网易服务器哪个比较好
软件开发流程培训资料
软件开发工作质量考察
jcg网络安全密钥
信息类软件开发商的营业原则
软件开发文档小论文
华为云服务器什么时候出现
sql服务器连接客户端配置
深圳公安约谈企业网络安全
思科的思科网络技术学院
盐城学软件开发
网络安全发文
广西服务器机箱上哪找
asp同时两个数据库
湖南网络技术学院哪个专业好
2021互联网科技公司排名
北纬科技互联网
软件开发企业的财务
sql2008怎么建系统数据库
excel上传数据库
山东财务软件开发
数据库服务命名
基础设施网络技术
达梦数据库查看集群
数据库 用户管理表