MatrixFactorizationModel¶
- 
class pyspark.mllib.recommendation.MatrixFactorizationModel(java_model: py4j.java_gateway.JavaObject)[source]¶
- A matrix factorisation model trained by regularized alternating least-squares. - New in version 0.9.0. - Examples - >>> r1 = (1, 1, 1.0) >>> r2 = (1, 2, 2.0) >>> r3 = (2, 1, 2.0) >>> ratings = sc.parallelize([r1, r2, r3]) >>> model = ALS.trainImplicit(ratings, 1, seed=10) >>> model.predict(2, 2) 0.4... - >>> testset = sc.parallelize([(1, 2), (1, 1)]) >>> model = ALS.train(ratings, 2, seed=0) >>> model.predictAll(testset).collect() [Rating(user=1, product=1, rating=1.0...), Rating(user=1, product=2, rating=1.9...)] - >>> model = ALS.train(ratings, 4, seed=10) >>> model.userFeatures().collect() [(1, array('d', [...])), (2, array('d', [...]))] - >>> model.recommendUsers(1, 2) [Rating(user=2, product=1, rating=1.9...), Rating(user=1, product=1, rating=1.0...)] >>> model.recommendProducts(1, 2) [Rating(user=1, product=2, rating=1.9...), Rating(user=1, product=1, rating=1.0...)] >>> model.rank 4 - >>> first_user = model.userFeatures().take(1)[0] >>> latents = first_user[1] >>> len(latents) 4 - >>> model.productFeatures().collect() [(1, array('d', [...])), (2, array('d', [...]))] - >>> first_product = model.productFeatures().take(1)[0] >>> latents = first_product[1] >>> len(latents) 4 - >>> products_for_users = model.recommendProductsForUsers(1).collect() >>> len(products_for_users) 2 >>> products_for_users[0] (1, (Rating(user=1, product=2, rating=...),)) - >>> users_for_products = model.recommendUsersForProducts(1).collect() >>> len(users_for_products) 2 >>> users_for_products[0] (1, (Rating(user=2, product=1, rating=...),)) - >>> model = ALS.train(ratings, 1, nonnegative=True, seed=123456789) >>> model.predict(2, 2) 3.73... - >>> df = sqlContext.createDataFrame([Rating(1, 1, 1.0), Rating(1, 2, 2.0), Rating(2, 1, 2.0)]) >>> model = ALS.train(df, 1, nonnegative=True, seed=123456789) >>> model.predict(2, 2) 3.73... - >>> model = ALS.trainImplicit(ratings, 1, nonnegative=True, seed=123456789) >>> model.predict(2, 2) 0.4... - >>> import os, tempfile >>> path = tempfile.mkdtemp() >>> model.save(sc, path) >>> sameModel = MatrixFactorizationModel.load(sc, path) >>> sameModel.predict(2, 2) 0.4... >>> sameModel.predictAll(testset).collect() [Rating(... >>> from shutil import rmtree >>> try: ... rmtree(path) ... except OSError: ... pass - Methods - call(name, *a)- Call method of java_model - load(sc, path)- Load a model from the given path - predict(user, product)- Predicts rating for the given user and product. - predictAll(user_product)- Returns a list of predicted ratings for input user and product pairs. - Returns a paired RDD, where the first element is the product and the second is an array of features corresponding to that product. - recommendProducts(user, num)- Recommends the top “num” number of products for a given user and returns a list of Rating objects sorted by the predicted rating in descending order. - Recommends the top “num” number of products for all users. - recommendUsers(product, num)- Recommends the top “num” number of users for a given product and returns a list of Rating objects sorted by the predicted rating in descending order. - Recommends the top “num” number of users for all products. - save(sc, path)- Save this model to the given path. - Returns a paired RDD, where the first element is the user and the second is an array of features corresponding to that user. - Attributes - Rank for the features in this model - Methods Documentation - 
call(name: str, *a: Any) → Any¶
- Call method of java_model 
 - 
classmethod load(sc: pyspark.context.SparkContext, path: str) → pyspark.mllib.recommendation.MatrixFactorizationModel[source]¶
- Load a model from the given path - New in version 1.3.1. 
 - 
predict(user: int, product: int) → float[source]¶
- Predicts rating for the given user and product. - New in version 0.9.0. 
 - 
predictAll(user_product: pyspark.rdd.RDD[Tuple[int, int]]) → pyspark.rdd.RDD[pyspark.mllib.recommendation.Rating][source]¶
- Returns a list of predicted ratings for input user and product pairs. - New in version 0.9.0. 
 - 
productFeatures() → pyspark.rdd.RDD[Tuple[int, array.array]][source]¶
- Returns a paired RDD, where the first element is the product and the second is an array of features corresponding to that product. - New in version 1.2.0. 
 - 
recommendProducts(user: int, num: int) → List[pyspark.mllib.recommendation.Rating][source]¶
- Recommends the top “num” number of products for a given user and returns a list of Rating objects sorted by the predicted rating in descending order. - New in version 1.4.0. 
 - 
recommendProductsForUsers(num: int) → pyspark.rdd.RDD[Tuple[int, Tuple[pyspark.mllib.recommendation.Rating, …]]][source]¶
- Recommends the top “num” number of products for all users. The number of recommendations returned per user may be less than “num”. 
 - 
recommendUsers(product: int, num: int) → List[pyspark.mllib.recommendation.Rating][source]¶
- Recommends the top “num” number of users for a given product and returns a list of Rating objects sorted by the predicted rating in descending order. - New in version 1.4.0. 
 - 
recommendUsersForProducts(num: int) → pyspark.rdd.RDD[Tuple[int, Tuple[pyspark.mllib.recommendation.Rating, …]]][source]¶
- Recommends the top “num” number of users for all products. The number of recommendations returned per product may be less than “num”. 
 - 
save(sc: pyspark.context.SparkContext, path: str) → None¶
- Save this model to the given path. - New in version 1.3.0. 
 - 
userFeatures() → pyspark.rdd.RDD[Tuple[int, array.array]][source]¶
- Returns a paired RDD, where the first element is the user and the second is an array of features corresponding to that user. - New in version 1.2.0. 
 - Attributes Documentation - 
rank¶
- Rank for the features in this model - New in version 1.4.0. 
 
-