| subset {SparkR} | R Documentation | 
Return subsets of SparkDataFrame according to given conditions
subset(x, ...) ## S4 method for signature 'SparkDataFrame,numericOrcharacter' x[[i]] ## S4 replacement method for signature 'SparkDataFrame,numericOrcharacter' x[[i]] <- value ## S4 method for signature 'SparkDataFrame' x[i, j, ..., drop = F] ## S4 method for signature 'SparkDataFrame' subset(x, subset, select, drop = F, ...)
| x | a SparkDataFrame. | 
| ... | currently not used. | 
| i, subset | (Optional) a logical expression to filter on rows. For extract operator [[ and replacement operator [[<-, the indexing parameter for a single Column. | 
| value | a Column or an atomic vector in the length of 1 as literal value, or  | 
| j, select | expression for the single Column or a list of columns to select from the SparkDataFrame. | 
| drop | if TRUE, a Column will be returned if the resulting dataset has only one column. Otherwise, a SparkDataFrame will always be returned. | 
A new SparkDataFrame containing only the rows that meet the condition with selected columns.
[[ since 1.4.0
[[<- since 2.1.1
[ since 1.4.0
subset since 1.5.0
Other SparkDataFrame functions: 
SparkDataFrame-class,
agg(),
alias(),
arrange(),
as.data.frame(),
attach,SparkDataFrame-method,
broadcast(),
cache(),
checkpoint(),
coalesce(),
collect(),
colnames(),
coltypes(),
createOrReplaceTempView(),
crossJoin(),
cube(),
dapplyCollect(),
dapply(),
describe(),
dim(),
distinct(),
dropDuplicates(),
dropna(),
drop(),
dtypes(),
exceptAll(),
except(),
explain(),
filter(),
first(),
gapplyCollect(),
gapply(),
getNumPartitions(),
group_by(),
head(),
hint(),
histogram(),
insertInto(),
intersectAll(),
intersect(),
isLocal(),
isStreaming(),
join(),
limit(),
localCheckpoint(),
merge(),
mutate(),
ncol(),
nrow(),
persist(),
printSchema(),
randomSplit(),
rbind(),
rename(),
repartitionByRange(),
repartition(),
rollup(),
sample(),
saveAsTable(),
schema(),
selectExpr(),
select(),
showDF(),
show(),
storageLevel(),
str(),
summary(),
take(),
toJSON(),
unionAll(),
unionByName(),
union(),
unpersist(),
withColumn(),
withWatermark(),
with(),
write.df(),
write.jdbc(),
write.json(),
write.orc(),
write.parquet(),
write.stream(),
write.text()
Other subsetting functions: 
filter(),
select()
## Not run: 
##D   # Columns can be selected using [[ and [
##D   df[[2]] == df[["age"]]
##D   df[,2] == df[,"age"]
##D   df[,c("name", "age")]
##D   # Or to filter rows
##D   df[df$age > 20,]
##D   # SparkDataFrame can be subset on both rows and Columns
##D   df[df$name == "Smith", c(1,2)]
##D   df[df$age %in% c(19, 30), 1:2]
##D   subset(df, df$age %in% c(19, 30), 1:2)
##D   subset(df, df$age %in% c(19), select = c(1,2))
##D   subset(df, select = c(1,2))
##D   # Columns can be selected and set
##D   df[["age"]] <- 23
##D   df[[1]] <- df$age
##D   df[[2]] <- NULL # drop column
## End(Not run)