http://git-wip-us.apache.org/repos/asf/spark-website/blob/52917ac4/site/docs/2.4.0/api/R/column_collection_functions.html ---------------------------------------------------------------------- diff --git a/site/docs/2.4.0/api/R/column_collection_functions.html b/site/docs/2.4.0/api/R/column_collection_functions.html new file mode 100644 index 0000000..01e336e --- /dev/null +++ b/site/docs/2.4.0/api/R/column_collection_functions.html @@ -0,0 +1,503 @@ +<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><title>R: Collection functions for Column operations</title> +<meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> +<link rel="stylesheet" type="text/css" href="R.css" /> + +<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css"> +<script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js"></script> +<script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js"></script> +<script>hljs.initHighlightingOnLoad();</script> +</head><body> + +<table width="100%" summary="page for column_collection_functions {SparkR}"><tr><td>column_collection_functions {SparkR}</td><td style="text-align: right;">R Documentation</td></tr></table> + +<h2>Collection functions for Column operations</h2> + +<h3>Description</h3> + +<p>Collection functions defined for <code>Column</code>. +</p> + + +<h3>Usage</h3> + +<pre> +array_contains(x, value) + +array_distinct(x) + +array_except(x, y) + +array_intersect(x, y) + +array_join(x, delimiter, ...) + +array_max(x) + +array_min(x) + +array_position(x, value) + +array_remove(x, value) + +array_repeat(x, count) + +array_sort(x) + +arrays_overlap(x, y) + +array_union(x, y) + +arrays_zip(x, ...) + +concat(x, ...) + +element_at(x, extraction) + +explode(x) + +explode_outer(x) + +flatten(x) + +from_json(x, schema, ...) + +map_from_arrays(x, y) + +map_keys(x) + +map_values(x) + +posexplode(x) + +posexplode_outer(x) + +reverse(x) + +shuffle(x) + +size(x) + +slice(x, start, length) + +sort_array(x, asc = TRUE) + +to_json(x, ...) + +## S4 method for signature 'Column' +reverse(x) + +## S4 method for signature 'Column' +to_json(x, ...) + +## S4 method for signature 'Column' +concat(x, ...) + +## S4 method for signature 'Column,characterOrstructType' +from_json(x, schema, + as.json.array = FALSE, ...) + +## S4 method for signature 'Column' +array_contains(x, value) + +## S4 method for signature 'Column' +array_distinct(x) + +## S4 method for signature 'Column,Column' +array_except(x, y) + +## S4 method for signature 'Column,Column' +array_intersect(x, y) + +## S4 method for signature 'Column,character' +array_join(x, delimiter, + nullReplacement = NULL) + +## S4 method for signature 'Column' +array_max(x) + +## S4 method for signature 'Column' +array_min(x) + +## S4 method for signature 'Column' +array_position(x, value) + +## S4 method for signature 'Column' +array_remove(x, value) + +## S4 method for signature 'Column,numericOrColumn' +array_repeat(x, count) + +## S4 method for signature 'Column' +array_sort(x) + +## S4 method for signature 'Column,Column' +arrays_overlap(x, y) + +## S4 method for signature 'Column,Column' +array_union(x, y) + +## S4 method for signature 'Column' +arrays_zip(x, ...) + +## S4 method for signature 'Column' +shuffle(x) + +## S4 method for signature 'Column' +flatten(x) + +## S4 method for signature 'Column,Column' +map_from_arrays(x, y) + +## S4 method for signature 'Column' +map_keys(x) + +## S4 method for signature 'Column' +map_values(x) + +## S4 method for signature 'Column' +element_at(x, extraction) + +## S4 method for signature 'Column' +explode(x) + +## S4 method for signature 'Column' +size(x) + +## S4 method for signature 'Column' +slice(x, start, length) + +## S4 method for signature 'Column' +sort_array(x, asc = TRUE) + +## S4 method for signature 'Column' +posexplode(x) + +## S4 method for signature 'Column' +explode_outer(x) + +## S4 method for signature 'Column' +posexplode_outer(x) +</pre> + + +<h3>Arguments</h3> + +<table summary="R argblock"> +<tr valign="top"><td><code>x</code></td> +<td> +<p>Column to compute on. Note the difference in the following methods: +</p> + +<ul> +<li> <p><code>to_json</code>: it is the column containing the struct, array of the structs, +the map or array of maps. +</p> +</li> +<li> <p><code>from_json</code>: it is the column containing the JSON string. +</p> +</li></ul> +</td></tr> +<tr valign="top"><td><code>value</code></td> +<td> +<p>A value to compute on. +</p> + +<ul> +<li> <p><code>array_contains</code>: a value to be checked if contained in the column. +</p> +</li> +<li> <p><code>array_position</code>: a value to locate in the given array. +</p> +</li> +<li> <p><code>array_remove</code>: a value to remove in the given array. +</p> +</li></ul> +</td></tr> +<tr valign="top"><td><code>y</code></td> +<td> +<p>Column to compute on.</p> +</td></tr> +<tr valign="top"><td><code>delimiter</code></td> +<td> +<p>a character string that is used to concatenate the elements of column.</p> +</td></tr> +<tr valign="top"><td><code>...</code></td> +<td> +<p>additional argument(s). In <code>to_json</code> and <code>from_json</code>, this contains +additional named properties to control how it is converted, accepts the same +options as the JSON data source. In <code>arrays_zip</code>, this contains additional +Columns of arrays to be merged.</p> +</td></tr> +<tr valign="top"><td><code>count</code></td> +<td> +<p>a Column or constant determining the number of repetitions.</p> +</td></tr> +<tr valign="top"><td><code>extraction</code></td> +<td> +<p>index to check for in array or key to check for in map</p> +</td></tr> +<tr valign="top"><td><code>schema</code></td> +<td> +<p>a structType object to use as the schema to use when parsing the JSON string. +Since Spark 2.3, the DDL-formatted string is also supported for the schema.</p> +</td></tr> +<tr valign="top"><td><code>start</code></td> +<td> +<p>an index indicating the first element occurring in the result.</p> +</td></tr> +<tr valign="top"><td><code>length</code></td> +<td> +<p>a number of consecutive elements chosen to the result.</p> +</td></tr> +<tr valign="top"><td><code>asc</code></td> +<td> +<p>a logical flag indicating the sorting order. +TRUE, sorting is in ascending order. +FALSE, sorting is in descending order.</p> +</td></tr> +<tr valign="top"><td><code>as.json.array</code></td> +<td> +<p>indicating if input string is JSON array of objects or a single object.</p> +</td></tr> +<tr valign="top"><td><code>nullReplacement</code></td> +<td> +<p>an optional character string that is used to replace the Null values.</p> +</td></tr> +</table> + + +<h3>Details</h3> + +<p><code>reverse</code>: Returns a reversed string or an array with reverse order of elements. +</p> +<p><code>to_json</code>: Converts a column containing a <code>structType</code>, a <code>mapType</code> +or an <code>arrayType</code> into a Column of JSON string. +Resolving the Column can fail if an unsupported type is encountered. +</p> +<p><code>concat</code>: Concatenates multiple input columns together into a single column. +The function works with strings, binary and compatible array columns. +</p> +<p><code>from_json</code>: Parses a column containing a JSON string into a Column of <code>structType</code> +with the specified <code>schema</code> or array of <code>structType</code> if <code>as.json.array</code> is set +to <code>TRUE</code>. If the string is unparseable, the Column will contain the value NA. +</p> +<p><code>array_contains</code>: Returns null if the array is null, true if the array contains +the value, and false otherwise. +</p> +<p><code>array_distinct</code>: Removes duplicate values from the array. +</p> +<p><code>array_except</code>: Returns an array of the elements in the first array but not in the second +array, without duplicates. The order of elements in the result is not determined. +</p> +<p><code>array_intersect</code>: Returns an array of the elements in the intersection of the given two +arrays, without duplicates. +</p> +<p><code>array_join</code>: Concatenates the elements of column using the delimiter. +Null values are replaced with nullReplacement if set, otherwise they are ignored. +</p> +<p><code>array_max</code>: Returns the maximum value of the array. +</p> +<p><code>array_min</code>: Returns the minimum value of the array. +</p> +<p><code>array_position</code>: Locates the position of the first occurrence of the given value +in the given array. Returns NA if either of the arguments are NA. +Note: The position is not zero based, but 1 based index. Returns 0 if the given +value could not be found in the array. +</p> +<p><code>array_remove</code>: Removes all elements that equal to element from the given array. +</p> +<p><code>array_repeat</code>: Creates an array containing <code>x</code> repeated the number of times +given by <code>count</code>. +</p> +<p><code>array_sort</code>: Sorts the input array in ascending order. The elements of the input array +must be orderable. NA elements will be placed at the end of the returned array. +</p> +<p><code>arrays_overlap</code>: Returns true if the input arrays have at least one non-null element in +common. If not and both arrays are non-empty and any of them contains a null, it returns null. +It returns false otherwise. +</p> +<p><code>array_union</code>: Returns an array of the elements in the union of the given two arrays, +without duplicates. +</p> +<p><code>arrays_zip</code>: Returns a merged array of structs in which the N-th struct contains all N-th +values of input arrays. +</p> +<p><code>shuffle</code>: Returns a random permutation of the given array. +</p> +<p><code>flatten</code>: Creates a single array from an array of arrays. +If a structure of nested arrays is deeper than two levels, only one level of nesting is removed. +</p> +<p><code>map_from_arrays</code>: Creates a new map column. The array in the first column is used for +keys. The array in the second column is used for values. All elements in the array for key +should not be null. +</p> +<p><code>map_keys</code>: Returns an unordered array containing the keys of the map. +</p> +<p><code>map_values</code>: Returns an unordered array containing the values of the map. +</p> +<p><code>element_at</code>: Returns element of array at given index in <code>extraction</code> if +<code>x</code> is array. Returns value for the given key in <code>extraction</code> if <code>x</code> is map. +Note: The position is not zero based, but 1 based index. +</p> +<p><code>explode</code>: Creates a new row for each element in the given array or map column. +</p> +<p><code>size</code>: Returns length of array or map. +</p> +<p><code>slice</code>: Returns an array containing all the elements in x from the index start +(or starting from the end if start is negative) with the specified length. +</p> +<p><code>sort_array</code>: Sorts the input array in ascending or descending order according to +the natural ordering of the array elements. NA elements will be placed at the beginning of +the returned array in ascending order or at the end of the returned array in descending order. +</p> +<p><code>posexplode</code>: Creates a new row for each element with position in the given array +or map column. +</p> +<p><code>explode</code>: Creates a new row for each element in the given array or map column. +Unlike <code>explode</code>, if the array/map is <code>null</code> or empty +then <code>null</code> is produced. +</p> +<p><code>posexplode_outer</code>: Creates a new row for each element with position in the given +array or map column. Unlike <code>posexplode</code>, if the array/map is <code>null</code> or empty +then the row (<code>null</code>, <code>null</code>) is produced. +</p> + + +<h3>Note</h3> + +<p>reverse since 1.5.0 +</p> +<p>to_json since 2.2.0 +</p> +<p>concat since 1.5.0 +</p> +<p>from_json since 2.2.0 +</p> +<p>array_contains since 1.6.0 +</p> +<p>array_distinct since 2.4.0 +</p> +<p>array_except since 2.4.0 +</p> +<p>array_intersect since 2.4.0 +</p> +<p>array_join since 2.4.0 +</p> +<p>array_max since 2.4.0 +</p> +<p>array_min since 2.4.0 +</p> +<p>array_position since 2.4.0 +</p> +<p>array_remove since 2.4.0 +</p> +<p>array_repeat since 2.4.0 +</p> +<p>array_sort since 2.4.0 +</p> +<p>arrays_overlap since 2.4.0 +</p> +<p>array_union since 2.4.0 +</p> +<p>arrays_zip since 2.4.0 +</p> +<p>shuffle since 2.4.0 +</p> +<p>flatten since 2.4.0 +</p> +<p>map_from_arrays since 2.4.0 +</p> +<p>map_keys since 2.3.0 +</p> +<p>map_values since 2.3.0 +</p> +<p>element_at since 2.4.0 +</p> +<p>explode since 1.5.0 +</p> +<p>size since 1.5.0 +</p> +<p>slice since 2.4.0 +</p> +<p>sort_array since 1.6.0 +</p> +<p>posexplode since 2.1.0 +</p> +<p>explode_outer since 2.3.0 +</p> +<p>posexplode_outer since 2.3.0 +</p> + + +<h3>Examples</h3> + +<pre><code class="r">## Not run: +##D # Dataframe used throughout this doc +##D df <- createDataFrame(cbind(model = rownames(mtcars), mtcars)) +##D tmp <- mutate(df, v1 = create_array(df$mpg, df$cyl, df$hp)) +##D head(select(tmp, array_contains(tmp$v1, 21), size(tmp$v1), shuffle(tmp$v1))) +##D head(select(tmp, array_max(tmp$v1), array_min(tmp$v1), array_distinct(tmp$v1))) +##D head(select(tmp, array_position(tmp$v1, 21), array_repeat(df$mpg, 3), array_sort(tmp$v1))) +##D head(select(tmp, flatten(tmp$v1), reverse(tmp$v1), array_remove(tmp$v1, 21))) +##D tmp2 <- mutate(tmp, v2 = explode(tmp$v1)) +##D head(tmp2) +##D head(select(tmp, posexplode(tmp$v1))) +##D head(select(tmp, slice(tmp$v1, 2L, 2L))) +##D head(select(tmp, sort_array(tmp$v1))) +##D head(select(tmp, sort_array(tmp$v1, asc = FALSE))) +##D tmp3 <- mutate(df, v3 = create_map(df$model, df$cyl)) +##D head(select(tmp3, map_keys(tmp3$v3), map_values(tmp3$v3))) +##D head(select(tmp3, element_at(tmp3$v3, "Valiant"))) +##D tmp4 <- mutate(df, v4 = create_array(df$mpg, df$cyl), v5 = create_array(df$cyl, df$hp)) +##D head(select(tmp4, concat(tmp4$v4, tmp4$v5), arrays_overlap(tmp4$v4, tmp4$v5))) +##D head(select(tmp4, array_except(tmp4$v4, tmp4$v5), array_intersect(tmp4$v4, tmp4$v5))) +##D head(select(tmp4, array_union(tmp4$v4, tmp4$v5))) +##D head(select(tmp4, arrays_zip(tmp4$v4, tmp4$v5), map_from_arrays(tmp4$v4, tmp4$v5))) +##D head(select(tmp, concat(df$mpg, df$cyl, df$hp))) +##D tmp5 <- mutate(df, v6 = create_array(df$model, df$model)) +##D head(select(tmp5, array_join(tmp5$v6, "#"), array_join(tmp5$v6, "#", "NULL"))) +## End(Not run) + +## Not run: +##D # Converts a struct into a JSON object +##D df2 <- sql("SELECT named_struct('date', cast('2000-01-01' as date)) as d") +##D select(df2, to_json(df2$d, dateFormat = 'dd/MM/yyyy')) +##D +##D # Converts an array of structs into a JSON array +##D df2 <- sql("SELECT array(named_struct('name', 'Bob'), named_struct('name', 'Alice')) as people") +##D df2 <- mutate(df2, people_json = to_json(df2$people)) +##D +##D # Converts a map into a JSON object +##D df2 <- sql("SELECT map('name', 'Bob')) as people") +##D df2 <- mutate(df2, people_json = to_json(df2$people)) +##D +##D # Converts an array of maps into a JSON array +##D df2 <- sql("SELECT array(map('name', 'Bob'), map('name', 'Alice')) as people") +##D df2 <- mutate(df2, people_json = to_json(df2$people)) +## End(Not run) + +## Not run: +##D df2 <- sql("SELECT named_struct('date', cast('2000-01-01' as date)) as d") +##D df2 <- mutate(df2, d2 = to_json(df2$d, dateFormat = 'dd/MM/yyyy')) +##D schema <- structType(structField("date", "string")) +##D head(select(df2, from_json(df2$d2, schema, dateFormat = 'dd/MM/yyyy'))) +##D df2 <- sql("SELECT named_struct('name', 'Bob') as people") +##D df2 <- mutate(df2, people_json = to_json(df2$people)) +##D schema <- structType(structField("name", "string")) +##D head(select(df2, from_json(df2$people_json, schema))) +##D head(select(df2, from_json(df2$people_json, "name STRING"))) +## End(Not run) + +## Not run: +##D df2 <- createDataFrame(data.frame( +##D id = c(1, 2, 3), text = c("a,b,c", NA, "d,e") +##D )) +##D +##D head(select(df2, df2$id, explode_outer(split_string(df2$text, ",")))) +##D head(select(df2, df2$id, posexplode_outer(split_string(df2$text, ",")))) +## End(Not run) +</code></pre> + + +<hr /><div style="text-align: center;">[Package <em>SparkR</em> version 2.4.0 <a href="00Index.html">Index</a>]</div> +</body></html>
http://git-wip-us.apache.org/repos/asf/spark-website/blob/52917ac4/site/docs/2.4.0/api/R/column_datetime_diff_functions.html ---------------------------------------------------------------------- diff --git a/site/docs/2.4.0/api/R/column_datetime_diff_functions.html b/site/docs/2.4.0/api/R/column_datetime_diff_functions.html new file mode 100644 index 0000000..6dd888b --- /dev/null +++ b/site/docs/2.4.0/api/R/column_datetime_diff_functions.html @@ -0,0 +1,217 @@ +<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><title>R: Date time arithmetic functions for Column operations</title> +<meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> +<link rel="stylesheet" type="text/css" href="R.css" /> + +<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css"> +<script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js"></script> +<script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js"></script> +<script>hljs.initHighlightingOnLoad();</script> +</head><body> + +<table width="100%" summary="page for column_datetime_diff_functions {SparkR}"><tr><td>column_datetime_diff_functions {SparkR}</td><td style="text-align: right;">R Documentation</td></tr></table> + +<h2>Date time arithmetic functions for Column operations</h2> + +<h3>Description</h3> + +<p>Date time arithmetic functions defined for <code>Column</code>. +</p> + + +<h3>Usage</h3> + +<pre> +add_months(y, x) + +datediff(y, x) + +date_add(y, x) + +date_format(y, x) + +date_sub(y, x) + +from_utc_timestamp(y, x) + +months_between(y, x) + +next_day(y, x) + +to_utc_timestamp(y, x) + +## S4 method for signature 'Column' +datediff(y, x) + +## S4 method for signature 'Column' +months_between(y, x) + +## S4 method for signature 'Column,character' +date_format(y, x) + +## S4 method for signature 'Column,character' +from_utc_timestamp(y, x) + +## S4 method for signature 'Column,character' +next_day(y, x) + +## S4 method for signature 'Column,character' +to_utc_timestamp(y, x) + +## S4 method for signature 'Column,numeric' +add_months(y, x) + +## S4 method for signature 'Column,numeric' +date_add(y, x) + +## S4 method for signature 'Column,numeric' +date_sub(y, x) +</pre> + + +<h3>Arguments</h3> + +<table summary="R argblock"> +<tr valign="top"><td><code>y</code></td> +<td> +<p>Column to compute on.</p> +</td></tr> +<tr valign="top"><td><code>x</code></td> +<td> +<p>For class <code>Column</code>, it is the column used to perform arithmetic operations +with column <code>y</code>. For class <code>numeric</code>, it is the number of months or +days to be added to or subtracted from <code>y</code>. For class <code>character</code>, it is +</p> + +<ul> +<li> <p><code>date_format</code>: date format specification. +</p> +</li> +<li> <p><code>from_utc_timestamp</code>, <code>to_utc_timestamp</code>: time zone to use. +</p> +</li> +<li> <p><code>next_day</code>: day of the week string. +</p> +</li></ul> +</td></tr> +</table> + + +<h3>Details</h3> + +<p><code>datediff</code>: Returns the number of days from <code>y</code> to <code>x</code>. +If <code>y</code> is later than <code>x</code> then the result is positive. +</p> +<p><code>months_between</code>: Returns number of months between dates <code>y</code> and <code>x</code>. +If <code>y</code> is later than <code>x</code>, then the result is positive. If <code>y</code> and <code>x</code> +are on the same day of month, or both are the last day of month, time of day will be ignored. +Otherwise, the difference is calculated based on 31 days per month, and rounded to 8 digits. +</p> +<p><code>date_format</code>: Converts a date/timestamp/string to a value of string in the format +specified by the date format given by the second argument. A pattern could be for instance +<code>dd.MM.yyyy</code> and could return a string like '18.03.1993'. All +pattern letters of <code>java.text.SimpleDateFormat</code> can be used. +Note: Use when ever possible specialized functions like <code>year</code>. These benefit from a +specialized implementation. +</p> +<p><code>from_utc_timestamp</code>: This is a common function for databases supporting TIMESTAMP WITHOUT +TIMEZONE. This function takes a timestamp which is timezone-agnostic, and interprets it as a +timestamp in UTC, and renders that timestamp as a timestamp in the given time zone. +However, timestamp in Spark represents number of microseconds from the Unix epoch, which is not +timezone-agnostic. So in Spark this function just shift the timestamp value from UTC timezone to +the given timezone. +This function may return confusing result if the input is a string with timezone, e.g. +(<code>2018-03-13T06:18:23+00:00</code>). The reason is that, Spark firstly cast the string to +timestamp according to the timezone in the string, and finally display the result by converting +the timestamp to string according to the session local timezone. +</p> +<p><code>next_day</code>: Given a date column, returns the first date which is later than the value of +the date column that is on the specified day of the week. For example, +<code>next_day("2015-07-27", "Sunday")</code> returns 2015-08-02 because that is the first Sunday +after 2015-07-27. Day of the week parameter is case insensitive, and accepts first three or +two characters: "Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun". +</p> +<p><code>to_utc_timestamp</code>: This is a common function for databases supporting TIMESTAMP WITHOUT +TIMEZONE. This function takes a timestamp which is timezone-agnostic, and interprets it as a +timestamp in the given timezone, and renders that timestamp as a timestamp in UTC. +However, timestamp in Spark represents number of microseconds from the Unix epoch, which is not +timezone-agnostic. So in Spark this function just shift the timestamp value from the given +timezone to UTC timezone. +This function may return confusing result if the input is a string with timezone, e.g. +(<code>2018-03-13T06:18:23+00:00</code>). The reason is that, Spark firstly cast the string to +timestamp according to the timezone in the string, and finally display the result by converting +the timestamp to string according to the session local timezone. +</p> +<p><code>add_months</code>: Returns the date that is numMonths (<code>x</code>) after startDate (<code>y</code>). +</p> +<p><code>date_add</code>: Returns the date that is <code>x</code> days after. +</p> +<p><code>date_sub</code>: Returns the date that is <code>x</code> days before. +</p> + + +<h3>Note</h3> + +<p>datediff since 1.5.0 +</p> +<p>months_between since 1.5.0 +</p> +<p>date_format since 1.5.0 +</p> +<p>from_utc_timestamp since 1.5.0 +</p> +<p>next_day since 1.5.0 +</p> +<p>to_utc_timestamp since 1.5.0 +</p> +<p>add_months since 1.5.0 +</p> +<p>date_add since 1.5.0 +</p> +<p>date_sub since 1.5.0 +</p> + + +<h3>See Also</h3> + +<p>Other data time functions: <code><a href="column_datetime_functions.html">column_datetime_functions</a></code> +</p> + + +<h3>Examples</h3> + +<pre><code class="r">## Not run: +##D dts <- c("2005-01-02 18:47:22", +##D "2005-12-24 16:30:58", +##D "2005-10-28 07:30:05", +##D "2005-12-28 07:01:05", +##D "2006-01-24 00:01:10") +##D y <- c(2.0, 2.2, 3.4, 2.5, 1.8) +##D df <- createDataFrame(data.frame(time = as.POSIXct(dts), y = y)) +## End(Not run) + +## Not run: +##D tmp <- createDataFrame(data.frame(time_string1 = as.POSIXct(dts), +##D time_string2 = as.POSIXct(dts[order(runif(length(dts)))]))) +##D tmp2 <- mutate(tmp, datediff = datediff(tmp$time_string1, tmp$time_string2), +##D monthdiff = months_between(tmp$time_string1, tmp$time_string2)) +##D head(tmp2) +## End(Not run) + +## Not run: +##D tmp <- mutate(df, from_utc = from_utc_timestamp(df$time, "PST"), +##D to_utc = to_utc_timestamp(df$time, "PST")) +##D head(tmp) +## End(Not run) + +## Not run: +##D tmp <- mutate(df, t1 = add_months(df$time, 1), +##D t2 = date_add(df$time, 2), +##D t3 = date_sub(df$time, 3), +##D t4 = next_day(df$time, "Sun")) +##D head(tmp) +## End(Not run) +</code></pre> + + +<hr /><div style="text-align: center;">[Package <em>SparkR</em> version 2.4.0 <a href="00Index.html">Index</a>]</div> +</body></html> http://git-wip-us.apache.org/repos/asf/spark-website/blob/52917ac4/site/docs/2.4.0/api/R/column_datetime_functions.html ---------------------------------------------------------------------- diff --git a/site/docs/2.4.0/api/R/column_datetime_functions.html b/site/docs/2.4.0/api/R/column_datetime_functions.html new file mode 100644 index 0000000..4a1a13c --- /dev/null +++ b/site/docs/2.4.0/api/R/column_datetime_functions.html @@ -0,0 +1,403 @@ +<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><title>R: Date time functions for Column operations</title> +<meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> +<link rel="stylesheet" type="text/css" href="R.css" /> + +<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css"> +<script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js"></script> +<script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js"></script> +<script>hljs.initHighlightingOnLoad();</script> +</head><body> + +<table width="100%" summary="page for column_datetime_functions {SparkR}"><tr><td>column_datetime_functions {SparkR}</td><td style="text-align: right;">R Documentation</td></tr></table> + +<h2>Date time functions for Column operations</h2> + +<h3>Description</h3> + +<p>Date time functions defined for <code>Column</code>. +</p> + + +<h3>Usage</h3> + +<pre> +current_date(x = "missing") + +current_timestamp(x = "missing") + +date_trunc(format, x) + +dayofmonth(x) + +dayofweek(x) + +dayofyear(x) + +from_unixtime(x, ...) + +hour(x) + +last_day(x) + +minute(x) + +month(x) + +quarter(x) + +second(x) + +to_date(x, format) + +to_timestamp(x, format) + +unix_timestamp(x, format) + +weekofyear(x) + +window(x, ...) + +year(x) + +## S4 method for signature 'Column' +dayofmonth(x) + +## S4 method for signature 'Column' +dayofweek(x) + +## S4 method for signature 'Column' +dayofyear(x) + +## S4 method for signature 'Column' +hour(x) + +## S4 method for signature 'Column' +last_day(x) + +## S4 method for signature 'Column' +minute(x) + +## S4 method for signature 'Column' +month(x) + +## S4 method for signature 'Column' +quarter(x) + +## S4 method for signature 'Column' +second(x) + +## S4 method for signature 'Column,missing' +to_date(x, format) + +## S4 method for signature 'Column,character' +to_date(x, format) + +## S4 method for signature 'Column,missing' +to_timestamp(x, format) + +## S4 method for signature 'Column,character' +to_timestamp(x, format) + +## S4 method for signature 'Column' +weekofyear(x) + +## S4 method for signature 'Column' +year(x) + +## S4 method for signature 'Column' +from_unixtime(x, format = "yyyy-MM-dd HH:mm:ss") + +## S4 method for signature 'Column' +window(x, windowDuration, slideDuration = NULL, + startTime = NULL) + +## S4 method for signature 'missing,missing' +unix_timestamp(x, format) + +## S4 method for signature 'Column,missing' +unix_timestamp(x, format) + +## S4 method for signature 'Column,character' +unix_timestamp(x, + format = "yyyy-MM-dd HH:mm:ss") + +## S4 method for signature 'Column' +trunc(x, format) + +## S4 method for signature 'character,Column' +date_trunc(format, x) + +## S4 method for signature 'missing' +current_date() + +## S4 method for signature 'missing' +current_timestamp() +</pre> + + +<h3>Arguments</h3> + +<table summary="R argblock"> +<tr valign="top"><td><code>x</code></td> +<td> +<p>Column to compute on. In <code>window</code>, it must be a time Column of +<code>TimestampType</code>. This is not used with <code>current_date</code> and +<code>current_timestamp</code></p> +</td></tr> +<tr valign="top"><td><code>format</code></td> +<td> +<p>The format for the given dates or timestamps in Column <code>x</code>. See the +format used in the following methods: +</p> + +<ul> +<li> <p><code>to_date</code> and <code>to_timestamp</code>: it is the string to use to parse +Column <code>x</code> to DateType or TimestampType. +</p> +</li> +<li> <p><code>trunc</code>: it is the string to use to specify the truncation method. +For example, "year", "yyyy", "yy" for truncate by year, or "month", "mon", +"mm" for truncate by month. +</p> +</li> +<li> <p><code>date_trunc</code>: it is similar with <code>trunc</code>'s but additionally +supports "day", "dd", "second", "minute", "hour", "week" and "quarter". +</p> +</li></ul> +</td></tr> +<tr valign="top"><td><code>...</code></td> +<td> +<p>additional argument(s).</p> +</td></tr> +<tr valign="top"><td><code>windowDuration</code></td> +<td> +<p>a string specifying the width of the window, e.g. '1 second', +'1 day 12 hours', '2 minutes'. Valid interval strings are 'week', +'day', 'hour', 'minute', 'second', 'millisecond', 'microsecond'. Note that +the duration is a fixed length of time, and does not vary over time +according to a calendar. For example, '1 day' always means 86,400,000 +milliseconds, not a calendar day.</p> +</td></tr> +<tr valign="top"><td><code>slideDuration</code></td> +<td> +<p>a string specifying the sliding interval of the window. Same format as +<code>windowDuration</code>. A new window will be generated every +<code>slideDuration</code>. Must be less than or equal to +the <code>windowDuration</code>. This duration is likewise absolute, and does not +vary according to a calendar.</p> +</td></tr> +<tr valign="top"><td><code>startTime</code></td> +<td> +<p>the offset with respect to 1970-01-01 00:00:00 UTC with which to start +window intervals. For example, in order to have hourly tumbling windows +that start 15 minutes past the hour, e.g. 12:15-13:15, 13:15-14:15... provide +<code>startTime</code> as <code>"15 minutes"</code>.</p> +</td></tr> +</table> + + +<h3>Details</h3> + +<p><code>dayofmonth</code>: Extracts the day of the month as an integer from a +given date/timestamp/string. +</p> +<p><code>dayofweek</code>: Extracts the day of the week as an integer from a +given date/timestamp/string. +</p> +<p><code>dayofyear</code>: Extracts the day of the year as an integer from a +given date/timestamp/string. +</p> +<p><code>hour</code>: Extracts the hour as an integer from a given date/timestamp/string. +</p> +<p><code>last_day</code>: Given a date column, returns the last day of the month which the +given date belongs to. For example, input "2015-07-27" returns "2015-07-31" since +July 31 is the last day of the month in July 2015. +</p> +<p><code>minute</code>: Extracts the minute as an integer from a given date/timestamp/string. +</p> +<p><code>month</code>: Extracts the month as an integer from a given date/timestamp/string. +</p> +<p><code>quarter</code>: Extracts the quarter as an integer from a given date/timestamp/string. +</p> +<p><code>second</code>: Extracts the second as an integer from a given date/timestamp/string. +</p> +<p><code>to_date</code>: Converts the column into a DateType. You may optionally specify +a format according to the rules in: +<a href="http://docs.oracle.com/javase/tutorial/i18n/format/simpleDateFormat.html">http://docs.oracle.com/javase/tutorial/i18n/format/simpleDateFormat.html</a>. +If the string cannot be parsed according to the specified format (or default), +the value of the column will be null. +By default, it follows casting rules to a DateType if the format is omitted +(equivalent to <code>cast(df$x, "date")</code>). +</p> +<p><code>to_timestamp</code>: Converts the column into a TimestampType. You may optionally specify +a format according to the rules in: +<a href="http://docs.oracle.com/javase/tutorial/i18n/format/simpleDateFormat.html">http://docs.oracle.com/javase/tutorial/i18n/format/simpleDateFormat.html</a>. +If the string cannot be parsed according to the specified format (or default), +the value of the column will be null. +By default, it follows casting rules to a TimestampType if the format is omitted +(equivalent to <code>cast(df$x, "timestamp")</code>). +</p> +<p><code>weekofyear</code>: Extracts the week number as an integer from a given date/timestamp/string. +</p> +<p><code>year</code>: Extracts the year as an integer from a given date/timestamp/string. +</p> +<p><code>from_unixtime</code>: Converts the number of seconds from unix epoch (1970-01-01 00:00:00 UTC) +to a string representing the timestamp of that moment in the current system time zone in the JVM +in the given format. +See <a href="http://docs.oracle.com/javase/tutorial/i18n/format/simpleDateFormat.html"> +Customizing Formats</a> for available options. +</p> +<p><code>window</code>: Bucketizes rows into one or more time windows given a timestamp specifying column. +Window starts are inclusive but the window ends are exclusive, e.g. 12:05 will be in the window +[12:05,12:10) but not in [12:00,12:05). Windows can support microsecond precision. Windows in +the order of months are not supported. It returns an output column of struct called 'window' +by default with the nested columns 'start' and 'end' +</p> +<p><code>unix_timestamp</code>: Gets current Unix timestamp in seconds. +</p> +<p><code>trunc</code>: Returns date truncated to the unit specified by the format. +</p> +<p><code>date_trunc</code>: Returns timestamp truncated to the unit specified by the format. +</p> +<p><code>current_date</code>: Returns the current date as a date column. +</p> +<p><code>current_timestamp</code>: Returns the current timestamp as a timestamp column. +</p> + + +<h3>Note</h3> + +<p>dayofmonth since 1.5.0 +</p> +<p>dayofweek since 2.3.0 +</p> +<p>dayofyear since 1.5.0 +</p> +<p>hour since 1.5.0 +</p> +<p>last_day since 1.5.0 +</p> +<p>minute since 1.5.0 +</p> +<p>month since 1.5.0 +</p> +<p>quarter since 1.5.0 +</p> +<p>second since 1.5.0 +</p> +<p>to_date(Column) since 1.5.0 +</p> +<p>to_date(Column, character) since 2.2.0 +</p> +<p>to_timestamp(Column) since 2.2.0 +</p> +<p>to_timestamp(Column, character) since 2.2.0 +</p> +<p>weekofyear since 1.5.0 +</p> +<p>year since 1.5.0 +</p> +<p>from_unixtime since 1.5.0 +</p> +<p>window since 2.0.0 +</p> +<p>unix_timestamp since 1.5.0 +</p> +<p>unix_timestamp(Column) since 1.5.0 +</p> +<p>unix_timestamp(Column, character) since 1.5.0 +</p> +<p>trunc since 2.3.0 +</p> +<p>date_trunc since 2.3.0 +</p> +<p>current_date since 2.3.0 +</p> +<p>current_timestamp since 2.3.0 +</p> + + +<h3>See Also</h3> + +<p>Other data time functions: <code><a href="column_datetime_diff_functions.html">column_datetime_diff_functions</a></code> +</p> + + +<h3>Examples</h3> + +<pre><code class="r">## Not run: +##D dts <- c("2005-01-02 18:47:22", +##D "2005-12-24 16:30:58", +##D "2005-10-28 07:30:05", +##D "2005-12-28 07:01:05", +##D "2006-01-24 00:01:10") +##D y <- c(2.0, 2.2, 3.4, 2.5, 1.8) +##D df <- createDataFrame(data.frame(time = as.POSIXct(dts), y = y)) +## End(Not run) + +## Not run: +##D head(select(df, df$time, year(df$time), quarter(df$time), month(df$time), +##D dayofmonth(df$time), dayofweek(df$time), dayofyear(df$time), weekofyear(df$time))) +##D head(agg(groupBy(df, year(df$time)), count(df$y), avg(df$y))) +##D head(agg(groupBy(df, month(df$time)), avg(df$y))) +## End(Not run) + +## Not run: +##D head(select(df, hour(df$time), minute(df$time), second(df$time))) +##D head(agg(groupBy(df, dayofmonth(df$time)), avg(df$y))) +##D head(agg(groupBy(df, hour(df$time)), avg(df$y))) +##D head(agg(groupBy(df, minute(df$time)), avg(df$y))) +## End(Not run) + +## Not run: +##D head(select(df, df$time, last_day(df$time), month(df$time))) +## End(Not run) + +## Not run: +##D tmp <- createDataFrame(data.frame(time_string = dts)) +##D tmp2 <- mutate(tmp, date1 = to_date(tmp$time_string), +##D date2 = to_date(tmp$time_string, "yyyy-MM-dd"), +##D date3 = date_format(tmp$time_string, "MM/dd/yyy"), +##D time1 = to_timestamp(tmp$time_string), +##D time2 = to_timestamp(tmp$time_string, "yyyy-MM-dd")) +##D head(tmp2) +## End(Not run) + +## Not run: +##D tmp <- mutate(df, to_unix = unix_timestamp(df$time), +##D to_unix2 = unix_timestamp(df$time, 'yyyy-MM-dd HH'), +##D from_unix = from_unixtime(unix_timestamp(df$time)), +##D from_unix2 = from_unixtime(unix_timestamp(df$time), 'yyyy-MM-dd HH:mm')) +##D head(tmp) +## End(Not run) + +## Not run: +##D # One minute windows every 15 seconds 10 seconds after the minute, e.g. 09:00:10-09:01:10, +##D # 09:00:25-09:01:25, 09:00:40-09:01:40, ... +##D window(df$time, "1 minute", "15 seconds", "10 seconds") +##D +##D # One minute tumbling windows 15 seconds after the minute, e.g. 09:00:15-09:01:15, +##D # 09:01:15-09:02:15... +##D window(df$time, "1 minute", startTime = "15 seconds") +##D +##D # Thirty-second windows every 10 seconds, e.g. 09:00:00-09:00:30, 09:00:10-09:00:40, ... +##D window(df$time, "30 seconds", "10 seconds") +## End(Not run) + +## Not run: +##D head(select(df, df$time, trunc(df$time, "year"), trunc(df$time, "yy"), +##D trunc(df$time, "month"), trunc(df$time, "mon"))) +## End(Not run) + +## Not run: +##D head(select(df, df$time, date_trunc("hour", df$time), date_trunc("minute", df$time), +##D date_trunc("week", df$time), date_trunc("quarter", df$time))) +## End(Not run) +## Not run: +##D head(select(df, current_date(), current_timestamp())) +## End(Not run) +</code></pre> + + +<hr /><div style="text-align: center;">[Package <em>SparkR</em> version 2.4.0 <a href="00Index.html">Index</a>]</div> +</body></html> http://git-wip-us.apache.org/repos/asf/spark-website/blob/52917ac4/site/docs/2.4.0/api/R/column_math_functions.html ---------------------------------------------------------------------- diff --git a/site/docs/2.4.0/api/R/column_math_functions.html b/site/docs/2.4.0/api/R/column_math_functions.html new file mode 100644 index 0000000..4a25ce7 --- /dev/null +++ b/site/docs/2.4.0/api/R/column_math_functions.html @@ -0,0 +1,415 @@ +<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><title>R: Math functions for Column operations</title> +<meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> +<link rel="stylesheet" type="text/css" href="R.css" /> + +<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css"> +<script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js"></script> +<script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js"></script> +<script>hljs.initHighlightingOnLoad();</script> +</head><body> + +<table width="100%" summary="page for column_math_functions {SparkR}"><tr><td>column_math_functions {SparkR}</td><td style="text-align: right;">R Documentation</td></tr></table> + +<h2>Math functions for Column operations</h2> + +<h3>Description</h3> + +<p>Math functions defined for <code>Column</code>. +</p> + + +<h3>Usage</h3> + +<pre> +bin(x) + +bround(x, ...) + +cbrt(x) + +ceil(x) + +conv(x, fromBase, toBase) + +hex(x) + +hypot(y, x) + +pmod(y, x) + +rint(x) + +shiftLeft(y, x) + +shiftRight(y, x) + +shiftRightUnsigned(y, x) + +signum(x) + +toDegrees(x) + +toRadians(x) + +unhex(x) + +## S4 method for signature 'Column' +abs(x) + +## S4 method for signature 'Column' +acos(x) + +## S4 method for signature 'Column' +asin(x) + +## S4 method for signature 'Column' +atan(x) + +## S4 method for signature 'Column' +bin(x) + +## S4 method for signature 'Column' +cbrt(x) + +## S4 method for signature 'Column' +ceil(x) + +## S4 method for signature 'Column' +ceiling(x) + +## S4 method for signature 'Column' +cos(x) + +## S4 method for signature 'Column' +cosh(x) + +## S4 method for signature 'Column' +exp(x) + +## S4 method for signature 'Column' +expm1(x) + +## S4 method for signature 'Column' +factorial(x) + +## S4 method for signature 'Column' +floor(x) + +## S4 method for signature 'Column' +hex(x) + +## S4 method for signature 'Column' +log(x) + +## S4 method for signature 'Column' +log10(x) + +## S4 method for signature 'Column' +log1p(x) + +## S4 method for signature 'Column' +log2(x) + +## S4 method for signature 'Column' +rint(x) + +## S4 method for signature 'Column' +round(x) + +## S4 method for signature 'Column' +bround(x, scale = 0) + +## S4 method for signature 'Column' +signum(x) + +## S4 method for signature 'Column' +sign(x) + +## S4 method for signature 'Column' +sin(x) + +## S4 method for signature 'Column' +sinh(x) + +## S4 method for signature 'Column' +sqrt(x) + +## S4 method for signature 'Column' +tan(x) + +## S4 method for signature 'Column' +tanh(x) + +## S4 method for signature 'Column' +toDegrees(x) + +## S4 method for signature 'Column' +toRadians(x) + +## S4 method for signature 'Column' +unhex(x) + +## S4 method for signature 'Column' +atan2(y, x) + +## S4 method for signature 'Column' +hypot(y, x) + +## S4 method for signature 'Column' +pmod(y, x) + +## S4 method for signature 'Column,numeric' +shiftLeft(y, x) + +## S4 method for signature 'Column,numeric' +shiftRight(y, x) + +## S4 method for signature 'Column,numeric' +shiftRightUnsigned(y, x) + +## S4 method for signature 'Column,numeric,numeric' +conv(x, fromBase, toBase) +</pre> + + +<h3>Arguments</h3> + +<table summary="R argblock"> +<tr valign="top"><td><code>x</code></td> +<td> +<p>Column to compute on. In <code>shiftLeft</code>, <code>shiftRight</code> and +<code>shiftRightUnsigned</code>, this is the number of bits to shift.</p> +</td></tr> +<tr valign="top"><td><code>...</code></td> +<td> +<p>additional argument(s).</p> +</td></tr> +<tr valign="top"><td><code>fromBase</code></td> +<td> +<p>base to convert from.</p> +</td></tr> +<tr valign="top"><td><code>toBase</code></td> +<td> +<p>base to convert to.</p> +</td></tr> +<tr valign="top"><td><code>y</code></td> +<td> +<p>Column to compute on.</p> +</td></tr> +<tr valign="top"><td><code>scale</code></td> +<td> +<p>round to <code>scale</code> digits to the right of the decimal point when +<code>scale</code> > 0, the nearest even number when <code>scale</code> = 0, and <code>scale</code> digits +to the left of the decimal point when <code>scale</code> < 0.</p> +</td></tr> +</table> + + +<h3>Details</h3> + +<p><code>abs</code>: Computes the absolute value. +</p> +<p><code>acos</code>: Returns the inverse cosine of the given value, +as if computed by <code>java.lang.Math.acos()</code> +</p> +<p><code>asin</code>: Returns the inverse sine of the given value, +as if computed by <code>java.lang.Math.asin()</code> +</p> +<p><code>atan</code>: Returns the inverse tangent of the given value, +as if computed by <code>java.lang.Math.atan()</code> +</p> +<p><code>bin</code>: Returns the string representation of the binary value +of the given long column. For example, bin("12") returns "1100". +</p> +<p><code>cbrt</code>: Computes the cube-root of the given value. +</p> +<p><code>ceil</code>: Computes the ceiling of the given value. +</p> +<p><code>ceiling</code>: Alias for <code>ceil</code>. +</p> +<p><code>cos</code>: Returns the cosine of the given value, +as if computed by <code>java.lang.Math.cos()</code>. Units in radians. +</p> +<p><code>cosh</code>: Returns the hyperbolic cosine of the given value, +as if computed by <code>java.lang.Math.cosh()</code>. +</p> +<p><code>exp</code>: Computes the exponential of the given value. +</p> +<p><code>expm1</code>: Computes the exponential of the given value minus one. +</p> +<p><code>factorial</code>: Computes the factorial of the given value. +</p> +<p><code>floor</code>: Computes the floor of the given value. +</p> +<p><code>hex</code>: Computes hex value of the given column. +</p> +<p><code>log</code>: Computes the natural logarithm of the given value. +</p> +<p><code>log10</code>: Computes the logarithm of the given value in base 10. +</p> +<p><code>log1p</code>: Computes the natural logarithm of the given value plus one. +</p> +<p><code>log2</code>: Computes the logarithm of the given column in base 2. +</p> +<p><code>rint</code>: Returns the double value that is closest in value to the argument and +is equal to a mathematical integer. +</p> +<p><code>round</code>: Returns the value of the column rounded to 0 decimal places +using HALF_UP rounding mode. +</p> +<p><code>bround</code>: Returns the value of the column <code>e</code> rounded to <code>scale</code> decimal places +using HALF_EVEN rounding mode if <code>scale</code> >= 0 or at integer part when <code>scale</code> < 0. +Also known as Gaussian rounding or bankers' rounding that rounds to the nearest even number. +bround(2.5, 0) = 2, bround(3.5, 0) = 4. +</p> +<p><code>signum</code>: Computes the signum of the given value. +</p> +<p><code>sign</code>: Alias for <code>signum</code>. +</p> +<p><code>sin</code>: Returns the sine of the given value, +as if computed by <code>java.lang.Math.sin()</code>. Units in radians. +</p> +<p><code>sinh</code>: Returns the hyperbolic sine of the given value, +as if computed by <code>java.lang.Math.sinh()</code>. +</p> +<p><code>sqrt</code>: Computes the square root of the specified float value. +</p> +<p><code>tan</code>: Returns the tangent of the given value, +as if computed by <code>java.lang.Math.tan()</code>. +Units in radians. +</p> +<p><code>tanh</code>: Returns the hyperbolic tangent of the given value, +as if computed by <code>java.lang.Math.tanh()</code>. +</p> +<p><code>toDegrees</code>: Converts an angle measured in radians to an approximately equivalent angle +measured in degrees. +</p> +<p><code>toRadians</code>: Converts an angle measured in degrees to an approximately equivalent angle +measured in radians. +</p> +<p><code>unhex</code>: Inverse of hex. Interprets each pair of characters as a hexadecimal number +and converts to the byte representation of number. +</p> +<p><code>atan2</code>: Returns the angle theta from the conversion of rectangular coordinates +(x, y) to polar coordinates (r, theta), +as if computed by <code>java.lang.Math.atan2()</code>. Units in radians. +</p> +<p><code>hypot</code>: Computes "sqrt(a^2 + b^2)" without intermediate overflow or underflow. +</p> +<p><code>pmod</code>: Returns the positive value of dividend mod divisor. +Column <code>x</code> is divisor column, and column <code>y</code> is the dividend column. +</p> +<p><code>shiftLeft</code>: Shifts the given value numBits left. If the given value is a long value, +this function will return a long value else it will return an integer value. +</p> +<p><code>shiftRight</code>: (Signed) shifts the given value numBits right. If the given value is a long +value, it will return a long value else it will return an integer value. +</p> +<p><code>shiftRightUnsigned</code>: (Unigned) shifts the given value numBits right. If the given value is +a long value, it will return a long value else it will return an integer value. +</p> +<p><code>conv</code>: Converts a number in a string column from one base to another. +</p> + + +<h3>Note</h3> + +<p>abs since 1.5.0 +</p> +<p>acos since 1.5.0 +</p> +<p>asin since 1.5.0 +</p> +<p>atan since 1.5.0 +</p> +<p>bin since 1.5.0 +</p> +<p>cbrt since 1.4.0 +</p> +<p>ceil since 1.5.0 +</p> +<p>ceiling since 1.5.0 +</p> +<p>cos since 1.5.0 +</p> +<p>cosh since 1.5.0 +</p> +<p>exp since 1.5.0 +</p> +<p>expm1 since 1.5.0 +</p> +<p>factorial since 1.5.0 +</p> +<p>floor since 1.5.0 +</p> +<p>hex since 1.5.0 +</p> +<p>log since 1.5.0 +</p> +<p>log10 since 1.5.0 +</p> +<p>log1p since 1.5.0 +</p> +<p>log2 since 1.5.0 +</p> +<p>rint since 1.5.0 +</p> +<p>round since 1.5.0 +</p> +<p>bround since 2.0.0 +</p> +<p>signum since 1.5.0 +</p> +<p>sign since 1.5.0 +</p> +<p>sin since 1.5.0 +</p> +<p>sinh since 1.5.0 +</p> +<p>sqrt since 1.5.0 +</p> +<p>tan since 1.5.0 +</p> +<p>tanh since 1.5.0 +</p> +<p>toDegrees since 1.4.0 +</p> +<p>toRadians since 1.4.0 +</p> +<p>unhex since 1.5.0 +</p> +<p>atan2 since 1.5.0 +</p> +<p>hypot since 1.4.0 +</p> +<p>pmod since 1.5.0 +</p> +<p>shiftLeft since 1.5.0 +</p> +<p>shiftRight since 1.5.0 +</p> +<p>shiftRightUnsigned since 1.5.0 +</p> +<p>conv since 1.5.0 +</p> + + +<h3>Examples</h3> + +<pre><code class="r">## Not run: +##D # Dataframe used throughout this doc +##D df <- createDataFrame(cbind(model = rownames(mtcars), mtcars)) +##D tmp <- mutate(df, v1 = log(df$mpg), v2 = cbrt(df$disp), +##D v3 = bround(df$wt, 1), v4 = bin(df$cyl), +##D v5 = hex(df$wt), v6 = toDegrees(df$gear), +##D v7 = atan2(df$cyl, df$am), v8 = hypot(df$cyl, df$am), +##D v9 = pmod(df$hp, df$cyl), v10 = shiftLeft(df$disp, 1), +##D v11 = conv(df$hp, 10, 16), v12 = sign(df$vs - 0.5), +##D v13 = sqrt(df$disp), v14 = ceil(df$wt)) +##D head(tmp) +## End(Not run) +</code></pre> + + +<hr /><div style="text-align: center;">[Package <em>SparkR</em> version 2.4.0 <a href="00Index.html">Index</a>]</div> +</body></html> http://git-wip-us.apache.org/repos/asf/spark-website/blob/52917ac4/site/docs/2.4.0/api/R/column_misc_functions.html ---------------------------------------------------------------------- diff --git a/site/docs/2.4.0/api/R/column_misc_functions.html b/site/docs/2.4.0/api/R/column_misc_functions.html new file mode 100644 index 0000000..06ebab2 --- /dev/null +++ b/site/docs/2.4.0/api/R/column_misc_functions.html @@ -0,0 +1,117 @@ +<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><title>R: Miscellaneous functions for Column operations</title> +<meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> +<link rel="stylesheet" type="text/css" href="R.css" /> + +<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css"> +<script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js"></script> +<script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js"></script> +<script>hljs.initHighlightingOnLoad();</script> +</head><body> + +<table width="100%" summary="page for column_misc_functions {SparkR}"><tr><td>column_misc_functions {SparkR}</td><td style="text-align: right;">R Documentation</td></tr></table> + +<h2>Miscellaneous functions for Column operations</h2> + +<h3>Description</h3> + +<p>Miscellaneous functions defined for <code>Column</code>. +</p> + + +<h3>Usage</h3> + +<pre> +crc32(x) + +hash(x, ...) + +md5(x) + +sha1(x) + +sha2(y, x) + +## S4 method for signature 'Column' +crc32(x) + +## S4 method for signature 'Column' +hash(x, ...) + +## S4 method for signature 'Column' +md5(x) + +## S4 method for signature 'Column' +sha1(x) + +## S4 method for signature 'Column,numeric' +sha2(y, x) +</pre> + + +<h3>Arguments</h3> + +<table summary="R argblock"> +<tr valign="top"><td><code>x</code></td> +<td> +<p>Column to compute on. In <code>sha2</code>, it is one of 224, 256, 384, or 512.</p> +</td></tr> +<tr valign="top"><td><code>...</code></td> +<td> +<p>additional Columns.</p> +</td></tr> +<tr valign="top"><td><code>y</code></td> +<td> +<p>Column to compute on.</p> +</td></tr> +</table> + + +<h3>Details</h3> + +<p><code>crc32</code>: Calculates the cyclic redundancy check value (CRC32) of a binary column +and returns the value as a bigint. +</p> +<p><code>hash</code>: Calculates the hash code of given columns, and returns the result +as an int column. +</p> +<p><code>md5</code>: Calculates the MD5 digest of a binary column and returns the value +as a 32 character hex string. +</p> +<p><code>sha1</code>: Calculates the SHA-1 digest of a binary column and returns the value +as a 40 character hex string. +</p> +<p><code>sha2</code>: Calculates the SHA-2 family of hash functions of a binary column and +returns the value as a hex string. The second argument <code>x</code> specifies the number +of bits, and is one of 224, 256, 384, or 512. +</p> + + +<h3>Note</h3> + +<p>crc32 since 1.5.0 +</p> +<p>hash since 2.0.0 +</p> +<p>md5 since 1.5.0 +</p> +<p>sha1 since 1.5.0 +</p> +<p>sha2 since 1.5.0 +</p> + + +<h3>Examples</h3> + +<pre><code class="r">## Not run: +##D # Dataframe used throughout this doc +##D df <- createDataFrame(cbind(model = rownames(mtcars), mtcars)[, 1:2]) +##D tmp <- mutate(df, v1 = crc32(df$model), v2 = hash(df$model), +##D v3 = hash(df$model, df$mpg), v4 = md5(df$model), +##D v5 = sha1(df$model), v6 = sha2(df$model, 256)) +##D head(tmp) +## End(Not run) +</code></pre> + + +<hr /><div style="text-align: center;">[Package <em>SparkR</em> version 2.4.0 <a href="00Index.html">Index</a>]</div> +</body></html> http://git-wip-us.apache.org/repos/asf/spark-website/blob/52917ac4/site/docs/2.4.0/api/R/column_nonaggregate_functions.html ---------------------------------------------------------------------- diff --git a/site/docs/2.4.0/api/R/column_nonaggregate_functions.html b/site/docs/2.4.0/api/R/column_nonaggregate_functions.html new file mode 100644 index 0000000..e555e73 --- /dev/null +++ b/site/docs/2.4.0/api/R/column_nonaggregate_functions.html @@ -0,0 +1,351 @@ +<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><title>R: Non-aggregate functions for Column operations</title> +<meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> +<link rel="stylesheet" type="text/css" href="R.css" /> + +<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css"> +<script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js"></script> +<script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js"></script> +<script>hljs.initHighlightingOnLoad();</script> +</head><body> + +<table width="100%" summary="page for column_nonaggregate_functions {SparkR}"><tr><td>column_nonaggregate_functions {SparkR}</td><td style="text-align: right;">R Documentation</td></tr></table> + +<h2>Non-aggregate functions for Column operations</h2> + +<h3>Description</h3> + +<p>Non-aggregate functions defined for <code>Column</code>. +</p> + + +<h3>Usage</h3> + +<pre> +when(condition, value) + +bitwiseNOT(x) + +create_array(x, ...) + +create_map(x, ...) + +expr(x) + +greatest(x, ...) + +input_file_name(x = "missing") + +isnan(x) + +least(x, ...) + +lit(x) + +monotonically_increasing_id(x = "missing") + +nanvl(y, x) + +negate(x) + +rand(seed) + +randn(seed) + +spark_partition_id(x = "missing") + +struct(x, ...) + +## S4 method for signature 'ANY' +lit(x) + +## S4 method for signature 'Column' +bitwiseNOT(x) + +## S4 method for signature 'Column' +coalesce(x, ...) + +## S4 method for signature 'Column' +isnan(x) + +## S4 method for signature 'Column' +is.nan(x) + +## S4 method for signature 'missing' +monotonically_increasing_id() + +## S4 method for signature 'Column' +negate(x) + +## S4 method for signature 'missing' +spark_partition_id() + +## S4 method for signature 'characterOrColumn' +struct(x, ...) + +## S4 method for signature 'Column' +nanvl(y, x) + +## S4 method for signature 'Column' +greatest(x, ...) + +## S4 method for signature 'Column' +least(x, ...) + +## S4 method for signature 'character' +expr(x) + +## S4 method for signature 'missing' +rand(seed) + +## S4 method for signature 'numeric' +rand(seed) + +## S4 method for signature 'missing' +randn(seed) + +## S4 method for signature 'numeric' +randn(seed) + +## S4 method for signature 'Column' +when(condition, value) + +## S4 method for signature 'Column' +ifelse(test, yes, no) + +## S4 method for signature 'Column' +create_array(x, ...) + +## S4 method for signature 'Column' +create_map(x, ...) + +## S4 method for signature 'missing' +input_file_name() +</pre> + + +<h3>Arguments</h3> + +<table summary="R argblock"> +<tr valign="top"><td><code>condition</code></td> +<td> +<p>the condition to test on. Must be a Column expression.</p> +</td></tr> +<tr valign="top"><td><code>value</code></td> +<td> +<p>result expression.</p> +</td></tr> +<tr valign="top"><td><code>x</code></td> +<td> +<p>Column to compute on. In <code>lit</code>, it is a literal value or a Column. +In <code>expr</code>, it contains an expression character object to be parsed.</p> +</td></tr> +<tr valign="top"><td><code>...</code></td> +<td> +<p>additional Columns.</p> +</td></tr> +<tr valign="top"><td><code>y</code></td> +<td> +<p>Column to compute on.</p> +</td></tr> +<tr valign="top"><td><code>seed</code></td> +<td> +<p>a random seed. Can be missing.</p> +</td></tr> +<tr valign="top"><td><code>test</code></td> +<td> +<p>a Column expression that describes the condition.</p> +</td></tr> +<tr valign="top"><td><code>yes</code></td> +<td> +<p>return values for <code>TRUE</code> elements of test.</p> +</td></tr> +<tr valign="top"><td><code>no</code></td> +<td> +<p>return values for <code>FALSE</code> elements of test.</p> +</td></tr> +</table> + + +<h3>Details</h3> + +<p><code>lit</code>: A new Column is created to represent the literal value. +If the parameter is a Column, it is returned unchanged. +</p> +<p><code>bitwiseNOT</code>: Computes bitwise NOT. +</p> +<p><code>coalesce</code>: Returns the first column that is not NA, or NA if all inputs are. +</p> +<p><code>isnan</code>: Returns true if the column is NaN. +</p> +<p><code>is.nan</code>: Alias for <a href="column_nonaggregate_functions.html">isnan</a>. +</p> +<p><code>monotonically_increasing_id</code>: Returns a column that generates monotonically increasing +64-bit integers. The generated ID is guaranteed to be monotonically increasing and unique, +but not consecutive. The current implementation puts the partition ID in the upper 31 bits, +and the record number within each partition in the lower 33 bits. The assumption is that the +SparkDataFrame has less than 1 billion partitions, and each partition has less than 8 billion +records. As an example, consider a SparkDataFrame with two partitions, each with 3 records. +This expression would return the following IDs: +0, 1, 2, 8589934592 (1L << 33), 8589934593, 8589934594. +This is equivalent to the MONOTONICALLY_INCREASING_ID function in SQL. +The method should be used with no argument. +Note: the function is non-deterministic because its result depends on partition IDs. +</p> +<p><code>negate</code>: Unary minus, i.e. negate the expression. +</p> +<p><code>spark_partition_id</code>: Returns the partition ID as a SparkDataFrame column. +Note that this is nondeterministic because it depends on data partitioning and +task scheduling. +This is equivalent to the <code>SPARK_PARTITION_ID</code> function in SQL. +</p> +<p><code>struct</code>: Creates a new struct column that composes multiple input columns. +</p> +<p><code>nanvl</code>: Returns the first column (<code>y</code>) if it is not NaN, or the second column +(<code>x</code>) if the first column is NaN. Both inputs should be floating point columns +(DoubleType or FloatType). +</p> +<p><code>greatest</code>: Returns the greatest value of the list of column names, skipping null values. +This function takes at least 2 parameters. It will return null if all parameters are null. +</p> +<p><code>least</code>: Returns the least value of the list of column names, skipping null values. +This function takes at least 2 parameters. It will return null if all parameters are null. +</p> +<p><code>expr</code>: Parses the expression string into the column that it represents, similar to +<code>SparkDataFrame.selectExpr</code> +</p> +<p><code>rand</code>: Generates a random column with independent and identically distributed (i.i.d.) +samples from U[0.0, 1.0]. +Note: the function is non-deterministic in general case. +</p> +<p><code>randn</code>: Generates a column with independent and identically distributed (i.i.d.) samples +from the standard normal distribution. +Note: the function is non-deterministic in general case. +</p> +<p><code>when</code>: Evaluates a list of conditions and returns one of multiple possible result +expressions. For unmatched expressions null is returned. +</p> +<p><code>ifelse</code>: Evaluates a list of conditions and returns <code>yes</code> if the conditions are +satisfied. Otherwise <code>no</code> is returned for unmatched conditions. +</p> +<p><code>create_array</code>: Creates a new array column. The input columns must all have the same data +type. +</p> +<p><code>create_map</code>: Creates a new map column. The input columns must be grouped as key-value +pairs, e.g. (key1, value1, key2, value2, ...). +The key columns must all have the same data type, and can't be null. +The value columns must all have the same data type. +</p> +<p><code>input_file_name</code>: Creates a string column with the input file name for a given row. +The method should be used with no argument. +</p> + + +<h3>Note</h3> + +<p>lit since 1.5.0 +</p> +<p>bitwiseNOT since 1.5.0 +</p> +<p>coalesce(Column) since 2.1.1 +</p> +<p>isnan since 2.0.0 +</p> +<p>is.nan since 2.0.0 +</p> +<p>negate since 1.5.0 +</p> +<p>spark_partition_id since 2.0.0 +</p> +<p>struct since 1.6.0 +</p> +<p>nanvl since 1.5.0 +</p> +<p>greatest since 1.5.0 +</p> +<p>least since 1.5.0 +</p> +<p>expr since 1.5.0 +</p> +<p>rand since 1.5.0 +</p> +<p>rand(numeric) since 1.5.0 +</p> +<p>randn since 1.5.0 +</p> +<p>randn(numeric) since 1.5.0 +</p> +<p>when since 1.5.0 +</p> +<p>ifelse since 1.5.0 +</p> +<p>create_array since 2.3.0 +</p> +<p>create_map since 2.3.0 +</p> +<p>input_file_name since 2.3.0 +</p> + + +<h3>See Also</h3> + +<p>coalesce,SparkDataFrame-method +</p> +<p>Other non-aggregate functions: <code><a href="column.html">column</a></code>, +<code><a href="not.html">not</a></code> +</p> + + +<h3>Examples</h3> + +<pre><code class="r">## Not run: +##D # Dataframe used throughout this doc +##D df <- createDataFrame(cbind(model = rownames(mtcars), mtcars)) +## End(Not run) + +## Not run: +##D tmp <- mutate(df, v1 = lit(df$mpg), v2 = lit("x"), v3 = lit("2015-01-01"), +##D v4 = negate(df$mpg), v5 = expr('length(model)'), +##D v6 = greatest(df$vs, df$am), v7 = least(df$vs, df$am), +##D v8 = column("mpg")) +##D head(tmp) +## End(Not run) + +## Not run: +##D head(select(df, bitwiseNOT(cast(df$vs, "int")))) +## End(Not run) + +## Not run: head(select(df, monotonically_increasing_id())) + +## Not run: head(select(df, spark_partition_id())) + +## Not run: +##D tmp <- mutate(df, v1 = struct(df$mpg, df$cyl), v2 = struct("hp", "wt", "vs"), +##D v3 = create_array(df$mpg, df$cyl, df$hp), +##D v4 = create_map(lit("x"), lit(1.0), lit("y"), lit(-1.0))) +##D head(tmp) +## End(Not run) + +## Not run: +##D tmp <- mutate(df, r1 = rand(), r2 = rand(10), r3 = randn(), r4 = randn(10)) +##D head(tmp) +## End(Not run) + +## Not run: +##D tmp <- mutate(df, mpg_na = otherwise(when(df$mpg > 20, df$mpg), lit(NaN)), +##D mpg2 = ifelse(df$mpg > 20 & df$am > 0, 0, 1), +##D mpg3 = ifelse(df$mpg > 20, df$mpg, 20.0)) +##D head(tmp) +##D tmp <- mutate(tmp, ind_na1 = is.nan(tmp$mpg_na), ind_na2 = isnan(tmp$mpg_na)) +##D head(select(tmp, coalesce(tmp$mpg_na, tmp$mpg))) +##D head(select(tmp, nanvl(tmp$mpg_na, tmp$hp))) +## End(Not run) + +## Not run: +##D tmp <- read.text("README.md") +##D head(select(tmp, input_file_name())) +## End(Not run) +</code></pre> + + +<hr /><div style="text-align: center;">[Package <em>SparkR</em> version 2.4.0 <a href="00Index.html">Index</a>]</div> +</body></html> http://git-wip-us.apache.org/repos/asf/spark-website/blob/52917ac4/site/docs/2.4.0/api/R/column_string_functions.html ---------------------------------------------------------------------- diff --git a/site/docs/2.4.0/api/R/column_string_functions.html b/site/docs/2.4.0/api/R/column_string_functions.html new file mode 100644 index 0000000..763df94 --- /dev/null +++ b/site/docs/2.4.0/api/R/column_string_functions.html @@ -0,0 +1,522 @@ +<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><title>R: String functions for Column operations</title> +<meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> +<link rel="stylesheet" type="text/css" href="R.css" /> + +<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css"> +<script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js"></script> +<script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js"></script> +<script>hljs.initHighlightingOnLoad();</script> +</head><body> + +<table width="100%" summary="page for column_string_functions {SparkR}"><tr><td>column_string_functions {SparkR}</td><td style="text-align: right;">R Documentation</td></tr></table> + +<h2>String functions for Column operations</h2> + +<h3>Description</h3> + +<p>String functions defined for <code>Column</code>. +</p> + + +<h3>Usage</h3> + +<pre> +ascii(x) + +base64(x) + +concat_ws(sep, x, ...) + +decode(x, charset) + +encode(x, charset) + +format_number(y, x) + +format_string(format, x, ...) + +initcap(x) + +instr(y, x) + +levenshtein(y, x) + +locate(substr, str, ...) + +lower(x) + +lpad(x, len, pad) + +ltrim(x, trimString) + +regexp_extract(x, pattern, idx) + +regexp_replace(x, pattern, replacement) + +repeat_string(x, n) + +rpad(x, len, pad) + +rtrim(x, trimString) + +split_string(x, pattern) + +soundex(x) + +substring_index(x, delim, count) + +translate(x, matchingString, replaceString) + +trim(x, trimString) + +unbase64(x) + +upper(x) + +## S4 method for signature 'Column' +ascii(x) + +## S4 method for signature 'Column' +base64(x) + +## S4 method for signature 'Column,character' +decode(x, charset) + +## S4 method for signature 'Column,character' +encode(x, charset) + +## S4 method for signature 'Column' +initcap(x) + +## S4 method for signature 'Column' +length(x) + +## S4 method for signature 'Column' +lower(x) + +## S4 method for signature 'Column,missing' +ltrim(x, trimString) + +## S4 method for signature 'Column,character' +ltrim(x, trimString) + +## S4 method for signature 'Column,missing' +rtrim(x, trimString) + +## S4 method for signature 'Column,character' +rtrim(x, trimString) + +## S4 method for signature 'Column' +soundex(x) + +## S4 method for signature 'Column,missing' +trim(x, trimString) + +## S4 method for signature 'Column,character' +trim(x, trimString) + +## S4 method for signature 'Column' +unbase64(x) + +## S4 method for signature 'Column' +upper(x) + +## S4 method for signature 'Column' +levenshtein(y, x) + +## S4 method for signature 'Column,character' +instr(y, x) + +## S4 method for signature 'Column,numeric' +format_number(y, x) + +## S4 method for signature 'character,Column' +concat_ws(sep, x, ...) + +## S4 method for signature 'character,Column' +format_string(format, x, ...) + +## S4 method for signature 'character,Column' +locate(substr, str, pos = 1) + +## S4 method for signature 'Column,numeric,character' +lpad(x, len, pad) + +## S4 method for signature 'Column,character,numeric' +regexp_extract(x, pattern, idx) + +## S4 method for signature 'Column,character,character' +regexp_replace(x, pattern, + replacement) + +## S4 method for signature 'Column,numeric,character' +rpad(x, len, pad) + +## S4 method for signature 'Column,character,numeric' +substring_index(x, delim, count) + +## S4 method for signature 'Column,character,character' +translate(x, matchingString, + replaceString) + +## S4 method for signature 'Column,character' +split_string(x, pattern) + +## S4 method for signature 'Column,numeric' +repeat_string(x, n) +</pre> + + +<h3>Arguments</h3> + +<table summary="R argblock"> +<tr valign="top"><td><code>x</code></td> +<td> +<p>Column to compute on except in the following methods: +</p> + +<ul> +<li> <p><code>instr</code>: <code>character</code>, the substring to check. See 'Details'. +</p> +</li> +<li> <p><code>format_number</code>: <code>numeric</code>, the number of decimal place to +format to. See 'Details'. +</p> +</li></ul> +</td></tr> +<tr valign="top"><td><code>sep</code></td> +<td> +<p>separator to use.</p> +</td></tr> +<tr valign="top"><td><code>...</code></td> +<td> +<p>additional Columns.</p> +</td></tr> +<tr valign="top"><td><code>charset</code></td> +<td> +<p>character set to use (one of "US-ASCII", "ISO-8859-1", "UTF-8", "UTF-16BE", +"UTF-16LE", "UTF-16").</p> +</td></tr> +<tr valign="top"><td><code>y</code></td> +<td> +<p>Column to compute on.</p> +</td></tr> +<tr valign="top"><td><code>format</code></td> +<td> +<p>a character object of format strings.</p> +</td></tr> +<tr valign="top"><td><code>substr</code></td> +<td> +<p>a character string to be matched.</p> +</td></tr> +<tr valign="top"><td><code>str</code></td> +<td> +<p>a Column where matches are sought for each entry.</p> +</td></tr> +<tr valign="top"><td><code>len</code></td> +<td> +<p>maximum length of each output result.</p> +</td></tr> +<tr valign="top"><td><code>pad</code></td> +<td> +<p>a character string to be padded with.</p> +</td></tr> +<tr valign="top"><td><code>trimString</code></td> +<td> +<p>a character string to trim with</p> +</td></tr> +<tr valign="top"><td><code>pattern</code></td> +<td> +<p>a regular expression.</p> +</td></tr> +<tr valign="top"><td><code>idx</code></td> +<td> +<p>a group index.</p> +</td></tr> +<tr valign="top"><td><code>replacement</code></td> +<td> +<p>a character string that a matched <code>pattern</code> is replaced with.</p> +</td></tr> +<tr valign="top"><td><code>n</code></td> +<td> +<p>number of repetitions.</p> +</td></tr> +<tr valign="top"><td><code>delim</code></td> +<td> +<p>a delimiter string.</p> +</td></tr> +<tr valign="top"><td><code>count</code></td> +<td> +<p>number of occurrences of <code>delim</code> before the substring is returned. +A positive number means counting from the left, while negative means +counting from the right.</p> +</td></tr> +<tr valign="top"><td><code>matchingString</code></td> +<td> +<p>a source string where each character will be translated.</p> +</td></tr> +<tr valign="top"><td><code>replaceString</code></td> +<td> +<p>a target string where each <code>matchingString</code> character will +be replaced by the character in <code>replaceString</code> +at the same location, if any.</p> +</td></tr> +<tr valign="top"><td><code>pos</code></td> +<td> +<p>start position of search.</p> +</td></tr> +</table> + + +<h3>Details</h3> + +<p><code>ascii</code>: Computes the numeric value of the first character of the string column, +and returns the result as an int column. +</p> +<p><code>base64</code>: Computes the BASE64 encoding of a binary column and returns it as +a string column. This is the reverse of unbase64. +</p> +<p><code>decode</code>: Computes the first argument into a string from a binary using the provided +character set. +</p> +<p><code>encode</code>: Computes the first argument into a binary from a string using the provided +character set. +</p> +<p><code>initcap</code>: Returns a new string column by converting the first letter of +each word to uppercase. Words are delimited by whitespace. For example, "hello world" +will become "Hello World". +</p> +<p><code>length</code>: Computes the character length of a string data or number of bytes +of a binary data. The length of string data includes the trailing spaces. +The length of binary data includes binary zeros. +</p> +<p><code>lower</code>: Converts a string column to lower case. +</p> +<p><code>ltrim</code>: Trims the spaces from left end for the specified string value. Optionally a +<code>trimString</code> can be specified. +</p> +<p><code>rtrim</code>: Trims the spaces from right end for the specified string value. Optionally a +<code>trimString</code> can be specified. +</p> +<p><code>soundex</code>: Returns the soundex code for the specified expression. +</p> +<p><code>trim</code>: Trims the spaces from both ends for the specified string column. Optionally a +<code>trimString</code> can be specified. +</p> +<p><code>unbase64</code>: Decodes a BASE64 encoded string column and returns it as a binary column. +This is the reverse of base64. +</p> +<p><code>upper</code>: Converts a string column to upper case. +</p> +<p><code>levenshtein</code>: Computes the Levenshtein distance of the two given string columns. +</p> +<p><code>instr</code>: Locates the position of the first occurrence of a substring (<code>x</code>) +in the given string column (<code>y</code>). Returns null if either of the arguments are null. +Note: The position is not zero based, but 1 based index. Returns 0 if the substring +could not be found in the string column. +</p> +<p><code>format_number</code>: Formats numeric column <code>y</code> to a format like '#,###,###.##', +rounded to <code>x</code> decimal places with HALF_EVEN round mode, and returns the result +as a string column. +If <code>x</code> is 0, the result has no decimal point or fractional part. +If <code>x</code> < 0, the result will be null. +</p> +<p><code>concat_ws</code>: Concatenates multiple input string columns together into a single +string column, using the given separator. +</p> +<p><code>format_string</code>: Formats the arguments in printf-style and returns the result +as a string column. +</p> +<p><code>locate</code>: Locates the position of the first occurrence of substr. +Note: The position is not zero based, but 1 based index. Returns 0 if substr +could not be found in str. +</p> +<p><code>lpad</code>: Left-padded with pad to a length of len. +</p> +<p><code>regexp_extract</code>: Extracts a specific <code>idx</code> group identified by a Java regex, +from the specified string column. If the regex did not match, or the specified group did +not match, an empty string is returned. +</p> +<p><code>regexp_replace</code>: Replaces all substrings of the specified string value that +match regexp with rep. +</p> +<p><code>rpad</code>: Right-padded with pad to a length of len. +</p> +<p><code>substring_index</code>: Returns the substring from string (<code>x</code>) before <code>count</code> +occurrences of the delimiter (<code>delim</code>). If <code>count</code> is positive, everything the left of +the final delimiter (counting from left) is returned. If <code>count</code> is negative, every to the +right of the final delimiter (counting from the right) is returned. <code>substring_index</code> +performs a case-sensitive match when searching for the delimiter. +</p> +<p><code>translate</code>: Translates any character in the src by a character in replaceString. +The characters in replaceString is corresponding to the characters in matchingString. +The translate will happen when any character in the string matching with the character +in the matchingString. +</p> +<p><code>split_string</code>: Splits string on regular expression. +Equivalent to <code>split</code> SQL function. +</p> +<p><code>repeat_string</code>: Repeats string n times. +Equivalent to <code>repeat</code> SQL function. +</p> + + +<h3>Note</h3> + +<p>ascii since 1.5.0 +</p> +<p>base64 since 1.5.0 +</p> +<p>decode since 1.6.0 +</p> +<p>encode since 1.6.0 +</p> +<p>initcap since 1.5.0 +</p> +<p>length since 1.5.0 +</p> +<p>lower since 1.4.0 +</p> +<p>ltrim since 1.5.0 +</p> +<p>ltrim(Column, character) since 2.3.0 +</p> +<p>rtrim since 1.5.0 +</p> +<p>rtrim(Column, character) since 2.3.0 +</p> +<p>soundex since 1.5.0 +</p> +<p>trim since 1.5.0 +</p> +<p>trim(Column, character) since 2.3.0 +</p> +<p>unbase64 since 1.5.0 +</p> +<p>upper since 1.4.0 +</p> +<p>levenshtein since 1.5.0 +</p> +<p>instr since 1.5.0 +</p> +<p>format_number since 1.5.0 +</p> +<p>concat_ws since 1.5.0 +</p> +<p>format_string since 1.5.0 +</p> +<p>locate since 1.5.0 +</p> +<p>lpad since 1.5.0 +</p> +<p>regexp_extract since 1.5.0 +</p> +<p>regexp_replace since 1.5.0 +</p> +<p>rpad since 1.5.0 +</p> +<p>substring_index since 1.5.0 +</p> +<p>translate since 1.5.0 +</p> +<p>split_string 2.3.0 +</p> +<p>repeat_string since 2.3.0 +</p> + + +<h3>Examples</h3> + +<pre><code class="r">## Not run: +##D # Dataframe used throughout this doc +##D df <- createDataFrame(as.data.frame(Titanic, stringsAsFactors = FALSE)) +## End(Not run) + +## Not run: +##D head(select(df, ascii(df$Class), ascii(df$Sex))) +## End(Not run) + +## Not run: +##D tmp <- mutate(df, s1 = encode(df$Class, "UTF-8")) +##D str(tmp) +##D tmp2 <- mutate(tmp, s2 = base64(tmp$s1), s3 = decode(tmp$s1, "UTF-8"), +##D s4 = soundex(tmp$Sex)) +##D head(tmp2) +##D head(select(tmp2, unbase64(tmp2$s2))) +## End(Not run) + +## Not run: +##D tmp <- mutate(df, sex_lower = lower(df$Sex), age_upper = upper(df$age), +##D sex_age = concat_ws(" ", lower(df$sex), lower(df$age))) +##D head(tmp) +##D tmp2 <- mutate(tmp, s1 = initcap(tmp$sex_lower), s2 = initcap(tmp$sex_age), +##D s3 = reverse(df$Sex)) +##D head(tmp2) +## End(Not run) + +## Not run: +##D tmp <- mutate(df, SexLpad = lpad(df$Sex, 6, " "), SexRpad = rpad(df$Sex, 7, " ")) +##D head(select(tmp, length(tmp$Sex), length(tmp$SexLpad), length(tmp$SexRpad))) +##D tmp2 <- mutate(tmp, SexLtrim = ltrim(tmp$SexLpad), SexRtrim = rtrim(tmp$SexRpad), +##D SexTrim = trim(tmp$SexLpad)) +##D head(select(tmp2, length(tmp2$Sex), length(tmp2$SexLtrim), +##D length(tmp2$SexRtrim), length(tmp2$SexTrim))) +##D +##D tmp <- mutate(df, SexLpad = lpad(df$Sex, 6, "xx"), SexRpad = rpad(df$Sex, 7, "xx")) +##D head(tmp) +## End(Not run) + +## Not run: +##D tmp <- mutate(df, d1 = levenshtein(df$Class, df$Sex), +##D d2 = levenshtein(df$Age, df$Sex), +##D d3 = levenshtein(df$Age, df$Age)) +##D head(tmp) +## End(Not run) + +## Not run: +##D tmp <- mutate(df, s1 = instr(df$Sex, "m"), s2 = instr(df$Sex, "M"), +##D s3 = locate("m", df$Sex), s4 = locate("m", df$Sex, pos = 4)) +##D head(tmp) +## End(Not run) + +## Not run: +##D tmp <- mutate(df, v1 = df$Freq/3) +##D head(select(tmp, format_number(tmp$v1, 0), format_number(tmp$v1, 2), +##D format_string("%4.2f %s", tmp$v1, tmp$Sex)), 10) +## End(Not run) + +## Not run: +##D # concatenate strings +##D tmp <- mutate(df, s1 = concat_ws("_", df$Class, df$Sex), +##D s2 = concat_ws("+", df$Class, df$Sex, df$Age, df$Survived)) +##D head(tmp) +## End(Not run) + +## Not run: +##D tmp <- mutate(df, s1 = regexp_extract(df$Class, "(\\d+)\\w+", 1), +##D s2 = regexp_extract(df$Sex, "^(\\w)\\w+", 1), +##D s3 = regexp_replace(df$Class, "\\D+", ""), +##D s4 = substring_index(df$Sex, "a", 1), +##D s5 = substring_index(df$Sex, "a", -1), +##D s6 = translate(df$Sex, "ale", ""), +##D s7 = translate(df$Sex, "a", "-")) +##D head(tmp) +## End(Not run) + +## Not run: +##D head(select(df, split_string(df$Sex, "a"))) +##D head(select(df, split_string(df$Class, "\\d"))) +##D # This is equivalent to the following SQL expression +##D head(selectExpr(df, "split(Class, '\\\\d')")) +## End(Not run) + +## Not run: +##D head(select(df, repeat_string(df$Class, 3))) +##D # This is equivalent to the following SQL expression +##D head(selectExpr(df, "repeat(Class, 3)")) +## End(Not run) +</code></pre> + + +<hr /><div style="text-align: center;">[Package <em>SparkR</em> version 2.4.0 <a href="00Index.html">Index</a>]</div> +</body></html> http://git-wip-us.apache.org/repos/asf/spark-website/blob/52917ac4/site/docs/2.4.0/api/R/column_window_functions.html ---------------------------------------------------------------------- diff --git a/site/docs/2.4.0/api/R/column_window_functions.html b/site/docs/2.4.0/api/R/column_window_functions.html new file mode 100644 index 0000000..0ae8cf4 --- /dev/null +++ b/site/docs/2.4.0/api/R/column_window_functions.html @@ -0,0 +1,187 @@ +<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><title>R: Window functions for Column operations</title> +<meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> +<link rel="stylesheet" type="text/css" href="R.css" /> + +<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/styles/github.min.css"> +<script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/highlight.min.js"></script> +<script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.3/languages/r.min.js"></script> +<script>hljs.initHighlightingOnLoad();</script> +</head><body> + +<table width="100%" summary="page for column_window_functions {SparkR}"><tr><td>column_window_functions {SparkR}</td><td style="text-align: right;">R Documentation</td></tr></table> + +<h2>Window functions for Column operations</h2> + +<h3>Description</h3> + +<p>Window functions defined for <code>Column</code>. +</p> + + +<h3>Usage</h3> + +<pre> +cume_dist(x = "missing") + +dense_rank(x = "missing") + +lag(x, ...) + +lead(x, offset, defaultValue = NULL) + +ntile(x) + +percent_rank(x = "missing") + +rank(x, ...) + +row_number(x = "missing") + +## S4 method for signature 'missing' +cume_dist() + +## S4 method for signature 'missing' +dense_rank() + +## S4 method for signature 'characterOrColumn' +lag(x, offset = 1, defaultValue = NULL) + +## S4 method for signature 'characterOrColumn,numeric' +lead(x, offset = 1, + defaultValue = NULL) + +## S4 method for signature 'numeric' +ntile(x) + +## S4 method for signature 'missing' +percent_rank() + +## S4 method for signature 'missing' +rank() + +## S4 method for signature 'ANY' +rank(x, ...) + +## S4 method for signature 'missing' +row_number() +</pre> + + +<h3>Arguments</h3> + +<table summary="R argblock"> +<tr valign="top"><td><code>x</code></td> +<td> +<p>In <code>lag</code> and <code>lead</code>, it is the column as a character string or a Column +to compute on. In <code>ntile</code>, it is the number of ntile groups.</p> +</td></tr> +<tr valign="top"><td><code>...</code></td> +<td> +<p>additional argument(s).</p> +</td></tr> +<tr valign="top"><td><code>offset</code></td> +<td> +<p>In <code>lag</code>, the number of rows back from the current row from which to obtain +a value. In <code>lead</code>, the number of rows after the current row from which to +obtain a value. If not specified, the default is 1.</p> +</td></tr> +<tr valign="top"><td><code>defaultValue</code></td> +<td> +<p>(optional) default to use when the offset row does not exist.</p> +</td></tr> +</table> + + +<h3>Details</h3> + +<p><code>cume_dist</code>: Returns the cumulative distribution of values within a window partition, +i.e. the fraction of rows that are below the current row: +(number of values before and including x) / (total number of rows in the partition). +This is equivalent to the <code>CUME_DIST</code> function in SQL. +The method should be used with no argument. +</p> +<p><code>dense_rank</code>: Returns the rank of rows within a window partition, without any gaps. +The difference between rank and dense_rank is that dense_rank leaves no gaps in ranking +sequence when there are ties. That is, if you were ranking a competition using dense_rank +and had three people tie for second place, you would say that all three were in second +place and that the next person came in third. Rank would give me sequential numbers, making +the person that came in third place (after the ties) would register as coming in fifth. +This is equivalent to the <code>DENSE_RANK</code> function in SQL. +The method should be used with no argument. +</p> +<p><code>lag</code>: Returns the value that is <code>offset</code> rows before the current row, and +<code>defaultValue</code> if there is less than <code>offset</code> rows before the current row. For example, +an <code>offset</code> of one will return the previous row at any given point in the window partition. +This is equivalent to the <code>LAG</code> function in SQL. +</p> +<p><code>lead</code>: Returns the value that is <code>offset</code> rows after the current row, and +<code>defaultValue</code> if there is less than <code>offset</code> rows after the current row. +For example, an <code>offset</code> of one will return the next row at any given point +in the window partition. +This is equivalent to the <code>LEAD</code> function in SQL. +</p> +<p><code>ntile</code>: Returns the ntile group id (from 1 to n inclusive) in an ordered window +partition. For example, if n is 4, the first quarter of the rows will get value 1, the second +quarter will get 2, the third quarter will get 3, and the last quarter will get 4. +This is equivalent to the <code>NTILE</code> function in SQL. +</p> +<p><code>percent_rank</code>: Returns the relative rank (i.e. percentile) of rows within a window +partition. +This is computed by: (rank of row in its partition - 1) / (number of rows in the partition - 1). +This is equivalent to the <code>PERCENT_RANK</code> function in SQL. +The method should be used with no argument. +</p> +<p><code>rank</code>: Returns the rank of rows within a window partition. +The difference between rank and dense_rank is that dense_rank leaves no gaps in ranking +sequence when there are ties. That is, if you were ranking a competition using dense_rank +and had three people tie for second place, you would say that all three were in second +place and that the next person came in third. Rank would give me sequential numbers, making +the person that came in third place (after the ties) would register as coming in fifth. +This is equivalent to the <code>RANK</code> function in SQL. +The method should be used with no argument. +</p> +<p><code>row_number</code>: Returns a sequential number starting at 1 within a window partition. +This is equivalent to the <code>ROW_NUMBER</code> function in SQL. +The method should be used with no argument. +</p> + + +<h3>Note</h3> + +<p>cume_dist since 1.6.0 +</p> +<p>dense_rank since 1.6.0 +</p> +<p>lag since 1.6.0 +</p> +<p>lead since 1.6.0 +</p> +<p>ntile since 1.6.0 +</p> +<p>percent_rank since 1.6.0 +</p> +<p>rank since 1.6.0 +</p> +<p>row_number since 1.6.0 +</p> + + +<h3>Examples</h3> + +<pre><code class="r">## Not run: +##D # Dataframe used throughout this doc +##D df <- createDataFrame(cbind(model = rownames(mtcars), mtcars)) +##D ws <- orderBy(windowPartitionBy("am"), "hp") +##D tmp <- mutate(df, dist = over(cume_dist(), ws), dense_rank = over(dense_rank(), ws), +##D lag = over(lag(df$mpg), ws), lead = over(lead(df$mpg, 1), ws), +##D percent_rank = over(percent_rank(), ws), +##D rank = over(rank(), ws), row_number = over(row_number(), ws)) +##D # Get ntile group id (1-4) for hp +##D tmp <- mutate(tmp, ntile = over(ntile(4), ws)) +##D head(tmp) +## End(Not run) +</code></pre> + + +<hr /><div style="text-align: center;">[Package <em>SparkR</em> version 2.4.0 <a href="00Index.html">Index</a>]</div> +</body></html> --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org