Split strings in R with strsplit() (2024)

HOME DATA MANIPULATION IN R STRSPLIT R

Data Manipulation in R String manipulation

Split strings in R with strsplit() (1)

The strsplit function creates substrings of a string based on a given separator. In this tutorial you will learn how to use this function in several use cases.

Syntax of strsplit

The strsplit function takes a string or character vector and a delimiter or separator as input. The basic syntax of the function is the following:

# x: character vector# split: delimiter used for splitting# fixed: if TRUE, matches 'split' as is. If FALSE (default) 'split' is considered a regular expressionstrsplit(x, split, fixed = FALSE)

The output type is a list of the length of x and each element of the list will contain the substrings resulting from the split.

Splitting a string by a delimiter

The strsplit function splits strings into substrings based on a delimiter. For instance, given a string you can split it by spaces passing the string as input and a white space as delimiter (" ").

strsplit("This is a string", split = " ")
[[1]][1] "This" "is" "a" "string"

Any character or string can be used as separator and the function will use it to split the input data into substrings.

strsplit("This&is a string", split = "&")
[[1]][1] "This" "is a string"

Notice that the output is a list, so in order to convert it into a vector you will need to unlist it accessing the corresponding element or by using the unlist function.

strsplit("This is a string", split = " ")[[1]]# Equivalent to:unlist(strsplit("This is a string", split = " "))
[1] "This" "is" "a" "string"

Now you will be able to access each substring. In the following examples we access the first, second and last element of the splitted string.

# Get the first elementstrsplit("This is a string", split = " ")[[1]][1]# Get the second elementstrsplit("This is a string", split = " ")[[1]][2]# Get the last elementsplitted_string <- strsplit("This is a string", split = " ")[[1]]splitted_string[length(splitted_string)]
[1] "This" [1] "is" [1] "string"

The input of the function can also be a character vector. In this scenario the output will be a list with as many elements as the length of the input and each element will contain the splitted strings based on the separator.

strsplit(c("This is a string", "This is other string"), split = " ")
[[1]][1] "This" "is" "a" "string"[[2]][1] "This" "is" "other" "string"

Multiple delimiters

The strsplit function can take multiple separators if the length of x is greater than one. In the example below we set an empty space as delimiter for the first string and a slash as delimiter of the second. Note that if the length of the input character vector is greater than the length of the delimiters, the delimiters will be recycled along x.

strsplit(c("This is a string", "This is/other string"), split = c(" ", "/"))
[[1]][1] "This" "is" "a" "string"[[2]][1] "This is" "other string"
strsplit("String-with/different&separators", split = "-|/|&")
[[1]][1] "String" "with" "different" "separators"

Splitting a date

A common use case of strsplit is to split a column of a data frame containing dates into three other columns with the corresponding year, month and day. For this purpose you will need to split the dates with "-" or the corresponding separator, bind the rows of the output with rbind and do.call and bind by columns the result into the original data frame.

# Sample data frame with datesdf <- data.frame(date = as.Date(Sys.Date():(Sys.Date() + 5)))# Split the dates with "-"splitted_dates <- strsplit(as.character(df$date), split = "-")# Bind the splitted dates by row and add them to the data framedf <- cbind(df, do.call(rbind, splitted_dates))# Change column namescolnames(df) <- c("date", "year", "month", "day")df
 date year month day1 2023-11-19 2023 11 192 2023-11-20 2023 11 203 2023-11-21 2023 11 214 2023-11-22 2023 11 225 2023-11-23 2023 11 236 2023-11-24 2023 11 24

Using regular expressions (regex) to split character vectors

The split argument of the function can take regular expressions as input. Considering that you want to use any number as delimiter you could use "[0-9]" as delimiter.

strsplit("A1B2C3D4", split = "[0-9]")
[[1]][1] "A" "B" "C" "D"

Keep in mind that if you set fixed = TRUE the function will interpret the delimiter as is, so that if, for example, you want to split a string by periods you can set this argument to TRUE or scape it with "\\.".

strsplit("String.with.periods", split = ".", fixed = TRUE)# Equivalent to:# strsplit("String.with.periods", split = "\\.")
[[1]][1] "String" "with" "periods"

If you want to split the string but keep the delimiter you can use the following delimiter: "(?<=[DELIMITERS])" and set perl = TRUE. For instance if you want to use "-" as delimiter and keep it you can type the following:

strsplit("a-b-c", split = "(?<=[-])", perl = TRUE)
[[1]][1] "a-" "b-" "c"

The opposite of strsplit in R is the paste function. You will need to unlist the output and paste it again with the same delimiter used for splitting.

# Split string by white spacessplitted <- strsplit("A B C", split = " ")# [[1]]# [1] "A" "B" "C"# Unsplitpaste(unlist(splitted), collapse = " ")# [1] "A B C"
Split strings in R with strsplit() (2024)

FAQs

What does strsplit() do in R? ›

As its name suggests, strsplit() splits a string up into substrings using user-defined rules. In a simple case, strsplit() splits a string every time a particular character or substring is present. The function can also use regular expressions to define more complex rules for splitting strings.

How to split strings in R? ›

Uses the strsplit() function to split the text. This function splits the elements of a text (character) vector x into substrings according to the matches to substring split within them, in this example, a comma. Resets the length of each vector so they are all equal.

What is the difference between Str_split and Strsplit? ›

These two functions return a character vector: str_split_1() takes a single string and splits it into pieces, returning a single character vector. str_split_i() splits each string in a character vector into pieces and extracts the i th value, returning a character vector.

Which package is required for splitting strings in R? ›

The 'strsplit' function in R is a versatile tool designed for splitting strings into vector elements based on specified delimiters. Understanding how to effectively utilize 'strsplit' is fundamental for data preprocessing, analysis, and manipulation tasks in R.

How to split string into columns in R? ›

To split a column into multiple columns in the R Language, We use the str_split_fixed() function of the stringr package library. The str_split_fixed() function splits up a string into a fixed number of pieces.

How to split data set in R? ›

Divide the Data into Groups in R Programming – split() function
  1. Parameters:
  2. x: represents data vector or data frame.
  3. f: represents factor to divide the data.
  4. drop: represents logical value which indicates if levels that do not occur should be dropped.
Jun 30, 2020

How do I split a string into multiple strings? ›

Split String By Multiple Delimiters using Split Function

Here, we iterate through each delimiter and split the string using the split() function. After splitting, we join the resulting list with spaces using the join() function and we split the modified string based on whitespace to obtain the desired list of strings.

How do you separate two strings? ›

The easiest way to split a string in Java is to use the String. split() method. In the example above, we use : as the delimiter, but note that the parameter the split() method takes is a regular expression.

How does string split () work? ›

Split is used to break a delimited string into substrings. You can use either a character array or a string array to specify zero or more delimiting characters or strings. If no delimiting characters are specified, the string is split at white-space characters.

Does str split return a list? ›

The string manipulation function in Python used to break down a bigger string into several smaller strings is called the split() function in Python. The split() function returns the strings as a list.

How to use substring in R? ›

A: To extract part of a string using substring , specify the string variable, followed by the start and end positions of the part you wish to extract. For example, substring(myString, 1, 5) extracts the first 5 characters of myString .

What is the difference between Regexp_extract and Regexp_substr? ›

Syntax Alternatives: REGEXP_EXTRACT is a synonym for REGEXP_SUBSTR.

What does split() do in R? ›

Description. split divides the data in the vector x into the groups defined by f . The replacement forms replace values corresponding to such a division. unsplit reverses the effect of split .

How do I extract part of a string in R? ›

The str_sub() function in stringr extracts parts of strings based on their location. As with all stringr functions, the first argument, string , is a vector of strings. The arguments start and end specify the boundaries of the piece to extract in characters.

Does string split include delimiter? ›

The method split() splits a String into multiple Strings given the delimiter that separates them. The returned object is an array which contains the split Strings. We can also pass a limit to the number of elements in the returned array.

What does strsplit return? ›

Return Value

If String is a scalar string or 1-element string array, then STRSPLIT returns an array containing either the positions of the substrings or the substrings themselves (if the EXTRACT keyword is specified).

What does substring in R do? ›

A: The substring function in R is used for extracting or replacing parts of a string with specified start and end positions. It's a fundamental tool in string manipulation within R programming.

What is the function of the string in R? ›

How to use R string functions and operations
  • substr(): Extracts a portion of a string.
  • paste(): Concatenates (joins) strings.
  • tolower() / toupper(): Converts all of the letters in a string to lowercase letters or uppercase letters.
  • strsplit(): Splits a string at a specified point.
Oct 11, 2023

How to split a string by comma in R? ›

You can first use stringr::str_split to split the strings and then use tidyr::unnest to expand the list into separate rows. This code first splits the Skills column into a list of individual skills and then uses unnest to create separate rows for each skill.

Top Articles
Latest Posts
Article information

Author: Rev. Leonie Wyman

Last Updated:

Views: 6633

Rating: 4.9 / 5 (79 voted)

Reviews: 94% of readers found this page helpful

Author information

Name: Rev. Leonie Wyman

Birthday: 1993-07-01

Address: Suite 763 6272 Lang Bypass, New Xochitlport, VT 72704-3308

Phone: +22014484519944

Job: Banking Officer

Hobby: Sailing, Gaming, Basketball, Calligraphy, Mycology, Astronomy, Juggling

Introduction: My name is Rev. Leonie Wyman, I am a colorful, tasty, splendid, fair, witty, gorgeous, splendid person who loves writing and wants to share my knowledge and understanding with you.