How can I find a subsequence in a &[u8] slice?

I don't think the standard library contains a function for this. Some libcs have memmem, but at the moment the libc crate does not wrap this. You can use the twoway crate however. rust-bio implements some pattern matching algorithms, too. All of those should be faster than using haystack.windows(..).position(..)


I found the memmem crate useful for this task:

use memmem::{Searcher, TwoWaySearcher};

let search = TwoWaySearcher::new("dog".as_bytes());
assert_eq!(
    search.search_in("The quick brown fox jumped over the lazy dog.".as_bytes()),
    Some(41)
);

How about Regex on bytes? That looks very powerful. See this Rust playground demo.

extern crate regex;

use regex::bytes::Regex;

fn main() {
    //see https://doc.rust-lang.org/regex/regex/bytes/

    let re = Regex::new(r"say [^,]*").unwrap();

    let text = b"say foo, say bar, say baz";

    // Extract all of the strings without the null terminator from each match.
    // The unwrap is OK here since a match requires the `cstr` capture to match.
    let cstrs: Vec<usize> =
        re.captures_iter(text)
          .map(|c| c.get(0).unwrap().start())
          .collect();

    assert_eq!(cstrs, vec![0, 9, 18]);
}

Here's a simple implementation based on the windows iterator.

fn find_subsequence(haystack: &[u8], needle: &[u8]) -> Option<usize> {
    haystack.windows(needle.len()).position(|window| window == needle)
}

fn main() {
    assert_eq!(find_subsequence(b"qwertyuiop", b"tyu"), Some(4));
    assert_eq!(find_subsequence(b"qwertyuiop", b"asd"), None);
}

The find_subsequence function can also be made generic:

fn find_subsequence<T>(haystack: &[T], needle: &[T]) -> Option<usize>
    where for<'a> &'a [T]: PartialEq
{
    haystack.windows(needle.len()).position(|window| window == needle)
}

Tags:

Rust