Parse RNA into codons
39 38 32 30 bytes
The trailing linefeed is significant.
Output as a linefeed-separated list.
Try it online.
This is match stage which turns the input into a linefeed-separated list of all matches (due to the
!). The regex itself matches every codon starting from the first
AUG. We achieve this with two separate options.
AUG matches unconditionally, so that it can start the list of matches. The second match can be any codon (
... matches any three characters), but the
\G is a special anchor which ensures that this can only match right after another match. The only problem is that
\G also matches at the beginning of the string, which we don't want. Since the input consists only of word characters, we use
\B (any position that is not a word boundary) to ensure that this match is not used at the beginning of the input.
This finds the first stop codon, matched as
U(AA|AG|GA) as well as everything after it and removes it from the string. Since the first stage split the codons into separate lines, we know that this match is properly aligned with the start codon. We use
\D (non-digits) to match any character, since
. wouldn't go past the linefeeds, and the input won't contain digits.
115 112 bytes
import Data.Lists fst.break(\e->elem e["UAA","UAG","UGA"]||length e<3).chunksOf 3.snd.spanList((/="AUG").take 3)
*Main> ( fst.break(\e->elem e["UAA","UAG","UGA"]||length e<3).chunksOf 3.snd.spanList((/="AUG").take 3) ) "AUGCUUAUGAAUGGCAUGUACUAAUAGACUCACUUAAGCGGUGAUGAA" ["AUG","CUU","AUG","AAU","GGC","AUG","UAC"]
How it works:
spanList((/="AUG").take 3) -- split input at the first "AUG" snd -- take 2nd part ("AUG" + rest) chunksOf 3 -- split into 3 element lists fst.break(\e-> -- take elements from this list elem e["UAA","UAG","UGA"]|| -- as long as we don't see end codons length e<3) -- or run out of full codons