You had me at hello

C (gcc), 81 80 76 75 72 71 70 69 bytes

main(n,c){while(~(c=getchar())&n-0xb33def<<7)n=n<<5^putchar(c)/96*c;}

Try it online!

How it works

This is a full program. We define a function f for our purposes. To save bytes, it is declared with two argument that default to int. This is undefined behavior, but in practice, n will be initialized as 1 when running the program without additional arguments, c will hold the lower 32 bits of the pointer to the argument vector

While the condition

~(c=getchar())&n-0xb33def<<7

holds, we'll execute the while loop's body:

n=n<<5^putchar(c)/96*c

To fully understand the condition, we must first examine the body. For now, all we observe is that c=getchar() reads a single byte from STDIN (if possible) and stores it in the variable c.

The byte sequence hello looks as follows in different representations.

char     decimal     binary (8 bits)
'h'      104         0 1 1 0 1 0 0 0
'e'      101         0 1 1 0 0 1 0 1
'l'      108         0 1 1 0 1 1 0 0
'l'      108         0 1 1 0 1 1 0 0
'o'      111         0 1 1 0 1 1 1 1

All of these fall in the range [96, 192), so c/96 will evaluate to 1 for each of these bytes, and to 0 for all remaining ASCII characters. This way, putchar(c)/96*c (putchar prints and returns its argument) will evaluate to c if c is `, a lowercase letter, one of {|}~, or the DEL character; for all other ASCII characters, it will evaluate to 0.

n is updated by shifting it five bits to the left, then XORing the result with the result from the previous paragraph. Since an int is 32 bits wide (or so we assume in this answer), some of the shifted bits might "fall off the left" (signed integer overflow is undefined behavior, but gcc behaves as the x64 instruction it generates here). Starting with an unknown value of n, after updating it for all characters of hello, we get the following result.

 n  ?????????????????????????|???????
'h'                          |    01101000
'e'                          |         01100101
'l'                          |              01101100
'l'                          |                   01101100
'o'                          |                        01101111
-----------------------------+--------------------------------
    <------ discarded ------>|???????0101100110011110111101111

Note that the lower 25 bits form the integer 0xb33def, which is the magic constant in the condition. While there is some overlap between the bits of two adjacent bytes, mapping bytes below 96 to 0 makes sure that there aren't any false positives.

The condition consists of two parts:

~(getchar()) takes the bitwise NOT of the result of reading (or attempting to read) a byte from STDIN.

If getchar succeeds, it will return the value of the read byte as an int. Since the input consists entirely of ASCII characters, the read byte can only have its lower 7 bits set, so the bitwise NOT will have its highest 25 bits set in this case.

If getchar fails (no more input), it will return -1 and the bitwise NOT will be 0.
n-0xb33def<<7 subtracts the magic constant from before from n, then shifts the result 7 units to the left.

If the last 5 read bytes were hello, the lowest 25 bits of n will be equal to 0xb33def and the subtraction will zero them out. Shifting the difference will yield 0 as the 7 highest bits will "fall off the left".

On the other hand, if the last 5 read bytes were not hello, one of the lowest 25 bits of the difference will be set; after shifting, one of the highest 25 bits will be.

Finally, if getchar was successful and we didn't print hello yet, the bitwise AND, all of the highest 25 bits of the left operand and at least one of the highest 25 bits of the right one will be set. This way, & will yield a non-zero integer and the loop continues.

On the other hand, if the input is exhausted or we have printed hello already, one of the bitwise AND's operand will be zero, and so will the result. In this case, we break out of the loop and the program terminates.

Bash, 74 75 103 99 88 82 76 bytes

-10 bytes thanks to @DigitalTrauma!
-11 bytes thanks to @manatwork!
-6 bytes thanks to @Dennis!

IFS=
b=ppcg
while [ ${b/hello} ];do
read -rN1 a
b=${b: -4}$a
echo -n $a
done

Explanation:

IFS=    # making sure we can read whitespace properly
b=ppcg  # set the variable b to some arbitrary 4 letter string

while [ ${b/hello} ]; do  # while the variable b doesn't contain "hello", do the following
    read -rN1 a           # get input
    b=${b: -4}$a          # set b to its last 4 chars + the inputted char
    echo -n $a            # output the inputted char
done

Try it online!

Labyrinth, 43 41 bytes

Thanks to Sp3000 for saving 2 bytes.

<_%-742302873844_::%*:*:420#+.:%):,*652_>

Try it online!

Explanation

The basic idea is to encode the last five characters in base 256 in a single integer. When a new character comes in, we can "append" it by multiplying the integer by 256 and adding the new code point. If we want to look only at the last 5 characters, we take the value modulo 256⁵ = 2⁴⁰ = 1099511627776. Then we can simply check whether this value is equal to 448378203247, which is what we get when we treat the code points of hello as base-256 digits.

As for the code... <...> is a bit of a Labyrinth idiom. It allows you to write an infinite loop without any conditional control flow on a single line, saving a lot of bytes on spaces and linefeeds. The main condition for this to work is that there are two disposable values on top of the stack when we reach the < (we normally use 0s for that, but the actual value is arbitrary).

Of course, the program does need some conditional logic to figure out when to terminate. But conditionally ending the program is possible by dividing by a value which is zero when we want the program to end. The <...> construct works by shifting the entire row left (cyclically) when the IP is at the left end, and then immediately shifting it back into position. This means that the code is actually executed right to left. Let's reverse it:

_256*,:)%:.+#024:*:*%::_448378203247-%_

This is one iteration of the loop which reads a character, terminates if we've reached EOF, prints the character, adds it to our encoding, truncates that to 5 characters, checks for equality with hello and repeats. Here is how that works in detail (remember that Labyrinth is stack-based):

_256*            Multiply the encoding by 256 in preparation for the next iteration.
,                Read one byte from STDIN.
:)%              Duplicate, increment, modulo. If we hit EOF, then , returns
                 -1, so incrementing and modulo terminates the program due to
                 the attempted division by zero. However, if we did read a
                 character, we've just compute n % (n+1), which is always n itself.
:.               Print a copy of the character we just read.
+                Add it to our encoding (we'll make sure to multiply the
                 encoding by 256 at the end of the iteration, so there's room
                 for our new character).
#024             Push 1024, using the stack depth to push the initial 1.
:*:*             Square it twice. That gives 2^40.
%                Take the encoding modulo 2^40 to truncate it to the last 5
                 characters.
::               Make two copies of the encoding.
_448378203247    Push the value that corresponds to "hello".
-                Subtract it from the encoding, giving zero iff the last 5
                 characters were "hello".
%                Take the other copy of the encoding modulo this value, again
                 terminating if we've reached "hello".
                 The actual value of this modulo - if it didn't terminate the
                 the program - is junk, but we don't really care, we just need
                 any disposable value here for the <...>
_                We push a zero as the second disposable value.

You had me at hello

C (gcc), 81 80 76 75 72 71 70 69 bytes

How it works

Bash, 74 75 103 99 88 82 76 bytes

Labyrinth, 43 41 bytes

Explanation

Tags:

String

Code Golf

Related

Recent Posts