The JIT in relation to PHP extensions

A few days ago I posted about Playing with the PHP JIT and included some simple benchmarking with the react-php-redis server project, which involves a lot of parsing but is ultimately still bound by I/O even when running async.

I got some questions on Twitter that are around some misconcetions of what the JIT really an do for PHP applications and what it cannot do.

So to show what the JIT is good for, I wanted to have truly CPU bound problem that was realistic from my POV.

Inside Tideways we use a datatype called HDRHistogram (high dynamic rrange histogram), a statistical datatype to calculate exact percentiles in monitoring data. For each minute and server we might have a histogram and when rendering a chart, we merge and aggregate this data in large numbers.

At the moment we use a PHP Extension interfacing with a C library to use this datatype.

I have ported the necessary code to PHP to test this with the JIT, without the JIT and against the PHP extension.

<?php

function simulate_hdr() {
    $hdr = hdr_init(1, 1000, 1);
    for ($i = 1; $i < 1000; $i++) {
        for ($j = 0; $j < 1000; $j++) {
            hdr_record_value($hdr, $i);
        }
    }
    hdr_value_at_percentile($hdr, 95);
}


for ($i = 0; $i < 5; $i++) {
    $time = microtime(true);
    simulate_hdr();

    echo number_format(microtime(true) - $time, 4) . "\n";
}

Again take the numbers with a grain of salt, these are just here to show the approximate relationships:

Runs PHP nojit PHP jit C/PHP Ext
1 0.5916 0.3671 0.0775
2 0.6322 0.4038 0.0775
3 0.6025 0.3866 0.0799
4 0.6010 0.3892 0.0829
5 0.6137 0.3947 0.0828
Average 0.6082 0.3883 0,0801
% 100,00% 63,84% 13,17%

As you can see, the JIT code runs at roughly 2/3 (63,84%) of the original non-jitted code and gets into the region of twice as fast that the RFC claims for PHPs internal benchmark. The improvement is much better than with the react-php-redis server example from a few days ago, where the improvement was only in the 5-20% region.

But compared to implementing this code directly in C as a PHP extension, even the jitted code is still 5 times slower.

Yes, with the JIT there is a massive improvement of this CPU bound problem, but it doesn’t mean we can now re-implement all PHP extensions in pure PHP and rely on the JIT to make them perform.

What the JIT does improve:

  • Make the parts of CPU bound problems that are written in PHP (!) faster.

What the JIT does not improve:

  • It does not improve performance of already fast internal functions written in C, for example hashing, encryption functions.
  • It does not improve performance (by a lot) for I/O bound problems.

To close the gap between JIT and C, we could look at PHP 7.4 including the FFI extension. It allows interfacing with C code more easily from PHP. Anthony Ferrara is building his “php-compiler” project on top FFI that would allow compiling a subset of PHP code directly to an FFI C extension.

Playing with the PHP JIT

The PHP JIT RFC is a hot topic on the internals list right now and the voting has started for it to be included in PHP 8.0 and as experimental feature in 7.4.

I wanted to test it out myself, here are the steps necessary to get started on a Linux (Ubuntu) server (or desktop):

git clone https://github.com/php/php-src.git
cd php-src
git remote add zendtech https://github.com/zendtech/php-src.git
git checkout zendtech/jit-dynasm-7.4
./buildconf
./configure  --prefix=/opt/php/php-7.4 --enable-opcache --enable-opcache-jit --with-zlib --enable-zip --enable-json --enable-sockets --without-pear
make -j4
sudo make install

For testing I needed a more realistic problem that was bit more complex than PHPs internal benchmark (which the JIT doubles in speed).

Luckily I came across a good one at Symfony User Group in Cologne this week: The react-php-redis server by @another_clue re-implements redis server in PHP with almost zero PHP extension dependencies. That means the vanilla build from above with no dependencies is enough to get it running.

In addition it fully works with the redis-benchmark command that the original redis-server package includes, so it takes no effort to make some tests. The benchmark pegs the PHP redis server to 100% making it a good candidate for testing JIT.

The code is doing async I/O so the Redis protocol parsing and internal handling should play a significant role in this code that might be optimizable by the JIT.

git clone https://github.com/clue/php-redis-server.git
cd php-redis-server/
composer install

I ran it without the JIT:

/opt/php/php-7.4/bin/php bin/redis-server.php --port 6380

And with the JIT use these flags:

/opt/php/php-7.4/bin/php -dopcache.enable_cli=1 -dopcache.jit_buffer_size=50000000 -dopcache.jit=1235 bin/redis-server.php --port 6380

The -dopcache.jit=1235 only jits the HOT functions that are called often.

Then to check their performance in relation to each other, I ran redis-benchmark -p 6380 -q against the servers (from the redis-tools package).

Don’t take my numbers for gold (I ran them on my busy desktop machine), but you can see a 4-23% improvement depending on the benchmarked command.

Benchmark Nojit Jit % change
PING_INLINE 30674.85 31877.59 3.92%
PING_BULK 87873.46 95969.28 9.21%
SET 81766.15 87336.24 6.81%
GET 81433.22 91575.09 12.45%
INCR 77881.62 83682.01 7.45%
LPUSH 71275.84 79617.83 11.70%
RPUSH 67294.75 79239.3 17.75%
LPOP 73529.41 84530.86 14.96%
RPOP 76103.5 80450.52 5.71%
SADD 84745.77 89686.1 5.83%
HSET 82712.98 91074.68 10.11%
SPOP 87260.03 99700.9 14.26%
LPUSH (needed to benchmark LRANGE) 68493.15 83822.3 22.38%
LRANGE_100 (first 100 elements) 21743.86 26759.43 23.07%
LRANGE_300 (first 300 elements) 9825.11 11923.21 21.35%
LRANGE_500 (first 450 elements) 6819.42 8272.67 21.31%
LRANGE_600 (first 600 elements) 5120.33 5707.11 11.46%
MSET (10 keys) 45998.16 52631.58 14.42%

Looking at the results in a system profiler (perf) I can see that the PHP process is spending a lot of time in I/O functions, so these numbers are not showing the full potential of the JIT with CPU bound code.

Integrate Ansible Vault with 1Password Commandline

We are using Ansible to provision and deploy Tideways in development and production and the Ansible Vault feature to unlock secrets on production. Since we recently introduced 1Password I integrated them both and unlock the Ansible Vault using 1Password.

This way we can centrally change the Ansible Vault password regularly, without any of the developers with access to production/deployment needing to know the actual password.

To make this integration work, you can setup 1Password CLI to query your 1Password vault for secrets after logging in with password and two factor token.

Then you only need a bash script to act as an executable Ansible Vault password file.

First, download and install the 1Password CLI according to their documentation.

Next, you need to login with your 1Password account explicitly passing email, domain and secret key, so that the CLI can store this information in a configuration file.

$ op signin example.1password.com me@example.com
Enter the Secret Key for me@example.com at example.1password.com: A3-**********************************
Enter the password for me@example.com at example.1password.com:
Enter your six-digit authentication code: ******

After this one-time step, you can login more easily by just specifiying op signin example, so I create an alias for this in ~.bash_aliases (I am on Ubuntu).

alias op-signin='eval $(op signin example)'
alias op-logout='op signout && unset OP_SESSION_example'

The eval line makes sure that an environment variable OP_SESSION_example is set for this terminal/shell only with temporary access to your 1Password vault in subsequent calls to the op command. You can use op-logout alias to invalidate this session and logout.

Then I create the bash script in /usr/local/bin/op-vault that is used as Ansible Vault Password File. It needs to fetches the secret and print it to the screen.

#!/bin/bash
VAULT_ID="1234"
VAULT_ANSIBLE_NAME="Ansible Vault"
op get item --vault=$VAULT_ID "$VAULT_ANSIBLE_NAME" |jq '.details.fields[] | select(.designation=="password").value' | tr -d '"'

This one liner uses the command jq to slice the JSON output to print only the password. The tr command trims the double quotes around the password.

Make sure to configure the VAULT_ID and VAULT_ANSIBLE_NAME variables to point to the ID of your vault where the secret is stored in, and its name in the list. To get the UUIDs of all the vaults type op list vaults in your CLI.

Afterwards you can unlock your Ansible Vault with 1Password by calling:

ansible-playbook --vault-password-file=/usr/local/bin/op-vault -i inventory your_playbook.yml

This now only works in the current terminal/shell, when you called op-signin before to enter password and 2 factor token.

More about: Deployment / DevOps / Ansible / Automation

Unslacking Tideways Company

We have moved away from Slack at Tideways over the last three months, because I found Slack is already annoying, even with just a four person team (plus the occasional freelancer). For me, it disrupts deep work phases and knowledge lost in the depth of chat history.

As an engineer, I have learned to be productive when I have a quiet space and can tinker on a problem without getting interrupted. Slack makes this very difficult and at least for me, is a primary cause of anxiety and fear of missing out (FOMO).

While at first sight it seems chat is asynchronous it really is not.

  1. If you wait for a long time to reply to a message on chat, then the discussion thread is already spread in the history of the chat room and messages back and forth interleaved with many additional messages that are not related is not helpful.
  2. In addition chats online status indicators exist just so that you know if someone can answer any question directly, increasing communication anxiety.
  3. If you set yourself into some kind of do not disturb mode or mute a channel, then its easy to miss important conversations and get left out of discussions or decisions. Reading up on long conversations you have missed is hard with chat tools.
  4. By also sending Github commit messages, OpsGenie and Tideways alerts, excerpts of HelpScout ticket updates and other “events” of the business into Slack through various integrations we made Slack the primary tool to check if anything is going on. While it is important to see what is going on, this almost never has to happen in realtime and by connecting it to the chat, we excuberated the previous points and everyone is checking chat even more frequently.

Jason Fried summed it up much better than I could in this blog post.

We are now using Github (issues and pull requests) and Basecamp (Messages, Todolists) to replace Slack. Both tools allow us to have context sensitive, asynchronous discussions on specific topics.

Since work never happens in a vacuum, I would be happy to have only a single realtime notification tool (OpsGenie/PagerDuty) that sends notifications to poeple currently on-call, and only about problems that require realtime attention. Everything else, including chat, can be part of a daily summary e-mail or screen.

My ultimate goal is to get longer stretches of uninterrupted time to work on features, customer support or operational issues. Under time pressure the last 3 years I realized that productive and concentrated tinkering on projects is my number one driver of happiness at work. I consider this a primary value of my company, and it takes work and distance to the current status quo to make it happen.

If you want to follow up on this ideas on your own, I can recommend the books Deep Work by Cal Newport, The Entrepreneur’s Guide to Keeping Your Sh*t Together by Sherry Walling and It doesn’t have to be crazy at work by Jason Fried and David Heinemeier Hansson.

More about: Bootstrapping / DeepWork

How we paid the hidden costs of bootstrapping Tideways

Justin Jackson started an extremely interesting discussion about the Bootstrappers paradox and the hidden costs of bootstrapping with replies from many others.

In this series of posts (there are more) he is discussing how hard bootstrapping a SaaS business is in the first few years, when you are essentially investing a lot of personal time and money until you can finally get a return on this investment and pay yourself a decent salary.

All bootstrappers have to deal with the same tension: can I get this to
scale, while paying my bills, without burning out?

This is something I am thinking about a lot to understand the journey and history of Tideways and by writing about my thoughts I can share my unconventional route for “bootstrapping” a SaaS.

In his post Jason projects the growth to sustainable $ 20.000 Monthly Reccurring Revenue (MRR) to take 5 years, which is much too long for my taste. We projected to reach 20.000€ MRR (~ $23.000) after three years, and missing this target would probably have meant a re-evaluation, maybe even shutdown or sale of Tideways.

We reached this goal this summer, after almost exactly three years.

What went into funding Tideways instead of money?

  • I was able to invest 50+ hours/week and live on 50-75% of my previous salary for two years only because I don’t have credits to pay; my wife and I don’t have kids, no cars and no other large monthly expenses. We saved up a lot of money that we partially invested into the company and partially used for personal expenses during the first two years. Trading your own personal time for bootstrapping a business is probably the largest, most hidden cost and I for myself am pretty sure that I am never going to do this again.
  • My then employer Qafoo allowed me to work on Tideways a few days per month and invested a lot of their own time after we decided to found a company together and test product market fit. In return they now own shares of the business.
  • We struck a deal with our first business partner SysEleven to trade hosting for Tideways in return for highly discounted licenses for almost all their customers. This helped us with the large four-digit monthly hosting costs that we had from day one, because as a monitoring company you need more servers than other SaaS, In return for investing in us, they now have us as a happy paying customer and they still get a good deal on licenses for their customers.
  • With over 7000 Twitter followers from my Doctrine open source project days and my reach within the European and German PHP community I accidentily already built a large audience that would be potential customers for Tideways. We didn’t need to invest more time into building an audience and we luckily never had to pay for customer acquisition. However realistically I invested hundreds of unpaid hours into my open source work since 2009 that allowed me to build this audience.

I consider points 2 & 3 “unconventional” ways for bootstrapping Tideways and already having an audience (4) as a very long term investment that I didn’t plan and got lucky with.

Roughly summing up these investments now, they amount to costs of around 400.000-500.000€ that Tideways would have required funding for.

Maybe raising 500.000€ would be possible in hindsight, but without previous founding experience and no connections to investors, I don’t know if I would have succeeded.

I consider our approach similar to Fundstrapping but instead of money you raise freedom, time and resources from your investors.

Most specifically my advice is to find strategic early business parterns that either become your first large customer from day one, or make your product available to their large customer base.

More about: Bootstrapping