Clean Code and Object Calisthenics Rules I try to Follow

I do a lot of Code Reviews, and without proper automation of most of the low level items that you are usually “remarking” to colleagues it is a frustrating experience for everyone involved and takes more time that needed.

My natural tendency is towards action over planning, so I tend to find arguments over coding style are a waste of time if sufficient standards exist that you can just adopt. In my opinion this is is something team leads and managers need to rule on to avoid that their teams become unproductive in little coding-standard and style discussions.

Choosing a set of strict rules is very important and enforcing them helps reducing time spent on unimportant details during code review and allows more junior developers to prepare their changes in a way that most feedback can be automated.

Coding Styles alone are not enough in most cases. There should also be rules with respect to what type of language constructs and building blocks are preferred over others.

Equipped with a fixed ruleset, you can automate the low-level parts of code review in such a way that every developer can perform it themselves and fix violations before you even get to see the code in the first place.

In this post I give an (probably incomplete) overview of rules and patterns we follow for Tideways code base.

A strict coding style

The first thing I put in place when Tideways hired its first engineer was PHP Code Sniffer with a slightly adopted Doctrine Standard (PSR-2 + a lot more), coupled with static analysis using Psalm. Combined with Github Checks Run API and Scotts diff filter tool all the violations are shown directly within Github commits and Pull Requests.

../../../_images/github-check1.png

Using the phpcbf command and a lot of manual work we are down to zero violations. If you start using a standard from the beginning this won’t happen, but strict enforcement of coding standard is not something I was used to before.

Violation check runs on Github include a message with a command to copy paste and auto-fix violations if available for PHP Code Sniffer, ESLint and TSLint.

We also have PHPStorm configurations to integrate each tool into the IDE.

In my opinion PSR-2 is a good first step, but it leaves a lot of details open that the Doctrine Coding Standard locks further down. A lot those rules are very strict and we had to disable them on our code base. If you are interested our phpcs rule definitions are in this Gist.

Besides “simple” code style checks, we have some clean code guidelines that we don’t check with phpcs, because they are not black and white but require some deliberation.

For Javascript we enforce the strict defaults of ESLint/TSLint and for Go the default coding standard that go fmt produces.

A static code analyzer, but not at highest level

We use Psalm for static code analysis. It automatically tries to find out all types of all variables in your application and then checks hundrets of different rules that might indicate bugs or improve the design of your application.

While type safety is awesome, without generics forcing Psalm or PHP-Stan at the highest level leads to a lot of clutter and boilerplate code that is only necessary to make the tool happy.

This is why I recommend to only fail the build on the obvious rules and leave the other ones as Warnings for the developer visible from the IDE.

We use a lenient psalm.xml and the Baseline feature to get down to zero errors. Every error fails the build.

Take very good care of API Information Hiding and Encapsulation

A lot of accidental complexity in code comes from exposing implementation details to the outer layers of your application. This is mostly due to function arguments or return values that expose the implementation details.

This concept is called Information Hiding.

When adding arguments or return values think about the client calling the function:

  • Do they only get the information required to work, or does it return additional information (extra array keys, objects) that are unncessary?
  • Does a function argument expose the internal implementation of a datastructure? Can you turn it into a simpler input that is converted to the desired variable internally?

This is extremely difficult skill to master and takes a lot of practice.

To New or Not to New?

One of the most influential blog posts on me has been To new or not to new by Misko Hevery.

It clearly defines which types of objects can be constructed with new in place (data objects, entities, value objects) and which ones should also be injected and centrally constructed by a factory object or a dependency injection container.

Separate Statements and Declartions by empty lines

Blocks of statements and blocks of declarations usually get separated by an empty line, to make a very visual distinction. Also statements with a different context tend to be split by empty lines as well. See this example from one our controllers, where each statement and declaration is in its own paragraph.

<?php

 public function deleteAction(PageContext $context, $serverName)
 {
     $server = $this->findServer($serverName, $context);

     $this->serverRepository->remove($server);

     $this->jobQueue->queue(new BalanceServerTask(['applicationId' => $context->application->getId()]));

     return $context->redirectRoute('XhprofProfilerBundle.Server.environment');
 }

I would define paragraphs are statements or declarations that belong to each other. This concept closely follows paragraphs in natural languages and is easy to grasp and follow.

Use early exits and don’t use else

Code almost never needs the else keyword. It can be written in a way to use early exits with return or continue instead. This following code could have been written with an else. This rule keeps the main/happy path of the code on the main level of the method.

<?php

 public function formatQueryString(string $query): string
 {
     if (strlen($query) === 0) {
         return '';
     }

     parse_str($query, $queryArgs);

     $queryArgs = $this->truncateArrayValues($queryArgs);

     return json_encode($queryArgs, JSON_PRETTY_PRINT);
 }

For loops we continue to the next iteration for early exits:

<?php

 private function parseErrorTrace(string $trace, $removeArguments = false)
 {
     $traceResult = [];

     $parts = explode("\n", $trace);
     foreach ($parts as $line) {
         if (strpos($line, '{main}') !== false) {
             continue;
         }

         if (strpos($line, '#') !== 0) {
             continue;
         }

         // more
     }
 }

Only one level of indentation per method and a second for early exits

Another rule I try to follow is “only one level of indentation per method” from the Object Calisthenics rulebook.

Guilherme from the Doctrine team introduced me to Object Calisthenics many years ago and I found it a very good guidestick to improve my code without having to think too much.

The deleteAction in the controller example above inhibits this rule, having no additional level of indentation. This second example shows a controller with one level of indentation, using the “Guard clause” pattern instead of nested conditions (See the refactoring)

<?php

 public function process(Task $task)
 {
     $application = $this->organizationRepository->findApplication($task->applicationId);

     if (!$application) {
         return;
     }

     $retentionDays = $application->getRetentionDays();
     $daysDiff = (new \DateTime('now'))->diff($task->date)->days;

     if ($daysDiff > $retentionDays) {
         return;
     }

     $this->historicalDataAction->compute($task->applicationId, $task->date, $task->overwrite);
 }

When methods use loops two levels of indention are ok, if the second level is used only for guard clauses.

<?php
 $lastProductionCommunication = null;

 foreach ($servers as $server) {
     if ($server->getEnvironment() !== 'production') {
         continue;
     }

     if (!$server->getLastCommunication()) {
         continue;
     }

     if ($lastProductionCommunication !== null && $lastProductionCommunication > $date)
         continue;
     }

     $lastProductionCommunication = $date;
 }

It is not easy to follow this rule to the dot, and we do have a lot of code in Tideways that doesn’t follow this rule as we don’t enforce it. But I view the violation of the one level rule as an automtaic, immediate signal of both complexity and potential for refactoring.

Don’t abbreviate names (too much)

This rule is easier in PHP than in Go (by design). As we use both languages, we force ourselves to write out what a variable, class, constant or function is doing, and avoid abbreviations.

func (previous PreviousSpans) GetLastIndex(category, summary string) int {
    if entry, ok := previous[category]; ok {
        if entry.Summary == summary {
            return entry.Index
        }
    }
    return -1
}

Certain names still get abbreviated though, for example i, idx, j and other common ones.

Prefer Null Coalesce Operator over if isset

We still work with a lot of arrays, because PHP makes it much easier to use them than complex types (objects). The risk of using arrays is not knowing if a key exists, and code being littered with if (isset($data['key'])) { statements all over the place. It is much better to use the null coalesce operator ?? to define default values for keys that you don’t know exist on an array:

<?php
 $hostName = $serverNames[$currentOccurrance['sId']] ?? 'unknown';

Don’t allow a variable to have two types

If something doesn’t exist it is NULL or false right? Wrong, if you care about simplicity of your code.

  • If a method is supposed to return an object but can’t, it may be better to throw an exception.
  • If an array is empty, then it is empty, not null or false. Iterating over an empty array can correctly do nothing.

I prefer methods written in a way, where they either only handle the default (happy path) by throwing exceptions otherwise, or by defining all variables in a way where even the “unhappy/error” paths can run successfully through the happy code path and yield the exactly right result.

Don’t re-use a variable with a different type

PHP allows this, but it can lead to very confusing code if you re-use the same variable with different types in a method or function. Avoid at all costs.

<?php
 public function deleteAction(PageContext $context, string $serverName)
 {
     $server = $this->findServer($serverName, $context);

In the previous block the argument $serverName could easily have also be called $server and therefore be both a string and an object of type Server within different parts of the method.

Typehint Collection Items with assert

PHP has no way to typehint the items of collections (for example arrays, iterators). You either typehint array, \Doctrine\Common\Collections\Collection or any other class in your methods, with no way to specify type of the key or value once you start iterating with foreach.

There are various docblock supported styles, to define these types, which I recommend using on class properties and return values such as:

<?php
 class XhprofReport extends Struct
 {
     /** @var array<string> \*/
     public $namespaces = [];
 }

But to help both PHPStorm and Psalm (our static code analyser) to understand items of a collection, we use assert() with instanceof, if we can’t provide the types ourselves on return values where we cannot define the typehint, for example when using Doctrine repositories:

<?php
 $organizations = $organizationRepository->findAll();

 foreach ($organizations as $organization) {
     assert($organization instanceof Organization);

     // ..
 }

Use structs to wrap arrays (especially from SQL results)

To avoids typical complex array problems such as not knowing what keys exist or what types values have, arrays should always be wrapped in objects extending from a Struct base class.

The struct base class maps an array to properties of the class in the constructor:

<?php
 abstract class Struct
 {
     public function __construct(array $data = [])
     {
         foreach ($data as $property => $value) {
             if (!property_exists($this, $property)) {
                 throw new \RuntimeException();
             }

             $this->$property = $value;
         }
     }
 }

Often data in Tideways is best represented in domain objects that doesn’t fit the entity or database model 100%. In those cases we manually map the database to the desired objects using this Struct pattern for simplicity.

Take this ServerConfiguration struct which is a combination of data from three Doctrine entities and is assembled in a repository from custom SQL.

<?php
class ServerConfiguration extends Struct
{
    /** @var int */
    public $id;
    /** @var string */
    public $environment;
    /** @var int */
    public $tracesPerMinute;
    /** @var bool */
    public $disabled;
    /** @var bool */
    public $newlyCreated = false;
    /*\*/
}

I am really looking forward when we deploy PHP 7.4 with typed properties where this pattern will become much more powerful and will allow us to write a “mini orm” for mapping SQL results to typed objects.

You can read more about Struct objects on the Qafoo blog and look into Kore’s DataObject package that implements this pattern in a reusable Composer package.

Firegento Hackathon Mainz 2020

This weekend I attended the Firegento Hackathon at Netz98 in Mainz. Due to the storm I had to leave early and skip presentation of my project, so I want to post a short summary of my work here on the blog.

I want to thank everyone from the Firegento organizer team, the hosts and sponsors. The event was really great and I was happy to get to know new people from the community.

Magento 2 Performance and Profiling

I proposed a project to work on Magento 2 Performance and Profiling and we started working on it on and off with a group of 6-7 people, which is quite an awesome crowd!

The biggest achievement was that we could introduce PHP profiling to a few people that have never used Profilers before. After a general overview on how different profiling approaches work we looked at Tideways and Xdebug as examples. It was really awesome to see the reaction of what is actually possible with PHP and that its possible to gather this data.

We looked at some data from production shops with performance problems on the category page with a lot of N+1 Product loads and Andreas explained how they refactored this bottleneck away.

With input from several people we poked into Magento 2 (demo shop) and found two things that seem worth investigating more:

  1. The Redis cache implementation could use a more efficient implementation for the app/config cache, where items are quite static over time. We saw 80-100ms of Redis calls for these caches across different shops (not only demo) that could be cut down by a more specialized cache implementation.

    I attempted a prototype working on this but relaized it needs probably 2-3 days of concentrated work to get a prototype working.

    One thing that Symfony + Doctrine benefit from is caches that write and execute PHP files so that Opcache can make use of interned strings, packed and immutable arrays internally.

  2. There is a ProductViewCounter UI block that renders JSON into the output of each page. This widget is used for the feature to show the last viewed/compared products and also is responsible for the view counter in the backend. This features are often not used, but the output of this UI block is still rendered into every page anyways.

    The serialization takes about 60-80ms, so there is potential here for a performance gain if there was a setting to disable this feature completly.

While working with Magneto 2 shops in Tideways, we realized there are a lot of small optimization that we can do to the callgraph profiler. There is room to add more specialized instrumentation and improve Magento 2 results. With the way Magento 2 uses interceptors (AOP) the profiling call stacks are sometimes really hard to understand, so I have written down around 10 improvement ideas we can add to the Profiler. Expect the results to go into Tideways over the next weeks and months.

This affects things like how Closures are rendered, or skipping them in the callgraph. And for a lot of calls in Magento, the actual function name should be augmented with more information, or changed to a different name to make it easier to grasp without knowing Magneto2 interceptor magic details.

More about: Magento

A simple, responsive feature per plan table using CSS Grids

For a long time I was extremely unhappy with the unresponsiveness of the features list in Tideways. The landingpage is using Bootstrap 3 since the beginning and its table-responsive class is supposed to help here, but requires visitors to scroll and loose context. See a screenshot here of the mobile version of our features table.

https://beberlei.de/_static/cssgrid1.png

For a large table I can’t see which feature and plan a cell belongs to while scrolling around. At the size of Tideways feature table is relatively useless in my opinion.

I came across the Webflow Pricing Page a few days ago and got inspired to redesign the page to use CSS Grids and Sticky Positioning. It took me a while to get around the CSS to understand the concepts and then started from scratch to try to come up with a bare bones solution.

In this blog post I try to explain the solution in my own words. I am by no means a CSS expert, so take all the explanations with a grain of salt. I am linking as many Mozilla Developer docs as possible for reference.

First, I want to use an ordered list for semantic reasons. I then need to group the feature name and its availability in different plans by splitting each list-item into several cells.

<section class="features">
   <ol>
      <li class="header">
         <div>Features</div>
         <div>Free</div>
         <div>Pro</div>
      </li>
      <li>
         <div>Simple A</div>
         <div>Yes</div>
         <div>Yes</div>
      </li>
      <li>
         <div>Fancy B</div>
         <div>No</div>
         <div>Yes</div>
      </li>
   </ol>
</section>

First we look at the style of the list item:

section.features ol li {
   /* hide the ordered list item numbers */
   list-style-type: none;

   /* set element to grid mode rendering */
   display: grid;

   /* grid has 3 columns with a pre-defined width */
   grid-template-columns: 50% 25% 25%;
}

The magic here is the grid-template-columns directive that can be thought of similar to defining the number and width of table columns.

Next we modify the .header class such that it always scrolls to the top of the screen as long as the whole features table is visible on the screen using the sticky position.

section.features ol li.header {
   position: sticky;

   /* this must be modified if you have a static top navigation for example */
   top: 0px;

   /* hide feature cells when header is "over" them */
   background-color: #fff;

   /* some styling to make the header stand out a little from the features */
   border-bottom: 1px solid #666;
   font-weight: bold;
}

Lastly we align all text to center in all divs:

section.features ol li div {
    text-align: center;
}

This makes the feature table work nicely on desktop browsers. The white background is necessary to that the features that scroll under the header will not be visible anymore.

Now to the responsive part, we use the CSS grid to change the three column row into two rows, with the feature label spanning the size of both cells that indiciate feature availability per plan.

@media(max-width: 672px) {
    section.features ol li {
        /* redefine the grid to have only two columns */
        grid-template-columns: 50% 50%;
        /* define two template "rows per grid" (this is my murky understanding) */
        grid-template-rows: auto auto;
    }

    section.features ol li div:nth-child(1) {
            /* define first div (cell) to be 3 columns wide and span a whole row */
            grid-column-start: 1;
            grid-column-end: 3;
            grid-row-start: 1;
            grid-row-end: 2;

            border-bottom: 1px solid #000;
        }
    }
}

The magic is in the grid-column-start (Mozilla Docs) and grid-column-end directives that sort of act like colspan in tables. In addition the possibility to change a grid from one to two rows just with CSS does the rest of the trick here.

You can see a full code example of this blog posts feature table and the re-designed Tideways feature table in action.

https://beberlei.de/_static/cssgrid3.png

Let me know if there are mistakes in my CSS or ways to simplify even further by contacting me under kontakt@beberlei.de.

More about: CSS

P++ is a bad idea for non-technical reasons

Last week the idea of changing PHP to include two languages “PHP” (Classic) and “P++” was proposed on the internals mailing list by Zeev and in more detail by a FAQ answering questions. For context, Zeev is one of the original authors of the Zend Engine in PHP 3 and 4 and co-founder of the Zend company, so the proposal got a lot of responses and additional discussions on Reddit and HackerNews.

The goal of the propsal is find a way to evolve the language PHP using a new dialect (P++) and stay backwards compatible by continuing to support the old dialect (PHP).

Zeev proposes a new tag would <?p++ that sets the PHP compiler into a different mode than <?php now does, providing a “clean start” with BC breaks and more strict typing syntax.

tl;dr: The proposal for P++ is at the core a proposal to have the PHP runtime support multiple versions of the PHP language at the same time. Other languages already have this distinction for ages (C with 89, 99, 11, …). By going with a language version number instead of a new language name, we can avoid a lot of non-technical issues with P++.

I will start with a few non-technical arguments why I think this is a bad idea to introduce a distinct language called “P++” or any other name, with its own name and brand:

  • From a “governing” perspective, introducing P++ is like a big bang that would force the community on a path without knowing all the implementation details up front. This goes against the current governing model of PHP where each larger technical decision is made democratically using the RFC process. At this point we have to respect and accept the fact that without a benelovant dictator, you cannot make these big bang changes in an open source project anymore. Improvements have to be made in incremental steps and P++ would not fit into this model.
  • From an evolutionary perspective, the premise that the PHP community and internal teams can design a new language from the ivory tower, and get the details right the first time is pretence of knowledge fallacy. It is much more likely mistakes are made and then in 5 years we are back with the same problem. The P++ proposal sounds like a perpetual experimental version. It would be better to find a long term strategy to cope with incremental change to the language instead of a big bang change every 10 years.
  • From a marketing perspective, introducing a new brand “P++” is going to be extremely hard to bring to the market. With “the PHP company” Zend swallowed by larger companies, there is no company with the primary goal of bringing forward the language anymore. PHP is truely a community effort now, without even a foundation. There is no centralized body that can effectively lead the marketing effort for this new P++ brand. We are not in 1979 anymore when C++ was invented, the language market is highly fought for and we as the PHP community are protected by PHPs enormous market share that we should not give up by fragmenting.
  • I recognize “P++” is just a working name right now, a name without special characters is certainly a better idea. But a name different from PHP introduces even more problems w.r.t to SEO/Google and the way the PHP project is organized right now there isn’t even a good process defined that would lead to a great outcome.
  • From a documentation perspective, one of PHPs unique selling points is its awesome docs living on “php.net”. As both dialects PHP and P++ would run on the same engine, it becomes much harder to represent this on the website. Here the argument that the P++ project is feasible even with few internal developers falls apart. It would require a completly overhauled new website, an approach to represent both dialects sufficiently without confusing users, new mailing lists, new everything.
  • From a documentation perspective, assuming P++ were to break BC on Core APIs compared to PHP. Would php.net/strpos show the PHP and the P++ function body with haystack and needle switched? Or Would we need to copy the entire documentation? This would be a huge documentation team effort whose time hasn’t been accounted for by the P++ FAQ/proposal.
  • From a teaching perspective, Code examples in the wild on blogs, mailing lists and other sources often would need to make an extra effort to target either PHP, P++ or both. Knowledge would become clustered into two groups.
  • From an ecosystem perspective, a second name/brand would complicate everything for third party vendors, conferences, magazines. Examples: “PHPStorm, the lightning smart PHP & P++ IDE”, “Xdebug - Debugger and Profiler for PHP and P++”, “Dutch PHP and P++ conference”, “PHP and P++ Magazine”. We would probably need to introduce another name for the runtime, say PVM, to allow to make a precise distiction. This adds even more confusion.
  • From a SEO perspeective, Google and other search engines are a primary tool for software developers. If PHP and P++ now start fragmenting the communtiy it becomes much harder for developers to find solutions to problems, because “PHP sort array” will not find the articles “P++ sort array” that offer the same solution.
  • A long time ago, PHP was described to me as the Borg of programming languages. Assimilating APIs, features, paradigms from everywhere. This is still a very good analogy. And today it supports even more paradigms than 15 years ago and gives users extreme freedom to choose between dynamic or strict typing. This has been done in a way with as few BC breaks as possible. Python 3 and Perl 6 are examples of languages that made it much much harder for users to upgrade. I don’t see why suddenly now this approach is not possible anymore and requires two separate dialects.

The P++ proposal makes a few analogys to arrive at the P++ idea, but they are both flawed in my opinion:

  • The analogy that P++ is to PHP what C++ is to C is wrong. C++ introduced a completly new paradigm (object oriented programming). P++ as proposed is PHP with some BC breaks. Its more comparable to Python 2 to 3.
  • The analogy that P++ is to PHP what ES6 is to ES5 is wrong. ES6 and ES5 are versions like PHP 5 and PHP 7 are. EcmaScript is much better in not breaking backwards compatibility than PHP is, but the language by design makes this easier. You can still write Javascript with just ES5 syntax on every ES6 and ES7 compiler. The same is true of PHP 7, where you can still write code that would also run on PHP 3, PHP 4 and PHP 5.

With my arugments I have hopefully established enough non-technical arguments why separating PHP into two separate dialects is not a good idea.

An Alternative Approach

But what are the alternatives to evolve the PHP language?

PHP could avoid all the non-technical problems that P++ would introduce by going with an approach like C, C++, ECMAScript or Rust have: Define different versions of the language that the Runtime/Compiler can all support. Currently PHP combines runtime and language and upgrading to PHP 7 runtime requires you to update your code to PHP 7 semantics.

In C you specify to the compiler according to which version of the standard the file should be compiled.:

gcc -std=c89 file.c
gcc -std=c99 file.c

And then you can combine their output to a new binary which includes code compiled with both versions.

Rust has a smiliar concept named editions. In ECMAscript you use a third party compiler (like Babel) to compile one version down into another.

Essentially the proposed semantics of P++ boil down to defining a new version of PHPs language, they don’t warrant a new language.

If we allow the PHP runtime to support several standards at the same time we can avoid fragmentation of the community, avoiding all the non-technical issues listed above.

PHP already uses declare for this kind of decisions at the moment, so it would be natural to introduce a construct to PHP and make it responsible for switching the Compiler between different versions. Example with made up option name and version:

<?php declare(std=20);

This could be defined to automatically include strict_types=1, but also include some cleanup to type juggling rules for example. The sky is the limit. If we improve the language for the next version, we can introduce the next standard version, but the compiler could still support the old ones for a few years.

PHP users could upgrade to the latest version of the PHP runtime, get security patches, bugfixes, and performance improvements, but can keep the semantics of the version their software was written against. This would simplify the process of keeping backwards compatibility.

Deciding on the actual naming and syntax would be a minor technical problem.

The JIT in relation to PHP extensions

A few days ago I posted about Playing with the PHP JIT and included some simple benchmarking with the react-php-redis server project, which involves a lot of parsing but is ultimately still bound by I/O even when running async.

I got some questions on Twitter that are around some misconcetions of what the JIT really an do for PHP applications and what it cannot do.

So to show what the JIT is good for, I wanted to have truly CPU bound problem that was realistic from my POV.

Inside Tideways we use a datatype called HDRHistogram (high dynamic rrange histogram), a statistical datatype to calculate exact percentiles in monitoring data. For each minute and server we might have a histogram and when rendering a chart, we merge and aggregate this data in large numbers.

At the moment we use a PHP Extension interfacing with a C library to use this datatype.

I have ported the necessary code to PHP to test this with the JIT, without the JIT and against the PHP extension.

<?php

function simulate_hdr() {
    $hdr = hdr_init(1, 1000, 1);
    for ($i = 1; $i < 1000; $i++) {
        for ($j = 0; $j < 1000; $j++) {
            hdr_record_value($hdr, $i);
        }
    }
    hdr_value_at_percentile($hdr, 95);
}


for ($i = 0; $i < 5; $i++) {
    $time = microtime(true);
    simulate_hdr();

    echo number_format(microtime(true) - $time, 4) . "\n";
}

Again take the numbers with a grain of salt, these are just here to show the approximate relationships:

Runs PHP nojit PHP jit C/PHP Ext
1 0.5916 0.3671 0.0775
2 0.6322 0.4038 0.0775
3 0.6025 0.3866 0.0799
4 0.6010 0.3892 0.0829
5 0.6137 0.3947 0.0828
Average 0.6082 0.3883 0,0801
% 100,00% 63,84% 13,17%

As you can see, the JIT code runs at roughly 2/3 (63,84%) of the original non-jitted code and gets into the region of twice as fast that the RFC claims for PHPs internal benchmark. The improvement is much better than with the react-php-redis server example from a few days ago, where the improvement was only in the 5-20% region.

But compared to implementing this code directly in C as a PHP extension, even the jitted code is still 5 times slower.

Yes, with the JIT there is a massive improvement of this CPU bound problem, but it doesn’t mean we can now re-implement all PHP extensions in pure PHP and rely on the JIT to make them perform.

What the JIT does improve:

  • Make the parts of CPU bound problems that are written in PHP (!) faster.

What the JIT does not improve:

  • It does not improve performance of already fast internal functions written in C, for example hashing, encryption functions.
  • It does not improve performance (by a lot) for I/O bound problems.

To close the gap between JIT and C, we could look at PHP 7.4 including the FFI extension. It allows interfacing with C code more easily from PHP. Anthony Ferrara is building his “php-compiler” project on top FFI that would allow compiling a subset of PHP code directly to an FFI C extension.