I was unaware that fseek actually makes a syscall which is rather costly, which became painfully obvious during large world conversions on PM4.
On average this problem appeared to be adding about 5ms to the load time for a newly loaded region, which is insanely expensive.
In the old days, we used to try to correct this problem by adjusting the region header to match the
length found at the start of the chunk payload. However, this has a very good chance to cause corruption
of other chunks, since we can't do any fast overlap checks (an upsize might cause the chunk's alloocated
area to overlap into another one, causing corruption when either chunk's space gets written to).
This corruption risk has become more problematic since the
introduction of region garbage sector reuse, since a broken location
header could cause chunks to trash each others' saved data.
In addition, if there is a length mismatch, there's a good chance that the oversized chunk itself will
already be corrupted, so we'd just fail trying to decompress it later on.
So, instead of trying to fix this automatically, we bail and hope this doesn't occur often enough for
users to get upset, and allow external offline tools to attempt to repair the mess instead.
PHPStan has no idea what is going on in this code because unpack() returns mixed[].
Possibly it might be a good idea to implement a dynamic return type extension for this.
we already have a region growth problem due to the lack of garbage collection, but this bug was making it worse. If the region already contained 1024 allocated chunks, 4MB of file space would get wasted before the next chunk would be appended to the file.
this isn't required by the spec and PHPStan chokes on it. I was previously having it ignore these errors, but it turns out that PHPStan is not making use of extended typeinfo provided if it can't parse the tag, which is problematic on level 6 and also a problem for array-of-type. Therefore, we are going to have to take the hit.
this was making the location table point to an offset that did not yet exist, which caused the region header consistency check to discard the region as corrupted the next time it was loaded.
This is better for performance because these then don't need to be reevaluated every time they are called.
When encountering an unqualified function or constant reference, PHP will first try to locate a symbol in the current namespace by that name, and then fall back to the global namespace.
This short-circuits the check, which has substantial performance effects in some cases - in particular, ord(), chr() and strlen() show ~1500x faster calls when they are fully qualified.
However, this doesn't mean that PM is getting a massive amount faster. In real world terms, this translates to about 10-15% performance improvement.
But before anyone gets excited, you should know that the CodeOptimizer in the PreProcessor repo has been applying fully-qualified symbol optimizations to Jenkins builds for years, which is one of the reasons why Jenkins builds have better performance than home-built or source installations.
We're choosing to do this for the sake of future SafePHP integration and also to be able to get rid of the buggy CodeOptimizer, so that phar and source are more consistent.
This assumes that the region is properly garbage-collected and packed, but if the file contains uncollected garbage this may not be the case, resulting in a region larger than a gigabyte.
This now removes logging from the level providers (for the most part) and replaces it with exception throws and catches. The implementation using the providers should catch these exceptions if they are thrown.
Check size, check header size, check location table offsets point to valid locations, check for shared offsets, prevent issues with corrupted or junk data
Refactored level\format\generic\GenericChunk -> level\format\Chunk.
Re-added support for async chunk sending
Refactored most Level IO into new namespaces for more organisation
Removed LevelDB loader completely (will be re-added at a later date)