Compare commits

...

5 Commits

7 changed files with 259 additions and 0 deletions

View File

@ -0,0 +1,47 @@
---
title: MISP 2.4.187 released with security fixes, new features and bugs fixes.
date: 2024-03-24
layout: post
tags: ["MISP", "Threat Intelligence", "release" ]
banner: /img/blog/opinion-view.png
---
We are pleased to announce the immediate release of MISP 2.4.187, including security fixes, new features and bugs fixes.
### New Features
- **CLI Enhancements:**
- Added `org list` to shell commands.
- New command to change user role.
- Fixes to role management.
- **OIDC Update:**
- New option `OidcAuth.update_user_role` to disable role changes from OIDC.
### Changes
- **Version and Software Updates:**
- Version bump.
- Updates to PyMISP, misp-galaxy, misp-warninglists, misp-objects, and taxonomies.
- **Internal Updates:**
- Added `ext-zstd` to suggested PHP extensions.
- Fixed non-focusable relationship dropdown search field in analyst data.
### Fixes
- **General Fixes:**
- Corrected variable unset in events:restsearch to prevent attribute override.
- Ensured sync pulls continue after an event save failure.
- Database update fixes for older MySQL versions.
- Improved API consistency.
- Fixed pulling from remote servers when analyst data is not supported.
- Logging fix for `removeTagFromObject()`.
- Security improvements for file and logo uploads. (Thanks to Rémi Matasse and Raphael Lob from Synacktiv for the report)
- [CVE-2024-29859](https://cvepremium.circl.lu/cve/CVE-2024-29859) < MISP 2.4.187 - add_misp_export in app/Controller/EventsController.php does not properly check for a valid file upload.
- [CVE-2024-29858](https://cvepremium.circl.lu/cve/CVE-2024-29858) < MISP 2.4.187 - __uploadLogo in app/Controller/OrganisationsController.php does not properly check for a valid logo upload.
- Correct message display when disabling a galaxy.
- **CLI Updates:**
- Added new functionalities including listing roles and creating users.
Details changes are available in [Changelog](https://www.misp-project.org/Changelog.txt).
# MISP Professional Services
[MISP Professional Services (MPS)](https://www.misp-project.org/professional-services/) is a program handled by the lead developers of MISP Project, in order to offer highly skilled services around MISP and to support the sustainability of the MISP project. This initiative is meant to address the policy requirements of companies/organisations requiring commercial support contracts. Don't hesitate to get in touch with us if you need specific services.

View File

@ -0,0 +1,71 @@
---
title: MISP 2.4.188 released major performance improvements and many bugs fixed.
date: 2024-03-25
layout: post
tags: ["MISP", "Threat Intelligence", "release" ]
banner: /img/blog/opinion-view.png
---
We are pleased to announce the immediate release of MISP 2.4.188, with major performance improvements and many bugs fixed.
### New Features
- **Datasource Improvements:**
- Updates to some datasources with the `ignoreIndexHint` parameter (`mysqlExtended`, `mysqlObserverExtended`).
- Fix for `forceIndexHint`.
- **Settings:**
- Added setting to temporarily disable the loading of sightings via the API (affects `restsearch` and `/events/view` endpoints). This helps with performance issues caused by large sighting data sets.
### Changes
- **PyMISP:**
- Multiple version bumps.
- **Version and Internal Updates:**
- General version bump.
- Improved error handling and marking `BadRequestException` as fail log in CI.
- Attempt to fix a failing test.
- Updated `misp-galaxy`, `misp-object`, and `warning-lists`.
- **Attribute Search Rework:**
- Significant performance improvement when using MysqlExtended or MysqlObserverExtended data sources.
- Event level lookup moved to subqueries for faster queries.
- Ignoring the deleted index to improve speed.
- **OpenAPI Updates:**
- Added content for `analyst-data` and `event-reports`.
- **Sighting Policy Support:**
- Added support of sighting policy in `sightings:getLastSighting`.
- **Attribute Search Performance:**
- Improved performance of `includeDecayScore` by a factor of 5.
- **Attribute Fetch Refactor:**
- Simplified conditions and optimizations.
### Fixes
- **Attribute Search:**
- Enforced `unpublishedprivate` directive.
- **Internal Error Handling:**
- Error handling improvements in AttachmentScan.
- **CurlClient HEAD Request:**
- Added `CURLOPT_NOBODY` for HEAD requests.
- **CLI and ECS Updates:**
- Fix for `redisReady` in dragonfly.
- Change type from `Exception` to `Throwable` in ECS.
- **OIDC:**
- Default organization handling if not provided by OIDC.
- **Publishing and Sync Issues:**
- Fix for publishing and sync errors.
- **Performance Improvements:**
- Bulk loading of analyst data to speed up event loading.
- **UI Update:**
- Added `MISP.email_reply_to` to server config.
### Other
- Multiple merges of branches and updates.
- Fixes and changes in `misp-stix`, attachment scan error handling, OIDC default org handling, alert email titles, shadow attribute handling, and community additions (ICS-CSIRT.io).
### Community and Contribution Updates
- Additions and changes to the community, including the introduction of the [ICS-CSIRT.io community](https://ICS-CSIRT.io).
Details changes are available in [Changelog](https://www.misp-project.org/Changelog.txt).
# MISP Professional Services
[MISP Professional Services (MPS)](https://www.misp-project.org/professional-services/) is a program handled by the lead developers of MISP Project, in order to offer highly skilled services around MISP and to support the sustainability of the MISP project. This initiative is meant to address the policy requirements of companies/organisations requiring commercial support contracts. Don't hesitate to get in touch with us if you need specific services.

View File

@ -0,0 +1,141 @@
---
title: Poppy a new Bloom filter format and open source project
date: 2024-03-25
layout: post
tags: ["information sharing", "poppy", "release", "Bloom filter", "sharing"]
banner: /img/blog/poppy/2.png
---
# Poppy a new Bloom filter format and open source library
## Introduction
At [CIRCL](https://www.circl.lu) we use regularly bloom filters for some of our use cases especially in digital forensic. Such as providing a small, fast and shareable caching mechanism for [Hashlookup](https://hashlookup.io/) database which can be used by incident responders.
We initially worked with an existing great project [bloom](https://github.com/DCSO/bloom) from [DCSO](https://github.com/DCSO) as it provided convenient features we were looking for, such as data serialization. To better suits our growing bloom filter needs, we decided to re-implement the [bloom project](https://github.com/DCSO/bloom) in Rust called [Poppy](https://github.com/hashlookup/poppy). Over the course of the re-implementation, we have noticed some challenges for our use-cases with the original implementation and we decided to move on towards a new implementation we will detail over this blog post.
So that the reader fully enjoys the content of this blog post, we highly recommend him to familiarize with classical [bloom filter](https://en.wikipedia.org/wiki/Bloom_filter) implementation.
## Reviewing the existing code
At the first sight, when reviewing the original code, everything seems pretty well written. We noticed some minor code optimization that we decided to directly integrate in our Rust port.
After reading again and again the code, we did notice something strange in the **bit index generation algorithm** (shown below).
```go
// m == 0xffffffffffffffc5
const m uint64 = 18446744073709551557
const g uint64 = 18446744073709550147
// Fingerprint returns the fingerprint of a given value, as an array of index
// values.
func (s *BloomFilter) Fingerprint(value []byte, fingerprint []uint64) {
hv := fnv.New64()
hv.Write(value)
// fnv(value) % m
hn := hv.Sum64() % m
for i := uint64(0); i < s.k; i++ {
hn = (hn * g) % m
// s.m is the size of the bloom filter in bits
// so this computes bits indexes
fingerprint[i] = uint64(hn % s.m)
}
}
```
Looking closer, this algorithm really looks like a [Lehmer random number generator](https://en.wikipedia.org/wiki/Lehmer_random_number_generator). The only notable difference with documented implementation is the fact that `m` (used for modulo) is in theory not of the same size as the dividend. For example, if the dividend is a `uint64`, `m` is supposed to be `uint32`.
Another surprising thing is the very high value of `m` which is `uint64::MAX - 58`. The surprise came from the observation that doing the modulo of a `uint64` with such a big value has a fairly low chance to be effective, as it will almost always return the dividend (except for 58 `uint64` on the whole range of `uint64`). That being said, we decided to implement it this way as we assumed this code correct.
## Looking for optimization ... getting issues
Everything was going fine, until we looked for further optimization to apply.
Digging into other datastructures (like hash tables) implementations, we noticed a nice optimization can be done on the bit **index generation algorithm**. This optimization consists of taking a bitset (holding all the bits of the filter) with a size being a **power of two**. If we do so, instead of computing bit index with a modulo, we can compute bit index with a `bit and` operation. This comes from the following nice property, for any `m` being a power of two `i % m == i & (m - 1)`. It is important to understand that this particular optimization will allow us to win on CPU cycles as less instructions are needed to compute the bit indexes.
Trying to apply this with bloom filter is fairly trivial, when computing the desired size (in bits) of the bloom filter we just need to take **the next power of two**. When implementing this with our current Rust port of the [bloom](https://github.com/DCSO/bloom) we realized the implementation was broken. When the bit size of the filter becomes a power of two, the **real false positive proability** of the filter literally explodes (**x20** in some cases). Without, going further into the details we believe the weakness comes from the **bit index generation** algorithm describes above. A [Github issue](https://github.com/DCSO/bloom/issues/19) has been opened in the original project to track this issue.
We are now quite embarassed with this issue since to fix it we need to change the **bit index generation algorithm**. Such a change would break compatibility between versions before and after the fix.
During this optimization attempt, this was not the only challenge we had to deal with, but the following are not directly related to the current implementation. Optimizing filter with a **power of two bit size** is not ideal, as in the worst case we need to double the size of the filter and we are facing an **exponential growth** issue. We also have to keep in mind that the bigger the filter, the more costly in term of size the optimization will be (we always take next power of two). This fact might be negligeable when dealing with small filters but no so much for big ones, such as the one (currently **~700MB**) generated to hold the full [Hashlookup][hashlookup] database. In this particular example, we would need to increase the size of the filter to **1GB** to benefit from such a speed optimization.
Given those conditions, we had a pretty solid motivation to move towards an improved version of the format, library and tools.
### New Implementation: same same ... but different
In order to benefit from **power of two** optimization discussed earlier without bothering about the exponential growth of the filter size we opted for a less common bloom filter implementation. We can see this structure as a **hash table containing Bloom filters**. It forms an hybrid data structure looking like a **hash table** but behaving like a **Bloom filter**. As a consequence we have the same properties (intersection, union) as classical Bloom filters.
![](/img/blog/poppy/1.png)
To fully take advantage of this implementation one needs to carefully choose the size of the bloom filters. We decided to choose a size of **0x8000 bits** corresponding to size of **4096 bytes**. This was not chosen at random, on many systems the memory page size is **4096 bytes**. So it means that additionally to the **power of two** optimization we will benefit of another gain in term of memory access. Compared to the traditional implementation, where bits of an entry can be spread over several memory pages, this implementation guarantees that all bits of an entry is contained in a single memory page.
In spite of its advantages, this implementations also has some cons:
* the minimal size of the filter is always **4096 bytes**
* we always need to retrieve one filter from memory for a lookup
This is now the time to address the main problem we found in the [bloom](https://github.com/DCSO/bloom), namely the **bit index algorithm**. To address this, we opted for [double hashing](https://en.wikipedia.org/wiki/Double_hashing), a more traditional approach seen in other **Bloom filters** implementations. The only freedom taken on this regard is the **hashing function** used. The original implementation is using [fnv1](https://en.wikipedia.org/wiki/Fowler%E2%80%93Noll%E2%80%93Vo_hash_function), which is rather easy to implement but is not adapted for hashing long strings. Fnv family algorithms are processing bytes one by one, impacting hashing performance on large inputs. After several benchmarks, we decided to use [wyhash](https://github.com/wangyi-fudan/wyhash) for the following reasons:
* one of the best of our [hash function benchmark](https://github.com/tkaitchuck/aHash?tab=readme-ov-file#comparison-with-other-hashers) (maybe for a later blog post)
* portable between CPU architectures
* implemented in other languages (important if this work needs to be ported)
In the spirit of providing an improved implementation, we explored some further optimizations applicable to this specific structure.
1. one can increase troughput (at the cost of space) by choosing a **table size** (c.f. structure drawing) being a **power of two**
2. a **trade-off optimization** (a small space cost but gaining speed) by having a pre-filter bitset keeping track of inserted hashes.
## Benchmarks
In order to evaluate our modifications and compare it with the previous implementation, we ran some benchmark on a common dataset. It is worth noting that benchmarking against [DCSO implementation][dcso-bloom] (i.e. v1) has been fully done in Rust. Here after, one can find information about our evaluation dataset.
**Dataset size**: varies between **10MB** and **7GB**
**Data type**: sha1 strings (40 bytes wide)
**False positive probability**: 0.001 (0.1%)
![](/img/blog/poppy/2.png)
The previous graph shows the impact of the dataset size on the throughput of the filter. We can observe the different advantages of the new implementation. The more obvious is the gain in term of speed **~ 2x faster**. The second is that the new implementation is flatter over the growth of the dataset, making it less impacted by always growing datasets such as [Hashlookup](https://hashlookup.io) one.
![](/img/blog/poppy/3.png)
On the above graph, zooming on the start of the curves, we notice an interesting point, speed optimization might not be good for small datasets. This optimization needs to increase table size to be aligned on a **power of two** and this increase in size very likely cost more on the memory accesses than **power of two** is beneficial on CPU. Let's now take a look at the space impact of the different optimizations.
![](/img/blog/poppy/4.png)
On the plot we can observe that both the implementations (in their unoptimized forms) have the same size. We can clearly see the impact of aligning filter size on the **next power of two** done by the speed optimization variant. For the trade off optimization, we can observe that size is more linear, making it a reasonnable default choice.
So that [Poppy](https://github.com/hashlookup/poppy/) users can compare and choose the best settings for their use case, we integrated a specific **bench** command. This special command allows to assess the speed of the filter but also verifies that the **false positive probability** of the filter matches the expected one.
### Other minors yet interesting improvements
Creating filters from [poppy](https://github.com/hashlookup/poppy/) command line has been implemented to be multi-threaded. It is not much relevant for small datasets but it allows to reduce read/write operations on large filters.
We also eased a bit filter creation over the CLI. Initially, counting the number of entries to set the filter capacity was necessary. Now, entry counting can be done by [poppy](https://github.com/hashlookup/poppy/) itself, so the only parameter one has to provide when creating a filter from a dataset is the **false positive probability**.
```bash
# this creates a new filter saved in filter.pop with all entries (one per line)
# found in .txt files under the dataset directory using available CPUs (-j 0)
poppy -j 0 create -p 0.001 /path/to/output/filter.pop /path/to/dataset/*.txt
```
## Conclusions
### Lessons learned
Bloom filters belongs to the category of probabilistic datastructures which means that many non trivial factors might alter its behaviour. Maybe the more confusing aspect about such a structure is that what you get, in term of false positive probability, is not necessarily what you expect. Over the course of this implementation, we had to do many adjustments to limit side effects (such as the one breaking the original implementation). So one wanting to implement it's own Bloom filter really has to pay attention to the quality of the **bit index generation algorithm** to make sure it does not create unexpected collisions. The other important parameters is the **hashing function** used. This one needs to have a low collision rate regardless of the input data. To make sure everything works as expected, thourough testing mixing filter properties and data types is mandatory.
### Future work
[Poppy](https://github.com/hashlookup/poppy) is set to become a pivotal Bloom filter library for a variety of upcoming initiatives and projects developed by CIRCL such as [MISP Project](https://www.misp-project.org/), [hashlookup](https://hashlookup.io/), [AIL Project](https://www.ail-project.org/) and many other open source security tooling. This versatile library will be instrumental in numerous areas, particularly in handling sensitive information sharing and threat intelligence correlation, among other applications. The roadmap for the Poppy library includes plans to extend its capabilities, allowing for network access to the Bloom filter. We also invite external contributors to join the project, bringing new ideas and enhancements.
### References
- [dcso-bloom](https://github.com/DCSO/bloom)
- [hashlookup](https://hashlookup.io/)
- [poppy](https://github.com/hashlookup/poppy)
## Acknowledgement
The Poppy open source project is developed in the scope of the NGSOTI project and co-funded under Digital Europe Programme by the ECCC (European Cybersecurity Competence Centre and Network).
The NGSOTI project is dedicated to training the next generation of Security Operation Center (SOC) operators, focusing on the human aspect of cybersecurity. It underscores the significance of providing SOC operators with the necessary skills and open-source tools to address challenges such as detection engineering, incident response, and threat intelligence analysis. Involving key partners such as CIRCL, Restena, Tenzir, and the University of Luxembourg, the project aims to establish a real operational infrastructure for practical training. This initiative integrates academic curricula with industry insights, offering hands-on experience in cyber ranges.

BIN
static/img/blog/poppy/1.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 51 KiB

BIN
static/img/blog/poppy/2.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 51 KiB

BIN
static/img/blog/poppy/3.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 47 KiB

BIN
static/img/blog/poppy/4.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 44 KiB