SimGrid
3.16
Versatile Simulation of Distributed Systems
|
SimGrid is a free software, written by a community of people.
It started as a little software to help ourselves in our own research, and as more people put their input into the pot, it turned into something that we hope to be valuable to many people. So yes. We hope that SimGrid is helping you doing what you want, and that you will join our community of happy simgriders.
There are several locations where you can connect and discuss about SimGrid. If you have a question, please have a look at the documentation and examples first, but if some remain don't hesitate to ask the community for help. If you do not have a question, just come to us and say hello! We love earing about how people use SimGrid.
We are sometimes asked by users how to give back to the project. Here are some ideas, but if you have new ones, feel free to share them with us.
There are many ways to help the SimGrid project. The first and most natural one is to use it for your research, and say so. Cite the SimGrid framework in your papers and discuss of its advantages with your colleagues to spread the word. When we ask for new fundings to sustain the project, the amount of publications enabled by SimGrid is always the first question we get. The more you use the framework, the better for us.
Make sure that your scientific publications using SimGrid actually cite the right paper. Also make sure that these citations are correctly listed on our list.
You can also help us constituting an active and welcoming user community. Subscribe to the mailing lists, and answer the questions that newscomers have if you can. Point them (gently ;) to the relevant part of the documentation on need, and help them becoming part of our community too.
Another easy way to help the project is to add a link to the SimGrid homepage on your site to improve SimGrid's ranking in search engines.
Finally, if you organize a scientific event where you expect many potential users, you can invite us to give a tutorial on SimGrid. We found that 45 minutes to one hour is very sharp, but doable. It allows us to explain the main motivations and outcomes of the project in order to motivate the attendees get more information on SimGrid, and eventually improve their scientific habits by using a sound simulation framework. Here is an example of such a presentation.
Because of its size and complexity, SimGrid is not perfect and contains a large amount of glitches and issues. When you find one, don't assume that it's here because we don't care. It survived only because nobody told us. We unfortunately cannot endlessly review our large code and documentation base. So please, report any issue you find, be it a typo in the documentation, a paragraph that needs to be reworded, a bug in the code, or any other problem. The best way to do so is to open an issue on our GitHub's Bug Tracker so that we don't forget about it (if you want to put some attachment, you can use this other bugtracker instead).
The worst way to report such issues is to go through private emails. These are unreliable, and we are trying to develop SimGrid openly, so private discussions are to be avoided if possible.
If you can provide a patch fixing the issue you report, that's even better. If you cannot, then you need to give us a minimal working example (MWE), that is a ready to use solution that reproduces the problem you face. Your bug will take much more time for us to reproduce and fix if you don't give us the MWE, so you want to help us helping you to get things efficient.
Of course, a very good way to give back to the SimGrid community is to triage and fix the bugs in the Bug Tracking Systems. If you can come up with a patch fixing them, we will be more than happy to apply your changes so that the whole community enjoys them.
If you deeply miss a feature in the framework, you should consider implementing it yourself. SimGrid is free software, meaning that you are free to help yourself. Of course, we'll do our best to assist you in this task, so don't hesitate to contact us with your idea.
You could write a new plugin extending SimGrid in some way, or a routing model for another kind of network. But even if you write your own platform file, this is probably interesting to other users too, and could be included to SimGrid. Modeling accurately a given platform is a difficult work, which outcome is very precious to us.
Or maybe you developed an independent tool on top of SimGrid. We'd love helping you gaining visibility by listing it in our Contrib section.
If you want to start working on the SimGrid codebase, here are a few ideas of things that could be done to improve the current code (not all of them are difficult, do trust yourself ;)
The code is being migrated to C++ but a large part is still C (or C++ with C idioms). It would be valuable to replace C idioms with C++ ones:
char*
strings with std::string
;std::unique_ptr
, etc.) instead of explicit malloc/free
or new/delete
;std::function
(or template functionoid arguments) instead of function pointers;SimGrid used to implement exceptions in C. This has been replaced with C++ exceptions but some bits of the C exceptions are still remaining:
xbt_ex
was the type of C exceptions. It is now a standard C++ exception. We might want to remove this exception and use a more idiomatic C++ solution with dedicated exception classes for different errors. std::system_error
might be used as well by replacing some xbt_errcat_t
with custom subclasses of std::error_category
.Some support for C++11-style time/duration is implemented (see chrono.hpp
) but only available in some (S4U) APIs. It would be nice to add support for them in the rest of the C++ code.
A related change would be to avoid using "-1" to mean "forever" at least in S4U and in the internal code. For compatibility, MSG should probably keep this semantic. We should probably always use separate functions (wait
vs wait_for
).
simgrid::kernel::Future
, simgrid::kernel::Promise
) could be extended to support additional features: when_any
, shared_future
, etc.simgrid::simix::Future
)..then()
is not available for user futures. We would need to add a basic user event loop in order to queue the pending continuations.Action
or Operation
class with an API compatible with Future
(and convertible to it) but with an additional .cancel()
method.Currently, all the simulated processes live in the same process as the SimGrid simulator. The benefit is that we don't have to do context switches and IPC between the simulator and the processes.
The fact that they share the same address space means that one memory corruption in one simulated process can propagate to the other ones and to the SimGrid simulator itself.
Moreover, the current design for SMPI applications is to compile the MPI code normally and execute it once per simulated process in the same system process: This means that all the existing simulated MPI processes share the same virtual address space and share by default the same global variables. This is not correct as each MPI process is expected to use its own address space and have its own global variables. In order to fix, this problem we have an optional SMPI privatization feature which creates an instanciation of the executable data segment per MPI process and map the correct one (using mmap
) at each context switch.
This approach has many problems:
In order to fix this, the standard solution is to move each MPI process in its own system process and use IPC to communicate with the simulator. One concern would be the impact on performance and memory consumption:
mmap
for SMPI privatization which is costly as well (TLB flush).We would need to modify the model-checker as well which currently can only manage on model-checked process. For the model-checker we can expect some benefits from this approach: if a process did not execute, we know its state did not change and we don't need to take its snapshot and compare its state.
Other solutions for this might include:
dlmopen
). Each process would have its own separate instanciation and would not share libraries.The state comparison code is quite complicated. It has very long functions and is programmed mostly using C idioms and is difficult to understand and debug. It is in need of an overhaul:
std::string
and std::vector
have a capacity which might be larger than the current size of the container. We should ignore the corresponding elements when comparing the states and infering the types.In order to speed up the state comparison an idea was to create a hash of the state. Only states with the same hash would need to be compared using the state comparison algorithm. Some information should not be inclueded in the hash in order to avoid considering different states which would otherwise would have been considered equal.
The states could be indexed by their hash. Currently they are indexed by the number of processes and the amount of heap currently allocated (see DerefAndCompareByNbProcessesAndUsedHeap
).
Good candidate informations for the state hashing:
Some basic infrastructure for this is already in the code (see mc_hash.cpp
) but it is currently disabled.
The model-checker reads many information about the model-checked process by process_vm_readv()
-ing brutally the data structure of the model-checked process leading to some horrible code such as walking a swag from another process. It prevents us as well from replacing some XBT data structures with standard C++ ones. We need a sane way to expose the relevant information to the model-checker.
We have introduced some generic simcalls which can be used to execute a callback in SimGrid Maestro context. It makes it a lot easier to interface the simulated process with the maestro. However, the callbacks for the model-checker which cannot decide how it should handle them. We would need a solution for this if we want to be able to replace the simcalls the model-checker cares about by generic simcalls.
Currently, writing a new model-checking algorithms in SimGridMC is quite difficult: the logic of the model-checking algorithm is mixed with a lot of low-level concerns about the way the model-checker is implemented. This makes it difficult to write new algorithms and difficult to understand, debug, and modify the existing ones. We need a clean API to express the model-checking algorithms in a form which is closer to the text-book/paper description. This API must be exposed in a a language which is more adequate to this task.
Tasks:
Session
class currently exists for this but is not feature complete and should probably be rewritten. It should be easy to create bindings for different languages on top of this API.