When I use precice, and both participants do the data mapping ( read consistent), often the simulation hangs here:
Initialize preCICE
| precice::impl::SolverInterfaceImpl::initialize() | Setting up master communication to coupling partner/s
| precice::impl::SolverInterfaceImpl::initialize() | Coupling partner/s are connected
| precice::geometry::CommunicatedGeometry::sendMesh() | Gather mesh AcousticSurface_euler
| precice::geometry::CommunicatedGeometry::sendMesh() | Send global mesh AcousticSurface_euler
| precice::geometry::CommunicatedGeometry::receiveMesh() | Receive global mesh AcousticSurface_acoustic
| precice::geometry::BroadcastFilterDecomposition::broadcast() | Broadcast mesh AcousticSurface_acoustic
| precice::geometry::BroadcastFilterDecomposition::filter() | Filter mesh AcousticSurface_acoustic
| precice::geometry::BroadcastFilterDecomposition::feedback() | Feedback mesh AcousticSurface_acoustic
| precice::impl::SolverInterfaceImpl::initialize() | Setting up slaves communication to coupling partner/s
| precice::impl::SolverInterfaceImpl::initialize() | Slaves are connected
| precice::impl::SolverInterfaceImpl::initialize() | it 1 of 1 | dt# 1 of 200000000 | t 0 | dt 1e-05 | max dt 1e-05 | ongoing yes | dt complete no | write-initial-data |
and the acoustic domain:
Initialize preCICE
| precice::impl::SolverInterfaceImpl::initialize() | Setting up master communication to coupling partner/s
| precice::impl::SolverInterfaceImpl::initialize() | Coupling partner/s are connected
| precice::geometry::CommunicatedGeometry::sendMesh() | Gather mesh AcousticSurface_acoustic
| precice::geometry::CommunicatedGeometry::sendMesh() | Send global mesh AcousticSurface_acoustic
| precice::geometry::CommunicatedGeometry::receiveMesh() | Receive global mesh AcousticSurface_euler
| precice::geometry::BroadcastFilterDecomposition::broadcast() | Broadcast mesh AcousticSurface_euler
| precice::geometry::BroadcastFilterDecomposition::filter() | Filter mesh AcousticSurface_euler
| precice::geometry::BroadcastFilterDecomposition::feedback() | Feedback mesh AcousticSurface_euler
| precice::impl::SolverInterfaceImpl::initialize() | Setting up slaves communication to coupling partner/s
| precice::impl::SolverInterfaceImpl::initialize() | Slaves are connected
| precice::impl::SolverInterfaceImpl::initialize() | it 1 of 1 | dt# 1 of 200000000 | t 0 | dt 1e-05 | max dt 1e-05 | ongoing yes | dt complete no | write-initial-data |
It is a rather small testcase,2d with 2*1500 points at the interfaces, matching grids.
Using debug flags, this is a output where it stops:
precice::com::SocketCommunication::acceptConnectionAsServer() | (72) Leaving (file:src/utils/Tracer.cpp,line:27)
| precice::com::SocketCommunication::getRemoteCommunicatorSize() | (72) Entering (file:src/utils/Tracer.cpp,line:21)
| precice::com::SocketCommunication::getRemoteCommunicatorSize() | (72) Leaving (file:src/utils/Tracer.cpp,line:27)
| precice::com::SocketCommunication::receive(int) | (72) Entering rankSender=0 (file:src/utils/Tracer.cpp,line:21)
| precice::com::SocketCommunication::receive(int) | (72) Leaving (file:src/utils/Tracer.cpp,line:27)
| precice::m2n::PointToPointCommunication::acceptConnection() | (72) Leaving (file:src/utils/Tracer.cpp,line:27)
| precice::m2n::M2N::acceptSlavesConnection() | (72) Leaving (file:src/utils/Tracer.cpp,line:27)
| precice::cplscheme::ParallelCouplingScheme::initialize() | (72) Entering startTime=0, startTimestep=1 (file:src/utils/Tracer.cpp,line:21)
| precice::cplscheme::ParallelCouplingScheme::initialize() | (72) Leaving (file:src/utils/Tracer.cpp,line:27)
| precice::utils::Parallel::synchronizeProcesses() | (72) Entering (file:src/utils/Tracer.cpp,line:21)
I figured out that for jobs where both participants are < 64 processes, it is running fine. The strange part is, that same executable, same input files from the solver, same precice config and same job script, it is sometimes running.
I worked a lot with Mohammed Shaheen (IBM support at LRZ) on this, but from the machine support, he could not fine any problem.
I am pretty sure that the problem is due to the data mapping on one participant. When I change the read consistent to write conservative of the other participant, I never have a problem at that point.
This is quite urgent, since I want to run simulations :) and the data mapping on one participant is really slow ( --> issue 43) .