All publications
Publications by categories in reversed chronological order.
International conferences
2023
- LSCPM: communities in massive real-world Link Streams by Clique Percolation MethodAlexis Baudin, Lionel Tabourier, and Clémence MagnienIn 30th International Symposium on Temporal Representation and Reasoning, TIME 2023, Sep 2023
Community detection is a popular approach to understand the organization of interactions in static networks. For that purpose, the Clique Percolation Method (CPM), which involves the percolation of k-cliques, is a well-studied technique that offers several advantages. Besides, studying interactions that occur over time is useful in various contexts, which can be modeled by the link stream formalism. The Dynamic Clique Percolation Method (DCPM) has been proposed for extending CPM to temporal networks. However, existing implementations are unable to handle massive datasets. We present a novel algorithm that adapts CPM to link streams, which has the advantage that it allows us to speed up the computation time with respect to the existing DCPM method. We evaluate it experimentally on real datasets and show that it scales to massive link streams. For example, it allows to obtain a complete set of communities in under twenty-five minutes for a dataset with thirty million links, what the state of the art fails to achieve even after a week of computation. We further show that our method provides communities similar to DCPM, but slightly more aggregated. We exhibit the relevance of the obtained communities in real world cases, and show that they provide information on the importance of vertices in the link streams.
@inproceedings{baudin2023lscpm, title = {LSCPM: communities in massive real-world Link Streams by Clique Percolation Method}, author = {Baudin, Alexis and Tabourier, Lionel and Magnien, Cl{\'e}mence}, booktitle = {30th International Symposium on Temporal Representation and Reasoning, TIME 2023}, year = {2023}, month = sep, }
2022
- Clique percolation method: memory efficient almost exact communitiesAlexis Baudin, Maximilien Danisch, Sergey Kirgizov, Clémence Magnien, and Marwan GhanemIn Advanced Data Mining and Applications: 17th International Conference, ADMA 2021, Sydney, NSW, Australia, February 2–4, 2022, Proceedings, Part II, Sep 2022
Automatic detection of relevant groups of nodes in large real-world graphs, i.e. community detection, has applications in many fields and has received a lot of attention in the last twenty years. The most popular method designed to find overlapping communities (where a node can belong to several communities) is perhaps the clique percolation method (CPM). This method formalizes the notion of community as a maximal union of k-cliques that can be reached from each other through a series of adjacent k-cliques, where two cliques are adjacent if and only if they overlap on k−1 nodes. Despite much effort CPM has not been scalable to large graphs for medium values of k. Recent work has shown that it is possible to efficiently list all k-cliques in very large real-world graphs for medium values of k. We build on top of this work and scale up CPM. In cases where this first algorithm faces memory limitations, we propose another algorithm, CPMZ, that provides a solution close to the exact one, using more time but less memory.
@inproceedings{baudin2022clique, title = {Clique percolation method: memory efficient almost exact communities}, author = {Baudin, Alexis and Danisch, Maximilien and Kirgizov, Sergey and Magnien, Cl{\'e}mence and Ghanem, Marwan}, booktitle = {Advanced Data Mining and Applications: 17th International Conference, ADMA 2021, Sydney, NSW, Australia, February 2--4, 2022, Proceedings, Part II}, pages = {113--127}, year = {2022}, organization = {Springer}, }
International journals
2021
- Assessing conservation of alternative splicing with evolutionary splicing graphsGenome Research, 2021
Understanding how protein function has evolved and diversified is of great importance for human genetics and medicine. Here, we tackle the problem of describing the whole transcript variability observed in several species by generalizing the definition of splicing graph. We provide a practical solution to construct parsimonious evolutionary splicing graphs where each node is a minimal transcript building block defined across species. We show a clear link between the functional relevance, tissue regulation, and conservation of alternative transcripts on a set of 50 genes. By scaling up to the whole human protein-coding genome, we identify a few thousand genes where alternative splicing modulates the number and composition of pseudorepeats. We have implemented our approach in ThorAxe, an efficient, versatile, robust, and freely available computational tool.
@article{zea2021assessing, title = {Assessing conservation of alternative splicing with evolutionary splicing graphs}, author = {Zea, Diego Javier and Laskina, Sofya and Baudin, Alexis and Richard, Hugues and Laine, Élodie}, journal = {Genome Research}, volume = {31}, number = {8}, pages = {1462--1473}, year = {2021}, publisher = {Cold Spring Harbor Lab}, }
2019
- Controlling large Boolean networks with single-step perturbationsAlexis Baudin, Soumya Paul, Cui Su, and Jun PangBioinformatics, 2019
Motivation: The control of Boolean networks has traditionally focussed on strategies where the perturbations are applied to the nodes of the network for an extended period of time. In this work, we study if and how a Boolean network can be controlled by perturbing a minimal set of nodes for a single-step and letting the system evolve afterwards according to its original dynamics. More pre- cisely, given a Boolean network (BN), we compute a minimal subset Cmin of the nodes such that BN can be driven from any initial state in an attractor to another ‘desired’ attractor by perturbing some or all of the nodes of Cmin for a single-step. Such kind of control is attractive for biological systems because they are less time consuming than the traditional strategies for control while also being financially more viable. However, due to the phenomenon of state-space explosion, comput- ing such a minimal subset is computationally inefficient and an approach that deals with the entire network in one-go, does not scale well for large networks. Results: We develop a ‘divide-and-conquer’ approach by decomposing the network into smaller partitions, computing the minimal control on the projection of the attractors to these partitions and then composing the results to obtain Cmin for the whole network. We implement our method and test it on various real-life biological networks to demonstrate its applicability and efficiency.
@article{baudin2019controlling, title = {Controlling large Boolean networks with single-step perturbations}, author = {Baudin, Alexis and Paul, Soumya and Su, Cui and Pang, Jun}, journal = {Bioinformatics}, volume = {35}, number = {14}, pages = {i558--i567}, year = {2019}, publisher = {Oxford University Press}, }
Preprints
2023
- Faster maximal clique enumeration in large real-world link streamsAlexis Baudin, Clémence Magnien, and Lionel TabourierarXiv preprint arXiv:2302.00360, Feb 2023
Link streams offer a good model for representing interactions over time. They consist of links (b,e,u,v), where u and v are vertices interacting during the whole time interval [b,e]. In this paper, we deal with the problem of enumerating maximal cliques in link streams. A clique is a pair (C,[t0,t1]), where C is a set of vertices that all interact pairwise during the full interval [t0,t1]. It is maximal when neither its set of vertices nor its time interval can be increased. Some of the main works solving this problem are based on the famous Bron-Kerbosch algorithm for enumerating maximal cliques in graphs. We take this idea as a starting point to propose a new algorithm which matches the cliques of the instantaneous graphs formed by links existing at a given time t to the maximal cliques of the link stream. We prove its validity and compute its complexity, which is better than the state-of-the art ones in many cases of interest. We also study the output-sensitive complexity, which is close to the output size, thereby showing that our algorithm is efficient. To confirm this, we perform experiments on link streams used in the state of the art, and on massive link streams, up to 100 million links. In all cases our algorithm is faster, mostly by a factor of at least 10 and up to a factor of 104. Moreover, it scales to massive link streams for which the existing algorithms are not able to provide the solution.
@article{baudin2023faster, title = {Faster maximal clique enumeration in large real-world link streams}, author = {Baudin, Alexis and Magnien, Cl{\'e}mence and Tabourier, Lionel}, year = {2023}, month = feb, }
French-speaking conferences
2023
- Énumération efficace des cliques maximales dans les flots de liens réels massifsAlexis Baudin, Clémence Magnien, and Lionel TabourierRevue des Nouvelles Technologies de l’Information, Jan 2023
Les flots de liens offrent un formalisme de description d’interactions au cours du temps. Un lien correspond à deux sommets qui interagissent sur un intervalle de temps. Une clique est un ensemble de sommets associé à un intervalle de temps durant lequel ils sont tous connectés. Elle est maximale si ni son ensemble de sommets ni son intervalle de temps ne peuvent être augmentés. Les algorithmes existants pour énumérer ces structures ne permettent pas de traiter des jeux de données réels de plus de quelques centaines de milliers d’interactions. Or, l’accès à des données toujours plus massives demande d’adapter les outils à de plus grandes échelles. Nous proposons alors un algorithme qui énumère les cliques maximales sur des réseaux temporels réels et massifs atteignant jusqu’à plus de 100 millions de liens. Nous montrons expérimentalement qu’il améliore l’état de l’art de plusieurs ordres de grandeur.
@article{baudin2023enumeration, author = {Baudin, Alexis and Magnien, Clémence and Tabourier, Lionel}, title = {Énumération efficace des cliques maximales dans les flots de liens réels massifs}, journal = {Revue des Nouvelles Technologies de l'Information}, volume = {Extraction et Gestion des Connaissances, RNTI-E-39}, year = {2023}, pages = {139-150}, month = jan, }