Why PyMotifCounter
?¶
This package was born out of a necessity to obtain motif counts of networks that were
generated from a variety of sources and were generally available as networkx
objects.
The motivation to create it was twofold:
To provide an interface to key motif counting algorithms, to the extent that would allow a user to simply call the binary with a
networkx
object and receive a computable version of the motif distribution.To preserve the code of these counters
The interface aspect¶
Suppose that you have a set of networks whose node id’s are arbitrary strings (or country names, or EEG
electrode names, or state names or more generally anything other than an integer id) and you wanted to see what motifs
it is composed of. Suppose also that you wanted to perform this enumeration using mfinder
[1].
If you are working with Python you are probably using networkx to work with disparate graphs an expect that you could get the motif distribution in computable form be able to use it further in other calculations.
To obtain a single motif distribution using (for example) mfinder
, you have to go through the following steps:
If you don’t have
mfinder
, you need to obtain it, compile it and install it to your system.- Provided that you dealt with step #1:
- The
mfinder
binary is called as> mfinder network.el -s 3 -r 1000
where: network.el
is a file containing the edge list representation of your network in a particular format expected bymfinder
mfinder
expects the edge list file to be listing one edge in each line with the format:Source Node ID, Target Node ID, Weight
, where the node IDs must be numeric andWeight
, as it is not used, should have the value of1
.
-s 3
is the motif size to enumerated; and-r 1000
is the number of random networks to generate for statistical testing purposes
- The
At the end of step 2.1, you get a file with the filename
[123456]_OUT.txt
where[123456]
are the first 6 characters of the input file (network.el
), that contains the motif enumeration along with other information associated with it that looks like the following.Summary motif results ===================== mfinder Version 1.20 MOTIF FINDER RESULTS: Network name: alpha.el Network type: Directed Num of Nodes: 510 Num of Edges: 510 Num of Nodes with edges: 510 Maximal out degree (out-hub) : 2 Maximal in degree (in-hub) : 1 Roots num: 0 Leaves num: 256 Single Edges num: 508 Mutual Edges num: 0 Motif size searched 3 Total number of 3-node subgraphs : 762 Number of random networks generated : 100 Random networks generation method: Switches Num of Switches range: 100.0-200.0, Success switches Ratio:0.988+-0.00 The following motifs were found: Criteria taken : Nreal Zscore > 2.00 Pval ignored (due to small number of random networks) Mfactor > 1.10 Uniqueness >= 4 Full list includes 0 motifs MOTIF NREAL NRAND NREAL NREAL UNIQ CREAL ID STATS ZSCORE PVAL VAL [MILI] Full list of subgraphs size 3 ids: ( Total num of different subgraphs size 3 is : 13 ) MOTIF NREAL NRAND NREAL NREAL CREAL UNIQ ID STATS ZSCORE PVAL [MILI] 6 254 254.0+-0.0 -0.10 1.000 333.33 170 12 504 503.5+-1.3 0.36 0.870 661.42 72 14 0 0.0+-0.0 888888 1.000 0.00 0 36 0 0.0+-0.0 888888 1.000 0.00 0 38 0 0.0+-0.0 888888 1.000 0.00 0 46 0 0.0+-0.0 888888 1.000 0.00 0 74 0 0.0+-0.0 888888 1.000 0.00 0 78 0 0.0+-0.0 888888 1.000 0.00 0 98 0 0.2+-0.4 -0.36 1.000 0.00 0 102 0 0.0+-0.0 888888 1.000 0.00 0 108 0 0.0+-0.0 888888 1.000 0.00 0 110 0 0.0+-0.0 888888 1.000 0.00 0 238 0 0.0+-0.0 888888 1.000 0.00 0 (Application total runtime was: 18.0 seconds.) (Real network processing runtime was: 0.0 seconds.) (Single Random network processing runtime was: 0.2 seconds.)
To apply this process to a set of networks, you can now:
- Handle the analysis without Python by:
Writing an intermediate script that converts all of your networks to a bunch of files in edge-list format; then
Writing a bash script that scans these files and sends them to
mfinder
(including the specific enumeration parameters); thenWriting another script (possibly involving
sed, awk
or even a parser in Python) to scan the result files and creates something like a CSV file which you would then use for further analysis.
- Handle the analysis with Python by:
Using
networkx
to convert the network to a file; thenSending this file and the parameters to the
mfinder
binary; thenParsing the output of that file to get the enumeration.
The first option leaves a large margin for error as we now have to juggle a number of scripts (and a directory structure) to perform and analyse the data from each set of computational experiments we want to run.
The second option is slightly better but it still requires handling the conversion from and to the intermediate stages and calling the external process.
The code maintenance aspect¶
Suppose now that you decided that no road is easier than the other and you now have a working solution.
Years pass, your solution is still working but you now decide to extend the code that produces the binary, or you want to automate the installation process, or you want to perform any additional steps that augment the original functionality in any way.
You download the code base and try to compile it but you realise that things have moved on, certain libraries are not available any more or bugs have been found with the versions that were originally linked to a given motif counter.
Or, as it did happen with one of the binaries packaged along with PyMotifCounter
, the original source code is no
more available (i.e. the whole website has vanished)…
Now what?
The PyMotifCounter
module¶
Having abstracted mfinder
, it was a relatively small step to create models that
abstract the same process across a number of different motif counting algorithms, each one able to produce its own
kind of unique feature for a given network [2].
The primary objective of PyMotifCounter
at this stage is to provide an interface to external processes that
abstract away each algorithm’s input, output and parameters and enable a complete motif enumeration workflow
using Python end-to-end.
All that the user interacts with is a top level Python object that accepts a networkx
network and returns the motif
distribution in a pandas.DataFrame
, thus making it easier to use these results further within a Python
based network data analysis ecosystem.
Outlook¶
PyMotifCounter
uses subprocess.popen()
to call an external process that handles
the actual motif enumeration. Although it is possible to optimise this invocation of an external process at the
operating system level so that it is nearly instantaneous, this method is associated with a longer processing
overhead than actually packaging the C/C++ code that was used to write them, in a proper Python binding.
And this is exactly, the broader aim of PyMotifCounter
, to offer true Python bindings to the underlying
data structures and code for each one of these algorithms.
Until then, if you are looking for a convenient way to get computable forms of a network’s motif distribution, have a look at the: