Why PyMotifCounter?

This package was born out of a necessity to obtain motif counts of networks that were generated from a variety of sources and were generally available as networkx objects.

The motivation to create it was twofold:

  1. To provide an interface to key motif counting algorithms, to the extent that would allow a user to simply call the binary with a networkx object and receive a computable version of the motif distribution.

  2. To preserve the code of these counters

The interface aspect

Suppose that you have a set of networks whose node id’s are arbitrary strings (or country names, or EEG electrode names, or state names or more generally anything other than an integer id) and you wanted to see what motifs it is composed of. Suppose also that you wanted to perform this enumeration using mfinder [1].

If you are working with Python you are probably using networkx to work with disparate graphs an expect that you could get the motif distribution in computable form be able to use it further in other calculations.

To obtain a single motif distribution using (for example) mfinder, you have to go through the following steps:

  1. If you don’t have mfinder, you need to obtain it, compile it and install it to your system.

  2. Provided that you dealt with step #1:
    1. The mfinder binary is called as > mfinder network.el -s 3 -r 1000 where:
      • network.el is a file containing the edge list representation of your network in a particular format expected by mfinder

        • mfinder expects the edge list file to be listing one edge in each line with the format: Source Node ID, Target Node ID, Weight, where the node IDs must be numeric and Weight, as it is not used, should have the value of 1.

      • -s 3 is the motif size to enumerated; and

      • -r 1000 is the number of random networks to generate for statistical testing purposes

    2. At the end of step 2.1, you get a file with the filename [123456]_OUT.txt where [123456] are the first 6 characters of the input file (network.el), that contains the motif enumeration along with other information associated with it that looks like the following.

      
         Summary motif results
         =====================
      mfinder Version 1.20
      
      MOTIF FINDER RESULTS:
      
      	Network name: alpha.el
      	Network type: Directed
      	Num of Nodes: 510 Num of Edges: 510
      	Num of Nodes with edges: 510
      	Maximal out degree (out-hub) : 2
      	Maximal in degree (in-hub) : 1
      	Roots num: 0 Leaves num: 256
      	Single Edges num: 508 Mutual Edges num: 0
      
      	Motif size searched  3
      	Total number of 3-node subgraphs : 762
      	Number of random networks generated : 100
      	Random networks generation method: Switches
      	Num of Switches range: 100.0-200.0, Success switches Ratio:0.988+-0.00
      
      The following motifs were found:
      
      Criteria taken : Nreal Zscore > 2.00
                       Pval ignored (due to small number of random networks)
                       Mfactor > 1.10
                       Uniqueness >= 4
      
      
      
      	Full list includes 0 motifs
      MOTIF	NREAL	NRAND		NREAL	NREAL	UNIQ	CREAL
      ID		STATS		ZSCORE	PVAL	VAL	[MILI]	
      
      
      
      
      Full list of subgraphs size 3 ids:
      
      	( Total num of different subgraphs size 3 is : 13 )
      
      MOTIF	NREAL	NRAND		NREAL	NREAL	CREAL	UNIQ
      ID		STATS		ZSCORE	PVAL	[MILI]	
      6	254	254.0+-0.0	-0.10	1.000	333.33	170
      
      12	504	503.5+-1.3	0.36	0.870	661.42	72
      
      14	0	0.0+-0.0	888888	1.000	0.00	0
      
      36	0	0.0+-0.0	888888	1.000	0.00	0
      
      38	0	0.0+-0.0	888888	1.000	0.00	0
      
      46	0	0.0+-0.0	888888	1.000	0.00	0
      
      74	0	0.0+-0.0	888888	1.000	0.00	0
      
      78	0	0.0+-0.0	888888	1.000	0.00	0
      
      98	0	0.2+-0.4	-0.36	1.000	0.00	0
      
      102	0	0.0+-0.0	888888	1.000	0.00	0
      
      108	0	0.0+-0.0	888888	1.000	0.00	0
      
      110	0	0.0+-0.0	888888	1.000	0.00	0
      
      238	0	0.0+-0.0	888888	1.000	0.00	0
      
      
      
       (Application total runtime was:   18.0 seconds.)
      
       (Real network processing runtime was:    0.0 seconds.)
      
       (Single Random network processing runtime was:    0.2 seconds.)
      

To apply this process to a set of networks, you can now:

  1. Handle the analysis without Python by:
    • Writing an intermediate script that converts all of your networks to a bunch of files in edge-list format; then

    • Writing a bash script that scans these files and sends them to mfinder (including the specific enumeration parameters); then

    • Writing another script (possibly involving sed, awk or even a parser in Python) to scan the result files and creates something like a CSV file which you would then use for further analysis.

  2. Handle the analysis with Python by:
    • Using networkx to convert the network to a file; then

    • Sending this file and the parameters to the mfinder binary; then

    • Parsing the output of that file to get the enumeration.

The first option leaves a large margin for error as we now have to juggle a number of scripts (and a directory structure) to perform and analyse the data from each set of computational experiments we want to run.

The second option is slightly better but it still requires handling the conversion from and to the intermediate stages and calling the external process.

The code maintenance aspect

Suppose now that you decided that no road is easier than the other and you now have a working solution.

Years pass, your solution is still working but you now decide to extend the code that produces the binary, or you want to automate the installation process, or you want to perform any additional steps that augment the original functionality in any way.

You download the code base and try to compile it but you realise that things have moved on, certain libraries are not available any more or bugs have been found with the versions that were originally linked to a given motif counter.

Or, as it did happen with one of the binaries packaged along with PyMotifCounter, the original source code is no more available (i.e. the whole website has vanished)…

Now what?

The PyMotifCounter module

Having abstracted mfinder, it was a relatively small step to create models that abstract the same process across a number of different motif counting algorithms, each one able to produce its own kind of unique feature for a given network [2].

The primary objective of PyMotifCounter at this stage is to provide an interface to external processes that abstract away each algorithm’s input, output and parameters and enable a complete motif enumeration workflow using Python end-to-end.

All that the user interacts with is a top level Python object that accepts a networkx network and returns the motif distribution in a pandas.DataFrame, thus making it easier to use these results further within a Python based network data analysis ecosystem.

flowchart TB A["Network (networkx)"]; B["Transform to method (process) dependent representation"]; C["Check parameter validity"] D["Run external process"]; E["Collect output"] F["Transform output to computable form (DataFrame)"]; G["Return results (DataFrame)"]; subgraph Python A --> B subgraph PyMotifCounter B --> C E --> F end F --> G end subgraph System C --> D D --> E end style A fill:#ddcbbc style G fill:#ddcbbc style D fill:#E83B3C, color:#BBBBBB style B fill:#2b9c90 style C fill:#2b9c90 style E fill:#2b9c90 style F fill:#2b9c90

Typical flowchart of PyMotifCounter operation.

Outlook

PyMotifCounter uses subprocess.popen() to call an external process that handles the actual motif enumeration. Although it is possible to optimise this invocation of an external process at the operating system level so that it is nearly instantaneous, this method is associated with a longer processing overhead than actually packaging the C/C++ code that was used to write them, in a proper Python binding.

And this is exactly, the broader aim of PyMotifCounter, to offer true Python bindings to the underlying data structures and code for each one of these algorithms.

Until then, if you are looking for a convenient way to get computable forms of a network’s motif distribution, have a look at the: