Developer notes

Overview

graph LR A("Input <br/> (A Networkx Graph)") B("External Process <br/> (An executable)") C("Parameters <br/> (Motif size, sampling method, ...)") D("Output <br/> (Motif enumeration as DataFrame)") A --> B C --> B B --> D

High level class overview

classDiagram class PyMotifCounterParameter class PyMotifCounterOutputTransformerBase class PyMotifCounterInputTransformerBase class PyMotifCounterBase PyMotifCounterBase : -str _binary_location PyMotifCounterBase o-- PyMotifCounterInputTransformerBase:_input_transformer PyMotifCounterBase o-- PyMotifCounterOutputTransformerBase:_output_transformer PyMotifCounterBase *-- "0..*" PyMotifCounterParameter:_parameters class PyMotifCounterMfinder class PyMotifCounterNetMODE class PyMotifCounterFanmod class PyMotifCounterPgd PyMotifCounterMfinder --|> PyMotifCounterBase PyMotifCounterNetMODE --|> PyMotifCounterBase PyMotifCounterFanmod --|> PyMotifCounterBase PyMotifCounterPgd --|> PyMotifCounterBase

A high level view of the project’s design.

Inputs

Broadly speaking all programs expect an edge list format where nodes have unique numeric IDs. But, each program allows certain details to be passed along with the edgelist, depending on their specific objectives.

  1. mfinder
    • Expects an edge list of Source\tTarget\tWeight

    • The edge list must be TAB delimited

    • The Weight is not active and should default to 1

  2. fanmod_cmd
    • Expects an edge list of Source Target

    • Can also accept ….

  3. NetMODE
    • Expects an edge list of Source Target

    • Listing has to begin with the number of edges expected present in the file.

    • The edge list must be TAB delimited.

  4. PGD
    • Expects an edge list of Source Target

    • The edge list must be COMMA separated.

Parameters

mfinder

Usage : mfinder <Network input file name> -s <motif size> -r <no. of randon networks> [-f <output file name>] [more flags]

	-s <motif size>  :Motif size to search
	-r <rand net num> :Number of random networks to generate
	-f <output file name>  : Output file name
	-nd : Input network is a non-directed network.
	-p <num of samples>: run with Sampling method,
	-omem : output members list of all subgraphs
	-h : help

	Additional flags:

	Motif criteria flags:
	-m <value> : mfactor threshold to use when calculating motifs
	-z <value> : Z-score threshold to use when calculating motifs
	-u : Uniqueness threshold
	-nu : Dont count uniqueness and ignore uniqueness threshold

	Random networks randomization flags:

	-rs : use stubs method for generating random networks
	-rclust : Preserve clustering sequence in random networks
	-met :Use Metropolis algorithm to conserve triad-census
		in random networks
		(for s>3; Default : Do not use Metropolis)
	-t0 <(default 0.001)> :Initial temperature (-met option)
	-iter <(default 2)> :controls how many steps to perform (-met option) 
	-eth  <(default 0.005)> : energy threshhold (-met option)
	-rgrass <colony size>: generate random networks using grassberger 
		algorithm
	-rgrass_max_sz <max ratio>: Limit maximal colony size ratio
	-rdm: don't conserve mutual edges in random networks
	-rcl <layers num><size1 size2 ..sizem>: conserve layers in random
		 networks
	-nsr : Global Switches number when generating Random networks.


	Output files flags:

	-oi : output intermediate output file. Defualt :No: 
	-ospmem <subgraph id>: output members list of a specific subgraphs only
	-maxmem <list length>: limit length of members list to 'list length'.
		Defualt: 1000
	-omat : output matrix format file ('__MAT.txt')
	-omet : output metropolis log
	-olog : output general log file
	-orall : output matrix format of appearances in each random network
	-ornet : output random networks files
	-otop <no. of top motifs> : No. of top motifs to show
	-onodangl: output a list of all nan-dangling detected motifs

	Other flags

	-ts : <target,source,weight> Old format of input network file
	-q :Quiet mode - No output to the screen
	-dd : Don't die mode. Wait to user action before terminating the
		program
	-pold <num of samples>: run sampling method old version
	-nor : Dont search Real network. Defualt :No: 
	-cr : calculate roles statistics
mfinder Version 1.20

fanmod_cmd

Allowed options:
  --help                  produce help message
  -i [ --input ] arg      input graph file
  -o [ --output ] arg     output csv file
  -d [ --directed ]       the graph input is directed
  -s [ --motif_size ] arg the size of the searched motifs. default = 4
  -r [ --rnd_nets ] arg   the amount of random nets to compare the motif 
                          frequency. default = 1000

NetMODE

-k --size    = k-node subgraphs (=3,4,5 or 6)
-c --random  = # comparison graphs (some number, could be 0, an integer in [0, 2^31))
-b --burnin  = burnin = # comparison graphs discarded (some number, could be 0)
-e --method  = bidirectional edge random_method (=0, 1, 2, 3(default) or 4)
               0: fixed;
               1: no regard;
               2: global constant;
               3: local constant;
               4: uniform.
-t --thread  = number of threads to run
-v --verbose = interface mode (for interfacing with e.g. R) (=0 (default), 1)
-s --showall = show all subgraph statistics in random network while (k <= 5)

PGD

Usage: ./pgd -f path

         pgd options:
        =================================================================================
        Parallel Parameterized Graphlet Decomposition (PGD) Library
        =================================================================================
        -f, --file,--graph              : Input GRAPH file for computing the graphlets (e.g., matrix market format, simple edge list).
        -a, --algorithm                 : Algorithm for the GRAPHLET DECOMPOSITION. Default: exact, approximate, etc.
        ---------------------------------------------------------------------------------
        -w, --workers                   : Number of PROCESSING UNITS (workers) for the algorithm to use (default = max).
        -b, --block_size                : Size of batch (number of jobs) dynamically assigned to the processing unit, that is, 1, 64, 512, etc.  Default: -b 64
        ---------------------------------------------------------------------------------
        -o, --ordering                  : Strategy used to determine the order in which the edge/node graphlets are computed.
                                          Default: -o degree (kcore, rand, natural/off, etc.).
            --s2l                       : Direction of the ordering (default: largest to smallest).
                                          Note to order from smallest to largest, set '--s2l'
        -n, --neigh_ordering            : Strategy used to order the neighbors of each node. Default: degree (kcore, rand, natural, etc.)
                                          Note only applicable if CSC/CSR is used (-r csc).
            --s2l_neigh                 : Order direction for neighbor/csc ordering strategy
                                          (e.g., --neigh_ordering degree --s2l_neigh, default: largest to smallest)
        ---------------------------------------------------------------------------------
        -c, --counts,--macro            : Compute MACRO (GLOBAL) GRAPHLET FEATURES and save counts to file (e.g., --counts name.graphlets)
        -m, --micro                     : Compute MICRO (LOCAL) GRAPHLET FEATURES and save EDGE-by-MOTIF feature matrix (-m name.micro_graphlets)
                                          Default: OFF. NOTE: Turn ON edge graphlet counting by specifying an output file, e.g., '-m name.micro_graphlets'
        ---------------------------------------------------------------------------------
        -v, --verbose                   : Output additional details to the screen.
        -?, -h, --help                  : Print out this help menu.


        REPRESENTATION: Example: ./pgd -r csc (adj, etc.)
        =================================================================================
        -r,   --rep                     : Graph representation [adj, csc, hybrid, auto, etc].
                                          Default: Auto select optimal.
                'adj'    - adjacency matrix   : dense n by n matrix, O(|V|^2) storage cost
                'csc'    - comp. sparse col   : large sparse graphs, O(|V|+|E|) storage cost
                'hybrid' -  csc + adj         : small sparse and dense graphs, O(|V|^2 + |V| + |E|) storage cost
        -l, --adj_limit                 : Threshold/limit for creating adj representation. Default: '-l 10000' (that is <10000 nodes).


        ORDERING TECHNIQUES: Example: ./pgd -o degree (kcore, rand, etc.)
        =================================================================================
        'degree',   'deg'                    : O(|V|)
        'kcore',                             : O(|E|)
        'rand', 'random'                     : O(|V|)
        'off',  'natural'

         Other methods for ordering include:
        'kcore_degree', 'kcore_deg'          : O(|V|)
        'degree_vol',   'deg_vol'            : O(|E|)
        'kcore_vol',                         : O(|E|)
        'deg_kcore_vol'                      : O(|E|)
        ------------------------------------------------------------------
        NOTE: Default ordering is kcore (degeneracy order). For natural order, use '-o off' or '-o natural'

        Copyright Nesreen K. Ahmed (http://nesreenahmed.com) and Ryan A. Rossi (http://ryanrossi.com).
        Website: http://nesreenahmed.com/graphlets for news and updates.

Outputs

mfinder


   Summary motif results
   =====================
mfinder Version 1.20

MOTIF FINDER RESULTS:

	Network name: alpha.el
	Network type: Directed
	Num of Nodes: 510 Num of Edges: 510
	Num of Nodes with edges: 510
	Maximal out degree (out-hub) : 2
	Maximal in degree (in-hub) : 1
	Roots num: 0 Leaves num: 256
	Single Edges num: 508 Mutual Edges num: 0

	Motif size searched  3
	Total number of 3-node subgraphs : 762
	Number of random networks generated : 100
	Random networks generation method: Switches
	Num of Switches range: 100.0-200.0, Success switches Ratio:0.988+-0.00

The following motifs were found:

Criteria taken : Nreal Zscore > 2.00
                 Pval ignored (due to small number of random networks)
                 Mfactor > 1.10
                 Uniqueness >= 4



	Full list includes 0 motifs
MOTIF	NREAL	NRAND		NREAL	NREAL	UNIQ	CREAL
ID		STATS		ZSCORE	PVAL	VAL	[MILI]	




Full list of subgraphs size 3 ids:

	( Total num of different subgraphs size 3 is : 13 )

MOTIF	NREAL	NRAND		NREAL	NREAL	CREAL	UNIQ
ID		STATS		ZSCORE	PVAL	[MILI]	
6	254	254.0+-0.0	-0.10	1.000	333.33	170

12	504	503.5+-1.3	0.36	0.870	661.42	72

14	0	0.0+-0.0	888888	1.000	0.00	0

36	0	0.0+-0.0	888888	1.000	0.00	0

38	0	0.0+-0.0	888888	1.000	0.00	0

46	0	0.0+-0.0	888888	1.000	0.00	0

74	0	0.0+-0.0	888888	1.000	0.00	0

78	0	0.0+-0.0	888888	1.000	0.00	0

98	0	0.2+-0.4	-0.36	1.000	0.00	0

102	0	0.0+-0.0	888888	1.000	0.00	0

108	0	0.0+-0.0	888888	1.000	0.00	0

110	0	0.0+-0.0	888888	1.000	0.00	0

238	0	0.0+-0.0	888888	1.000	0.00	0



 (Application total runtime was:   18.0 seconds.)

 (Real network processing runtime was:    0.0 seconds.)

 (Single Random network processing runtime was:    0.2 seconds.)

fanmod_cmd

FANMOD 1.1 subgraph enumeration
-------------------------------

Network name: alpha_2.el
Network type: Undirected
Number of nodes: 100
Number of edges: 400
Number of single edges: 0
Number of mutual edges: 400

Algorithm: enumeration
Subgraph size: 3

Generated 1000 random networks
   with locally constant number of bidirectional edges,
   3 exchanges per edge and 3 tries per edge.

1798 subgraphs were enumerated in the original network.
2817794 subgraphs were enumerated in the random networks.
2819592 subgraphs were enumerated in all networks.

For the random networks: 1608151 tries were made, 1177966 were successful.
Randomization took 0.88589 seconds.
Enumeration took 5.49207 seconds.


Result overview:

ID,Adj-Matrix,Frequency,Mean-Freq,Standard-Dev,Z-Score,p-Value
,,[Original],[Random],[Random]

78,001,71.58%,99.961%,0.00074998,-378.43,1
,001
,110

238,011,28.42%,0.039255%,0.00074998,378.43,0
,101
,110

NetMODE

NetMODE produces both a file with results as well as a file with the adjacency matrices of the motifs it has detected.

calc Z-Score
gID:   6  freq:    80  ave_rand_freq:     -nan (sd:   -nan)  conc: 0.16393  ave_rand_conc: -nan (sd: -nan)  f-ZScore:   -nan  f-pValue: -nan  c-ZScore:   -nan  c-pValue: -nan
gID:  12  freq:   103  ave_rand_freq:     -nan (sd:   -nan)  conc: 0.21107  ave_rand_conc: -nan (sd: -nan)  f-ZScore:   -nan  f-pValue: -nan  c-ZScore:   -nan  c-pValue: -nan
gID:  14  freq:   105  ave_rand_freq:     -nan (sd:   -nan)  conc: 0.21516  ave_rand_conc: -nan (sd: -nan)  f-ZScore:   -nan  f-pValue: -nan  c-ZScore:   -nan  c-pValue: -nan
gID:  36  freq:    61  ave_rand_freq:     -nan (sd:   -nan)  conc: 0.12500  ave_rand_conc: -nan (sd: -nan)  f-ZScore:   -nan  f-pValue: -nan  c-ZScore:   -nan  c-pValue: -nan
gID:  38  freq:    13  ave_rand_freq:     -nan (sd:   -nan)  conc: 0.02664  ave_rand_conc: -nan (sd: -nan)  f-ZScore:   -nan  f-pValue: -nan  c-ZScore:   -nan  c-pValue: -nan
gID:  46  freq:    15  ave_rand_freq:     -nan (sd:   -nan)  conc: 0.03074  ave_rand_conc: -nan (sd: -nan)  f-ZScore:   -nan  f-pValue: -nan  c-ZScore:   -nan  c-pValue: -nan
gID:  74  freq:    69  ave_rand_freq:     -nan (sd:   -nan)  conc: 0.14139  ave_rand_conc: -nan (sd: -nan)  f-ZScore:   -nan  f-pValue: -nan  c-ZScore:   -nan  c-pValue: -nan
gID:  78  freq:    12  ave_rand_freq:     -nan (sd:   -nan)  conc: 0.02459  ave_rand_conc: -nan (sd: -nan)  f-ZScore:   -nan  f-pValue: -nan  c-ZScore:   -nan  c-pValue: -nan
gID:  98  freq:     1  ave_rand_freq:     -nan (sd:   -nan)  conc: 0.00205  ave_rand_conc: -nan (sd: -nan)  f-ZScore:   -nan  f-pValue: -nan  c-ZScore:   -nan  c-pValue: -nan
gID: 102  freq:     5  ave_rand_freq:     -nan (sd:   -nan)  conc: 0.01025  ave_rand_conc: -nan (sd: -nan)  f-ZScore:   -nan  f-pValue: -nan  c-ZScore:   -nan  c-pValue: -nan
gID: 108  freq:     7  ave_rand_freq:     -nan (sd:   -nan)  conc: 0.01434  ave_rand_conc: -nan (sd: -nan)  f-ZScore:   -nan  f-pValue: -nan  c-ZScore:   -nan  c-pValue: -nan
gID: 110  freq:    12  ave_rand_freq:     -nan (sd:   -nan)  conc: 0.02459  ave_rand_conc: -nan (sd: -nan)  f-ZScore:   -nan  f-pValue: -nan  c-ZScore:   -nan  c-pValue: -nan
gID: 238  freq:     5  ave_rand_freq:     -nan (sd:   -nan)  conc: 0.01025  ave_rand_conc: -nan (sd: -nan)  f-ZScore:   -nan  f-pValue: -nan  c-ZScore:   -nan  c-pValue: -nan
graphID = 6

000
000
110


graphID = 12

000
001
100


graphID = 14

000
001
110


graphID = 36

000
100
100


graphID = 38

000
100
110


graphID = 46

000
101
110


graphID = 74

001
001
010


graphID = 78

001
001
110


graphID = 98

001
100
010


graphID = 102

001
100
110


graphID = 108

001
101
100


graphID = 110

001
101
110


graphID = 238

011
101
110


pgd

[reading generic edge list: read_edge_list func]  filename: sample_graph.csv
|V|: 5748
|E|: 14267
p: 0.000863783
d_max: 11
d_avg: 4
r = 0.253437
K = 4
************************************************************
total_2_1edge = 14267
total_2_indep = 16502611
----------------------------------------
total_3_tris = 9286
total_2_star = 35397
total_3_1edge = 81879530
total_3_indep = 31553402783
----------------------------------------
total_4_clique = 2116
total_4_chordcycle = 9925
total_4_tailed_tris = 41154
total_4_cycle = 1267
total_3_star = 13838
total_4_path = 88568
----------------------------------------
total_4_1edge = 234712803384
total_4_2edge = 101544802
total_4_2star = 203029889
total_4_tri = 53278602
total_4_indep = 45201167584460
************************************************************
graphlet decomposition time: 0.0163529 sec
--------------------------------------------------------------------------------
Graphlet Frequency Distribution (GFD)
--------------------------------------------------------------------------------
4-clique     	4.65708e-11
4-chordal-cycle	2.18438e-10
4-tailed-tri	9.05753e-10
4-cycle       	2.78852e-11
3-star       	3.04559e-10
4-path       	1.94928e-09
4-node-1-tri   	1.1726e-06
4-node-2-star  	4.46846e-06
4-node-2-edge	2.23489e-06
4-node-1-edge  	0.00516576
4-node-indep  	0.994826


--------------------------------------------------------------------------------
Connected Motif Graphlet Frequency Distribution (GFD)
--------------------------------------------------------------------------------
4-clique     	0.013489
4-chordal-cycle	0.0632698
4-tailed-tri	0.262348
4-cycle       	0.00807685
3-star       	0.0882143
4-path       	0.564602


--------------------------------------------------------------------------------
Disconnected Motif Graphlet Frequency Distribution (GFD)
--------------------------------------------------------------------------------
4-node-1-tri   	1.1726e-06
4-node-2-star	4.46846e-06
4-node-2-edge  	2.23489e-06
4-node-1-edge  	0.00516576
4-node-indep  	0.994826


Recompiling the binaries

For the moment, please see this document with more information about this.

Other algorithms

  1. ORCA website
  2. Hypergraphlets