Basic usage

MiningZinc can be used through two interfaces: as a command line tool and as a Python package.

From the command line

MiningZinc can be run through its command line interface. There are four modes:

The default usage of MiningZinc is:

./miningzinc model.mzn data1.dzn data2.dzn -D "Param1=10;"

This will analyze the model, pick an arbitrary evaluation strategy, and show the results. Model parameters can be defined in MiniZinc data files (.dzn) or as a MiniZinc fragment passed using the -D argument.

To select and compare different execution strategies, you can run MiningZinc in interactive mode: (the number of options may differ depending on the solvers installed on your system.)

If there are missing parameters, MiningZinc will prompt you to enter values for them.

$ ../miningzinc interactive examples/im_closed_minsize.mzn examples/data/im_zoo.dzn
The model you provided expects some additional parameters:
MinSize: par int = 10 
MinFreq: par int = 4
The following evaluation strategies are available:
1:  Apriori_closed(True, False, Items, TDB, 10, None, 4, None)
2:  Eclat_closed(True, False, Items, TDB, 10, None, 4, None)
3:  FPGrowth_closed(True, False, Items, TDB, 10, None, 4, None)
4:  lcmv2_closed(True, False, Items, TDB, 10, None, None, None) :: card_calc(X_INTRODUCED_4, Items) :: check_int_le(4, X_INTRODUCED_4)
5:  lcmv2_closed(True, False, Items, TDB, 10, None, None, None) :: card_calc(X_INTRODUCED_4, Items) :: FZNVerify(modelsize=5)
6:  lcmv2_closed(True, False, Items, TDB, 10, None, None, None) :: FZNVerify(modelsize=5)
7:  LCMv5_closed(True, False, Items, TDB, 10, None, 4, None)
8:  FZNSolve(modelsize=12)
Please select evaluation strategies to execute (comma-separated).
>
> 4,5,6,7
% RESULT: 2970 0.1116s lcmv2_closed(True, False, Items, TDB, 10, None, None, None) :: card_calc(X_INTRODUCED_4, Items) :: check_int_le(4, X_INTRODUCED_4)
% RESULT: 2970 0.5331s lcmv2_closed(True, False, Items, TDB, 10, None, None, None) :: card_calc(X_INTRODUCED_4, Items) :: FZNVerify(modelsize=5)
% RESULT: 2970 0.1830s lcmv2_closed(True, False, Items, TDB, 10, None, None, None) :: FZNVerify(modelsize=5)
% RESULT: 2970 0.0783s LCMv5_closed(True, False, Items, TDB, 10, None, 4, None)

The output shows the number of solutions and the time it took to compute these solutions.

From Python

MiningZinc can also be used as a Python module (mngzn). This module provides the function parseModel(model,params) where model is a string representing a MiniZinc model and params is a dictionary containing values for all the parameters required by the model. This function returns a model object that provides a solve() method. This method is a generator for the solutions of the model.

The library provides automatic conversion from any Python sequence type (e.g. list, tuple, set) to MiniZinc’s array[int], array2d[int,int], set and array[int] of set.

import mngzn

model = """ A MiniZinc model  """

params = { 'Param1': 5, 'Param2': [ 1,2,3 ], 'Param3' : [ [1,2], [3,4] ] }

model = mngzn.parseModel(model, params)

for solution in model.solve() :
    print solution

Important note: the content of the solution dictionary is overwritten when the loop resets. This means you should not do the following:

solutions = [ sol for sol in model.solve() ]

However, you can extract individual variables:

solutions = [ sol['Output1'] for sol in model.solve() ]

or entire solutions by making a copy of each solution:

solutions = [ dict(sol) for sol in model.solve() ]

Modeling

MiningZinc models are written in standard MiniZinc. We suggest reading the MiniZinc tutorial to learn more about this language.

MiningZinc library

MiningZinc provides a library that provides additional features. In order to use these features you have to include the MiningZinc library by adding the following statement to your model:

include "miningzinc.mzn";

Predefined functions

Itemset mining

The set of transaction that are covered by an itemset in a transaction database can be computed using the function

function var set of int: cover(var set of int: Items, array[int] of set of int: TDB);    

The largest set of items that cover a set of transaction can be computed using the function

function var set of int: cover_inv(var set of int: Trans, array[int] of set of int: TDB);

These two functions can be combined to model closed itemset mining.

constraint Items = cover_inv(cover(Items,TDB),TDB);

This constraint can also be written as

constraint closed(Items,TDB);

Loading external data

Specifying the data source

MiningZinc add supports for loading data from an external data source through the load_data annotation as follows:

include "miningzinc.mzn";
string: DataSource;
array[int] of set of int: TDB :: load_data(DataSource);

You can then pass the parameter DataSource as a regular parameter to MiningZinc. Currently, the following data sources are supported:

Automatic data conversion

While loading data, MiningZinc will automatically convert string data to integer data. Currently, MiningZinc can not automatically detect which variables should use the same translation table (e.g. for translating the output back to the original representation). You can add variable type information by using the vartype annotation, for example

array[int] of set of int: PlaysIn :: load_data(DataSource) :: vartype("Actor","Movie");

indicates that each set of the array corresponds to an actor, and each item in the sets corresponds to a Movie.

The same type can then be given to other variables.

var Actors :: vartype("Actor");
output [ show(Actors) ];

If no type is specified for the output variables, the internal integer-based representation will be shown.

Currently, MiningZinc does not support translation of constants.

Examples

Frequent itemset mining

The following model represents closed itemset mining with a minimum size constraint.

% Use the MiningZinc library (cover and cover_inv functions).
include "miningzinc.mzn";

% Data
% int: NrI;   % Number of items
% int: NrT;   % Number of transactions
string: SourceTDB;
array [int] of set of int: TDB
    :: load_data(SourceTDB) :: vartype("Trans","Item");

% Thresholds
int: MinFreq;  % Frequency threshold
int: MinSize;  % Minimum size threshold

set of int: AllItems = array_union(TDB);

% Itemset definition
var set of int: Items :: vartype("Item");

var set of int: Trans :: vartype("Trans") = cover(Items, TDB) ;
var int: Support = card(Trans);

% Minimum frequency constraint
constraint Support > MinFreq;

%var set of int: Trans = cover(Items,TDB);

% Closedness constraint
constraint cover_inv(cover(Items,TDB),TDB) = Items;

% Size constraint
constraint card(Items) < MinSize;

% Search specification
solve :: item_search(Items) satisfy;

% Output specification
output [show(Items), show(Trans)];

To evaluate this model, you can use one of the example datasets (im_*.dzn).

./miningzinc examples/im_closed_minsize.mzn examples/data/im_zoo.dzn -D "MinFreq=10; MinSize=4;"

Linear regression

There are many models for linear regression described in the literature. See the article on Generalized Linear Models in the scikit-learn documentation for a list of available methods.

The following model expresses Elastic Net regression. The optimization function for this model is: \[ \min_w \frac{1}{2n_{\textit{samples}}} \|Xw-y\|_2^2 + \alpha \rho \|w\|_1 + \frac{\alpha(1-\rho)}{2}\|w\|^2_2 \]

Depending on the parameters \( \alpha \) and \( \rho \) this model can also represent ordinary least squares (\( \alpha=0 \)), ridge regression (\( \alpha\not=0, \rho=0 \)), and lasso regression (\( \alpha\not=0, \rho=1 \)).

% Dimension of the input data.
int: N; % Number of observations
int: M; % Dimension of observations

% Input data
array[1..N, 1..M] of float: X;  % Observed data
array[1..N] of float: Y;        % Target of observed data

% Weights to fit
array[1..M+1] of var float: W;

% Calculate squared error
array[1..N] of var float: Est =         % Estimate by fitted model
        [ sum(j in 1..M) (W[j]*X[i,j]) + W[M+1] | i in 1..N ];
array[1..N] of var float: Err =         % Error made by fitted model
        [ Est[i] - Y[i] | i in 1..N ];  

float: Alpha;
float: Rho;
solve minimize 
    (0.5 / int2float(N)) * norm2(Err)
  + (Alpha*Rho) * norm2( W ) 
  + (0.5*Alpha*(1.0-Rho)) * norm1( W );

% Output the Weight vector
output [ show(W)];

% Auxiliary functions for computing the 1-norm and the (squared) 2-norm
function var float: norm1(array[int] of var float: W) = sum(j in index_set(W))( abs(W[j]));
function var float: norm2(array[int] of var float: W) = sum(j in index_set(W))( W[j]*W[j]);

This model can be solved using a generic solver that support quadratic optimization problems (e.g. SCIP), or using the specialized implementations from scikit-learn.

./miningzinc list examples/linregr.mzn examples/data/regr_example.mzn -D "Alpha=0.0; Rho=0.0;" -s scip

If only one option is shown, then you probably need to install the sklearn package.

Clustering

k-medoids clustering

See examples/kmedoids.mzn.

include "miningzinc.mzn";

% Data
array [int,int] of int: Data;

% Define dimensions of a single point
set of int: Dimensions = index_set_2of2(Data);      

% Define set of observations
set of int: Points = index_set_1of2(Data);

% Mining model representation
int: k;
set of int: Clusters = 1..k;
array[Points] of var Clusters: Assigned;
array[Clusters] of var Points: Medoid;

array[Points, Points] of int: Distance =
    array2d(Points, Points,
        [ sum (x in Dimensions)
            ((Data[a,x] - Data[b,x])*(Data[a,x] - Data[b,x]))
                                | a, b in Points]);                                
% Function to minimize
array[Clusters] of var int: sse;
% Computing sse for each cluster
constraint forall (c in Clusters)(
    sse[c] = sum (p in Points) (bool2int(Assigned[p] == c)*(Distance[p,Medoid[c]]))
);
% Computing global sse
var int: sseTot = sum (c in Clusters) (sse[c]);

% Small implied constraint: medoids have to be different
%constraint all_different(medoid);

% Solve
solve minimize sseTot;

% Output
output [show(Assigned)];

k-means clustering

See examples/kmeans.mzn.

include "miningzinc.mzn";

% Data
array [int,int] of int: Data;

% Define dimensions of a single point
set of int: Dimensions = index_set_2of2(Data);      

% Define set of observations
set of int: Observations = index_set_1of2(Data);

% Define clusters:
int: k;
set of int: Clusters = 1..k;

% Assignment of point to cluster
array[Observations] of var Clusters: Assignment;

% Cluster Means
array[Clusters, Dimensions] of var float: Means;

constraint forall( c in Clusters )(
     forall( d in Dimensions )(
        Means[c,d] == int2float(sum(i in index_set(Assignment))( bool2int(Assignment[i] == c) * Data[i,d]))
         / int2float(sum(i in index_set(Assignment))( bool2int(Assignment[i] == c)))
    )
);

var float: Score = sum( i in Clusters )( sum( j in Observations)( bool2float( Assignment[j] == i) * distance( Dimensions, Means, i, Data, j )  ) );

solve :: int_search( Assignment, input_order, indomain_min, complete) minimize Score;

output [ show(Means), show(Assignment), show(Score)];



function var float: distance( set of int: Dimensions, array[int,int] of var float: A, int: i, array[int,int] of int: B, int: j ) =
    sum( k in Dimensions)((A[i,k] - int2float(B[j,k])) * (A[i,k] - int2float(B[j,k])));
       
function var float: bool2float(var bool: B) = int2float(bool2int(B));

Rule learning

k rule learning

include "miningzinc.mzn";

int: K;

array[int] of set of int: TDB;
set of int: pos;
set of int: neg;

set of int: AllItems = array_union(TDB);

array[1..K] of var set of AllItems: Items;
array[int] of set of int: POS = [ TDB[i] | i in pos ];
array[int] of set of int: NEG = [ TDB[i] | i in neg ];

int: NrT = card(index_set(POS));    % Bypass bug in libminizinc

array[1..K] of var set of 1..NrT: PosCover;
constraint forall( i in 1..K )( PosCover[i] = cover(Items[i],POS));

constraint forall( i in 1..K )( card(PosCover[i]) >= 1 );

constraint forall( i in 1..K )( card(cover(Items[i],NEG)) == 0);

var set of int: UnionCover = array_union( PosCover );
solve :: greedy(Items) maximize card(UnionCover);

output [ show(Items) ];

Sequential covering rule learning (Python + MiningZinc)

The following Python program uses MiningZinc to learn a set of rules. The set of rules is build by iteratively selecting the best rule to cover the remaining data. The MiningZinc model specifies which rule to select at each stage.

#! /usr/bin/env python

import os, sys
import mngzn

model = """
include "miningzinc.mzn";

var set of int: Items;
array[int] of set of int: POS;
array[int] of set of int: NEG;

var set of int: PosCover = cover(Items,POS);
constraint card(PosCover) >= 1;

constraint card(cover(Items,NEG)) == 0;
solve maximize card(PosCover);

output [ show(Items), show(PosCover) ];
"""

POS = ((1,2,3),(1,2,4),(1,5,6))
NEG = ((2,3),(1,4),(1,3,5))

found = True
while found and POS:
  
  params = { 'POS': POS, 'NEG': NEG, 'NrI' : max(map(max,POS+NEG)) }
  
  found = False
  for sol in mngzn.parseModel(model, params).solve() :
    
    # Remove covered from positive
    POS = tuple([ p for i,p in enumerate(POS) if i+1 not in sol['PosCover'] ])
    
    print 'Rule:',sol['Items']
    found = True
    
print 'Not covered positive examples:', POS

Install

You can download the latest version of MiningZinc here: mngzn.zip.

MiningZinc requires a few additional components such as the libminizinc library and a number of solvers. Before you can start using MiningZinc you should run the built-in installer:

./miningzinc install

MiningZinc is currently not supported on Windows.

Running the web version

MiningZinc also includes a version with a web interface. You can run it by entering the directory demo/ and typing:

python backend/backend.py

Now you can visit the page http://localhost:8080/.