MiningZinc can be used through two interfaces: as a command line tool and as a Python package.
MiningZinc can be run through its command line interface. There are four modes:
list
: analyze the model and list the possible execution planssolve
: solve the modelinteractive
: combination of previous two modes with a choice menuThe default usage of MiningZinc is:
./miningzinc model.mzn data1.dzn data2.dzn -D "Param1=10;"
This will analyze the model, pick an arbitrary evaluation strategy, and show the results. Model parameters can be defined in MiniZinc data files (.dzn) or as a MiniZinc fragment passed using the -D argument.
To select and compare different execution strategies, you can run MiningZinc in interactive mode: (the number of options may differ depending on the solvers installed on your system.)
If there are missing parameters, MiningZinc will prompt you to enter values for them.
$ ../miningzinc interactive examples/im_closed_minsize.mzn examples/data/im_zoo.dzn
The model you provided expects some additional parameters:
MinSize: par int = 10
MinFreq: par int = 4
The following evaluation strategies are available:
1: Apriori_closed(True, False, Items, TDB, 10, None, 4, None)
2: Eclat_closed(True, False, Items, TDB, 10, None, 4, None)
3: FPGrowth_closed(True, False, Items, TDB, 10, None, 4, None)
4: lcmv2_closed(True, False, Items, TDB, 10, None, None, None) :: card_calc(X_INTRODUCED_4, Items) :: check_int_le(4, X_INTRODUCED_4)
5: lcmv2_closed(True, False, Items, TDB, 10, None, None, None) :: card_calc(X_INTRODUCED_4, Items) :: FZNVerify(modelsize=5)
6: lcmv2_closed(True, False, Items, TDB, 10, None, None, None) :: FZNVerify(modelsize=5)
7: LCMv5_closed(True, False, Items, TDB, 10, None, 4, None)
8: FZNSolve(modelsize=12)
Please select evaluation strategies to execute (comma-separated).
>
> 4,5,6,7
% RESULT: 2970 0.1116s lcmv2_closed(True, False, Items, TDB, 10, None, None, None) :: card_calc(X_INTRODUCED_4, Items) :: check_int_le(4, X_INTRODUCED_4)
% RESULT: 2970 0.5331s lcmv2_closed(True, False, Items, TDB, 10, None, None, None) :: card_calc(X_INTRODUCED_4, Items) :: FZNVerify(modelsize=5)
% RESULT: 2970 0.1830s lcmv2_closed(True, False, Items, TDB, 10, None, None, None) :: FZNVerify(modelsize=5)
% RESULT: 2970 0.0783s LCMv5_closed(True, False, Items, TDB, 10, None, 4, None)
The output shows the number of solutions and the time it took to compute these solutions.
MiningZinc can also be used as a Python module (mngzn
).
This module provides the function parseModel(model,params)
where model
is a string representing a MiniZinc model and params
is a dictionary containing values for all the parameters required by the model.
This function returns a model object that provides a solve()
method.
This method is a generator for the solutions of the model.
The library provides automatic conversion from any Python sequence type (e.g. list
, tuple
, set
) to MiniZinc’s array[int]
, array2d[int,int]
, set
and array[int] of set
.
import mngzn
model = """ A MiniZinc model """
params = { 'Param1': 5, 'Param2': [ 1,2,3 ], 'Param3' : [ [1,2], [3,4] ] }
model = mngzn.parseModel(model, params)
for solution in model.solve() :
print solution
Important note: the content of the solution dictionary is overwritten when the loop resets. This means you should not do the following:
solutions = [ sol for sol in model.solve() ]
However, you can extract individual variables:
solutions = [ sol['Output1'] for sol in model.solve() ]
or entire solutions by making a copy of each solution:
solutions = [ dict(sol) for sol in model.solve() ]
MiningZinc models are written in standard MiniZinc. We suggest reading the MiniZinc tutorial to learn more about this language.
MiningZinc provides a library that provides additional features. In order to use these features you have to include the MiningZinc library by adding the following statement to your model:
include "miningzinc.mzn";
The set of transaction that are covered by an itemset in a transaction database can be computed using the function
function var set of int: cover(var set of int: Items, array[int] of set of int: TDB);
The largest set of items that cover a set of transaction can be computed using the function
function var set of int: cover_inv(var set of int: Trans, array[int] of set of int: TDB);
These two functions can be combined to model closed itemset mining.
constraint Items = cover_inv(cover(Items,TDB),TDB);
This constraint can also be written as
constraint closed(Items,TDB);
MiningZinc add supports for loading data from an external data source through the load_data
annotation as follows:
include "miningzinc.mzn";
string: DataSource;
array[int] of set of int: TDB :: load_data(DataSource);
You can then pass the parameter DataSource
as a regular parameter to MiningZinc.
Currently, the following data sources are supported:
csv;filename
: Load data from a space-separated file.sqlite;filename;query
: Load data from a sqlite database.dzn
or empty string: Ignore this annotation and load data the standard way (from a *.dzn
file or from the -D
flag).While loading data, MiningZinc will automatically convert string data to integer data.
Currently, MiningZinc can not automatically detect which variables should use the same translation table (e.g. for translating the output back to the original representation).
You can add variable type information by using the vartype
annotation, for example
array[int] of set of int: PlaysIn :: load_data(DataSource) :: vartype("Actor","Movie");
indicates that each set of the array corresponds to an actor, and each item in the sets corresponds to a Movie.
The same type can then be given to other variables.
var Actors :: vartype("Actor");
output [ show(Actors) ];
If no type is specified for the output variables, the internal integer-based representation will be shown.
Currently, MiningZinc does not support translation of constants.
The following model represents closed itemset mining with a minimum size constraint.
% Use the MiningZinc library (cover and cover_inv functions).
include "miningzinc.mzn";
% Data
% int: NrI; % Number of items
% int: NrT; % Number of transactions
string: SourceTDB;
array [int] of set of int: TDB
:: load_data(SourceTDB) :: vartype("Trans","Item");
% Thresholds
int: MinFreq; % Frequency threshold
int: MinSize; % Minimum size threshold
set of int: AllItems = array_union(TDB);
% Itemset definition
var set of int: Items :: vartype("Item");
var set of int: Trans :: vartype("Trans") = cover(Items, TDB) ;
var int: Support = card(Trans);
% Minimum frequency constraint
constraint Support > MinFreq;
%var set of int: Trans = cover(Items,TDB);
% Closedness constraint
constraint cover_inv(cover(Items,TDB),TDB) = Items;
% Size constraint
constraint card(Items) < MinSize;
% Search specification
solve :: item_search(Items) satisfy;
% Output specification
output [show(Items), show(Trans)];
To evaluate this model, you can use one of the example datasets (im_*.dzn
).
./miningzinc examples/im_closed_minsize.mzn examples/data/im_zoo.dzn -D "MinFreq=10; MinSize=4;"
There are many models for linear regression described in the literature. See the article on Generalized Linear Models in the scikit-learn documentation for a list of available methods.
The following model expresses Elastic Net regression. The optimization function for this model is: \[ \min_w \frac{1}{2n_{\textit{samples}}} \|Xw-y\|_2^2 + \alpha \rho \|w\|_1 + \frac{\alpha(1-\rho)}{2}\|w\|^2_2 \]
Depending on the parameters \( \alpha \) and \( \rho \) this model can also represent ordinary least squares (\( \alpha=0 \)), ridge regression (\( \alpha\not=0, \rho=0 \)), and lasso regression (\( \alpha\not=0, \rho=1 \)).
% Dimension of the input data.
int: N; % Number of observations
int: M; % Dimension of observations
% Input data
array[1..N, 1..M] of float: X; % Observed data
array[1..N] of float: Y; % Target of observed data
% Weights to fit
array[1..M+1] of var float: W;
% Calculate squared error
array[1..N] of var float: Est = % Estimate by fitted model
[ sum(j in 1..M) (W[j]*X[i,j]) + W[M+1] | i in 1..N ];
array[1..N] of var float: Err = % Error made by fitted model
[ Est[i] - Y[i] | i in 1..N ];
float: Alpha;
float: Rho;
solve minimize
(0.5 / int2float(N)) * norm2(Err)
+ (Alpha*Rho) * norm2( W )
+ (0.5*Alpha*(1.0-Rho)) * norm1( W );
% Output the Weight vector
output [ show(W)];
% Auxiliary functions for computing the 1-norm and the (squared) 2-norm
function var float: norm1(array[int] of var float: W) = sum(j in index_set(W))( abs(W[j]));
function var float: norm2(array[int] of var float: W) = sum(j in index_set(W))( W[j]*W[j]);
This model can be solved using a generic solver that support quadratic optimization problems (e.g. SCIP), or using the specialized implementations from scikit-learn.
./miningzinc list examples/linregr.mzn examples/data/regr_example.mzn -D "Alpha=0.0; Rho=0.0;" -s scip
If only one option is shown, then you probably need to install the sklearn
package.
include "miningzinc.mzn";
% Data
array [int,int] of int: Data;
% Define dimensions of a single point
set of int: Dimensions = index_set_2of2(Data);
% Define set of observations
set of int: Points = index_set_1of2(Data);
% Mining model representation
int: k;
set of int: Clusters = 1..k;
array[Points] of var Clusters: Assigned;
array[Clusters] of var Points: Medoid;
array[Points, Points] of int: Distance =
array2d(Points, Points,
[ sum (x in Dimensions)
((Data[a,x] - Data[b,x])*(Data[a,x] - Data[b,x]))
| a, b in Points]);
% Function to minimize
array[Clusters] of var int: sse;
% Computing sse for each cluster
constraint forall (c in Clusters)(
sse[c] = sum (p in Points) (bool2int(Assigned[p] == c)*(Distance[p,Medoid[c]]))
);
% Computing global sse
var int: sseTot = sum (c in Clusters) (sse[c]);
% Small implied constraint: medoids have to be different
%constraint all_different(medoid);
% Solve
solve minimize sseTot;
% Output
output [show(Assigned)];
See examples/kmeans.mzn.
include "miningzinc.mzn";
% Data
array [int,int] of int: Data;
% Define dimensions of a single point
set of int: Dimensions = index_set_2of2(Data);
% Define set of observations
set of int: Observations = index_set_1of2(Data);
% Define clusters:
int: k;
set of int: Clusters = 1..k;
% Assignment of point to cluster
array[Observations] of var Clusters: Assignment;
% Cluster Means
array[Clusters, Dimensions] of var float: Means;
constraint forall( c in Clusters )(
forall( d in Dimensions )(
Means[c,d] == int2float(sum(i in index_set(Assignment))( bool2int(Assignment[i] == c) * Data[i,d]))
/ int2float(sum(i in index_set(Assignment))( bool2int(Assignment[i] == c)))
)
);
var float: Score = sum( i in Clusters )( sum( j in Observations)( bool2float( Assignment[j] == i) * distance( Dimensions, Means, i, Data, j ) ) );
solve :: int_search( Assignment, input_order, indomain_min, complete) minimize Score;
output [ show(Means), show(Assignment), show(Score)];
function var float: distance( set of int: Dimensions, array[int,int] of var float: A, int: i, array[int,int] of int: B, int: j ) =
sum( k in Dimensions)((A[i,k] - int2float(B[j,k])) * (A[i,k] - int2float(B[j,k])));
function var float: bool2float(var bool: B) = int2float(bool2int(B));
include "miningzinc.mzn";
int: K;
array[int] of set of int: TDB;
set of int: pos;
set of int: neg;
set of int: AllItems = array_union(TDB);
array[1..K] of var set of AllItems: Items;
array[int] of set of int: POS = [ TDB[i] | i in pos ];
array[int] of set of int: NEG = [ TDB[i] | i in neg ];
int: NrT = card(index_set(POS)); % Bypass bug in libminizinc
array[1..K] of var set of 1..NrT: PosCover;
constraint forall( i in 1..K )( PosCover[i] = cover(Items[i],POS));
constraint forall( i in 1..K )( card(PosCover[i]) >= 1 );
constraint forall( i in 1..K )( card(cover(Items[i],NEG)) == 0);
var set of int: UnionCover = array_union( PosCover );
solve :: greedy(Items) maximize card(UnionCover);
output [ show(Items) ];
The following Python program uses MiningZinc to learn a set of rules. The set of rules is build by iteratively selecting the best rule to cover the remaining data. The MiningZinc model specifies which rule to select at each stage.
#! /usr/bin/env python
import os, sys
import mngzn
model = """
include "miningzinc.mzn";
var set of int: Items;
array[int] of set of int: POS;
array[int] of set of int: NEG;
var set of int: PosCover = cover(Items,POS);
constraint card(PosCover) >= 1;
constraint card(cover(Items,NEG)) == 0;
solve maximize card(PosCover);
output [ show(Items), show(PosCover) ];
"""
POS = ((1,2,3),(1,2,4),(1,5,6))
NEG = ((2,3),(1,4),(1,3,5))
found = True
while found and POS:
params = { 'POS': POS, 'NEG': NEG, 'NrI' : max(map(max,POS+NEG)) }
found = False
for sol in mngzn.parseModel(model, params).solve() :
# Remove covered from positive
POS = tuple([ p for i,p in enumerate(POS) if i+1 not in sol['PosCover'] ])
print 'Rule:',sol['Items']
found = True
print 'Not covered positive examples:', POS
You can download the latest version of MiningZinc here: mngzn.zip.
MiningZinc requires a few additional components such as the libminizinc library and a number of solvers. Before you can start using MiningZinc you should run the built-in installer:
./miningzinc install
MiningZinc is currently not supported on Windows.
MiningZinc also includes a version with a web interface.
You can run it by entering the directory demo/
and typing:
python backend/backend.py
Now you can visit the page http://localhost:8080/.