weka控制台指令

java weka.classifiers.trees.J48 -t data/weather.arff

java 类的完整名称 -t表示下一个参数是训练数据集的名称

java weka.classifiers.trees.J48 -h

查看java命令行中各个参数的具体含义

-h or -help
    Output help information.
-synopsis or -info
    Output synopsis for classifier (use in conjunction  with -h)
-t <name of training file>
    Sets training file.
-T <name of test file>
    Sets test file. If missing, a cross-validation will be performed
    on the training data.
-c <class index>
    Sets index of class attribute (default: last).
-x <number of folds>
    Sets number of folds for cross-validation (default: 10).
-no-cv
    Do not perform any cross validation.
-force-batch-training
    Always train classifier in batch mode, never incrementally.
-split-percentage <percentage>
    Sets the percentage for the train/test set split, e.g., 66.
-preserve-order
    Preserves the order in the percentage split.
-s <random number seed>
    Sets random number seed for cross-validation or percentage split
    (default: 1).
-m <name of file with cost matrix>
    Sets file with cost matrix.
-disable <comma-separated list of evaluation metric names>
    Comma separated list of metric names not to print to the output.
    Available metrics:
    Correct,Incorrect,Kappa,Total cost,Average cost,KB relative,KB information,
    Correlation,Complexity 0,Complexity scheme,Complexity improvement,
    MAE,RMSE,RAE,RRSE,Coverage,Region size,TP rate,FP rate,Precision,Recall,
    F-measure,MCC,ROC area,PRC area
-l <name of input file>
    Sets model input file. In case the filename ends with '.xml',
    a PMML file is loaded or, if that fails, options are loaded
    from the XML file.
-d <name of output file>
    Sets model output file. In case the filename ends with '.xml',
    only the options are saved to the XML file, not the model.
-v
    Outputs no statistics for training data.
-o
    Outputs statistics only, not the classifier.
-i
    Outputs detailed information-retrieval statistics for each class.
-k
    Outputs information-theoretic statistics.
-classifications "weka.classifiers.evaluation.output.prediction.AbstractOutput + options"
    Uses the specified class for generating the classification output.
    E.g.: weka.classifiers.evaluation.output.prediction.PlainText
-p range
    Outputs predictions for test instances (or the train instances if
    no test instances provided and -no-cv is used), along with the 
    attributes in the specified range (and nothing else). 
    Use '-p 0' if no attributes are desired.
    Deprecated: use "-classifications ..." instead.
-distribution
    Outputs the distribution instead of only the prediction
    in conjunction with the '-p' option (only nominal classes).
    Deprecated: use "-classifications ..." instead.
-r
    Only outputs cumulative margin distribution.
-z <class name>
    Only outputs the source representation of the classifier,
    giving it the supplied name.
-g
    Only outputs the graph representation of the classifier.
-xml filename | xml-string
    Retrieves the options from the XML-data instead of the command line.
-threshold-file <file>
    The file to save the threshold data to.
    The format is determined by the extensions, e.g., '.arff' for ARFF 
    format or '.csv' for CSV.
-threshold-label <label>
    The class label to determine the threshold data for
    (default is the first label)

Options specific to weka.classifiers.trees.J48:

-U
    Use unpruned tree.
-O
    Do not collapse tree.
-C <pruning confidence>
    Set confidence threshold for pruning.
    (default 0.25)
-M <minimum number of instances>
    Set minimum number of instances per leaf.
    (default 2)
-R
    Use reduced error pruning.
-N <number of folds>
    Set number of folds for reduced error
    pruning. One fold is used as pruning set.
    (default 3)
-B
    Use binary splits only.
-S
    Don't perform subtree raising.
-L
    Do not clean up after the tree has been built.
-A
    Laplace smoothing for predicted probabilities.
-J
    Do not use MDL correction for info gain on numeric attributes.
-Q <seed>
    Seed for random data shuffling (default 1).

weka.core

weka核心包，基本所有类都与他有联系

核心包中的关键类：Attribute：包含attribute’s name, its type, and, in the case of a nominal or string attribute, its possible values

Instance：contains the attribute values of a particular instance

Instances：holds an ordered set of instances—in other words, a dataset

weka.classifiers

内容：contains implementations of most of the algorithms for clas-sification and numeric prediction

关键抽象类：Classifier---->>defines the general structure of any scheme for classification or numeric prediction

包含三个核心方法：buildClassifier(), classifyInstance(),distributionForInstance()

继承这个抽象类的例子：

weka.classifiers.trees.DecisionStump
覆写了distributionForInstance()
包含getRevision()，simply returns the revision number of the classifier，used by Weka maintainers when diagnosing and debugging problems reported by users.
包含globalInfo()，returns a string describing the classifier, which, along with the scheme’s options
包含toString()， returns a textual representation of the classifier
包含toSource()，s used to obtain a source code repre-sentation of the learned classifier
包含main()，called when you ask for a decision stump from the command line，相当于执行这个类的入口
包含getCapabilities() ，called by the generic object editor to provide information about the capabilities of a learning scheme

其他的一些比较重要的包

weka.associations

：contains association-rule learners

weka.clusterers

：contains methods for unsupervised learning.包含非监督学习方法

weka.datagenerators

：产生人工数据

weka.estimators package

：computes different types of probability distribution

weka.filters

：提供数据清理的相关方法