将帖子翻译为中文
I am trying to experiment with pylearn2 directly using python (as I am unfamiliar with yaml). I am stealing much from a very nice blog post (http://www.arngarden.com/2013/07/29/neural-network-example-using-pylearn2/).
Here is the basic script I am using (I am using a data set from UCI regarding wine ratings).
# create hidden layer with 5 nodes, init weights in range -0.1 to 0.1 and add # a bias with value 1 hidden_layer = mlp.Sigmoid(layer_name='hidden1', dim=5, irange=.1, init_bias=1.)
# create hidden layer with 2 nodes, init weights in range -0.1 to 0.1 and add # a bias with value 1 hidden_layer2 = mlp.Sigmoid(layer_name='hidden2', dim=2, irange=.1, init_bias=1.)
# create Stochastic Gradient Descent trainer that runs for x epochs trainer = sgd.SGD(learning_rate=.05, batch_size=100, termination_criterion=EpochCounter(200)) layers = [hidden_layer,hidden_layer2,output_layer] #according to the code, the last layer will be considered the output
# create neural net that takes two inputs ann = mlp.MLP(layers, nvis=11) trainer.setup(ann, ds)
# train neural net until the termination criterion is true while True: trainer.train(dataset=ds) ann.monitor.report_epoch() ann.monitor() if not trainer.continue_learning(ann): break
I. Weights. How do I see the weights from the trained model? I *think* I am adding a second hidden layer above but if I looked at ann.get_weights() the dimension of this resulting object does not change if I remove the second hidden layer. So I question if I am looking at the right thing. Ultimately I want to see the finished weights so (outside pylearn) I can visualize the network.
II. Regularization. How to use regularization? Specifically, how to adjust the above code to use 1) drop out and then 2) L2 norm?
Thanks!
Brian
点击此处回复
此帖已被删除。
b_m...@live.com
13-10-12
将帖子翻译为中文
- 显示引用文字 -
Through the ann.get_param_values() call I am now able to see the weight and bias values and through knowledge of the net architecture, accomplish question #1.
I would still like to get some quick help on how to use regularization (especially dropout) and then how to predict new cases with such a model (ann.fprop(theano.shared(testMatrix, name='test')).eval() call still work?).
Thanks!
Kyle Kastner
13-10-12
Re: [pylearn-users] Re: Weights and Regularization
将帖子翻译为中文
I am doing something similar, and had to enable the one_hot=True to recreate the MNIST yaml results in python. What is the error you are getting?
Kyle
- 显示引用文字 -
- 显示引用文字 -
-- You received this message because you are subscribed to the Google Groups "pylearn-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to pylearn-user...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
b_m...@live.com
13-10-13
Re: [pylearn-users] Re: Weights and Regularization
将帖子翻译为中文
On Friday, October 11, 2013 11:39:02 PM UTC-4, Kyle Kastner wrote: > I am doing something similar, and had to enable the one_hot=True to recreate the MNIST yaml results in python. What is the error you are getting? > > Kyle >
.
Kyle,
I am not getting at error, instead I am looking to learn/confirm the proper method to train a MLP using regularization (L2 as well as dropout) and then get predictions on a new data set. I am not using yaml though i want a way to use pylearn2 directly in python using its functions.
I referenced a blog that showed how to train a MLP w/o regularization (only number of epochs) and then predict new data using ann.fprop where ann is the trained MLP. I *think* I can use drop out simply by adding the call into SGD like this: sgd.SGD(learning_rate=.05, batch_size=100, termination_criterion=EpochCounter(200), cost=Dropout())
and then to predict new data I *think* i just need to call dropout_fprop instead of fprop. Like this (where X_s is the new test set).
But I am hoping one of the developers will confirm this is correct and explain how to add a L2 penalty, as that is escaping me currently. I am not very experienced with Python yet so following the code is a challenge.
Kyle Kastner
13-10-14
Re: [pylearn-users] Re: Weights and Regularization
将帖子翻译为中文
I don't know that you need to call dropout_fprop for the predictions - once the network is trained, a regular fprop should be all you need, as the model averaging is done during training - the fprop output of a dropout net *should* represent the bagged estimate of many neural nets. I am having trouble finding the reference, but I am recalling that from somewhere. Maybe some one else can help/contradict me here?
In my code, I have called Dropout with an additional dictionary of parameters, so that the dropout from the visible layer is .8, while the others remain .5, as is recommended in some of the literature. The default value of .5 dropout should be OK though, so cost=Dropout() seems ok to me.
I am unsure about the need for an l2 penalty in addition to dropout, as dropout is already a very strong regularizer... what is driving the need for l2 regularization?
There is an LxReg class in the cost.py file - using that could give something useful. See https://github.com/lisa-lab/pylearn2/issues/273 for more details. I haven't used it, though, so I can't give much guidance beyond the link.
Kyle
- 显示引用文字 -
- 显示引用文字 -
-- You received this message because you are subscribed to the Google Groups "pylearn-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to pylearn-user...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
b_m...@live.com
13-10-14
Re: [pylearn-users] Re: Weights and Regularization
将帖子翻译为中文
On Sunday, October 13, 2013 7:38:25 PM UTC-4, Kyle Kastner wrote: > I don't know that you need to call dropout_fprop for the predictions - once the network is trained, a regular fprop should be all you need, as the model averaging is done during training - the fprop output of a dropout net *should* represent the bagged estimate of many neural nets. I am having trouble finding the reference, but I am recalling that from somewhere. Maybe some one else can help/contradict me here? > > > > > > In my code, I have called Dropout with an additional dictionary of parameters, so that the dropout from the visible layer is .8, while the others remain .5, as is recommended in some of the literature. The default value of .5 dropout should be OK though, so cost=Dropout() seems ok to me. > > > I am unsure about the need for an l2 penalty in addition to dropout, as dropout is already a very strong regularizer... what is driving the need for l2 regularization? > > There is an LxReg class in the cost.py file - using that could give something useful. See https://github.com/lisa-lab/pylearn2/issues/273 for more details. I haven't used it, though, so I can't give much guidance beyond the link. > > > > > Kyle >
Hey Kyle,
I. I see this description of dropout_fprop from models/mlp.py so I am not sure:
def dropout_fprop(self, state_below, default_input_include_prob=0.5, input_include_probs=None, default_input_scale=2., input_scales=None, per_example=True): """ state_below: The input to the MLP
Returns the output of the MLP, when applying dropout to the input and intermediate layers.
II. regarding L2, I would not be using both, just want to see how to do it as another option.
I saw that class. I also am thinking that here https://github.com/lisa-lab/pylearn2/blob/master/pylearn2/costs/mlp/__init__.py
there is this:
class WeightDecay(Cost): """ coeff * sum(sqr(weights))
for each set of weights.
"""
def __init__(self, coeffs): """ coeffs: a list, one element per layer, specifying the coefficient to multiply with the cost defined by the squared L2 norm of the weights for each layer.
and this
class L1WeightDecay(Cost): """ coeff * sum(abs(weights))
for each set of weights.
"""
def __init__(self, coeffs): """ coeffs: a list, one element per layer, specifying the coefficient to multiply with the cost defined by the L1 norm of the weights(lasso) for each layer.
which might be the way to go for L1 and L2 reg.
Kyle Kastner
13-10-14
Re: [pylearn-users] Re: Weights and Regularization
将帖子翻译为中文
As far as dropout_fprop goes, I think that description matchesmy thoughts. You use dropout_fprop during the training stage, to apply dropout at each layer, which *effectively* creates many separate neural networks, each trained on one example. Then, once the training is all done, you can use a regular fprop, which will *effectively* give you the bagged decision result from all of these networks, by making a decision using all of the weights (see http://arxiv.org/pdf/1207.0580.pdf)
In short, I think that dropout_fprop is largely internal/used during training - while fprop is used for predictions with a trained net.
I did not see the WeightDecay/L1WeightDecay classes - I agree that those seem like the way to go. If I can get those working in my own code I will let you know.
Kyle
- 显示引用文字 -
- 显示引用文字 -
-- You received this message because you are subscribed to the Google Groups "pylearn-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to pylearn-user...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
b_m...@live.com
13-10-14
Re: [pylearn-users] Re: Weights and Regularization
将帖子翻译为中文
You may be right, I knew in dropout that during test set prediction, all the weights were used, but were (in the Hinton original paper) divided by 2 to effectively replicate the 0.5 inclusion probability. So, perhaps if the weights of the trained model are such adjusted, then fprop works. Apparently this site is not monitored regularly by developers.
I could not figure out how to use the weightdecay class in my code (I am not using yaml). I tried this with no success trainer = sgd.SGD(learning_rate=0.005, batch_size=100,monitoring_dataset={ 'test': dt_test }, termination_criterion=EpochCounter(5000),cost=WeightDecay(coeffs=[0.005,0.005,0.005]))
With yaml are you able to make predictions on a new dataset (and get the probabilities and not just the predicted class).
On Monday, October 14, 2013 9:28:46 AM UTC-4, Kyle Kastner wrote: > As far as dropout_fprop goes, I think that description matchesmy thoughts. You use dropout_fprop during the training stage, to apply dropout at each layer, which *effectively* creates many separate neural networks, each trained on one example. Then, once the training is all done, you can use a regular fprop, which will *effectively* give you the bagged decision result from all of these networks, by making a decision using all of the weights (see http://arxiv.org/pdf/1207.0580.pdf) > > > > In short, I think that dropout_fprop is largely internal/used during training - while fprop is used for predictions with a trained net. > > > I did not see the WeightDecay/L1WeightDecay classes - I agree that those seem like the way to go. If I can get those working in my own code I will let you know. > > > Kyle > > >
Ian Goodfellow
13-10-15
Re: [pylearn-users] Re: Weights and Regularization
将帖子翻译为中文
You need to use a SumOfCosts class that adds together a Dropout cost and a WeightDecay cost. If you train with only a WeightDecay cost it will just make the weights go to 0.
You indeed need to use dropout_fprop at train time and regular fprop at test time. Training using the Dropout cost will handle the calls to dropout_fprop for you.
- 显示引用文字 -
- 显示引用文字 -
b_m...@live.com
13-10-15
Re: [pylearn-users] Re: Weights and Regularization
将帖子翻译为中文
On Monday, October 14, 2013 8:22:50 PM UTC-4, Ian Goodfellow wrote: > You need to use a SumOfCosts class that adds together a Dropout cost > > and a WeightDecay cost. If you train with only a WeightDecay cost it > > will just make the weights go to 0. > > > > You indeed need to use dropout_fprop at train time and regular fprop > > at test time. Training using the Dropout cost will handle the calls to > > dropout_fprop for you. > > >
Ian,
I. So, cost=Dropout() in the sgd call takes care of dropout and then using just fprop in the prediction of new the test set?
II. How do you just use L1 or L2 regularization without dropout? Do I need to somehow add the L1 or L2 weight decay to the log lik?
b_m...@live.com
13-10-15
Re: [pylearn-users] Re: Weights and Regularization
将帖子翻译为中文
> > > Ian, > > I. So, cost=Dropout() in the sgd call takes care of dropout and then using just fprop in the prediction of new the test set? > > II. How do you just use L1 or L2 regularization without dropout? Do I need to somehow add the L1 or L2 weight decay to the log lik?
I mean for I. that a user doesn't ever call dropout_fprop directly correct, just ass the cost=Dropout() call into sgd?
Ian Goodfellow
13-10-15
Re: [pylearn-users] Re: Weights and Regularization
将帖子翻译为中文
I. Yes. Yes to your follow up question 2. II. Yes, the SumOfCosts class does the addition.
- 显示引用文字 -
- 显示引用文字 -
b_m...@live.com
13-10-15
Re: [pylearn-users] Re: Weights and Regularization
将帖子翻译为中文
On Monday, October 14, 2013 8:35:11 PM UTC-4, Ian Goodfellow wrote: > I. Yes. Yes to your follow up question 2. > > II. Yes, the SumOfCosts class does the addition. >
Thanks! Last follow-up, how would I actually accomplish II?
Re: [pylearn-users] Re: Weights and Regularization
将帖子翻译为中文
Can you post the full trace and error message?
- 显示引用文字 -
- 显示引用文字 -
b_m...@live.com
13-10-15
Re: [pylearn-users] Re: Weights and Regularization
将帖子翻译为中文
- 显示引用文字 -
Traceback (most recent call last): File "", line 2, in File "C:Anacondalibsite-packagespylearn2-0.1dev-py2.7.eggpylearn2 raining_algorithmssgd.py", line 314, in train "data_specs: %s" % str(data_specs)) NotImplementedError: Unable to train with SGD, because the cost does not actually use data from the data set. data_specs: (CompositeSpace(), ())
I can post the entire script (it is a simple 2 hidden layer mlp) if need be.
Ian Goodfellow
13-10-15
Re: [pylearn-users] Re: Weights and Regularization
将帖子翻译为中文
Yeah, post the whole script I guess.
- 显示引用文字 -
b_m...@live.com
13-10-15
将帖子翻译为中文
Here is the code. The dataset is a small toy example. Not sure I can upload it here though....
import theano from pylearn2.models import mlp from pylearn2.training_algorithms import sgd from pylearn2.termination_criteria import MonitorBased, EpochCounter from pylearn2.costs.mlp.dropout import Dropout from pylearn2.costs.cost import SumOfCosts, MethodCost from pylearn2.models.mlp import WeightDecay, L1WeightDecay from pylearn2.datasets.dense_design_matrix import DenseDesignMatrix import numpy as np from random import randint from sklearn.metrics import confusion_matrix, roc_auc_score, accuracy_score from sklearn.preprocessing import StandardScaler from sklearn.preprocessing import Binarizer import pandas as pd
# create hidden layer with 5 nodes, init weights in range -0.1 to 0.1 and add # a bias with value 1 hidden_layer = mlp.Sigmoid(layer_name='hidden1', dim=5, irange=.1, init_bias=1.)
# create hidden layer with 2 nodes, init weights in range -0.1 to 0.1 and add # a bias with value 1 hidden_layer2 = mlp.Sigmoid(layer_name='hidden2', dim=2, irange=.1, init_bias=1.)
# create Softmax output layer output_layer = mlp.Softmax(2, 'output', irange=.1)
trainer = sgd.SGD(learning_rate=0.005, batch_size=100,monitoring_dataset={ 'test': dt_test }, termination_criterion=EpochCounter(5000),cost=SumOfCosts(costs=[WeightDecay(coeffs=[0.005,0.005,0.005])])) #epoch is complete run through the data. if the training set is 2000 records and the batch size is 100, there are two batches in an epoch
layers = [hidden_layer,hidden_layer2,output_layer] #according to the code, the last layer will be considered the output
# create neural net that takes 11 inputs ann = mlp.MLP(layers, nvis=11) trainer.setup(ann, dt_train)
# train neural net until the termination criterion is true while True:
trainer.train(dataset=dt_train)
ann.monitor.report_epoch() ann.monitor() if not trainer.continue_learning(ann): break
ann.get_params() ann.get_param_values()
#predict the test set test_preds=ann.fprop(theano.shared(X_s, name='test')).eval()
Ian Goodfellow
13-10-15
Re: [pylearn-users] Re: Weights and Regularization
将帖子翻译为中文
Can you just make it a public file on Google drive or something?
- 显示引用文字 -
- 显示引用文字 -
b_m...@live.com
13-10-15
Re: [pylearn-users] Re: Weights and Regularization
将帖子翻译为中文
- 显示引用文字 -
I placed the file here: https://docs.google.com/file/d/0B9dsnio60wRoRHptdHlTZjk2RU0/edit?usp=sharing
thanks Ian!
Pascal Lamblin
13-10-15
Re: [pylearn-users] Re: Weights and Regularization
将帖子翻译为中文
- 显示引用文字 -
There is only "WeightDecay" in the SumOfCost. As Ian said, this would simply put all weights to zero.
You need to have at least a cost that actually depends on the data, such as DropoutCost, CrossEntropy or NegativeLogLikelihood. You can also use a MethodCost to specify a method of the model to call, and use the return expression as the cost.
-- Pascal
Ian Goodfellow
13-10-15
Re: [pylearn-users] Re: Weights and Regularization
将帖子翻译为中文
Pascal is correct. I hadn't read closely enough.
- 显示引用文字 -
- 显示引用文字 -
Brian Miner
13-10-15
Re: [pylearn-users] Re: Weights and Regularization
将帖子翻译为中文
Hi Pascal,
to incorporate the standard loss function used by the output layer?
Thanks!
- 显示引用文字 -
Ian Goodfellow
13-10-16
Re: [pylearn-users] Re: Weights and Regularization
将帖子翻译为中文
Just put a second cost in the list. Like Dropout() or something.
- 显示引用文字 -
- 显示引用文字 -
b_m...@live.com
13-10-16
Re: [pylearn-users] Re: Weights and Regularization
将帖子翻译为中文
On Tuesday, October 15, 2013 1:02:13 PM UTC-4, Ian Goodfellow wrote: > Just put a second cost in the list. Like Dropout() or something. > >
What i am struggling with and perhaps just did not explain well enough is how to add the weight decay to the default cost that results from a call to sgd without the cost parameter added at all. I don't want to combine weight decay with dropout. I want the output layer to dictate the cost, to which to add the weight decay term.
For example, this call sgd.SGD(learning_rate=0.005,batch_size=100,termination_criterion=EpochCounter(5000))
has some default cost. I expect it is the negative log lik derived from the choice of output layer.
So, my question is simply what to do to add this cost to the weight decay (within SumOfCosts). There is a NegativeLogLikelihood in supervised_cost but that seems to be depreciated.
Thanks for the time!
Ian Goodfellow
13-10-16
Re: [pylearn-users] Re: Weights and Regularization
将帖子翻译为中文
Dropout does use the last layer to drive the base cost: https://github.com/lisa-lab/pylearn2/blob/master/pylearn2/costs/mlp/dropout.py#L62 All it does is compute that cost with the hidden states multiplied by 2 * dropout mask.
If you don't want dropout, then use costs.mlp.Default: https://github.com/lisa-lab/pylearn2/blob/master/pylearn2/costs/mlp/__init__.py#L11 That will also make the last layer drive the cost.
Most of the layers implement some kind of negative log likelihood as their cost.
The NegativeLogLikelihood cost has been deprecated because it's only the negative log likelihood for a specific model (maybe softmax? I haven't looked at it recently) so it doesn't make sense to apply it to other models.
- 显示引用文字 -
- 显示引用文字 -
Pascal Lamblin
13-10-16
Re: [pylearn-users] Re: Weights and Regularization
将帖子翻译为中文
Hi Brian,
On Tue, Oct 15, 2013, Brian Miner wrote: > Can you give an example? How to change this: > > trainer = sgd.SGD(learning_rate=0.005, batch_size=100,monitoring_dataset={ 'test': dt_test }, termination_criterion=EpochCounter(5000),cost=SumOfCosts(costs=[WeightDecay(coeffs=[0.005,0.005,0.005])])) > > to incorporate the standard loss function used by the output layer?
> > Thanks! > > > > > > On 10/15/2013 10:44 AM, Pascal Lamblin wrote: > >On Mon, Oct 14, 2013, b_m...@live.com wrote: > >>On Monday, October 14, 2013 8:35:11 PM UTC-4, Ian Goodfellow wrote: > >>>I. Yes. Yes to your follow up question 2. > >>> > >>>II. Yes, the SumOfCosts class does the addition. > >>Thanks! Last follow-up, how would I actually accomplish II? > >> > >>I tried this but receive an NotImplementedError. > >> > >>trainer = sgd.SGD(learning_rate=0.005, batch_size=100,monitoring_dataset={ 'test': dt_test }, termination_criterion=EpochCounter(5000),cost=SumOfCosts(costs=[WeightDecay(coeffs=[0.005,0.005,0.005])])) > >There is only "WeightDecay" in the SumOfCost. As Ian said, this would > >simply put all weights to zero. > > > >You need to have at least a cost that actually depends on the data, such > >as DropoutCost, CrossEntropy or NegativeLogLikelihood. You can also > >use a MethodCost to specify a method of the model to call, and use the > >return expression as the cost. > > > > -- > You received this message because you are subscribed to the Google Groups "pylearn-users" group. > To unsubscribe from this group and stop receiving emails from it, send an email to pylearn-user...@googlegroups.com. > For more options, visit https://groups.google.com/groups/opt_out.
-- Pascal
Ian Goodfellow
13-10-16
Re: [pylearn-users] Re: Weights and Regularization
将帖子翻译为中文
MethodCost works too. costs.mlp.Default should do exactly the same thing, without needing to write cost_from_X in the base script.
- 显示引用文字 -
Brian
13-10-16
Re: [pylearn-users] Re: Weights and Regularization
将帖子翻译为中文
- 显示引用文字 -
Thank you Ian and Pascal! These did the trick. Is "cost_from_X" another way of using the default (outer-layer dependent) cost function (without assuming MLP)?
Ian Goodfellow
13-10-16
Re: [pylearn-users] Re: Weights and Regularization
将帖子翻译为中文
cost_from_X is the method that Default calls. MethodCost is a cost based on calling a method that you name, so if you use MethodCost and tell it to call cost_from_X it does the same thing as Default.
- 显示引用文字 -
文章名称:readlater-创新互联
浏览路径:http://ncjierui.cn/article/djojhh.html