Searn: Search-based Structured Prediction

Searn ( is a generic algorithm for solving structured prediction problems. This page contains papers, software and notes about using Searn for solving a variety of problems. Feel free to contact me AT hal3 DOT name with questions or comments.



I'm releasing a very simply and stripped down implementation of Searn (limited to sequence labeling with Hamming loss) that should answer some questions people have been asking. You can download it here: SimpleSearn.tgz (requires Perl).

I'm also releasing a much more featureful implementation of Searn that can accomodate pretty much any structured prediction task. You can download it here: SearnShell_v0.1.tgz (requires Perl). The cool thing about SearnShell is that you provide your own code for doing search and computing features and it automatically connects this to a base learning algorithm like megam or libsvm.

 Frequently Asked Questions

Here, I'll attempt to answer some questions that I've either been asked by email (me AT hal3 DOT name) or in person.

Q: In the Searn training algorithm, could you tell me in which step we should perform the search process using a kind of search algorithm, such as greedy search or beam search.
A: Essentially never. I like to think of Searn as follow. We have a structured prediction problem, which entails solving an argmax during prediction. This argmax is intractable. So, normally, we will use some search algorithm to approximate it (beam or greedy or...). Whatever search algorithm we apply, the search is performed by making a bunch of sequential decisions (each decision is a step is the search process). The idea behind Searn is that instead of learning some sort of global model and then searching (as is standard), instead we will simply learn a classifier to make each of these decisions optimally. The question then becomes how to train such a classifier, and Searn is one possible solution. In the end, there is no search going on. All you're doing is making a bunch of sequential decisions, as if you were performing search, but you aren't actually searching. So there is nowhere in the training or prediction algorithm where you will run some search algorithm.

Q: The Searn algorithm takes an optimal policy as input. Can I believe that actually the optimal policy is an initial classifier? Could you tell me how to construct the initial classifier?
A: When you think about Searn as described in response to the previous question, you see that training a classifier to make incremental decisions requires that we get labeled training data for incremental decisions. This is essentially where the optimal policy comes in. The optimal policy tells us: for a given input (eg., sentence), true output (eg., part of speech tags for this sentence) and some location in the search space (eg., after 3 words have been tagged), what is the best action to take. Importantly, this is based on having access to the true output sequence. For instance, in sequence labeling, under Hamming loss, the optimal policy will always choose simply to label the next word with its correct label.

The optimal policy is not a classifier, in the sense that it is not learned. It is, essentially, exactly what we want our classifiers to learn, since it always makes correct choices. It is up to us as the algorithm designer to come up with a way of computing the optimal policy on the basis of the true output. Essentially, what we need to do is to find some way of structuring the search space so that computing an optimal policy for our given loss function is easy. I don't know a magic method for constructing it, but if you look at the examples for sequence labeling, parsing and machine translation, hopefully that will give some sort of idea.

 More to come...

Last updated 5 May 2006