Prepare a shell script which connects the user programs through pipe. For example, if you want to run a syntax parsing system KNP with Juman, prepare the following shell script:
#!/bin/bash
juman | knp
Save this script as run_knp.sh and configure a service definition XML as follows:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE beans PUBLIC "-//SPRING//DTD BEAN//EN" "http://www.springframework.org/dtd/spring-beans.dtd">
<beans>
<bean id="target"
class="jp.go.nict.langrid.servicecontainer.handler.TargetServiceFactory">
<property name="service">
<bean class="jp.go.nict.wisdom.wrapper.StdIOCommandService">
<property name="cmdLine" value="sh ___BASE_DIR___/run_knp.sh" />
<property name="delimiterIn" value="\n" />
<property name="delimiterOut" value="EOS\n" />
</bean>
</property>
</bean>
</beans>
Given this service definition XML, RaSC starts the user programs connected through a pipe.
First configure your RaSC service for parallel execution by following Execute a user program in parallel. Then read inputs from standard input and call the RaSC service using analyzeArray. The following Java program is an example to do this:
import java.io.BufferedReader;
import java.io.FileInputStream;
import java.io.InputStreamReader;
import java.net.InetSocketAddress;
import java.util.Arrays;
import org.apache.commons.lang.StringUtils;
import jp.go.nict.langrid.client.msgpackrpc.MsgPackClientFactory;
import jp.go.nict.wisdom.wrapper.api.TextAnalysisService;
public class RaSCClient {
public static void main(String[] args) throws Exception {
try(MsgPackClientFactory factory = new MsgPackClientFactory()){
TextAnalysisService client = factory.create(
TextAnalysisService.class,
new InetSocketAddress(args[0], Integer.parseInt(args[1])));
try(InputStreamReader isr = new InputStreamReader(System.in, "UTF-8");
BufferedReader br = new BufferedReader(isr)) {
String[] list = new String[1000];
int count = 0;
while((list[count] = br.readLine()) != null) {
count++;
if (count == list.length){
String[] ret = client.analyzeArray(list);
System.out.println(StringUtils.join(ret, "\n"));
count = 0;
}
}
if(count > 0){
String[] ret = client.analyzeArray(Arrays.copyOf(list, count));
System.out.println(StringUtils.join(ret, "\n"));
}
}
}
}
}
With the RaSC core package, you can compile the Java program with the following command after saving the above as RaSCClient.java (Refer to Run a user program as a RaSC service).:
javac -cp lib/*: RaSCClient.java
You can use this program with a hostname and a port as follows. This example assumes that the RaSC service is started on localhost and with port 19999.
cat INPUT_TXT | java -cp ./lib/*: RaSCClient localhost 19999
Refer to Call a user program from various programming languages.
If the command line specified in a service definition XML has a directory name or filename including a space, you will not be able to run using cmdLine property as shown in Run a user program as a RaSC service. The solution is to use the property cmdArray instead of cmdLine. The following example shows that the default path to mecab is specified in Windows environment:
<property name="cmdArray">
<list>
<value>C:\Program Files\Mecab\bin\mecab</value>
<value>-O</value>
<value>wakati</value>
</list>
</property>
You cannot set both the cmdArray property and the cmdLine property in the same file.
Set includeDelim to true in the service XML. Refer to Service definition XML for detail.
You can use the method String getStatus() which will return the command line, the number of pooled process instances, and the limit of the numbers. The following lists some examples of calling the method. These examples assume that SERVICE_HOST and SERVICE_PORT is the host and port started by the RaSC service.
Example in Java:
TextAnalysisService client = factory.create(TextAnalysisService.class,
new InetSocketAddress(SERVICE_HOST, SERVICE_PORT));
String ret = client.getStatus();
Example in Perl:
my $client = AnyEvent::MPRPC::Client->new(
host => "SERVICE_HOST", port => "SERVICE_PORT"
);
my $ret = $client->call('getStatus')->recv;
Example in Python:
client = msgpackrpc.Client(msgpackrpc.Address("SERVICE_HOST", SERVICE_PORT))
ret = client.call('getStatus')
The following shows the result example for the Enju service started in Run a user program as a RaSC service. It returns the command line set in the service definition XML, the number of processes being executed, and the upper limit of the numbers.
Command line: /usr/local/bin/enju
Pooled processes: 1 / 20
Even when you call a RaSC service, you may not get any result and may encounter a timeout error. If it is assumed that the user program is actually completed in a short period of time, the following problems could cause it:
The input and output delimiters set in the service definition XML are used to recognize units of inputs and outputs.
For example, suppose that a user program recognizes the character string [END_OF_INPUT]\n (with a linefeed at the end) as the input delimiter. Then, if you set [END_OF_INPUT] (without a linefeed at the end) in the RaSC setting, the user program waits for the linefeed code, so that it looks as if there is no response. RaSC terminates and restarts the user program when it does not receive a proper output delimiter from the program after a certain time has elapsed (RaSC logs the fact that the program was restarted).
A similar problem occurs in the case of output. For example, suppose that a user program outputs the character string [END_OF_OUTPUT] (without a linefeed at the end) as the output delimiter. On the other hand, if you set [END_OF_INPUT]\n in the RaSC setting, the RaSC continues to wait for the linefeed code.
If you have no response from the RaSC service, check the input/output delimiters (whether they have a linefeed code and the type of the linefeed code).
You can find the description in Architecture of RaSC service.
For detailed information on setting the limit for the number of processes of a user program, refer to Service definition XML.
You can find the description in Programs that can run with RaSC.
To run a user program on the RaSC service, you must:
The following shows modification of CRF++ to achieve the above. As CRF++ provides an option to use standard input, what you need to do is to decide the delimiters for input/output. This example assumes that [END_OF_INPUT] and EOS are used for the input delimiter and for the output delimiter. To achieve the purpose, you must add the highlighted part in the following source code to write EOS\n to standard output if an input line is [END_OF_INPUT].
CRF++-0.58/tagger.cpp
bool TaggerImpl::read(std::istream *is) {
scoped_fixed_array<char, 8192> line;
clear();
for (;;) {
if (!is->getline(line.get(), line.size())) {
is->clear(std::ios::eofbit|std::ios::badbit);
return true;
}
if(std::strcmp(line.get(), "[END_OF_INPUT]") == 0){
std::cout << "EOS\n";
return true;
}
if (line[0] == '\0' || line[0] == ' ' || line[0] == '\t') {
break;
}
if (!add(line.get())) {
return false;
}
}
return true;
}
The buffer for input/output may overflow by a large size of data, resulting in an error. You can change the buffer size, using the settings in Service definition XML.
Some user programs to be used may not run normally unless the environment variable LANG is properly set (e.g. ja_JP.UTF-8 for Japanese)
You can set environment variables in the service definition XML using environment parameter (Since ver. 1.0.2).
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE beans PUBLIC "-//SPRING//DTD BEAN//EN" "http://www.springframework.org/dtd/spring-beans.dtd">
<beans>
<bean id="target" class="jp.go.nict.langrid.servicecontainer.handler.TargetServiceFactory">
<property name="service">
<bean class="jp.go.nict.wisdom.wrapper.StdIOCommandService">
<property name="cmdLine" value="___PATH_TO_PROGRAM___" />
<property name="delimiterIn" value="___INPUT_DELIMITER___" />
<property name="delimiterOut" value="___OUTPUT_DELIMITER___" />
<property name="environment">
<map>
<entry key="VAR1">
<value>VALUE1</value>
</entry>
<entry key="VAR2">
<value>VALUE2/value>
</entry>
</map>
</property>
</bean>
</beans>
refer to Service definition XML.