Table Of Contents

Previous topic

Call by SOAP

Next topic

Change logs

This Page

FAQs

How to connect multiple user programs through a pipe

Prepare a shell script which connects the user programs through pipe. For example, if you want to run a syntax parsing system KNP with Juman, prepare the following shell script:

#!/bin/bash
juman | knp

Save this script as run_knp.sh and configure a service definition XML as follows:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE beans PUBLIC "-//SPRING//DTD BEAN//EN" "http://www.springframework.org/dtd/spring-beans.dtd">
<beans>
  <bean id="target"
    class="jp.go.nict.langrid.servicecontainer.handler.TargetServiceFactory">
    <property name="service">
      <bean class="jp.go.nict.wisdom.wrapper.StdIOCommandService">
        <property name="cmdLine" value="sh ___BASE_DIR___/run_knp.sh" />
        <property name="delimiterIn" value="\n" />
        <property name="delimiterOut" value="EOS\n" />
      </bean>
    </property>
  </bean>
</beans>

Given this service definition XML, RaSC starts the user programs connected through a pipe.

How to parallelize processing on inputs from a pipe

First configure your RaSC service for parallel execution by following Execute a user program in parallel. Then read inputs from standard input and call the RaSC service using analyzeArray. The following Java program is an example to do this:

import java.io.BufferedReader;
import java.io.FileInputStream;
import java.io.InputStreamReader;
import java.net.InetSocketAddress;
import java.util.Arrays;

import org.apache.commons.lang.StringUtils;

import jp.go.nict.langrid.client.msgpackrpc.MsgPackClientFactory;
import jp.go.nict.wisdom.wrapper.api.TextAnalysisService;

public class RaSCClient {
    public static void main(String[] args) throws Exception {
        try(MsgPackClientFactory factory = new MsgPackClientFactory()){
                TextAnalysisService client = factory.create(
                    TextAnalysisService.class,
                    new InetSocketAddress(args[0], Integer.parseInt(args[1])));

                try(InputStreamReader isr = new InputStreamReader(System.in, "UTF-8");
                    BufferedReader br = new BufferedReader(isr)) {

                        String[] list = new String[1000];
                        int count = 0;

                        while((list[count] = br.readLine()) != null) {
                            count++;
                            if (count == list.length){
                                String[] ret = client.analyzeArray(list);
                                System.out.println(StringUtils.join(ret, "\n"));
                                count = 0;
                            }
                        }
                        if(count > 0){
                            String[] ret = client.analyzeArray(Arrays.copyOf(list, count));
                            System.out.println(StringUtils.join(ret, "\n"));
                        }
                    }
            }
    }
}

With the RaSC core package, you can compile the Java program with the following command after saving the above as RaSCClient.java (Refer to Run a user program as a RaSC service).:

javac -cp lib/*: RaSCClient.java

You can use this program with a hostname and a port as follows. This example assumes that the RaSC service is started on localhost and with port 19999.

cat INPUT_TXT | java -cp ./lib/*: RaSCClient localhost 19999

How should I configure a service definition XML when a directory or filename includes a space in the command line?

If the command line specified in a service definition XML has a directory name or filename including a space, you will not be able to run using cmdLine property as shown in Run a user program as a RaSC service. The solution is to use the property cmdArray instead of cmdLine. The following example shows that the default path to mecab is specified in Windows environment:

<property name="cmdArray">
  <list>
    <value>C:\Program Files\Mecab\bin\mecab</value>
    <value>-O</value>
    <value>wakati</value>
  </list>
</property>

You cannot set both the cmdArray property and the cmdLine property in the same file.

I want to include a termination symbol such as “EOS” in the results

Set includeDelim to true in the service XML. Refer to Service definition XML for detail.

How can I confirm whether a RaSC service is alive

You can use the method String getStatus() which will return the command line, the number of pooled process instances, and the limit of the numbers. The following lists some examples of calling the method. These examples assume that SERVICE_HOST and SERVICE_PORT is the host and port started by the RaSC service.

Example in Java:

TextAnalysisService client = factory.create(TextAnalysisService.class,
          new InetSocketAddress(SERVICE_HOST, SERVICE_PORT));
String ret = client.getStatus();

Example in Perl:

my $client = AnyEvent::MPRPC::Client->new(
  host => "SERVICE_HOST", port => "SERVICE_PORT"
 );
my $ret = $client->call('getStatus')->recv;

Example in Python:

client = msgpackrpc.Client(msgpackrpc.Address("SERVICE_HOST", SERVICE_PORT))
ret = client.call('getStatus')

The following shows the result example for the Enju service started in Run a user program as a RaSC service. It returns the command line set in the service definition XML, the number of processes being executed, and the upper limit of the numbers.

Command line: /usr/local/bin/enju
Pooled processes: 1 / 20

How can I know what is going on when a RaSC service does not respond

Even when you call a RaSC service, you may not get any result and may encounter a timeout error. If it is assumed that the user program is actually completed in a short period of time, the following problems could cause it:

  • The input delimiter is not properly sent to the user program.
  • The output delimiter output by the user program is not recognized.

The input and output delimiters set in the service definition XML are used to recognize units of inputs and outputs.

For example, suppose that a user program recognizes the character string [END_OF_INPUT]\n (with a linefeed at the end) as the input delimiter. Then, if you set [END_OF_INPUT] (without a linefeed at the end) in the RaSC setting, the user program waits for the linefeed code, so that it looks as if there is no response. RaSC terminates and restarts the user program when it does not receive a proper output delimiter from the program after a certain time has elapsed (RaSC logs the fact that the program was restarted).

A similar problem occurs in the case of output. For example, suppose that a user program outputs the character string [END_OF_OUTPUT] (without a linefeed at the end) as the output delimiter. On the other hand, if you set [END_OF_INPUT]\n in the RaSC setting, the RaSC continues to wait for the linefeed code.

If you have no response from the RaSC service, check the input/output delimiters (whether they have a linefeed code and the type of the linefeed code).

What will happen when more than one request is received at once?

You can find the description in Architecture of RaSC service.

For detailed information on setting the limit for the number of processes of a user program, refer to Service definition XML.

How can I modify existing user programs to run them on RaSC?

To run a user program on the RaSC service, you must:

  • Use standard input/output for the input/output
  • Decide the delimiters for input/output
  • Set the program so that it may not be terminated when it receives a unit of input

The following shows modification of CRF++ to achieve the above. As CRF++ provides an option to use standard input, what you need to do is to decide the delimiters for input/output. This example assumes that [END_OF_INPUT] and EOS are used for the input delimiter and for the output delimiter. To achieve the purpose, you must add the highlighted part in the following source code to write EOS\n to standard output if an input line is [END_OF_INPUT].

CRF++-0.58/tagger.cpp

bool TaggerImpl::read(std::istream *is) {
  scoped_fixed_array<char, 8192> line;
  clear();

  for (;;) {
    if (!is->getline(line.get(), line.size())) {
      is->clear(std::ios::eofbit|std::ios::badbit);
      return true;
    }
    if(std::strcmp(line.get(), "[END_OF_INPUT]") == 0){
      std::cout << "EOS\n";
      return true;
    }
    if (line[0] == '\0' || line[0] == ' ' || line[0] == '\t') {
      break;
    }
    if (!add(line.get())) {
      return false;
    }
  }

  return true;
}

How can I process a large amount of data with an RaSC service?

The buffer for input/output may overflow by a large size of data, resulting in an error. You can change the buffer size, using the settings in Service definition XML.

Characters become garbled

Some user programs to be used may not run normally unless the environment variable LANG is properly set (e.g. ja_JP.UTF-8 for Japanese)

How to set environment variables for a program running on RaSC

You can set environment variables in the service definition XML using environment parameter (Since ver. 1.0.2).

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE beans PUBLIC "-//SPRING//DTD BEAN//EN" "http://www.springframework.org/dtd/spring-beans.dtd">
<beans>
  <bean id="target" class="jp.go.nict.langrid.servicecontainer.handler.TargetServiceFactory">
    <property name="service">
      <bean class="jp.go.nict.wisdom.wrapper.StdIOCommandService">
        <property name="cmdLine" value="___PATH_TO_PROGRAM___" />
        <property name="delimiterIn" value="___INPUT_DELIMITER___" />
        <property name="delimiterOut" value="___OUTPUT_DELIMITER___" />
        <property name="environment">
          <map>
            <entry key="VAR1">
              <value>VALUE1</value>
            </entry>
            <entry key="VAR2">
              <value>VALUE2/value>
            </entry>
       </map>
    </property>
  </bean>
</beans>

refer to Service definition XML.