RaSC is a free and open source middleware, developed by the Information Analysis Laboratory at the National Institute of Information and Communications Technology (NICT). RaSC facilitates high-speed and highly parallelized execution of user programs.
RaSC has been developed to apply user programs such as morphological analyzers and dependency parsers to a huge number of Web pages. To this end, RaSC can run user programs and connect them across distributed computation nodes. A typical use of RaSC is to process multiple inputs in a file or in stream in parallel with multi-core CPUs and/or many computation nodes. Although RaSC is originally designed for natural language processing (NLP), RaSC can work with various user programs, not limited to NLP programs. As long as the programs receive inputs from standard input or from a file, and output the result to standard output or to a file, they can be executed in a distributed manner on RaSC with slight changes in most cases.
The process instances of user programs running on RaSC will reside on memory once they started. For this reason, even programs that need a long time to start — for example, NLP programs that load a large dictionary file — can efficiently run. With RaSC, in addition, user programs on remote computer can be easily used through a network. When a number of inputs are given, the inputs can be distributed to multiple computers. The user programs can be easily connected through stream communication like a UNIX pipe, and they are executed in parallel without making users conscious of it (refer to Overview for more details).
The following shows an example of executing the Dependency and Case Structure Analyzer KNP on RaSC. In the example, given an input of 500 lines whose line is one sentence, they are assigned to multiple process instances and parallelized with a multi-core CPU (8 instances of parallel execution on two Intel Xeon X5675), and thereby processed about 5 times faster. The order of input sequences in the original input file (INPUT_TXT) is preserved in the output file (OUTPUT_TXT). (This example can be implemented by the program described in How to parallelize processing on inputs from a pipe.)
$ time cat INPUT_TXT | juman | knp > OUTPUT_TXT # Directly run a user program without RaSC
real 2m28.456s # Without parallelization
user 2m17.557s
sys 0m1.011s
$ ./server.sh KNPService 19999 start # Start a RaSC service that runs KNP
$ time cat INPUT_TXT | java -cp ./lib/*: RaSCClient localhost 19999 > OUTPUT_TXT # Other computer nodes can be accessed by changing the host and port.
real 0m29.402s # Parallelization with RaSC (8 parallel processes on two Intel Xeon X5675)
user 0m0.566s
sys 0m0.045s
RaSC is used for the large-scale Web information analysis system WISDOM X and Disaster Information Analysis System DISAANA , which are developed by NICT. WISDOM X can apply various analyses, such as dependency analysis, sentiment analysis, and causality extraction, to up to 100 million Web documents per day using RaSC.
RaSC is distributed under LGPL v2.1.
The following table lists the user programs that are tested on RaSC. You can download the service definition XMLs required to run the programs on RaSC. The service definition XML is a setting file which specifies the command line for a user program, the number of process instances for parallel execution, or other configuration. For details, refer to Run a user program as a RaSC service (when being used only with MessagePack RPC) and Work with various network protocols (when using JSON RPC, ProtocolBuffers, and SOAP).
Note
Paths to user programs and model files must be set according to your environment in the service definition XML.
(Please replace [at] with an at mark.)