Python installation

The engine uses a Java/Python interface, which allows to submit scripts and get result from a Python environment already installed on the machine where the Datamining Engine runs. For this reason, Python environment need to be installed on the same machine of KnowAge server. This implies that, in order to run this engine, you have to install Python properly (depending on the OS) on the same machine where Knowage is installed. You can find all information about Python installation at https://www.python.org. Datamining engine only support Python 3 (the product has been tested with Python 3.4.0, but other 3.x releases are supported).

JPY installation

JPY is a connector that make possible a bidirectional communication between Python and Java and its components must be installed on both sides (dataminingengine Java project and Python environment). Dataminingengine project is provided with jpy.jar that allows the communication, but this is not sufficient, because JPY must be installed on your Python environment. To do this you have to download the JPY source files and build them by yourself on your machine (unfortunately pre-built packages are not made available yet by JPY creators). All the detailed instructions to build and install JPY on your Python environment are described on the page http://jpy.readthedocs.org/en/stable/install.html. During the testing phase Python 3.4 and JPY 0.8 (stable version) have been used; here the version-specific installation steps are described. You will need:

  • Python 3.3 or higher (3.2 may work as well but is not tested),
  • Oracle JDK 7 or higher (JDK 6 may work as well),
  • Maven 3 or higher,
  • Microsoft Windows SDK 7.1 or higher If you build for a 32-bit Python, make sure to also install a 32-bit JDK. Accordingly, for a 64-bit Python, you will need a 64-bit JDK.

The Python setup tools distutils can make use of the command-line C/C++ compilers of the free Microsoft Windows SDK. These will by used by distutils if the DISTUTILS_USE_SDK environment variable is set. The compilers are made accessible via the command-line by using the setenv tool of the Windows SDK. In order to install the Windows SDK execute the following steps.

  • If you already use Microsoft Visual C++ 2010, make sure to uninstall the x86 and amd64 compiler redistributables first. Otherwise the installation of the Windows SDK will definitely fail. This may also be applied to higher versions of Visual C++.
  • Download and install Windows SDK 7.1.
  • Download and install Windows SDK 7.1 SP1. Open the command-line and execute:
    • "C:\\Program Files\\Microsoft SDKs\\Windows\\v7.1\\bin\\setenv" /x64 /release to prepare a build of the 64-bit version of jpy.
    • "C:\\Program Files\\Microsoft SDKs\\Windows\\v7.1\\bin\\setenv" /x86 /release to prepare a build of the 32-bit version of jpy.

Now set other environment variables:

1
2
3
SET DISTUTILS_USE_SDK=1
    SET JAVA_HOME=%JDK_HOME%
    SET PATH=%JDK_HOME%\jre\bin\server;%PATH%

Then, to actually build and test the jpy Python module use the following command: python setup.py install. To use JPY you need to replace the jpyconfig.properties file on your project, with the one generated by the build process that is present in your JPY built folder jpy-master\build\lib.<SO-CPU-PYTHON_versions>. Properties file to replace is located under knowagedataminingengine\src\.

Datamining engine supports the use of all Python libraries: before import a library in your script install it on your native Python environment (for example using pip). To use Python YOU NEED TO INSTALL the following libraries: matplotlib, pandas, numpy, scipy. You can install them using pip typing the following commands on your native Python console:

1
2
3
4
    pip install pandas
    pip install numpy
    pip install scipy
    pip install matplotlib.
Listing 6 Example of a Knowage Data Mining engine template which uses a Python script
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
<?xml version="1.0" encoding="ISO-8859-15"?>
<DATA_MINING>
   <LANGUAGE name="Python"/>
   <DATASETS>
       <DATASET name="df" readType="csv" type="file" label="HairEyeColor" canUpload="true"><![CDATA[sep=',']]>
       </DATASET>
   </DATASETS>
   <SCRIPTS>
       <SCRIPT name="test01" mode="auto" datasets="df" label="HairEyeColor" libraries="csv,os,pandas,numpy">
        <![CDATA[ print(df.ix[0,0]) y=df.ix[0,0] ]]>
       </SCRIPT>
   </SCRIPTS>
   <COMMANDS>
                <COMMAND name="testcommand" scriptName="test01" label="test01"  mode=" auto">
        <OUTPUTS>
                <OUTPUT type="text" name="first_element" value="y" function=""  mode="manual" label="first_element"/>
        </OUTPUTS>
    </COMMAND>
   </COMMANDS>
</DATA_MINING>

Note that the LANGUAGE tag is used to specify the language to use: name=Python and name=R are supported. If the LANGUAGE tag is not present or name is not specified correctly, the default language is set to R.