Tuesday, July 30, 2013

Taking Tableau Extract Process to the Next Step ...

So far I have seen handful of Tableau Extract samples using Python. They all followed the same template steps to create a table and dump some data into it. A more useful example would've been demonstrated  beyond just outlining the steps involved in the extract creation process.

My partner has suggested me to come up with something which can be used by our clients on daily basis. Then we had multiple discussions with one of our clients on their daily needs and came up with the following program. Though this is not the most complicated or most difficult program, this comes across like a simple configurable program using simple configuration files. This can be further enhanced to handle more complicated stuff.

Let us dive into the details.

Problem Statement: Problem (rather the requirement) we are trying to address here is to make the extract process configurable so that the users will be able to address the changing input formats of the input files / database queries.

Design: User needs to define the input data structure in a configuration file, from where program can read the input & create the tde schema accordingly, let us see in action in just few.

Format of the configuration & data files: For now, let us deal with Java API, my input file is as follows (for this sample, I will consider 8 columns).



and my data file looks like below, let us consider say 10,000 rows for this sample.

<pic>

Program Flow: Let us discuss the program flow in detail.


package com.nicheitllc.training.tableaudataextract;

import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.util.LinkedHashMap;
import java.util.Map;
import java.util.Map.Entry;

import com.tableausoftware.TableauException;
import com.tableausoftware.DataExtract.Collation;
import com.tableausoftware.DataExtract.Extract;
import com.tableausoftware.DataExtract.Row;
import com.tableausoftware.DataExtract.Table;
import com.tableausoftware.DataExtract.TableDefinition;
import com.tableausoftware.DataExtract.Type;

public class TableauExtractCreator {

    public static void main(String[] args) {
        try {
            // Step 0. Load all Tableau related dll files.
            loadDllFiles();

            // Step 1. Read Config file TableConfig.properties & load into a
            // Linked Hashmap
            // Question: Why Hashmap? Why not properties?
            Map<String, Type> mapColumns = new LinkedHashMap<String, Type>();

            BufferedReader bufferedReader = null;
            try {
                String[] tokens;
                String currentLine;
                bufferedReader = new BufferedReader(new FileReader(
                        "TableConfig.properties"));
                while ((currentLine = bufferedReader.readLine()) != null) {
                    if (!currentLine.startsWith("#")
                            && currentLine.trim().length() > 0) {
                        tokens = currentLine.split("=");
                        mapColumns.put(tokens[0], Type.valueOf(tokens[1]));
                    }
                }
            } catch (IOException ioException) {
                ioException.printStackTrace();
            } finally {
                try {
                    if (bufferedReader != null)
                        bufferedReader.close();
                } catch (IOException ex) {
                    ex.printStackTrace();
                }
            }

            System.out.println("Print Configured Columns ....");
            for (Entry<String, Type> entry : mapColumns.entrySet()) {
                System.out.println(entry.getKey() + " - " + entry.getValue());
            }

            // Follow Extract Process
            // 2. Create Extract File
            Extract extract = new Extract("testOrders.tde");

            // 3. Create Table Definition
            TableDefinition tableDefinition = new TableDefinition();
            tableDefinition.setDefaultCollation(Collation.EN_GB);

            // 4. Create Table Structure as defined in the Configuration File
            for (Entry<String, Type> entry : mapColumns.entrySet()) {
                tableDefinition.addColumn(entry.getKey(), entry.getValue());
            }

            // 5. Add Table to the Extract
            Table table = extract.addTable("Extract", tableDefinition);

            // 6. Now read the Input Data File & create rows in the Table
            try {
                Row row = null;
                String[] tokens;
                String currentLine;
                bufferedReader = new BufferedReader(new FileReader(
                        "TestData.csv"));

                int columnNumber = 0;
                // For every line in CSV File
                while ((currentLine = bufferedReader.readLine()) != null) {
                    // We can leave the first line if there is a header, but for
                    // now I am treating as if there is no header
                    tokens = currentLine.split(",");
                    for (Entry<String, Type> entry : mapColumns.entrySet()) {
                        // Create a new Row in the Table
                        row = new Row(tableDefinition);
                        setRowData(row, entry.getValue(),
                                tokens[columnNumber++], columnNumber);
                    }
                    // 7. Insert the rowe in the Table
                    table.insert(row);
                    row.close();
                }

                // 8. Close all the used resources.
                tableDefinition.close();
                extract.close();
            } catch (IOException ioException) {
                ioException.printStackTrace();
            } finally {
                try {
                    if (bufferedReader != null)
                        bufferedReader.close();
                } catch (IOException ex) {
                    ex.printStackTrace();
                }
            }
        } catch (Throwable t) {
            t.printStackTrace(System.err);
        }
    }

    /**
     * Set Data Dynamic depending the Column Type
     *
     * @param row
     * @param columnNumber
     * @param value
     * @param string
     * @throws TableauException
     * @throws NumberFormatException
     */
    private static void setRowData(Row row, Type columnType, String columnData,
            int columnNumber) throws NumberFormatException, TableauException {
        switch (columnType) {
        case INTEGER:
            row.setInteger(columnNumber, Integer.parseInt(columnData));
            break;
        case DOUBLE:
            row.setDouble(columnNumber, Double.parseDouble(columnData));
            break;
        case BOOLEAN:
            row.setBoolean(columnNumber, Boolean.parseBoolean(columnData));
            break;
        case DATE:
            String[] dateTokens = columnData.split("-");
            row.setDate(columnNumber, Integer.parseInt(dateTokens[0]),
                    Integer.parseInt(dateTokens[1]),
                    Integer.parseInt(dateTokens[2]));
            break;
        case DATETIME:
            break;
        case DURATION:
            break;
        case CHAR_STRING:
            row.setCharString(columnNumber, columnData);
            break;
        case UNICODE_STRING:
            row.setString(columnNumber, columnData);
        }
    }

    /**
     * Load DLL Files in the same Order
     */
    private static void loadDllFiles() {
        // These files should be loaded in the same order, Don't ask me why:
        // Tableau's Top Secret
        System.load("C:\\Ravi\\SOFTWARE\\Tableau-Java\\32-bit\\bin\\msvcr100.dll");
        System.load("C:\\Ravi\\SOFTWARE\\Tableau-Java\\32-bit\\bin\\msvcp100.dll");
        System.load("C:\\Ravi\\SOFTWARE\\Tableau-Java\\32-bit\\bin\\tbb.dll");
        System.load("C:\\Ravi\\SOFTWARE\\Tableau-Java\\32-bit\\bin\\icudt44.dll");
        System.load("C:\\Ravi\\SOFTWARE\\Tableau-Java\\32-bit\\bin\\icuuc44.dll");
        System.load("C:\\Ravi\\SOFTWARE\\Tableau-Java\\32-bit\\bin\\tdeapi.dll");
        System.load("C:\\Ravi\\SOFTWARE\\Tableau-Java\\32-bit\\bin\\icuin44.dll");
        System.load("C:\\Ravi\\SOFTWARE\\Tableau-Java\\32-bit\\bin\\dataextract.dll");
    }

}


I tried to write this program in the simplest way, but feel free to ask any questions you may have (just make sure you don't ask me to extend the API to do something beyond Tableau Extract API abilities, you know what I am talking about yeah?)

See you guys in my blog soon with more interesting stuff.