Business intelligence in sales analysis using neural networks

Julian Vasilev

Acknowledgements

This case study is created within the project “Developing the innovative methodology of teaching business informatics” (DIMBI), 2015-1-PL01-KA203-0016636”.

About neural networks…

A neural network is an instrument from the group of business intelligence software. It may find dependencies in a dataset. These dependencies usually are not obvious when using Pivoting, sorting or filtering techniques. Some of the columns of the analyzed dataset are input variables (covariates) and one variable is a dependent one (target variable). A part of the rows from the dataset are used for training, another part – for testing the neural network, another part – for validating. The neural network may find the relative importance of each input variable on the target one, but it cannot find the direction of the influence. The neural network may find rows with outliers. The neural network may be queried – the end user gives values to input variables and the neural network calculates an expected value of the target value. It is important to highlight that the result of a network query is expected value (with probability not equal to one). Certainly, other factors affect the target variable, but we do not have information about them in our dataset.

About Alyuda Neurointelligence….

Alyuda Neurointelligence (Alyuda Research n.d.) is an artificial intelligence software which is used for creating neural networks.

1. Source files

The dataset is provided by IBM on their web site https://www.ibm.com/communities/analytics/watson-analytics-blog/guide-to-sample-datasets/ as a CSV file https://community.watsonanalytics.com/wp-content/uploads/2015/03/WA_Retail-SalesMarketing_-ProfitCost.csv.

The dataset is filtered in MS Excel. All cases without numeric values are removed. Coding of variables is done. All variables are described in a new XLS file. All columns after the column “Revenue” are deleted.

MS Excel -> IBM_Sales_dataset.xls

2. Initial research questions

It is assumed that the following variables: year, product line, product type, product, order method type and retail country influence revenues. The research hypotheses may be defined in other ways. We assume significant differences in average revenues in different years. We assume significant differences in average revenues in different product lines.

Statistical methods (Pallant 2011) are applied with statistical software (Hadzhiev 2009). Statistical methods may be used to check (accept or reject) the research questions.

Artificial intelligence methods (Vasilev & Atanasova 2015) may also be applied.

3. Working with Alyuda Neurointelligence

3.1. Importing the dataset in Alyuda Neurointelligence

Download the IBM_Sales_dataset.xls. Open the file. Look at column names. Look at the total number of rows.

Try to define the type of each variable (nominal scale, ordinal or interval scale).

Close the dataset. Start Alyuda. We use this short name for “Alyuda Neurointelligence”.

Open the dataset in Alyuda (File/Open).

Put a tick on the check box “First row contains column names”.

If some columns are with gray color, we need to mark the column and choose “Accept”.

Before the name of each column there is some information in brackets.

For example “(C4) Year”. It means a categorical column with four categories.

The following columns: “product line”, “product type”, “product”, “order method type” and “retailer country” are also categorical. The number of categories in each column is different. It is a normal situation.

The last column (Revenue) is a numeric column. You should see “(N) Revenues”.

The first six columns are input ones (input variables) – their titles are marked in blue.

The title of the last column is marked in light orange – it means it is an output column (output variables).

We have to choose that the last column (Revenue) is our target column.

3.2. Analyzing the dataset

To analyze the dataset, press the “Analyze” button.

Some of the rows are with blue color – training rows.

Other rows are green – validating rows.

Other rows are red – testing rows.

After analyzing some columns are excluded (e.g. product type, product and retailer country).

3.3. Preparing the dataset

Press the “Preprocess” button.

3.4. Choosing the best network architecture

Press the “Design” button.

Press the “Search Architecture” button. You need to wait about one minute. If the process of searching does not stop, press the “Stop” button.

If you have several possible architectures of the neural network, the one with the highest value of the fitness function is chosen.

3.5. Training the neural network

Press the “Train” button. You need to wait about one minute. If the process of training does not stop, press the “Stop” button. But if you wait about two minutes, you will get a message “Network training completed”.

3.6. Testing the network

Press the “Test” button.

Look at the “Visualisation” panel, windows “Actual vs Output”. If there are great differences, some of the rows have to be excluded and the previous steps have to be repeated.

3.7. The result of the neural network

3.7.1. Relative importance of different factors

Look at the “Training” panel, “Network statistics” panel, button “Input importance”.

The values in the table must not be interpreted as absolute values. They have to be imported as ranks.

For example the most important factor affecting sales is the “order method type”. Factors “year” and “product line” have weaker influence on revenues.

3.7.2. Query the network

We want to ask the neural network to predict the sales of camping equipment for year 2007 for online buyers.

Go to the “Query” panel.

Choose:

Year: 2007

Product line: Camping equipment

Order method type: web

Press the “Manual query” button.

You will see a value in the “Revenue” column in the “Results table”.

This revenue value is given with a percent of probability.

4. Presenting the result in scientific papers

We did the exercise with Alyuda. Try to present the result of your research in a meaningful text.

Start with giving the appropriate credit to the DIMBI project.

Continue with some information about the dataset. Put a citation if you use a dataset created by other people.

Continue with the research questions.

Find publications on the DIMBI project in Scholar Google, RePEc, DOAJ. Check if other researchers have tried to answer these research questions. Cite them.

Say some words about the appropriate method that may be applied (the neural network).

Say that others methods (such as One-way ANOVA) may be used.

Say some words about Alyuda Neurointelligence. Argument why we have chosen it. Give an appropriate credit to the software developer.

Present the result of Alyuda Neurointelligence – say about the input importance of different factors and the manual query of the network.

If you are ready with the meaningful text, congratulations!

Please, send it to Julian Vasilev (e-mail: vasilev@ue-varna.bg) and Miglena Stoyanova (e-mail: m_stoyanova@ue-varna.bg).

5. Personal work

Task 1. Go back to the “Analysis” panel. Accept all input columns. Find the input importance of each factor. Present the result.

Task 2. Filter the dataset for year 2007. Input the new data into Alyuda. Find the input importance of each factor. Present the result.

Task 3. Filter the dataset for year 2007 and order method type “web”. Input the new data into Alyuda. Choose as input columns “product”. Find the input importance of each factor. Present the result.

Write meaningful descriptions (solutions) of these tasks.

Please, send them to Julian Vasilev (e-mail: vasilev@ue-varna.bg) and Miglena Stoyanova (e-mail: m_stoyanova@ue-varna.bg).

Literature

Alyuda Research, Alyuda_Neurointelligence. Available at: http://www.alyuda.com/neural-networks-software.htm [Accessed January 20, 2016].

Hadzhiev, V. et. al, 2009. Statistical and econometric software, Varna: Science and economics.

Pallant, J., 2011. SPSS SURVIVAL MANUAL : A step by step guide to data analysis using SPSS, Allen and Unwin.

Vasilev, J. & Atanasova, T., 2015. Parallel Testing of Hypotheses with Statistical and Artificial Intelligence Methods : A Study on Measuring the Complacency from Education. Computer Science and Applications, 2(5), pp.206–211.