Business intelligence in sales analysis using neural
networks
Julian Vasilev
Acknowledgements
This case study is created
within the project “Developing the innovative methodology of teaching business
informatics” (DIMBI), 2015-1-PL01-KA203-0016636”.
About neural
networks…
A neural network is an instrument from the group of business
intelligence software. It may find dependencies in a dataset. These
dependencies usually are not obvious when using Pivoting, sorting or filtering
techniques. Some of the columns of the analyzed dataset are input variables
(covariates) and one variable is a dependent one (target variable). A part of
the rows from the dataset are used for training, another part – for testing the
neural network, another part – for validating. The neural network may find the
relative importance of each input variable on the target one, but it cannot
find the direction of the influence. The neural network may find rows with
outliers. The neural network may be queried – the end user gives values to
input variables and the neural network calculates an expected value of the
target value. It is important to highlight that the result of a network query
is expected value (with probability not equal to one). Certainly, other factors
affect the target variable, but we do not have information about them in our
dataset.
About
Alyuda Neurointelligence….
Alyuda Neurointelligence (Alyuda Research n.d.) is
an artificial intelligence software which is used for creating neural networks.
1. Source files
The dataset is provided by IBM on their web site https://www.ibm.com/communities/analytics/watson-analytics-blog/guide-to-sample-datasets/ as a CSV file https://community.watsonanalytics.com/wp-content/uploads/2015/03/WA_Retail-SalesMarketing_-ProfitCost.csv.
The dataset is filtered in
MS Excel. All cases without numeric values are removed. Coding of variables is
done. All variables are described in a new XLS file. All columns after the
column “Revenue” are deleted.
MS Excel -> IBM_Sales_dataset.xls
2. Initial
research questions
It is assumed that the
following variables: year, product line, product type, product, order method type and retail country influence revenues. The
research hypotheses may be defined in other ways. We assume significant
differences in average revenues in different years. We assume significant
differences in average revenues in different product lines.
Statistical methods (Pallant 2011) are applied with
statistical software (Hadzhiev 2009). Statistical methods may be
used to check (accept or reject) the research questions.
Artificial intelligence
methods (Vasilev & Atanasova 2015) may also be applied.
3. Working with Alyuda Neurointelligence
3.1. Importing
the dataset in Alyuda Neurointelligence
Download the IBM_Sales_dataset.xls.
Open the file. Look at column names. Look at the total number of rows.
Try to define the type of each variable (nominal scale, ordinal or
interval scale).
Close the dataset. Start Alyuda. We use this short
name for “Alyuda Neurointelligence”.
Open the dataset in Alyuda (File/Open).
Put a tick on the check box “First row contains column names”.
If some columns are with gray color, we need to mark the column and
choose “Accept”.
Before the name of each column there is some information in brackets.
For example “(C4) Year”. It means a
categorical column with four categories.
The following columns: “product line”, “product type”, “product”, “order
method type” and “retailer country” are also categorical. The number of
categories in each column is different. It is a normal situation.
The last column (Revenue) is a numeric column. You should see “(N)
Revenues”.
The first six columns are input ones (input variables) – their titles
are marked in blue.
The title of the last column is marked in light orange – it means it is
an output column (output variables).
We have to choose that the last column (Revenue) is our target column.
3.2. Analyzing
the dataset
To analyze the dataset, press the “Analyze” button.
Some of the rows are with blue color – training rows.
Other rows are green – validating rows.
Other rows are red – testing rows.
After analyzing some columns are excluded (e.g. product type, product
and retailer country).
3.3. Preparing
the dataset
Press the “Preprocess” button.
3.4. Choosing
the best network architecture
Press the “Design” button.
Press the “Search Architecture” button. You need to wait about one
minute. If the process of searching does not stop, press the “Stop” button.
If you have several possible architectures of the neural network, the
one with the highest value of the fitness function is chosen.
3.5. Training
the neural network
Press the “Train” button. You need to wait about one minute. If the process
of training does not stop, press the “Stop” button. But if you wait about two
minutes, you will get a message “Network training completed”.
3.6. Testing the
network
Press the “Test” button.
Look at the “Visualisation” panel, windows
“Actual vs Output”. If there are great differences, some of the rows have to be
excluded and the previous steps have to be repeated.
3.7. The result
of the neural network
3.7.1. Relative
importance of different factors
Look at the “Training” panel, “Network statistics” panel, button “Input
importance”.
The values in the table must not be interpreted as absolute values. They
have to be imported as ranks.
For example the most important factor affecting sales is the “order
method type”. Factors “year” and “product line” have weaker influence on
revenues.
3.7.2. Query the
network
We want to ask the neural network to predict the sales of camping
equipment for year 2007 for online buyers.
Go to the “Query” panel.
Choose:
Year: 2007
Product line: Camping equipment
Order method type: web
Press the “Manual query” button.
You will see a value in the “Revenue” column in the “Results table”.
This revenue value is given with a percent of probability.
4. Presenting
the result in scientific papers
We did the exercise with Alyuda. Try to
present the result of your research in a meaningful text.
Start with giving the appropriate credit to the DIMBI project.
Continue with some information about the dataset. Put a citation if you
use a dataset created by other people.
Continue with the research questions.
Find publications on the DIMBI project in Scholar Google, RePEc, DOAJ. Check if other
researchers have tried to answer these research questions. Cite them.
Say some words about the appropriate method that may be applied (the
neural network).
Say that others methods (such as One-way ANOVA) may be used.
Say some words about Alyuda Neurointelligence. Argument why we have
chosen it. Give an appropriate credit to the software developer.
Present the result of Alyuda Neurointelligence – say about the input importance of
different factors and the manual query of the network.
If you are ready with the meaningful text, congratulations!
Please, send it to Julian Vasilev (e-mail: vasilev@ue-varna.bg) and Miglena Stoyanova (e-mail: m_stoyanova@ue-varna.bg).
5. Personal work
Task
1. Go back to the
“Analysis” panel. Accept all input columns. Find the input importance of each
factor. Present the result.
Task
2. Filter the
dataset for year 2007. Input the new data into Alyuda.
Find the input importance of each factor. Present the result.
Task
3. Filter the
dataset for year 2007 and order method type “web”. Input the new data into Alyuda. Choose as input columns “product”. Find the input
importance of each factor. Present the result.
Write meaningful descriptions (solutions) of these tasks.
Please, send them to Julian Vasilev (e-mail: vasilev@ue-varna.bg) and Miglena Stoyanova (e-mail: m_stoyanova@ue-varna.bg).
Literature
Alyuda Research,
Alyuda_Neurointelligence. Available at: http://www.alyuda.com/neural-networks-software.htm
[Accessed January 20, 2016].
Hadzhiev, V. et. al, 2009. Statistical
and econometric software, Varna: Science and economics.
Pallant, J., 2011. SPSS SURVIVAL
MANUAL : A step by step guide to data analysis using SPSS, Allen and
Unwin.
Vasilev, J. & Atanasova, T.,
2015. Parallel Testing of Hypotheses with Statistical and Artificial
Intelligence Methods : A Study on Measuring the Complacency from
Education. Computer Science and Applications, 2(5), pp.206–211.