Subject: Price Prediction Software, Prediction Accuracy Evaluation, Deep Learning
Data Science areas: Feature Analysis, Estimators, Machine Learning
Architectures: Linear Regression, Boosting Regressors, Deep Neural Networks
Tools: Python, Tensorflow, Sklearn
Summary: A team of data scientists at MindCraft applied Machine Learning algorithms to create a universal E-commerce pricing prediction system (Real estate). Our model can apply to any product and be of great business value. The possible industries are E-commerce and Wholesale, Automotive, Real Estate, and even Artwork and Antiques.
Exploratory Data Analysis & Engineering
We started the project research by analyzing the business data and visualizing it. We noticed that some variables are closely correlated with one and other. Some pairs are correlated by nature, such as “Basement-Finished area” and “Basement Unfinished Area” while other pairs were correlated by deduction, such as “Overall condition” and “Year built.” , “The Surroundings”, “Number of School Nearby”, “Medical Infrastructure”, “Parks and Green Zones”, etc.
For the automotive sellers, the product pricing predictions are based on such factors as “Car Brand”, “Model”, “Packaging”, “Engine Type”, “Transmission System”, “Mileage”, “Production Year”, “Popularity”. The model also takes the market situation into account: pricing amplitude – minimum prices, maximum cost, average price, the number of owners, certifications, fixes, etc. The season can also affect the car pricing.
Flexible Architecture Selection
Linear regression can be understood and implemented using, for example, Excel. It is not often thought that the final results exceed 70-80% of relative accuracy. MindCraft’s team of data analysts and developers created a house pricing predictions engine using linear regression on MS SQL script. The performance was great, but the accuracy was not good enough
Here is why. The age of the vehicle can have a great impact on the price and not in a linear way. For example, as time goes by car becomes cheaper. Sometime later the price might go up again. The car will already be considered a precious rarity.
Such modeling requires more flexible architectures like Boosting Regressors or Neural Networks.
Building a Machine Learning Solution
We started our new PoC with importing and processing data to filter out the anomalies. Then, the categorical data was normalized and converted. The numerical representation can be better grasped by the machine learning components. This required to manually analyze dozens of features and categories and select processing limits and methods.
We achieved almost 95% of average relative accuracy for product pricing. However, a new challenge was looming on the horizon. We wondered how we can know for sure whether the prices prediction is good. A general approach would be to apply statistical methods like count training samples around the prediction and measure dispersion.
We decided to try something else. Instead, we trained another DNN with the same architecture on the test data set. We targeted the model at the accuracy itself instead of the prices value. This approach helped us reach 95% of “accuracy of the accuracy” on the test data. It was pretty enough to cut off the bad predictions.
Finally, we ended up with a model that can predict the car price and, most importantly, predict the accuracy of this particular result. Applied to any product price prediction system, our model can be of great business value, no matter the industry: from e-commerce and wholesale to automotive, real estate and even artwork and antiques.
We managed to create a business intelligence system that not only predicts the prices of products but also measures its own accuracy. This will help to keep the error rate to the minimum. The main advantage of this model is that it’s highly customizable and can be applied by sellers in any subdomain of the E-commerce business, be it Real Estate, Automotive, Valuables Market, Retail, or others.