Machine Learning Solution for Extracting Information
Oftentimes before we get to develop the Machine Learning Solution for Extracting Information, we need to test several approaches to see if it really is possible to solve the task and if yes, what the optimal solution would be. We develop the so-called PoC, Proofs of Concept, where we test various things applying just the minimum amount of effort to see if they work.
Read more ❯Classifying SpaCy’s ORG Named Entities with Machine Learning
Recently, we were tagging a lot of texts with spaCy in order to extract organization names. We faced a problem: many entities tagged by spaCy were not valid organization names at all. And it wasn’t actually the problem of spaCy itself: all extracted entities, at first sight, did look like organization names. The result could be better if we trained spaCy models more. However, this approach required a large corpus of properly labeled data which should also include a proper context. So we needed a simpler solution to filter out the wrong data.
Read more ❯Automated Document Classifier Solution For Banking
Our Main Challenges in Document Classification
A retail bank addressed MindCraft asking for help with document classification. Their organization has an input queue of documents, scanned or captured with a camera or cell phone. Before the information can be processed, OCRed, and stored, the documents need to be classified by type. The reason is that different types of text content can be processed in different ways. Some can be easily captured by fields and OCRed. Others, like handwriting, need manual tagging and then storing. Types of documents can vary (as shown below):
- a regular printed letter
- a handwritten document containing a table
- a mixed-type document
AI Document Recognition Software for FinTech
MindCraft helped to automate the document capture and recognition for a client in the Banking industry using AI Document Recognition Software. The system can process documents for any domain and containing any kind of content, from handwritten text to fields and tables.
Read more ❯AI Searching for Related Organizations Addresses with Web-Search, spaCy, and RegExes
Recently, one of our clients has contacted us with an interesting problem. It was an investment company, which needed to calculate potential risks for their objects of investment. They required a solution that would help generate a list of all addresses (or GPS coordinates) that were associated with a specific company name.
Read more ❯How Anomaly Detection Can Boost Your Business
Every business produces and stores considerable amounts of data, regardless of its domain. Whether it deals in Retail, Fintech, eCommerce, or Manufacturing, it operates hundreds of thousands of units of data. Using ML and AI solutions, data analysts can enable such basic processes as automated data processing, predictions, clustering, and others. One of the widely used methods is called Anomaly Detection and it allows you to look at your business from a completely different angle.
Read more ❯Machine Learning-Based Sales Forecasting Tool for Automotive
We developed a Python-based Machine Learning solution for the Sales Forecasting toll. We do this for a client in the Automotive Industry, the product analyzes Sales Time Series data, predicts buying behavior, and helps to boost Business Intelligence in Retail.
Read more ❯Predictive Sales Analytics Tool for Special Offers Evaluation
This machine learning tool doesn’t replace the professional employee with the knowledge of the industry. However, Predictive Sales Analytics it takes about one-fifth of the time this employee would spend on all data lookups and comparisons. What was done manually using MS Excel and the accounting software – is now all in a simple and elegant tool. In fact, the predictive sales analytics method we used can be of great benefit to any B2B or B2C retail business. With a little customization, this data-driven technology can help organizations quickly make the right marketing decisions
Read more ❯Сomputer Vision Selective Object Recognition
MindCraft АI Research Lab is on a roll!
MindCraft АI Research Lab This time our task was quite well-known – object recognition using Internet Protocol-based surveillance cameras. The demand for this technology is booming right now. Especially for the specific, selective recognition, identifying objects which possess specific predetermined character eristics. In our case, we were looking for a person wearing red clothes. We were trying to catch the exact time when this person enters the monitored area, and when they leave.
How Anomaly Detection Can Boost Your Business
Every business produces and stores considerable amounts of data, regardless of its domain. Whether it deals in Retail, Fintech, eCommerce, or Manufacturing, it operates hundreds of thousands of units of data. Using ML and AI solutions, data analysts can enable such basic processes as automated data processing, predictions, clustering, and others. One of the widely used methods is called Anomaly Detection and it allows you to look at your business from a completely different angle.
Read more ❯Collecting POI from Cameras Using AI
Maps Services, like those in Google and Bing, usually provide us with a picture of what the schema of streets and buildings looks like from above. They also give us the ability to check for additional information, like traffic data and points of interest (POI). The problem is that this list of POI is typically limited and does not include specific things like speed limits, bus stops, or parking signs.
We decided to create an AI engine that would be able to collect POI right from the street cameras and place them on the map using machine learning. In this article, we will stick with the speed limit signs.
Read more ❯Time Series Analysis and Sales Forecasting for Automotive
Time-series Sales forecasting is one of the most important topics in every business, helping to process data taken over a long period of time. Stock exchange, logistics, retail are classic industries where the ability to build predictive models becomes a crucial differentiator in a highly-competitive business environment. In our article we’ll try to explain how time series analysis and sales forecasting methods can be used for typical business tasks: finding hidden patterns, detecting trends in sales over the years, and predicting sales in the future.
Read more ❯Machine Learning-Based Sales Forecasting Tool for Automotive
We developed a Python-based Machine Learning solution for the Sales Forecasting toll. We do this for a client in the Automotive Industry, the product analyzes Sales Time Series data, predicts buying behavior, and helps to boost Business Intelligence in Retail.
Read more ❯How Anomaly Detection Can Boost Your Business
Every business produces and stores considerable amounts of data, regardless of its domain. Whether it deals in Retail, Fintech, eCommerce, or Manufacturing, it operates hundreds of thousands of units of data. Using ML and AI solutions, data analysts can enable such basic processes as automated data processing, predictions, clustering, and others. One of the widely used methods is called Anomaly Detection and it allows you to look at your business from a completely different angle.
Read more ❯Classifying SpaCy’s ORG Named Entities with Machine Learning
Recently, we were tagging a lot of texts with spaCy in order to extract organization names. We faced a problem: many entities tagged by spaCy were not valid organization names at all. And it wasn’t actually the problem of spaCy itself: all extracted entities, at first sight, did look like organization names. The result could be better if we trained spaCy models more. However, this approach required a large corpus of properly labeled data which should also include a proper context. So we needed a simpler solution to filter out the wrong data.
Read more ❯AI Searching for Related Organizations Addresses with Web-Search, spaCy, and RegExes
Recently, one of our clients has contacted us with an interesting problem. It was an investment company, which needed to calculate potential risks for their objects of investment. They required a solution that would help generate a list of all addresses (or GPS coordinates) that were associated with a specific company name.
Read more ❯Artificial Intelligence In E-commerce
It is important to point out that this article is only a summary of the basic capabilities of AI in e-commerce. We created it to give you a better understanding of what Data Science can do for your business, and how it is already applied by companies in your industry.
To find out how Data Science can solve the challenges of your particular company, we need to see the whole picture and work with your data. The best way to do it is during an individual consultation.
Read more ❯Time Series Analysis and Sales Forecasting for Automotive
Time-series Sales forecasting is one of the most important topics in every business, helping to process data taken over a long period of time. Stock exchange, logistics, retail are classic industries where the ability to build predictive models becomes a crucial differentiator in a highly-competitive business environment. In our article we’ll try to explain how time series analysis and sales forecasting methods can be used for typical business tasks: finding hidden patterns, detecting trends in sales over the years, and predicting sales in the future.
Read more ❯Machine Learning-Based Sales Forecasting Tool for Automotive
We developed a Python-based Machine Learning solution for the Sales Forecasting toll. We do this for a client in the Automotive Industry, the product analyzes Sales Time Series data, predicts buying behavior, and helps to boost Business Intelligence in Retail.
Read more ❯Сomputer Vision Selective Object Recognition
MindCraft АI Research Lab is on a roll!
MindCraft АI Research Lab This time our task was quite well-known – object recognition using Internet Protocol-based surveillance cameras. The demand for this technology is booming right now. Especially for the specific, selective recognition, identifying objects which possess specific predetermined character eristics. In our case, we were looking for a person wearing red clothes. We were trying to catch the exact time when this person enters the monitored area, and when they leave.
How Anomaly Detection Can Boost Your Business
Every business produces and stores considerable amounts of data, regardless of its domain. Whether it deals in Retail, Fintech, eCommerce, or Manufacturing, it operates hundreds of thousands of units of data. Using ML and AI solutions, data analysts can enable such basic processes as automated data processing, predictions, clustering, and others. One of the widely used methods is called Anomaly Detection and it allows you to look at your business from a completely different angle.
Read more ❯Сomputer Vision Selective Object Recognition
MindCraft АI Research Lab is on a roll!
MindCraft АI Research Lab This time our task was quite well-known – object recognition using Internet Protocol-based surveillance cameras. The demand for this technology is booming right now. Especially for the specific, selective recognition, identifying objects which possess specific predetermined character eristics. In our case, we were looking for a person wearing red clothes. We were trying to catch the exact time when this person enters the monitored area, and when they leave.
Machine Learning Automation & AI Model for the Farm Industry
Build an Object Detection Tool for Corn Kernel Recognition (PoC)
A Partner of a farming company reached out to MindCraft with a request to develop a Machine Learning Automation model that could count the corn kernels on corn using a 2d photo. Kernels calculation is currently done manually, using a certain algorithm allowing workers to count the corn grains. An automatic solution for corn calculation would help our client automate the tedious manual work of separate departments.
How Anomaly Detection Can Boost Your Business
Every business produces and stores considerable amounts of data, regardless of its domain. Whether it deals in Retail, Fintech, eCommerce, or Manufacturing, it operates hundreds of thousands of units of data. Using ML and AI solutions, data analysts can enable such basic processes as automated data processing, predictions, clustering, and others. One of the widely used methods is called Anomaly Detection and it allows you to look at your business from a completely different angle.
Read more ❯Machine Learning Solution for Extracting Information
Oftentimes before we get to develop the Machine Learning Solution for Extracting Information, we need to test several approaches to see if it really is possible to solve the task and if yes, what the optimal solution would be. We develop the so-called PoC, Proofs of Concept, where we test various things applying just the minimum amount of effort to see if they work.
Read more ❯Time Series Analysis and Sales Forecasting for Automotive
Time-series Sales forecasting is one of the most important topics in every business, helping to process data taken over a long period of time. Stock exchange, logistics, retail are classic industries where the ability to build predictive models becomes a crucial differentiator in a highly-competitive business environment. In our article we’ll try to explain how time series analysis and sales forecasting methods can be used for typical business tasks: finding hidden patterns, detecting trends in sales over the years, and predicting sales in the future.
Read more ❯Predictive Sales Analytics Tool for Special Offers Evaluation
This machine learning tool doesn’t replace the professional employee with the knowledge of the industry. However, Predictive Sales Analytics it takes about one-fifth of the time this employee would spend on all data lookups and comparisons. What was done manually using MS Excel and the accounting software – is now all in a simple and elegant tool. In fact, the predictive sales analytics method we used can be of great benefit to any B2B or B2C retail business. With a little customization, this data-driven technology can help organizations quickly make the right marketing decisions
Read more ❯AI Searching for Related Organizations Addresses with Web-Search, spaCy, and RegExes
Recently, one of our clients has contacted us with an interesting problem. It was an investment company, which needed to calculate potential risks for their objects of investment. They required a solution that would help generate a list of all addresses (or GPS coordinates) that were associated with a specific company name.
Read more ❯Meet The MindCraft AI Research Lab!
We are happy to announce the opening of the MindCraft AI R&D Lab. It’s been six months since we launched our Research Lab project in a test mode. The Mindcraft Research Lab focuses on startups and scale-ups urgently looking for insights into their AI-based projects to be presented to investors in the nearest future. In our R&D Lab, the Mindcraft team will help you create a vision of your product, by means of rapidly building a Demo, a well-structured and comprehensive visualization of your product idea which you can present to your investors or customers as an initial prototype.
Read more ❯Active Learning on MNIST – Saving on Labeling
Active Learning is a semi-supervised technique that allows labeling less data by selecting the most important samples from the learning process (loss) standpoint. It can have a huge impact on the project cost in the case when the amount of data is large and the labeling rate is high. For example, object detection and NLP-NER problems.
Read more ❯Differentiable Programming – Inverse Graphics AutoEncoder
DeepLearning classifier, LSTM, YOLO detector, Variational AutoEncoder, GAN – are these guys truly architectures in sense meta-programs or just wise implementations of ideas on how to solve particular optimization problems? Are machine learning engineers actually developers of decision systems or just operators of GPU-enabled computers with a predefined parameterized optimization program?
Read more ❯Automated NLP Without Much Effort
While building models for NLP on our own and from scratch, we were constantly communicating with our clients and asking ourselves “Is there any faster & simpler way for NLP tasks, where no specific knowledge or experience in (NLP) natural language processing is required?
Read more ❯Checking Document Outline with LLM
Introduction
Let’s solve the problem of an organization receiving multiple documents that should have the same information, but different formats, and styles and even sometimes the full scope of the document is missing. Let’s consider that multiple CVs or NDAs are arriving and we need to check if they fit in some generic template and contain all required information.
Initially, we tackle these tasks by converting documents into text using OCR.). Let it be an NDA example:
Subsequently, we remove empty spaces and add line numbers throughout the entire document.
This simplifies the task for LLM, allowing it to segment the document effectively.. Now we can use prompt engineering to ask the model generate line numbers where document sections starts and propose names for the sections:
As a result, we want to receive a list of line numbers and document section names:
Of course, for larger documents, we need to break the text into chunks and process it chunk by chunk. Having enough big dataset of the same types of documents we can receive multiple versions of the same document section name. For example “Parties Identifications” can be just “Parties”. To prepare a universal document template we use text embeddings (for example with OpenAI adav2 model) to collect vector representation for each text section. After applying agglomerative clustering we will come up with a nice structure of document sections, even if they can have slightly different names:
In this picture, one can see multiple sections collected in clusters that should share the same name.
Then we can analyze the section names statistics and find what name is most often used and normalize synonyms with this correct name:
Having such a structure of a document as a list of required sections, we will use it to check with LLM each incoming document if it contains the section name.
Summary
Consequently, we automate the verification of standard documents such as declarations and CVs against a predefined template created automatically using LLM prompt engineering, embeddings, and unsupervised machine learning
Machine Learning Solution for Extracting Information
Oftentimes before we get to develop the Machine Learning Solution for Extracting Information, we need to test several approaches to see if it really is possible to solve the task and if yes, what the optimal solution would be. We develop the so-called PoC, Proofs of Concept, where we test various things applying just the minimum amount of effort to see if they work.
Read more ❯Active Learning on MNIST – Saving on Labeling
Active Learning is a semi-supervised technique that allows labeling less data by selecting the most important samples from the learning process (loss) standpoint. It can have a huge impact on the project cost in the case when the amount of data is large and the labeling rate is high. For example, object detection and NLP-NER problems.
Read more ❯Differentiable Programming – Inverse Graphics AutoEncoder
DeepLearning classifier, LSTM, YOLO detector, Variational AutoEncoder, GAN – are these guys truly architectures in sense meta-programs or just wise implementations of ideas on how to solve particular optimization problems? Are machine learning engineers actually developers of decision systems or just operators of GPU-enabled computers with a predefined parameterized optimization program?
Read more ❯Automated NLP Without Much Effort
While building models for NLP on our own and from scratch, we were constantly communicating with our clients and asking ourselves “Is there any faster & simpler way for NLP tasks, where no specific knowledge or experience in (NLP) natural language processing is required?
Read more ❯Named Entity Recognition (NER) – briefly about the current state.
At the moment all available NER approaches fall into two big categories: useful for applied NLP problems and oriented for primarily scientific development. Much work is in progress to close the gap but it is still wide especially after so-called BERT explosion. An excellent example of a library for applied NLP is spaCy covered in depth later.
Read more ❯How Anomaly Detection Can Boost Your Business
Every business produces and stores considerable amounts of data, regardless of its domain. Whether it deals in Retail, Fintech, eCommerce, or Manufacturing, it operates hundreds of thousands of units of data. Using ML and AI solutions, data analysts can enable such basic processes as automated data processing, predictions, clustering, and others. One of the widely used methods is called Anomaly Detection and it allows you to look at your business from a completely different angle.
Read more ❯AI & Machine Learning – engine who can hight your Demand for your Services
Partners often reach out to MindCraft with a request to help create AI & Machine Learning – engine who can hight their Demand for your Services. And this task sounds very appealing to us. After conducting careful research into our clients’ services and the selection of effective solutions we were able to generate an important knowledge pool of means that can be really effective in achieving this goal and those that cannot.
Read more ❯Checking Document Outline with LLM
Introduction
Let’s solve the problem of an organization receiving multiple documents that should have the same information, but different formats, and styles and even sometimes the full scope of the document is missing. Let’s consider that multiple CVs or NDAs are arriving and we need to check if they fit in some generic template and contain all required information.
Initially, we tackle these tasks by converting documents into text using OCR.). Let it be an NDA example:
Subsequently, we remove empty spaces and add line numbers throughout the entire document.
This simplifies the task for LLM, allowing it to segment the document effectively.. Now we can use prompt engineering to ask the model generate line numbers where document sections starts and propose names for the sections:
As a result, we want to receive a list of line numbers and document section names:
Of course, for larger documents, we need to break the text into chunks and process it chunk by chunk. Having enough big dataset of the same types of documents we can receive multiple versions of the same document section name. For example “Parties Identifications” can be just “Parties”. To prepare a universal document template we use text embeddings (for example with OpenAI adav2 model) to collect vector representation for each text section. After applying agglomerative clustering we will come up with a nice structure of document sections, even if they can have slightly different names:
In this picture, one can see multiple sections collected in clusters that should share the same name.
Then we can analyze the section names statistics and find what name is most often used and normalize synonyms with this correct name:
Having such a structure of a document as a list of required sections, we will use it to check with LLM each incoming document if it contains the section name.
Summary
Consequently, we automate the verification of standard documents such as declarations and CVs against a predefined template created automatically using LLM prompt engineering, embeddings, and unsupervised machine learning
AI & Machine Learning solution for Business: Getting Started
If you combine deep knowledge of your business needs with a thorough understanding of what Machine Learning can and cannot do, you’ll be able to automate many of your routine processes, redirecting your efforts and resources into more important and strategic zones.
Read more:
Read more ❯Engagement Model for Machine Learning Projects
The engagement model determines the basis for cooperation between a client and a vendor. For example, in software development, it defines the most mission-critical aspects: how the project will evolve (in a linear or flexible way), how the team will be assembled (hand-picked by the client to work exclusively on the project or assigned by the vendor), how the payments will be made (in a fixed bid, monthly, weekly, etc.). From our experience, a wisely selected engagement model can increase the chances of project success by 50%. If you choose the wrong one, it can create obstacles that the project will never be able to overcome and recover from.
Read more ❯Meet The MindCraft AI Research Lab!
We are happy to announce the opening of the MindCraft AI R&D Lab. It’s been six months since we launched our Research Lab project in a test mode. The Mindcraft Research Lab focuses on startups and scale-ups urgently looking for insights into their AI-based projects to be presented to investors in the nearest future. In our R&D Lab, the Mindcraft team will help you create a vision of your product, by means of rapidly building a Demo, a well-structured and comprehensive visualization of your product idea which you can present to your investors or customers as an initial prototype.
Read more ❯Сomputer Vision Selective Object Recognition
MindCraft АI Research Lab is on a roll!
MindCraft АI Research Lab This time our task was quite well-known – object recognition using Internet Protocol-based surveillance cameras. The demand for this technology is booming right now. Especially for the specific, selective recognition, identifying objects which possess specific predetermined character eristics. In our case, we were looking for a person wearing red clothes. We were trying to catch the exact time when this person enters the monitored area, and when they leave.
Machine Learning Automation & AI Model for the Farm Industry
Build an Object Detection Tool for Corn Kernel Recognition (PoC)
A Partner of a farming company reached out to MindCraft with a request to develop a Machine Learning Automation model that could count the corn kernels on corn using a 2d photo. Kernels calculation is currently done manually, using a certain algorithm allowing workers to count the corn grains. An automatic solution for corn calculation would help our client automate the tedious manual work of separate departments.
Active Learning on MNIST – Saving on Labeling
Active Learning is a semi-supervised technique that allows labeling less data by selecting the most important samples from the learning process (loss) standpoint. It can have a huge impact on the project cost in the case when the amount of data is large and the labeling rate is high. For example, object detection and NLP-NER problems.
Read more ❯Automated NLP Without Much Effort
While building models for NLP on our own and from scratch, we were constantly communicating with our clients and asking ourselves “Is there any faster & simpler way for NLP tasks, where no specific knowledge or experience in (NLP) natural language processing is required?
Read more ❯Named Entity Recognition (NER) – briefly about the current state.
At the moment all available NER approaches fall into two big categories: useful for applied NLP problems and oriented for primarily scientific development. Much work is in progress to close the gap but it is still wide especially after so-called BERT explosion. An excellent example of a library for applied NLP is spaCy covered in depth later.
Read more ❯AI & Machine Learning – engine who can hight your Demand for your Services
Partners often reach out to MindCraft with a request to help create AI & Machine Learning – engine who can hight their Demand for your Services. And this task sounds very appealing to us. After conducting careful research into our clients’ services and the selection of effective solutions we were able to generate an important knowledge pool of means that can be really effective in achieving this goal and those that cannot.
Read more ❯Checking Document Outline with LLM
Introduction
Let’s solve the problem of an organization receiving multiple documents that should have the same information, but different formats, and styles and even sometimes the full scope of the document is missing. Let’s consider that multiple CVs or NDAs are arriving and we need to check if they fit in some generic template and contain all required information.
Initially, we tackle these tasks by converting documents into text using OCR.). Let it be an NDA example:
Subsequently, we remove empty spaces and add line numbers throughout the entire document.
This simplifies the task for LLM, allowing it to segment the document effectively.. Now we can use prompt engineering to ask the model generate line numbers where document sections starts and propose names for the sections:
As a result, we want to receive a list of line numbers and document section names:
Of course, for larger documents, we need to break the text into chunks and process it chunk by chunk. Having enough big dataset of the same types of documents we can receive multiple versions of the same document section name. For example “Parties Identifications” can be just “Parties”. To prepare a universal document template we use text embeddings (for example with OpenAI adav2 model) to collect vector representation for each text section. After applying agglomerative clustering we will come up with a nice structure of document sections, even if they can have slightly different names:
In this picture, one can see multiple sections collected in clusters that should share the same name.
Then we can analyze the section names statistics and find what name is most often used and normalize synonyms with this correct name:
Having such a structure of a document as a list of required sections, we will use it to check with LLM each incoming document if it contains the section name.
Summary
Consequently, we automate the verification of standard documents such as declarations and CVs against a predefined template created automatically using LLM prompt engineering, embeddings, and unsupervised machine learning
Machine Learning Solution for Extracting Information
Oftentimes before we get to develop the Machine Learning Solution for Extracting Information, we need to test several approaches to see if it really is possible to solve the task and if yes, what the optimal solution would be. We develop the so-called PoC, Proofs of Concept, where we test various things applying just the minimum amount of effort to see if they work.
Read more ❯AI Searching for Related Organizations Addresses with Web-Search, spaCy, and RegExes
Recently, one of our clients has contacted us with an interesting problem. It was an investment company, which needed to calculate potential risks for their objects of investment. They required a solution that would help generate a list of all addresses (or GPS coordinates) that were associated with a specific company name.
Read more ❯Artificial Intelligence In E-commerce
It is important to point out that this article is only a summary of the basic capabilities of AI in e-commerce. We created it to give you a better understanding of what Data Science can do for your business, and how it is already applied by companies in your industry.
To find out how Data Science can solve the challenges of your particular company, we need to see the whole picture and work with your data. The best way to do it is during an individual consultation.
Read more ❯Engagement Model for Machine Learning Projects
The engagement model determines the basis for cooperation between a client and a vendor. For example, in software development, it defines the most mission-critical aspects: how the project will evolve (in a linear or flexible way), how the team will be assembled (hand-picked by the client to work exclusively on the project or assigned by the vendor), how the payments will be made (in a fixed bid, monthly, weekly, etc.). From our experience, a wisely selected engagement model can increase the chances of project success by 50%. If you choose the wrong one, it can create obstacles that the project will never be able to overcome and recover from.
Read more ❯Meet The MindCraft AI Research Lab!
We are happy to announce the opening of the MindCraft AI R&D Lab. It’s been six months since we launched our Research Lab project in a test mode. The Mindcraft Research Lab focuses on startups and scale-ups urgently looking for insights into their AI-based projects to be presented to investors in the nearest future. In our R&D Lab, the Mindcraft team will help you create a vision of your product, by means of rapidly building a Demo, a well-structured and comprehensive visualization of your product idea which you can present to your investors or customers as an initial prototype.
Read more ❯Active Learning on MNIST – Saving on Labeling
Active Learning is a semi-supervised technique that allows labeling less data by selecting the most important samples from the learning process (loss) standpoint. It can have a huge impact on the project cost in the case when the amount of data is large and the labeling rate is high. For example, object detection and NLP-NER problems.
Read more ❯Differentiable Programming – Inverse Graphics AutoEncoder
DeepLearning classifier, LSTM, YOLO detector, Variational AutoEncoder, GAN – are these guys truly architectures in sense meta-programs or just wise implementations of ideas on how to solve particular optimization problems? Are machine learning engineers actually developers of decision systems or just operators of GPU-enabled computers with a predefined parameterized optimization program?
Read more ❯Automated NLP Without Much Effort
While building models for NLP on our own and from scratch, we were constantly communicating with our clients and asking ourselves “Is there any faster & simpler way for NLP tasks, where no specific knowledge or experience in (NLP) natural language processing is required?
Read more ❯Named Entity Recognition (NER) – briefly about the current state.
At the moment all available NER approaches fall into two big categories: useful for applied NLP problems and oriented for primarily scientific development. Much work is in progress to close the gap but it is still wide especially after so-called BERT explosion. An excellent example of a library for applied NLP is spaCy covered in depth later.
Read more ❯Checking Document Outline with LLM
Introduction
Let’s solve the problem of an organization receiving multiple documents that should have the same information, but different formats, and styles and even sometimes the full scope of the document is missing. Let’s consider that multiple CVs or NDAs are arriving and we need to check if they fit in some generic template and contain all required information.
Initially, we tackle these tasks by converting documents into text using OCR.). Let it be an NDA example:
Subsequently, we remove empty spaces and add line numbers throughout the entire document.
This simplifies the task for LLM, allowing it to segment the document effectively.. Now we can use prompt engineering to ask the model generate line numbers where document sections starts and propose names for the sections:
As a result, we want to receive a list of line numbers and document section names:
Of course, for larger documents, we need to break the text into chunks and process it chunk by chunk. Having enough big dataset of the same types of documents we can receive multiple versions of the same document section name. For example “Parties Identifications” can be just “Parties”. To prepare a universal document template we use text embeddings (for example with OpenAI adav2 model) to collect vector representation for each text section. After applying agglomerative clustering we will come up with a nice structure of document sections, even if they can have slightly different names:
In this picture, one can see multiple sections collected in clusters that should share the same name.
Then we can analyze the section names statistics and find what name is most often used and normalize synonyms with this correct name:
Having such a structure of a document as a list of required sections, we will use it to check with LLM each incoming document if it contains the section name.
Summary
Consequently, we automate the verification of standard documents such as declarations and CVs against a predefined template created automatically using LLM prompt engineering, embeddings, and unsupervised machine learning
Machine Learning Solution for Extracting Information
Oftentimes before we get to develop the Machine Learning Solution for Extracting Information, we need to test several approaches to see if it really is possible to solve the task and if yes, what the optimal solution would be. We develop the so-called PoC, Proofs of Concept, where we test various things applying just the minimum amount of effort to see if they work.
Read more ❯Active Learning on MNIST – Saving on Labeling
Active Learning is a semi-supervised technique that allows labeling less data by selecting the most important samples from the learning process (loss) standpoint. It can have a huge impact on the project cost in the case when the amount of data is large and the labeling rate is high. For example, object detection and NLP-NER problems.
Read more ❯Differentiable Programming – Inverse Graphics AutoEncoder
DeepLearning classifier, LSTM, YOLO detector, Variational AutoEncoder, GAN – are these guys truly architectures in sense meta-programs or just wise implementations of ideas on how to solve particular optimization problems? Are machine learning engineers actually developers of decision systems or just operators of GPU-enabled computers with a predefined parameterized optimization program?
Read more ❯Automated NLP Without Much Effort
While building models for NLP on our own and from scratch, we were constantly communicating with our clients and asking ourselves “Is there any faster & simpler way for NLP tasks, where no specific knowledge or experience in (NLP) natural language processing is required?
Read more ❯Named Entity Recognition (NER) – briefly about the current state.
At the moment all available NER approaches fall into two big categories: useful for applied NLP problems and oriented for primarily scientific development. Much work is in progress to close the gap but it is still wide especially after so-called BERT explosion. An excellent example of a library for applied NLP is spaCy covered in depth later.
Read more ❯How Anomaly Detection Can Boost Your Business
Every business produces and stores considerable amounts of data, regardless of its domain. Whether it deals in Retail, Fintech, eCommerce, or Manufacturing, it operates hundreds of thousands of units of data. Using ML and AI solutions, data analysts can enable such basic processes as automated data processing, predictions, clustering, and others. One of the widely used methods is called Anomaly Detection and it allows you to look at your business from a completely different angle.
Read more ❯Latest News
Retrieval-Augmented Generation (RAG) is revolutionizing the way AI interacts with information. By enabling large language models t... read more
OpenAI’s Next-Gen Model OpenAI is poised to revolutionize AI once again with its upcoming model, codenamed “Orion.R... read more
The Evolution of Prompt Engineering Prompt engineering, the art of crafting effective prompts for AI models, has become a critical... read more