Text classification to generate query params

Vukasin_Mladenovic · December 15, 2023, 2:31am

I have a task to build an ML model that decides what the user wants to see on board or better said to generate params for my database query. I will create my dataset with the correct labels (supervised ML). I would appreciate any advice.

How this should work:
https://excalidraw.com/#json=dkAfIIA_xhqyYAhuepDWl,6pxhRcnSuV6-aF-WCH5mKw

I have an Issue model in a database with several properties:

Status: [Backlog, Unstarted, Started, Completed, Canceled]
Priority: [NoPriority, Low, Medium, High, Urgent]
Assignee: to which user an issue is assigned
Author/creator: Who created an issue
Labels: labels are assigned to an Issue and describe an issue, this can be something like: “Bug”, “Feature”, “Improvement”, “Design”, “DevOps”, etc… (users can create more labels, if this is too much complex I can train a model to recognize just some basic labels).**

Now I want to provide AI search to users:
They give text input and I generate query params.
They can ask to include or exclude something.

Examples:

“Find issues where I am assigned” (issues where assignee === currentUser)
“Find issues where Mike is not assigned” (issues where assignee !== Mike)

Users also can combine categories:
Examples:

“Finished bugs where Joe is not assigned”=> label: “Bug”, assignee: “Joe” (important to know the user wants to exclude Joe), status: “Completed”
“Find high-priority issues where I am not assigned” => Priority: “High”, Assignee: “CurrentUser/me” (exclude)

All statements can be merged.

Dataset

1. How should I structure the dataset? I roughly know how to structure a dataset in order just to extract which category a user wants, but when I came to the context of mentioning (for example does he want to include or exclude “high priority issues”) I am not sure. Should I create labels [include_high, exclude_high, include_low, exclude_low, etc] and mark them with zeros and ones?

2. Should I include as many examples of merged statements? Consider users can merge many statements.

Model

How many layers should I create for this model?
Which prebuild models I can use to create this?
Which algorithms to use?

My thoughts on this is:

Create a category classifier, which will first find the mentioned categories, and mark them with zeros and ones.
Extract sub-category (case category is Priority => low, medium, high…), if category is assignee to use NER for names extraction
Biases for exclude/include

PS. I am a newbie and sorry if this sounds bad.