I have a task to build an ML model that decides what the user wants to see on board or better said to generate params for my database query. I will create my dataset with the correct labels (supervised ML). I would appreciate any advice.
How this should work:
https://excalidraw.com/#json=dkAfIIA_xhqyYAhuepDWl,6pxhRcnSuV6-aF-WCH5mKw
I have an Issue model in a database with several properties:
- Status: [Backlog, Unstarted, Started, Completed, Canceled]
- Priority: [NoPriority, Low, Medium, High, Urgent]
- Assignee: to which user an issue is assigned
- Author/creator: Who created an issue
- Labels: labels are assigned to an Issue and describe an issue, this can be something like: “Bug”, “Feature”, “Improvement”, “Design”, “DevOps”, etc… (users can create more labels, if this is too much complex I can train a model to recognize just some basic labels).**
Now I want to provide AI search to users:
They give text input and I generate query params.
They can ask to include or exclude something.
Examples:
- “Find issues where I am assigned” (issues where assignee === currentUser)
- “Find issues where Mike is not assigned” (issues where assignee !== Mike)
Users also can combine categories:
Examples:
-
“Finished bugs where Joe is not assigned”=> label: “Bug”, assignee: “Joe” (important to know the user wants to exclude Joe), status: “Completed”
-
“Find high-priority issues where I am not assigned” => Priority: “High”, Assignee: “CurrentUser/me” (exclude)
All statements can be merged.
Dataset
1. How should I structure the dataset? I roughly know how to structure a dataset in order just to extract which category a user wants, but when I came to the context of mentioning (for example does he want to include or exclude “high priority issues”) I am not sure. Should I create labels [include_high, exclude_high, include_low, exclude_low, etc] and mark them with zeros and ones?
2. Should I include as many examples of merged statements? Consider users can merge many statements.
Model
-
How many layers should I create for this model?
-
Which prebuild models I can use to create this?
-
Which algorithms to use?
My thoughts on this is:
-
Create a category classifier, which will first find the mentioned categories, and mark them with zeros and ones.
-
Extract sub-category (case category is Priority => low, medium, high…), if category is assignee to use NER for names extraction
-
Biases for exclude/include
PS. I am a newbie and sorry if this sounds bad.