Text classification to generate query params

I have a task to build an ML model that decides what the user wants to see on board or better said to generate params for my database query. I will create my dataset with the correct labels (supervised ML). I would appreciate any advice.

How this should work:
https://excalidraw.com/#json=dkAfIIA_xhqyYAhuepDWl,6pxhRcnSuV6-aF-WCH5mKw

I have an Issue model in a database with several properties:

  • Status: [Backlog, Unstarted, Started, Completed, Canceled]
  • Priority: [NoPriority, Low, Medium, High, Urgent]
  • Assignee: to which user an issue is assigned
  • Author/creator: Who created an issue
  • Labels: labels are assigned to an Issue and describe an issue, this can be something like: “Bug”, “Feature”, “Improvement”, “Design”, “DevOps”, etc… (users can create more labels, if this is too much complex I can train a model to recognize just some basic labels).**

Now I want to provide AI search to users:
They give text input and I generate query params.
They can ask to include or exclude something.

Examples:

  • “Find issues where I am assigned” (issues where assignee === currentUser)
  • “Find issues where Mike is not assigned” (issues where assignee !== Mike)

Users also can combine categories:
Examples:

  • “Finished bugs where Joe is not assigned”=> label: “Bug”, assignee: “Joe” (important to know the user wants to exclude Joe), status: “Completed”

  • “Find high-priority issues where I am not assigned” => Priority: “High”, Assignee: “CurrentUser/me” (exclude)

All statements can be merged.

Dataset

1. How should I structure the dataset? I roughly know how to structure a dataset in order just to extract which category a user wants, but when I came to the context of mentioning (for example does he want to include or exclude “high priority issues”) I am not sure. Should I create labels [include_high, exclude_high, include_low, exclude_low, etc] and mark them with zeros and ones?

2. Should I include as many examples of merged statements? Consider users can merge many statements.

Model

  • How many layers should I create for this model?

  • Which prebuild models I can use to create this?

  • Which algorithms to use?

My thoughts on this is:

  • Create a category classifier, which will first find the mentioned categories, and mark them with zeros and ones.

  • Extract sub-category (case category is Priority => low, medium, high…), if category is assignee to use NER for names extraction

  • Biases for exclude/include

PS. I am a newbie and sorry if this sounds bad. :slight_smile: