| --- |
| license: mit |
| language: |
| - en |
| metrics: |
| - accuracy |
| pipeline_tag: text-generation |
| widget: |
| - text: "<schema>CREATE TABLE radio(age VARCHAR, radio_id VARCHAR, frequency VARCHAR, wavelength VARCHAR); CREATE TABLE radio_faults(radio_id VARCHAR, fault_description VARCHAR)</schema><question>Get the radio id and defect descriptions of radios that have wavelength greater than 30 ?</question><sql>" |
| example_title: "example1" |
| - text: "<schema>CREATE TABLE system(JobID: String,GID: String, UID: String, Start:Time(yyyy/mm/dd), End: Time,ElapsedRaw: Time, CPUTimeRAW: Time,NCPUS: Number,NNodes: Number, NodeList: List, State:String, Timelimit: Time);</schema><question>Get UID and job id for Jobs that started on Jan 20 , 2023</question><sql>" |
| example_title: "example2" |
| - text: "<schema>CREATE TABLE department (Department_ID number, Name text, Creation text, Ranking number, Budget_in_Billions number, Num_Employees number) which has Department_ID as primary key abd CREATE TABLE head (head_ID number, name text, born_state text, age number) which has head_ID as primary key and CREATE TABLE management (department_ID number, head_ID number, temporary_acting text) which has department_ID as primary key</schema><question>" |
| example_title: "example3" |
| tags: |
| - code |
| - sql |
| - text2sql |
| - instruction_tuned |
| - jax |
| - pytorch |
| - 1b |
| - expert |
| datasets: |
| - PipableAI/spider-bird |
| --- |
| # Pipable’s pipSQL |
| Please refer to https://huggingface.co/PipableAI/pipSQL-1.3b for our state of the art model, that gives better performance than chatgpt and claude on sql tasks on a lot of benchmarks. |
|
|
|
|
| Pipable’s pipSQL is a model distilled from llama 1b to generate sql queries given prompt and schema. |
| We used a unique pipeline which involved the model working on two objectives alternatively ---- |
| 1. Maximizing the log prob of all tokens in the sequence (including the prompt tokens) |
| 2. Minimizng the difference between the true value and the predicted maximum value of the output tokens i.e generated tokens for the sql query slice of the entire sequence. |
|
|
|
|
|
|
|
|
|
|
| ## License |
|
|
| The model's new weights along with all other assets involved with it are open sourced under mit license. |
|
|
| ## How to Use |
|
|
| ```python |
| text = """<schema>{schema}</schema> |
| <question>{question}</question> |
| <sql>""" |
| ``` |
| pytorch |
|
|
| ```python |
| from transformers import AutoModelForCasualLM, AutoTokenizer |
| device = "cuda" |
| model = AutoModelForCausalLM.from_pretrained("PipableAI/pipSQL1b") |
| tokenizer = AutoTokenizer.from_pretrained("PipableAI/pipSQL1b") |
| |
| inputs = tokenizer(text, return_tensors="pt") |
| outputs = model.generate(**inputs, max_new_tokens=200) |
| print(tokenizer.decode(outputs[0], skip_special_tokens=True).split('<sql>')[1].split('</sql>')[0]) |
| ``` |
| flax |
|
|
| ```python |
| from transformers import FlaxAutoModelForCasualLM, AutoTokenizer |
| model = FlaxAutoModelForCausalLM.from_pretrained("PipableAI/pipSQL1b" , from_pt=True) |
| tokenizer = AutoTokenizer.from_pretrained("PipableAI/pipSQL1b") |
| ``` |
|
|
| ## The PipableAI team |
|
|
| Avi Kothari, Pratham Gupta, Ritvik Aryan Kalra, Rohan Bhatial, Soham Acharya |