Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,137 @@
|
|
| 1 |
-
---
|
| 2 |
-
license: cc-by-sa-4.0
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: cc-by-sa-4.0
|
| 3 |
+
datasets:
|
| 4 |
+
- Homie0609/MatchTime
|
| 5 |
+
language:
|
| 6 |
+
- en
|
| 7 |
+
tags:
|
| 8 |
+
- sports
|
| 9 |
+
- soccer
|
| 10 |
+
---
|
| 11 |
+
|
| 12 |
+
## Requirements
|
| 13 |
+
- Python >= 3.8 (Recommend to use [Anaconda](https://www.anaconda.com/download/#linux) or [Miniconda](https://docs.conda.io/en/latest/miniconda.html))
|
| 14 |
+
- [PyTorch >= 2.0.0](https://pytorch.org/) (If use A100)
|
| 15 |
+
- transformers >= 4.42.3
|
| 16 |
+
- pycocoevalcap >= 1.2
|
| 17 |
+
|
| 18 |
+
A suitable [conda](https://conda.io/) environment named `matchtime` can be created and activated with:
|
| 19 |
+
```
|
| 20 |
+
cd MatchTime
|
| 21 |
+
conda env create -f environment.yaml
|
| 22 |
+
conda activate matchtime
|
| 23 |
+
```
|
| 24 |
+
|
| 25 |
+
## Training
|
| 26 |
+
Before training, make sure you have prepared [features](https://pypi.org/project/SoccerNet/) and caption [data]((https://drive.google.com/drive/folders/14tb6lV2nlTxn3VygwAPdmtKm7v0Ss8wG)), and put them into according folders. The structure after collating should be like:
|
| 27 |
+
``````
|
| 28 |
+
└─ MatchTime
|
| 29 |
+
├─ dataset
|
| 30 |
+
│ ├─ MatchTime
|
| 31 |
+
│ │ ├─ valid
|
| 32 |
+
│ │ └─ train
|
| 33 |
+
│ │ ├─ england_epl_2014-2015
|
| 34 |
+
│ │ ... ├─ 2015-02-21 - 18-00 Chelsea 1 - 1 Burnley
|
| 35 |
+
│ │ ... └─ Labels-caption.json
|
| 36 |
+
│ │
|
| 37 |
+
│ ├─ SN-Caption
|
| 38 |
+
│ └─ SN-Caption-test-align
|
| 39 |
+
│ ├─ england_epl_2015-2016
|
| 40 |
+
│ ... ├─ 2015-08-16 - 18-00 Manchester City 3 - 0 Chelsea
|
| 41 |
+
│ ... └─ Labels-caption_with_gt.json
|
| 42 |
+
│
|
| 43 |
+
├─ features
|
| 44 |
+
│ ├─ baidu_soccer_embeddings
|
| 45 |
+
│ │ ├─ england_epl_2014-2015
|
| 46 |
+
... │ ... ├─ 2015-02-21 - 18-00 Chelsea 1 - 1 Burnley
|
| 47 |
+
│ ... ├─ 1_baidu_soccer_embeddings.npy
|
| 48 |
+
│ └─ 2_baidu_soccer_embeddings.npy
|
| 49 |
+
├─ C3D_PCA512
|
| 50 |
+
...
|
| 51 |
+
``````
|
| 52 |
+
with the format of features is adjusted by
|
| 53 |
+
```
|
| 54 |
+
python ./features/preprocess.py directory_path_of_feature
|
| 55 |
+
```
|
| 56 |
+
After preparing the data and features, you can pre-train (or finetune) with the following terminal command (Check hyper-parameters at the bottom of *train.py*):
|
| 57 |
+
```
|
| 58 |
+
python train.py
|
| 59 |
+
```
|
| 60 |
+
## Inference
|
| 61 |
+
|
| 62 |
+
We provide two types of inference:
|
| 63 |
+
|
| 64 |
+
#### For all test set
|
| 65 |
+
|
| 66 |
+
You can generate a *.csv* file with the following code to test the ***MatchVoice*** model with the following code (Check hyper-parameters at the bottom of *inference.py*)
|
| 67 |
+
|
| 68 |
+
```
|
| 69 |
+
python inference.py
|
| 70 |
+
```
|
| 71 |
+
|
| 72 |
+
There is a sample of this type of inference in *./inference_result/sample.csv*.
|
| 73 |
+
|
| 74 |
+
#### For Single Video
|
| 75 |
+
|
| 76 |
+
We also provide a version for predict the commentary single video (for our checkpoints, use 30s video)
|
| 77 |
+
```
|
| 78 |
+
python inference_single_video_CLIP.py single_video_path
|
| 79 |
+
```
|
| 80 |
+
Here we only provide the version of CLIP feature (using VIT/B-32), for crop the CLIP feature, please check [here](https://github.com/openai/CLIP). CLIP features are not the one with best performance but are the most friendly for new new videos.
|
| 81 |
+
|
| 82 |
+
## Alignment
|
| 83 |
+
|
| 84 |
+
Before doing alignment, you should download videos from [here](https://www.soccer-net.org/data) (224p is enough) and make it in the following format:
|
| 85 |
+
|
| 86 |
+
``````
|
| 87 |
+
└─ MatchTime
|
| 88 |
+
├─ videos_224p
|
| 89 |
+
... ├─ england_epl_2014-2015
|
| 90 |
+
... ├─ 2015-02-21 - 18-00 Chelsea 1 - 1 Burnley
|
| 91 |
+
... ├─ 1_224.mkv
|
| 92 |
+
└─ 2_224p.mkv
|
| 93 |
+
``````
|
| 94 |
+
|
| 95 |
+
### Pre-process (Coarse Align)
|
| 96 |
+
|
| 97 |
+
We need to use [WhisperX](https://github.com/m-bain/whisperX) and [LLaMA3](https://huggingface.co/docs/transformers/model_doc/llama3)(as agent) to finish coarse alignment with following steps:
|
| 98 |
+
|
| 99 |
+
*WhisperX ASR:*
|
| 100 |
+
```
|
| 101 |
+
python ./alignment/soccer_whisperx.py --process_directory video_folder(eg. ./videos_224p/england_epl_2014-2015) --output_directory output_folder(eg. ./ASR_results/england_epl_2014-2015)
|
| 102 |
+
```
|
| 103 |
+
*Transform to Events:*
|
| 104 |
+
```
|
| 105 |
+
python ./alignment/soccer_asr2events.py --base_path ASR_results_folder(eg. ./ASR_results/england_epl_2014-2015) --output_dir envent_results_folder(eg. ./event_results/england_epl_2014-2015)
|
| 106 |
+
```
|
| 107 |
+
|
| 108 |
+
*Align from Events:*
|
| 109 |
+
```
|
| 110 |
+
python ./alignment/soccer_align_from_event.py --event_path envent_results_folder(eg. ./event_results/england_epl_2014-2015) --output_dir output_directory(eg. ./pre-processed/england_epl_2014-2015)
|
| 111 |
+
```
|
| 112 |
+
|
| 113 |
+
More details could be checked in paper.
|
| 114 |
+
|
| 115 |
+
### Contrastive Learning (Fine-grained Align)
|
| 116 |
+
|
| 117 |
+
After downloading checkpoints from [here](https://huggingface.co/Homie0609/MatchTime/tree/main). Use the following code to finish alignment with contrastive learning:
|
| 118 |
+
```
|
| 119 |
+
python ./alignment/do_alignment.py
|
| 120 |
+
```
|
| 121 |
+
By changing the hyper-parameter ***finding_words***, you can freely align from ASR, enent, or original SN-Caption.
|
| 122 |
+
|
| 123 |
+
Also, you can directly use alignment model by
|
| 124 |
+
```
|
| 125 |
+
from alignment.matchtime_model import ContrastiveLearningModel
|
| 126 |
+
```
|
| 127 |
+
|
| 128 |
+
## Evaluation
|
| 129 |
+
We provide codes for evaluate the prediction results:
|
| 130 |
+
```
|
| 131 |
+
# for single csv file
|
| 132 |
+
python ./evaluation/scoer_single.py --csv_path ./inference_result/sample.csv
|
| 133 |
+
# for many csv files to record scores in a new csv file
|
| 134 |
+
python ./evaluation/scoer_group.py
|
| 135 |
+
# for gpt score (need OpenAI API Key)
|
| 136 |
+
python ./evaluation/scoer_gpt.py ./inference_result/sample.csv
|
| 137 |
+
```
|