[1/3] /usr/local/cuda-12.3/bin/nvcc --generate-dependencies-with-compile --dependency-output cuda.cuda.o.d -DTORCH_EXTENSION_NAME=matmul_sum_max -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /matx/u/aco/miniconda3/envs/myenv/lib/python3.11/site-packages/torch/include -isystem /matx/u/aco/miniconda3/envs/myenv/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /matx/u/aco/miniconda3/envs/myenv/lib/python3.11/site-packages/torch/include/TH -isystem /matx/u/aco/miniconda3/envs/myenv/lib/python3.11/site-packages/torch/include/THC -isystem /usr/local/cuda-12.3/include -isystem /matx/u/aco/miniconda3/envs/myenv/include/python3.11 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_89,code=compute_89 -gencode=arch=compute_89,code=sm_89 --compiler-options '-fPIC' -std=c++17 -c /sailhome/aco/.cache/torch_extensions/py311_cu121/matmul_sum_max/cuda.cu -o cuda.cuda.o 
FAILED: cuda.cuda.o 
/usr/local/cuda-12.3/bin/nvcc --generate-dependencies-with-compile --dependency-output cuda.cuda.o.d -DTORCH_EXTENSION_NAME=matmul_sum_max -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /matx/u/aco/miniconda3/envs/myenv/lib/python3.11/site-packages/torch/include -isystem /matx/u/aco/miniconda3/envs/myenv/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /matx/u/aco/miniconda3/envs/myenv/lib/python3.11/site-packages/torch/include/TH -isystem /matx/u/aco/miniconda3/envs/myenv/lib/python3.11/site-packages/torch/include/THC -isystem /usr/local/cuda-12.3/include -isystem /matx/u/aco/miniconda3/envs/myenv/include/python3.11 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_89,code=compute_89 -gencode=arch=compute_89,code=sm_89 --compiler-options '-fPIC' -std=c++17 -c /sailhome/aco/.cache/torch_extensions/py311_cu121/matmul_sum_max/cuda.cu -o cuda.cuda.o 
/sailhome/aco/.cache/torch_extensions/py311_cu121/matmul_sum_max/cuda.cu(24): error: identifier "dtype" is undefined
      auto out = torch::zeros({batch_size}, dtype=torch::float32, device=a.device);
                                            ^

/sailhome/aco/.cache/torch_extensions/py311_cu121/matmul_sum_max/cuda.cu(24): error: namespace "torch" has no member "float32"
      auto out = torch::zeros({batch_size}, dtype=torch::float32, device=a.device);
                                                         ^

/sailhome/aco/.cache/torch_extensions/py311_cu121/matmul_sum_max/cuda.cu(24): error: identifier "device" is undefined
      auto out = torch::zeros({batch_size}, dtype=torch::float32, device=a.device);
                                                                  ^

/sailhome/aco/.cache/torch_extensions/py311_cu121/matmul_sum_max/cuda.cu(24): error: a pointer to a bound function may only be used to call the function
      auto out = torch::zeros({batch_size}, dtype=torch::float32, device=a.device);
                                                                           ^

/sailhome/aco/.cache/torch_extensions/py311_cu121/matmul_sum_max/cuda.cu(30): error: no instance of overloaded function "at::Tensor::view" matches the argument list
            argument types are: (int, int)
            object type is: at::Tensor
      return out.view(-1, 1);
                 ^

5 errors detected in the compilation of "/sailhome/aco/.cache/torch_extensions/py311_cu121/matmul_sum_max/cuda.cu".
[2/3] c++ -MMD -MF main.o.d -DTORCH_EXTENSION_NAME=matmul_sum_max -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /matx/u/aco/miniconda3/envs/myenv/lib/python3.11/site-packages/torch/include -isystem /matx/u/aco/miniconda3/envs/myenv/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /matx/u/aco/miniconda3/envs/myenv/lib/python3.11/site-packages/torch/include/TH -isystem /matx/u/aco/miniconda3/envs/myenv/lib/python3.11/site-packages/torch/include/THC -isystem /usr/local/cuda-12.3/include -isystem /matx/u/aco/miniconda3/envs/myenv/include/python3.11 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++17  -c /sailhome/aco/.cache/torch_extensions/py311_cu121/matmul_sum_max/main.cpp -o main.o 
ninja: build stopped: subcommand failed.
Using /sailhome/aco/.cache/torch_extensions/py311_cu121 as PyTorch extensions root...
Creating extension directory /sailhome/aco/.cache/torch_extensions/py311_cu121/matmul_sum_max...
Detected CUDA files, patching ldflags
Emitting ninja build file /sailhome/aco/.cache/torch_extensions/py311_cu121/matmul_sum_max/build.ninja...
/matx/u/aco/miniconda3/envs/myenv/lib/python3.11/site-packages/torch/utils/cpp_extension.py:1965: UserWarning: TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation. 
If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'].
  warnings.warn(
Building extension module matmul_sum_max...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
Traceback (most recent call last):
  File "/matx/u/aco/miniconda3/envs/myenv/lib/python3.11/site-packages/torch/utils/cpp_extension.py", line 2105, in _run_ninja_build
    subprocess.run(
  File "/matx/u/aco/miniconda3/envs/myenv/lib/python3.11/subprocess.py", line 571, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/matx/u/aco/KernelBenchInternal/src/scratch/model_new.py", line 40, in <module>
    matmul_sum_max = load_inline(
                     ^^^^^^^^^^^^
  File "/matx/u/aco/miniconda3/envs/myenv/lib/python3.11/site-packages/torch/utils/cpp_extension.py", line 1647, in load_inline
    return _jit_compile(
           ^^^^^^^^^^^^^
  File "/matx/u/aco/miniconda3/envs/myenv/lib/python3.11/site-packages/torch/utils/cpp_extension.py", line 1722, in _jit_compile
    _write_ninja_file_and_build_library(
  File "/matx/u/aco/miniconda3/envs/myenv/lib/python3.11/site-packages/torch/utils/cpp_extension.py", line 1834, in _write_ninja_file_and_build_library
    _run_ninja_build(
  File "/matx/u/aco/miniconda3/envs/myenv/lib/python3.11/site-packages/torch/utils/cpp_extension.py", line 2121, in _run_ninja_build
    raise RuntimeError(message) from e
RuntimeError: Error building extension 'matmul_sum_max'
