X

Upload a Big CSV file into Big Query

Big query is a really good, cheap, quick (insert superlative) platform to query and visualise data. It has a simple connector into DataStudio which is great to use in reporting as it’s very sharable.

Big Query has its limits and one of them is getting data into it. To upload directly within the interface there is a 10mb limit. You can import from Cloud Storage but you might find issues with data format. E.g. commas in fields being read incorrectly by Big Query.

In this case we need to use the API and do a streaming insert.

I am a big advocate of Pandas for handling large CSV’s and thankfully there is a library that handles this and dumps dataframes into Big Query. https://github.com/pydata/pandas-gbq

Step 1) Read the CSV

df = pd.read_csv('path_to_csv', encoding='utf-8')

Replace the path to CSV with the relative or absolute path to your file

Set the projectid of your project you are going to import into, you can get this from the URL of the big query dashboard after this string: ?project=

Step 2) Call the to_gbp function

Finally set the dataset and the table name in the to_gbq function

to_gbq(df, 'datasetname.tablename', projectid ,chunksize=10000, if_exists='append')

There you have a it a simple way to import data into Big Query

Checkout the repo here! 

Step 3) Bringing it all together

import pandas as pd
from pandas.io.gbq import read_gbq, to_gbq
import sqlite3
import time
import glob
import os

projectid = PROJECT_ID
dataset_table = DATASET.TABLE

# Load all to memory and insert into Big Query

df = pd.read_csv('filename.csv', encoding='utf-8')

to_gbq(df, dataset_table, projectid ,chunksize=10000, if_exists='append')

# Read CSV in chunks (500k rows) and upload 10k at a time

for chunk in pd.read_sql_query("SELECT * FROM TABLE_NAME", con=con, chunksize=500000):
    df = chunk
    to_gbq(df, dataset_table, projectid, chunksize=10000, if_exists='append')

 

Will Cecil: Digital Marketer, Python Tinkerer & Tech Enthusiast. Follow me: Website / Twitter / Github
Related Post