This tool allows you to transform your CSV data using various techniques. You can clean data, pivot tables, pseudonymize data, or generate hash IDs. Useful for quick data wrangling and curation.
Click to select a CSV file
or drag and drop here
Filter rows based on column values containing specific text.
Column: Select the column to filter on.
Value Contains: Enter text to filter for (case-insensitive).
Example: Filter a "Country" column for rows containing "united" to find both "United States" and "United Kingdom".
Group data by a column and perform calculations on another column.
Group By: Select a column to group records by (e.g., Category, Country).
Aggregate Column: Select a column (usually numeric) to perform calculations on.
Function: Choose an aggregation function:
Example: Group sales data by "Region" and sum the "Revenue" column to see total revenue by region.
Order the dataset by values in a specific column.
Sort By: Select the column to sort on.
Direction: Choose ascending (A→Z, 1→9) or descending (Z→A, 9→1) order.
Example: Sort a product list by "Price" in descending order to see most expensive items first.
Automatically clean and prepare data for analysis.
Actions performed:
Example: Clean survey data to remove incomplete responses and ensure numeric fields are properly formatted.
Create a cross-tabulation of data similar to Excel pivot tables.
Row Labels: Select a column for the row dimension.
Column Labels: Select a column for the column dimension.
Example: Create a table showing product sales (counts) by region and category, with regions as rows and categories as columns.
Replace identifying information with fictional data while maintaining consistency.
Pseudonymize Columns: Select columns containing sensitive data to replace with fictional values.
Type Selection: Choose the appropriate type for each column:
Remove Columns: Completely remove columns that shouldn't be included in the output.
Mapping: Maintains consistency by always replacing a specific value with the same pseudonym.
Example: Pseudonymize customer data for sharing with analysts while protecting privacy.
Create a new column with hash identifiers based on values from selected columns.
ID Column Name: Name for the new column containing hash IDs.
Auto-generate name: Creates a descriptive column name based on selected columns.
Hash Algorithm: Choose between a simple or more complex hash algorithm.
Salt: Optionally add a secret key to make hashes unique but non-reproducible.
Columns for Hash: Select which columns to include when generating the hash ID.
Example: Create persistent anonymous identifiers from demographic data, or generate unique IDs from multiple fields.
Find and merge similar values in a column using various clustering algorithms.
Select Column: Choose a column that may contain variations of the same value.
Clustering Methods:
Similarity Threshold: For Levenshtein method, controls how similar values need to be to cluster.
Example: Find and standardize variations like "United States", "USA", "U.S.A", and "US" in a Country column.
Remove unnecessary columns from your dataset.
Select columns: Choose which columns to completely remove from the dataset.
Example: Remove sensitive columns like "SSN" or "Phone Number" before sharing data with others, or remove irrelevant columns to simplify your analysis.